Difference between revisions of "Task 3 - Sequence-based predictions"
(→Disorder) |
(→Disorder) |
||
Line 49: | Line 49: | ||
Use IUPred to predict disorder for your protein. Apply IUPred to the example proteins given above, too. |
Use IUPred to predict disorder for your protein. Apply IUPred to the example proteins given above, too. |
||
− | IUPred is installed on the student computers. Usage: <code>iupred</code>. Try the different types: "long", "short" and "glob". You can find more information here: <code>/opt/iupred/</code> (README and example). |
+ | IUPred is installed on the student computers. Usage: <code>iupred</code>. Try the different types: "long", "short" and "glob". You can find more information here: <code>/opt/iupred/</code> (README and example). Try out the [http://iupred.enzim.hu/ IUPred server] and have a look at the graphical output. |
Compare the results to the information in the [http://www.disprot.org/ DisProt] database. ''Note:'' DisProt does not cover the complete UniProt list of sequences. If your protein is not in DisProt, you can use "Search by sequence" in DisProt to look for similar proteins. Look for at least one other (interesting!) disordered protein, use for example DisProt. |
Compare the results to the information in the [http://www.disprot.org/ DisProt] database. ''Note:'' DisProt does not cover the complete UniProt list of sequences. If your protein is not in DisProt, you can use "Search by sequence" in DisProt to look for similar proteins. Look for at least one other (interesting!) disordered protein, use for example DisProt. |
Revision as of 12:36, 7 May 2013
May 7, 2013: STILL UNDER CONSTRUCTION!
You can start with secondary structure, but I am still updating some of the databases and task for other prediction features.
In contrast to the vast amount of known protein sequences, information about structure and function is available for only very few proteins. Sequence-based predictions of protein features aim to decrease this gap. Many sequence-based prediction methods use evolutionary information. Sequence alignments are therefore often a prerequisite for the predictions.
Contents
Theoretical background talk
The talk will give an introduction to sequence-based protein predictions. In particular:
- secondary structure
- disorder
- transmembrane helices
- signal peptides
- GO terms
Where to run the jobs
- You can log in to the student computer pool from outside:
i12k-biolab??.informatik.tu-muenchen.de
, where??
goes from 01 to 10. - Work in the student computer pool.
- You can also install the programs on your own computer.
Protein sequence databases
Up-to-date
/mnt/project/pracstrucfunc13/data/swissprot/uniprot_sprot (BLAST db, May 7, 2013)
Questions to answer
- What features are predicted?
- Discuss the results for your protein and the example proteins. Using the predictions, what could you learn about your protein and the example proteins? Compare to the available knowledge in UniProt, PDB, DisProt, OPM, PDBTM, Pfam...
- Look for other methods to get an idea how many different are available to predict: secondary structure, disorder, transmembrane, signal peptides and GO terms. You should be able to name several more methods in the discussion. (You can also try out more methods.)
- What else can/is be predicted from protein sequence alone?
- Which predictions can be improved considerably by structure-based approaches?
Secondary structure
Use ReProf to predict secondary structure for your protein. Apply ReProf to your protein and also to these proteins (UniProt AC):
- P10775
- Q9X0E6
- Q08209
ReProf is installed on the student computers. Usage: reprof
(man reprof
). Use as input a fasta sequence as well as a position specific matrix (PSSM). The PSSM can be genereated with PSI-BLAST or HHblits. ReProf uses the PSI-BLAST PSSM format, i.e. HHblits output will have to be transformed.
Compare the ReProf results to PsiPred and DSSP_server (DSSP). Find out more about the example proteins (and yours) using UniProt and the PDB.
Disorder
Use IUPred to predict disorder for your protein. Apply IUPred to the example proteins given above, too.
IUPred is installed on the student computers. Usage: iupred
. Try the different types: "long", "short" and "glob". You can find more information here: /opt/iupred/
(README and example). Try out the IUPred server and have a look at the graphical output.
Compare the results to the information in the DisProt database. Note: DisProt does not cover the complete UniProt list of sequences. If your protein is not in DisProt, you can use "Search by sequence" in DisProt to look for similar proteins. Look for at least one other (interesting!) disordered protein, use for example DisProt.
Transmembrane helices
Use PolyPhobius and MEMSAT-SVM to predict transmembrane helices for your protein and for the following proteins (UniProt IDs given):
- P35462
- Q9YDF8
- P47863
PolyPhobius
In contrast to its precursor Phobius, PolyPhobius uses homology information for the prediction. You input a multiple sequence alignment (MSA) in Kalign format instead of a fasta sequence. You can use any MSA, for example one generated during task 2. PolyPhobius provides scripts, too.
PolyPhobius is installed in /mnt/project/pracstrucfunc13/polyphobius/
.
- perl script
blastget
(/mnt/project/pracstrucfunc12/polyphobius/blastget
) to run BLAST. You can use any sequence database, for example big_80. However, we recommend SwissProt. It is fast and the results do usually not improve when using a larger database. You need to create an index for BLAST thatblastget
uses (seeblastget -h
). - Kalign (
/mnt/opt/T-Coffee/bin/kalign
) to generate the MSA. - (
/mnt/project/pracstrucfunc12/polyphobius/jphobius
) with MSA as input. Do not forget the -poly parameter.
MEMSAT-SVM
MEMSAT-SVM uses support vector machines (SVMs) to predict transmembrane helices.
Compare the results to the membrane assignment of the structures for these proteins in OPM and/or PDBTM.
Signal peptides
Use SignalP to predict signal peptides for the following proteins:
- P02768
- P47863
- P11279
You can look for more example proteins with different signal peptides or targeting signals.
You can run signalp
in the student computer pool. This is version 3.0 right now. However, Tim will update to version 4.0 soon. Please note which version you are using. You can also use the SignalP server, version 4.0. For one of the example proteins I got a different prediction from SignalP 3.0 and 4.0.
You can use the Signal Peptide Website to look up the proteins. Check also for predicted transmembrane helices using PolyPhobius.
GO terms
Use GOPET and ProtFun2.0 to predit GO terms for your protein. What can you learn about its function form sequence alone?
Use Pfam -> SEQUENCE SEARCH to find out more about the Pfam family of your protein.