Task 3 - Sequence-based predictions 2012
In contrast to the vast amount of known protein sequences, information about structure and function is available for only very few proteins. Sequence-based predictions of protein features aim to decrease this gap. Many sequence-based prediction methods use evolutionary information. Sequence alignments are therefore often a prerequisite for the predictions.
Theoretical background talks
The talks will give an introduction to sequence-based protein predictions. In particular:
- secondary structure
- transmembrane helices
- GO terms
The slides are available here  (sorry too large to upload to the wiki system).
Where to run the jobs
- You can log in to the student computer pool from outside:
??goes from 01 to 10.
- Work in the student computer pool.
- You can also install the programs on your own computer.
Use ReProf (available as Debian package from rostlab) to predict secondary structure for your protein. Apply ReProf also to these proteins (given are UniProt IDs):
Use fasta sequences for the prediction. You can find out about Reprof usage by running
reprof or reading the man page (
man reprof). Peter Hoenigschmig (
firstname.lastname@example.org) would like to hear about anything that would improve the description or if anything seems unclear. For help, you can always ask us first.
Use IUPred to predict disorder for your protein. Apply IUPred to the example proteins given above, too (run
iupred). You can find a more information (README and example) here:
Compare the results to the information in the DisProt database. Note: DisProt does not cover the complete UniProt list of sequences. If your protein is not in DisProt, you can use "Search by sequence" in DisProt to look for similar proteins. You can also use DisProt to look for more disordered proteins.
Use PolyPhobius to predict transmembrane helices for your protein and for the following proteins (UniProt IDs given):
PolyPhobius is installed in
In contrast to its precursor Phobius, PolyPhobius uses homology information for the prediction. First, you have to execute a blast search. PolyPhobius distributed its own perl script for this purpose:
blastget -h. Use only the -db and -ix parameters. Input is the fasta sequence of the above given proteins. Use SwissProt (
/mnt/project/pracstrucfunc12/data/swissprot/uniprot_sprot) as database and
/mnt/project/pracstrucfunc12/data/index_pp/uniprot_sprot.idx as index.
Use the blastget output to create a MSA using Kalign (
Use the MSA as input for PolyPhobius (
jphobius -h. Do not forget the -poly parameter.
Use SignalP to predict signal peptides for the following proteins:
You can look for more example proteins with different signal peptides or targeting signals.
You can run
signalp in the student computer pool. This is version 3.0 right now. However, Tim will update to version 4.0 soon. Please note which version you are using. You can also use the SignalP server, version 4.0. For one of the example proteins I got a different prediction from SignalP 3.0 and 4.0.
You can use the Signal Peptide Website to look up the proteins. Check also for predicted transmembrane helices using PolyPhobius.
Use Pfam -> SEQUENCE SEARCH to find out more about the Pfam family of your protein.
Questions to answer
- What features are predicted?
- Discuss the results for your protein and the example proteins. Using the predictions, what could you learn about your protein and the example proteins? Compare to the available knowledge in UniProt, PDB, DisProt, OPM, Pfam...
- Look for other methods to get an idea how many different are available. (You can also try out more methods.)
- What else could be predicted from protein sequence alone?