Sequence-based predictions (Phenylketonuria)
Page is still under construction!!!
Contents
Summary
Sequence-based prediction approaches are useful to predict a variety of structural and functional properties of proteins. Here, we used different methods to provide useful information about our protein sequence of phenylalanine hydroxylase (PAH - P00439) and in some cases likewise for other given proteins (in brackets):
- ReProf for secondary structure prediction (P10775, Q9X0E6, Q08209)
- IUPred and MD (MetaDisorder) for the prediction of the disorder (P10775, Q9X0E6, Q08209)
- PolyPhobius and MEMSAT-SVM to predict transmembrane helices (P35462, Q9YDF8, P47863)
- SignalP to predict signal peptides (P02768, P47863, P11279)
- GOPET and ProtFun2.0 to predict GO terms
- Pfam with a sequence search to find out more about the Pfam family of our protein
The results are here presented and discussed in detail.
Secondary structure
We wrote a program to filter the ReProf, PsiPred and DSSP outputs for the secondary structure: filter_seqStruc.pl
For DSSP PDB files are needed. Empty positions are converted to '-'. The PDB IDs are:
- P10775: 2BNH
- Q9X0E6: 1VHF
- Q08209: ...
- P00439: 1PAH
"Secondary Structure" | |||||||
---|---|---|---|---|---|---|---|
Type | ReProf | PsiPred | DSSP | ||||
Helix (alpha) | H | H | GHI | ||||
Extended strand (beta) | E | E | BE | ||||
Loops/Turns | L | C | ST |
P10775 (RNH1)
In the following tables the predicted structures of ReProf are compared against the structure prediciton of DSSP and PsiPred. Furthermore they are compared to the recorded structure in UniProt.
"Sensitivity of predicted secondary structures against the DSSP structure." | |||||||
---|---|---|---|---|---|---|---|
Letter | FASTA | PSSM-Big | PSSM-Swissprot | PsiPred | Uniprot | ||
E | 21 | 63 | 81 | 84 | 63 | ||
H | 72 | 95 | 92 | 83 | 91 | ||
L | 72 | 85 | 79 | 95 | 73 | ||
total | 46 | 64 | 64 | 63 | 73 |
"Sensitivity of predicted secondary structures against the PsiPred structure." | |||||||
---|---|---|---|---|---|---|---|
Letter | FASTA | PSSM-Big | PSSM-Swissprot | DSSP | Uniprot | ||
E | 20 | 69 | 91 | 87 | 60 | ||
H | 78 | 100 | 99 | 98 | 96 | ||
L | 62 | 77 | 71 | 33 | 0 | ||
total | 63 | 84 | 84 | 63 | 42 |
"Sensitivity of predicted secondary structures against the Uniprot structure." | |||||||
---|---|---|---|---|---|---|---|
Letter | FASTA | PSSM-Big | PSSM-Swissprot | PsiPred | DSSP | ||
E | 22 | 56 | 71 | 73 | 80 | ||
H | 74 | 97 | 95 | 89 | 100 | ||
L | 0 | 0 | 0 | 0 | 0 | ||
total | 31 | 43 | 44 | 42 | 73 |
Bad total values at DSSP and Uniprot comparison are caused by the unknown ...
...
Q9X0E6 (CUTA)
...
Q08209 (PPP3CA)
...
P00439 (PAH)
...
Disorder
IUPred
With IUPred one can predict long and short disorders as well as globular domains. ...
First we compiled IUPred with following command:
cc /opt/iupred/iupred.c -o /mnt/home/student/.../iupred
Afterwards one can invoke the programm as shown here:
iupred sequence.fasta long/short/glob > output.txt
Since the output is only given to Standard Out, we had to save the output into a file.
MD (MetaDisorder)
MetaDisorder is a ...
To invoke the programm one can use following command:
predictprotein --seqfile sequence.fasta --target metadisorder -p output_name -o output-directory
DisProt
DisProt is a database of ...
We could not find exact matchings on DisProt for our protein as well as two other proteins, so we used the following best hits done with Sequence Search and Smith Waterman search algorithm:
- P00439 (PAH): best hit with an E-Value of 1.1e-116 leading to the protein Tyrosine 3-monooxygenase (P04177)
- P10775 (RNH1): best hit with an E-Value of 1.4e-24 leading to the protein NALP1_HUMAN (Q9C000-1)
- Q9X0E6 (CUTA): best hit with an E-Value of 0.34 leading to the protein Uncharacterized protein (Q57696)
The PSI-Blast search algorithm gave the same best hits, except for the CUTA protein, but here was the E-Value in the Smith Waterman search better than in PSI-Blast, so we used this hit.
The only protein with a match in DisProt, was Q08209 (PPP3CA).
Transmembrane helices
...
PolyPhobius
...
P00439 (PAH)
...
P35462 (DRD3)
...
Q9YDF8 (KVAP)
...
P47863 (AQP4)
...
MEMSAT-SVM
...
Signal peptides
...
P00439 (PAH)
...
P02768 (ALB)
...
P47863 (AQP4)
...
P11279 (LAMP1)
...
GO terms
...
Pfam
...
Discussion
Questions:
- What features are predicted?
- Discuss the results for your protein and the example proteins. Using the predictions, what could you learn about your protein and the example proteins? Compare to the available knowledge in UniProt, PDB, DisProt, OPM, PDBTM, Pfam...
- Look for other methods to get an idea how many different tools are available to predict: secondary structure, disorder, transmembrane, signal peptides and GO terms. You should be able to name several more methods in the discussion. (You can also try out more methods.)
- What else can/is be predicted from protein sequence alone?
- Which predictions can be improved considerably by structure-based approaches?