Difference between revisions of "Gaucher Disease: Task 03 - Sequence-based predictions"
(→Human Lysosome-associated membrane glycoprotein 1) |
(→Human Lysosome-associated membrane glycoprotein 1) |
||
Line 337: | Line 337: | ||
|- |
|- |
||
|# of TMH |
|# of TMH |
||
− | | |
+ | |8 |
|6 |
|6 |
||
|8 (per chain) |
|8 (per chain) |
||
Line 343: | Line 343: | ||
|- |
|- |
||
|align="center" |TMH Topology |
|align="center" |TMH Topology |
||
− | |1. ''' <span style="color:#045FB4"> 35-56</span>'''<br>2. '''<span style="color:#045FB4"> 71-89</span>'''<br>3. '''<span style="color:#a30909"> 93-109</span>'''<br>4. '''113-136'''<br>5. '''<span style="color:#045FB4"> 157-178</span>'''<br>6. '''<span style="color:#045FB4"> 190-205</span>'''<br>'''<span style="color:#a30909">209-225 </span>'''<br>8. '''232-252''' |
+ | |1. ''' <span style="color:#045FB4"> 35-56</span>'''<br>2. '''<span style="color:#045FB4"> 71-89</span>'''<br>3. '''<span style="color:#a30909"> 93-109</span>'''<br>4. '''113-136'''<br>5. '''<span style="color:#045FB4"> 157-178</span>'''<br>6. '''<span style="color:#045FB4"> 190-205</span>'''<br>7. '''<span style="color:#a30909">209-225 </span>'''<br>8. '''232-252''' |
|'''34-58<br> 70-91<br><br> 115-136<br>156-177<br>188-208<br><br> 231-252''' |
|'''34-58<br> 70-91<br><br> 115-136<br>156-177<br>188-208<br><br> 231-252''' |
||
|'''34-56<br> 70-88<br><span style="color:#a30909"> 98-107</span><br> 112-136<br> 156-178<br> 189-203<br> <span style="color:#a30909"> 214-223</span><br> 231-252''' |
|'''34-56<br> 70-88<br><span style="color:#a30909"> 98-107</span><br> 112-136<br> 156-178<br> 189-203<br> <span style="color:#a30909"> 214-223</span><br> 231-252''' |
Revision as of 17:21, 11 August 2013
This page is still under construction.
Contents
Secondary Structure
In this task secondary structure of a protein is predicted using ReProf and compared to PsiPred prediction and DSSP structure assignment.
Evaluation results
Evaluation results of Reprof against Psipred and DSSP are summarized in <xr id="secondary structure results"/>. Reprof run was performed starting with the Psi-BLAST PSSM after a run against big_80 with 3 iterations and E-value cutoff 10E-10 (as described in the lab journal in the link above).
<figtable id="secondary structure results">
Query | Precision PsiPred | Precision DSSP | ||||||
---|---|---|---|---|---|---|---|---|
E | H | L | Total | E | H | L | Total | |
P10775 | 47.3 | 99.4 | 59.6 | 72.6 | 42.1 | 96.0 | 62.2 | 78.5 |
Q9X0E6 | 86.1 | 97.2 | 75.9 | 87.1 | 83.0 | 97.3 | 77.0 | 87.8 |
Q08209 | 87.5 | 74.3 | 86.0 | 82.2 | 70.5 | 75.9 | 88.9 | 78.8 |
P04062 | 88.4 | 96.8 | 76.2 | 83.6 | 80.0 | 84.1 | 86.3 | 83.3 |
</figtable>
P04062
Aligned view
Aligned view of the secondary structure predictions with ReProf and PsiPred, the DSSP assignment and the UniProt annotation for the Gaucher's disease protein, P04062, is shown below.
Sequence: MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHT ReProf: LLLLLLHHHHLLHHLLHHHHHHHHHHHHHHHHHHHHHHLLLLLLEEELLLLLEEEEEELLLLLLLLLLLLLLLLEEEEEEELLLLLEEEEELLLLLLLLL PsiPred: LLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHHLLLLLLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLL DSSP: ----------------------------------------E---EEE-LLLLEEEEEELL---E--------LLEEEEEEEELLL--LEEEEEE-ELL-- UniProt: LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLEEEELEEEELLLLLLLLLLLLLLLLLEEEEEEEELLLLLEEEEEEELEEELL
Sequence: GTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLI ReProf: LLLEEEEEELLLLEEEEEEELEEHHHHHHHHHHLLLHHHHHHHHHHHLLLLLLLEEEEEEELLLLLLLLLLLEEELLLLLLLLLLLLLLHHHHHHHHHHH PsiPred: LLLEEEEEELLLLLLEEEEEELLLLHHHHHHHHLLLHHHHHHHHHHLLLLLLLLLEEEEEELLLLLLLLLLLLLLLLLLLLLLLLLLLLHHLLLLLHHHH DSSP: --LEEEEEEEEEEEEE--EEEEE--HHHHHHHLLL-HHHHHHHHHHHHLLLLL---EEEEEEL--LLLLL---L--LLL-LL-LL----HHHHHHHHHHH UniProt: LLEEEEEEEEEEEEEELLEEEEELLHHHHHHHHLLLHHHHHHHHHHHHLLLLLLLLEEEEEEELLEEEEELLLLLLEEELLLLLLLLLLHHHHLLHHHHH
Sequence: HRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIA ReProf: HHHHHHLLLLEEEEEELLLLLLEEEELLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHHLLLLEEEEELLLLLLLLLLLLLLLLLELLLHHHHHHHHH PsiPred: HHHHHHLLLLLEEEELLLLLLLLLLLLLLLLLLLLLLLLLLLHHHHHHHHHHHHHHHHHHHLLLLLLEEEELLLLLLLLLLLLLLLLLLLLHHHHHHHHH DSSP: HHHHHH-LL--EEEEEEL---HHHELL-LLLLL-EELL-LLLHHHHHHHHHHHHHHHHHHHLL---LEEEL-LLLLHHHLLL--L---E--HHHHHHHHH UniProt: HHHHHHLLLLLEEEEEEELLLHHHEEELEEEEELEEEELLLLHHHHHHHHHHHHHHHHHHHLLLLLEEEEELLLHHHHHLLLLEEELLLLLHHHHHHHHH
Sequence: RDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGM ReProf: HHHHHHHHHLLLLLEEEEEEELLLLLLHHHHHHHLLLHHHHHHHHHLEEEELLLLLLLLHHHHHHHHHHLLLLLEEEEEEEELLLLLLLLLLLLLHHHHH PsiPred: HHHHHHHHLLLLLLEEEEEELLLLLLLLLLLHHHLLLHHHHHHLLEEEEELLLLLLLHHHHHHHHHHHHLLLLEEEEEELLLLLLLLLLLLLLLLHHHHH DSSP: HHHHHHHHLLLLLLLEEEEEEEEHHHLLHHHHHHHLLHHHHLL--EEEEEEELLL---HHHHHHHHHHH-LLLEEEEEEEE----LLL-L--LL-HHHHH UniProt: HLHHHHHHLLLLLLEEEEEEEEEHHHLLHHHHHHHLLHHHHLLLLEEEEEEELHHHLLHHHHHHHHHHHLLLEEEEEEEEELLLLEEELLLLLLLHHHHH
Sequence: QYSHSIITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAVALMHPDGSAVVVVL ReProf: HHHHHHHHHHHHHLHHEEEEEEEELLLLLLLLLLLLEEEEEEEELLLLEEEEELLEHHHHEEEELLLLLLEEEEEELLLLLLEEEEEEELLLLLEEEEEE PsiPred: HHHHHHHHHHHLLLEEEEEELLLLLLLLLLLLLLLLLLLLEEEELLLLEEEELLLEEEEEHHLLLLLLLLEEEEEELLLLLLLEEEEEELLLLLEEEEEE DSSP: HHHHHHHHHHHLLEEEEEEEEL-E-LLL---LL------LEEEEHHHLEEEE-HHHHHHHHHHLL--LL-EEEEEEELL--LEEEEEEE-LLL-EEEEEE UniProt: HHHHHHHHHHHLLEEEEEEEEEEELEEELLLLLLLLLLLEEEEEHHHLEEEELHHHHHHHHHHLLLLLLLEEEEEEEEELLEEEEEEEELLLLLEEEEEE
Sequence: NRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ ReProf: LLLLLLEEEEEEELLLEEEEEELLLLEEEEEEEELL PsiPred: ELLLLLEEEEEELLLLLEEEEELLLLEEEEEEEELL DSSP: E-LLL-EEEEEEELLLEEEEEEE-LLEEEEEEE--- UniProt: ELEEELEEEEEEELLLEEEEEEELLLEEEEEEELLL
Comparison to available knowledge
Here we compare the secondary structure predictions and DSSP assignment to the available knowledge in UniProt and PDB.
UniProt
UniProt secondary structure annotation assigns residues into one of the three states: helix, strand or turn. The annotation might be unreliable, if no evidence on experimental level is available for the protein. However, the existence of our protein, P04062, was verified on protein level, therefore we can rely on the annotation to some extent.
The UniProt secondary structure annotation for P04062 is shown in the image above. It also included into the alignment in previous section, regarding both turns and positions not in one of the three states (helix, strand or turn) as loops. As one can see, the main difference is that ReProf and PsiPred both predict one long helix and ReProf additionally two short helices before it (with 4 and 2 residues) near the beginning of the sequence, whereas UniProt annotates only loops there (and DSSP has no assignment there). But altogether, the secondary structures look very similar, excluding small disagreements in the exact position and length of a segment or not everywhere present short segments. The latter may be falsely predicted or assigned.
PDB
1OGS consists of two identical chains, A and B. From looking at the cartoon representation colored according to the secondary structures of one of the chains, one can see that it contains many alternating helices and sheets connected by loops. Beta-barrel fold can be recognized easily. This supports our predictions, the DSSP assignment and the UniProt annotation of the secondary structure of the protein sequence P04062.
- TODO: Make alignments and comparison to UniProt and PDB for the other 3 proteins.
Redo the structure visualization with VMD with right colors.
protein
Aligned view
Comparison to available knowledge
PDB
UniProt
Discussion
Using the secondary structure predictions and assignments, we could learn for our protein that its secondary structure mainly contains helices and strands. It is a dimer, each chain folds into a beta-barrel domain.
- TODO: Discuss the results also for the example proteins. Using the predictions, what could you learn about the example proteins?
Disorder
In this task we predict protein disordered and globular regions using IUpred and MetaDisorder.
Lab journal
IUPred
IUpred is a protein disorder predictor. User can choose one of the three options:
- long for prediction of long disorders
- short for prediction of short disorders
- glob for prediction of structured, globular domains
IUpred prediction results for each protein are presented and described in the plots below. The disorder tendency ranges from 0 to 1 and is plotted for each residue in a protein sequence. Residues with a tendency above 0.5 are seen as disordered.
IUpred results for protein P04062. There is almost no difference between the "long" and "short" prediction, the latter only predicts more disorder at the beginning and the end of the protein. Almost the whole protein sequence - from position 4 till the end (536) - is in a globular domain, according to "glob". Only a short region of three first residues is predicted to be disordered.
IUpred results for protein P10775. There is almost no difference between the "long" and "short" prediction, the latter only predicts more disorder at the beginning and the end of the protein. The whole protein sequence (456 residues long) is predicted to be in a globular domain, according to "glob".
IUpred results for protein Q9X0E6. There is almost no difference between the "long" and "short" prediction, the latter only predicts more disorder at the beginning and the end of the protein. The whole protein sequence (101 residues long) is predicted to be in a globular domain, according to "glob".
IUpred results for protein Q08209. There are only small deviations between the "long" and "short" prediction, "long" predicts more disorder at the end on the protein, whereas "short" predicts more disorder at the beginning and the very end of the protein. According to "glob", a major part of the protein sequence - from position 5 till 446 from a total of 521 residues - is in a globular domain. Therefore, two disordered regions are predicted: one consisting only of four residues at the beginning and one containing 75 amino acids at the end of the protein.
MetaDisorder
MetaDisorder (MD) is a meta-predictor that combines several prediction methods:
- NORSnet: prediction of unstructured loops
- PROFbval: prediction of residue flexibility from sequence
- Ucon: prediction of protein disorder using predicted internal contacts
Among the prediction scores of the three predictors, MD gives the final decision on disorder as well as MDrel: reliability of the final prediction, whose values range from 0-9 (9 is the strongest prediction). The raw prediction scores as well as the MD final score for each of the four proteins are visualized in the following plots.
MetaDisorder results for protein P04062. MD prediction (red line) looks very similar to IUpred prediction, the very beginning of the protein is predicted to be disordered. The predictions of the single programs look very different and are very fluctuating. PROFbval (blue line) outputs higher scores (frequently over 0.5), than NORSnet (green line) and Ucon (purple line), however Ucon has some high peaks over 0.5. Overall, MD prediction seems more reliable, than predictions of the stand-alone methods.
MetaDisorder results for protein P10775. MD results are very similar to IUpred "short" results, the very beginning and end of the protein seem to be slightly disordered, however the score goes only a little over 0.5 for this prediction to be reliable. PROFbval prediction is the most similar to MD because of the higher scores at the ends, only the scores are overall higher (often over 0.5). NORSnet and Ucon output lower scores and still lower at the ends, Ucon has sometimes high peaks. Again, MD seems to predict disordered regions better than the single programs.
MetaDisorder results for protein Q9X0E6. Compared to IUpred prediction, MD, PROFbval and Ucon predictions look very different and fluctuating, predicting several disordered regions, which are not present in the IUpred results. Only NORSnet predicts the whole protein as lacking unstructured loops, like IUpred. Maybe the worse prediction of MD can be caused by the short length of this protein.
MetaDisorder results for protein Q08209. MD predicts disorder at the beginning and the end of the protein, as IUpred. NORSnet prediction is also similar, however it predict less disorder at the very beginning but more disorder after it (approx. position 5-40). Interestingly, both MD and NORSnet predict slight peaks around the position 240 and 375. The two other methods - Ucon and PROFbval - have very fluctuating prediction with higher scores and many peaks. Here MD and NORSnet seem to make more reliable predictions.
DisProt
Q08209 It is the only protein from the four which could be found in DisProt by ID. There are 5 disordered regions in the protein, 2 of them overlap. These regions are o the end of the sequence. The 6th region is ordered and is in the core of the sequence and is also the longest region. All regions map to the PDB structure 1AUI:A.
Region | Type | Name | Location | Length | Structural/functional type | Functional classes | Functional subclasses |
---|---|---|---|---|---|---|---|
1 | Disordered - Extended | 1 - 13 | 13 | Relationship to function unknown | Unknown | Unknown | |
2 | Disordered - Extended | 374 - 468 | 95 | Function arises via a disorder to order transition | Molecular assembly | Autoregulatory, Protein-protein binding | |
3 | Disordered - Extended | CaM-binding domain | 390 - 414 | 25 | Function arises via a disorder to order transition | Molecular assembly | Protein-protein binding, Autoregulatory |
4 | Disordered - Extended | Autoinhibitory region | 469 - 486 | 18 | Function arises via an order to disorder transition and vice versa | Molecular assembly | Protein-protein binding, Autoregulatory |
5 | Disordered - Extended | 487 - 521 | 35 | Relationship to function unknown | Unknown | Unknown | |
6 | Ordered | 14 - 373 | 360 |
Transmembrane Helices
Four Proteins, including the Gaucher's disease causing Protein, where analysed under reference by transmembrane helices. The used prediction tools differ in their analysing features. While Polyphobius only differs between residues being part of a transmembrane helix or being inside/outside of the cytoplama, Memsat-SVM also predicts re-entrant helices and pore-linig helices. Due to the fact that pore-lining helices are also transmembrane helices, this kind of helices is detected of both prediction tools. In case of re-entrant helices both programms differ. In general a membrane helix crosses the membrane, so that both ends of the helix lie on different sides of the membrane. In contrast, the re-entrant helix leads bot its ends to the same side of the mebrane. Memsat-SVM can predict re-entrant helices, but Polyphobius treats this helices as a general membrane helices, which crosses the membrane (seen for Q9YDF8), or ignores it (seen for P47863). In case of re-entrant helices predictions also the C-terminal or the N-terminal may be predicted on different membrane sides, as well as some helices may be predicted to lie in a different direction within the membrane, because of an re-entrant helix.
Human Glucosylceramidase
This Protein is not a membrane protein and is located on the extracellular side of the membrane as documented in OPM. For the same reason there exist no entry in the PDBTM, as this databse only contains membrane proteins. The prediction of Polyphobius causes to the same result. Additionally Polyphobius predicted also the signal peptide (including the N/H/C-region). MemsatSVM detected a false positive transmembrane helix. As the Glucosylceramidase cleaves lipids of cell membranes, the active site of the enzyme may be mistaken for a transmembrane helix.
Comparison of TMH for Glucosylceramidase (P04062, human) | ||||
---|---|---|---|---|
Prediction | Assignment | |||
Memsat SVM | Polyphobius | OPM | PDMTM | |
# of TMH | 1 | - | - | - |
TMH Topology | 456-471 | - | - | - |
N-terminal | extracellular | extracellular | extracellular | - |
C-terminal | cytoplasmic | extracellular | extracellular | - |
Signal peptide | 1-34 | 1-40 | - | - |
Re-entrant Helix | - | - | - | - |
Pore-lining Helix | 1 | - | - | - |
Graphical position | - | |||
more information | P04062 | 1OGS | 1OGS is not in the PDBTM |
Aeropyrum pernix Voltage-gated potassium channel
For the protein of the Arachae, Aeropyrum pernix, 4 different pdb ids were found:
- 1ORQ: chain C
- 1ORS: chain C
- 2A0L: chain A/B
- 2KYH: chain A
As they all pdb ids represent structures on different chains, which are not the same, it was difficult to choose one of the ids. In the end the 1ORS was choosen, because of two reasons. The x-ray structure has the highest resolution compared to the others. Aside from this, this structure represents a sensor domain and musst be important for the protein. The predictions have completly different results than the assignments. As the predictions are more similar to each other, they were compared to each other. The same was done for the two assignments. Both predictions have the same number of helices. Nevertheless some helices have a greater deviation in their position. Memsat predicted an re-entrant helix where Polyphobius detected a transmembrane helix. Thats why the N-terminal is predicted different of both programms.
The assignment of OPM has actually one helix more, but only because of a different declaration of its helices than PDBTM. The third helix of PDBTM consist of two shorter consecutive helices. Both together they form one larger helix which crosses the membrane once and are therfore seen as one helix in PDBMT. These two mini-helices which would be too short to cross the membrane alone are counted separatly in OPM. Apart from a light deviation of a few residues at the ends of the helices, the strucure is the same in both databases.
Comparison of TMH for Voltage-gated potassium channel (Q9YDF8, Aeropyrum pernix) | ||||
---|---|---|---|---|
Prediction | Assignment | |||
Memsat SVM | Polyphobius | OPM | PDMTM | |
# of TMH | 6 | 7 | 5 | 4 |
TMH Topology | 1. 43-59 2. 72-90 3. 101-118 4. 128-143 5. 163-184 6. 188-217 7. 221-245 |
1. 42-60 2. 68-88 3. 108-129 4. 137-157 5. 163-184 6. 196-213 7. 224-244 |
1. 25-46 2. 55-78 3. 86-97 4. 100-107 5. 117-148 6. - 7. - |
1. 27-50 2. 55-75 3. 88-107 4. - 5. 118-142 6. - 7. - |
N-terminal | cytoplasmic | extracellular | cytoplasmic | cytoplasmic |
C-terminal | cytoplasmic | cytoplasmic | cytoplasmic | cytoplasmic |
Signal peptide | - | - | ||
Re-entrant Helix | 1 | - | ||
Pore-lining Helix | 1 | - | ||
Graphical position | ||||
more information | Q9YDF8 | 1ORS | 1ORS |
Human Lysosome-associated membrane glycoprotein 1
Both predictions have results similar to the assignments of OPM and PDBMT. All predicted transmembrane helices differ in their position only by a few residues. The protein consists of 6 transmembrane helices and 2 re-entrant helices. Polyphobius skips the re-entrant helices prediction but predicts the remaining membrane helices well. MemsatSVM predicts the re-entrant helices similar to the re-entrant helices of the database entries. Unfortunately MemsatSVM predicts the placing inside the membrane wrong. Instead of the C- and N-terminal situated in the cytoplasm, MemsatSVM places the both ends in the extracellular region.
The two assignments are very similar, OPM does not particularly signs two of its helices as re-entrant but both helices can be seen as re-entrant in the OPM visualisation. The re-entrant helices are colored gold in the PDBTM and are lightly silhouetted against the yellow transmembrane helices. All pictures can be found in the table below.
Comparison of TMH for Lysosome-associated membrane glycoprotein 1 (P47863, human) | ||||
---|---|---|---|---|
Prediction | Assignment | |||
Memsat SVM | Polyphobius | OPM | PDMTM | |
# of TMH | 8 | 6 | 8 (per chain) | 8 (per chain) |
TMH Topology | 1. 35-56 2. 71-89 3. 93-109 4. 113-136 5. 157-178 6. 190-205 7. 209-225 8. 232-252 |
34-58 70-91 115-136 156-177 188-208 231-252 |
34-56 70-88 98-107 112-136 156-178 189-203 214-223 231-252 |
39-55 72-89 95-106 116-133 158-177 188-205 209-222 231-248 |
N-terminal | extracellular | cytoplasmic | cytoplasmic | cytoplasmic |
C-terminal | extracellular | cytoplasmic | cytoplasmic | cytoplasmic |
Signal peptide | 1-20 | |||
Re-entrant Helix | 2 | 95-106 209-222 | ||
Pore-lining Helix | 4 | |||
Graphical position | ||||
more information | P47863 | 2D57 | 2D57 |
Human D3 dopamine receptor
Comparison of TMH for D3 dopamine receptor (P35462, human) | ||||
---|---|---|---|---|
Prediction | Assignment | |||
Memsat SVM | Polyphobius | OPM | PDMTM | |
# of TMH | 6 | 7 | 7 | 7 |
TMH Topology | 32-55 65-88 101-129 151-169 188-209 331-354 |
30-55 66-88 105-126 150-170 188-212 329-352 367-386 |
34-52 67-91 101-126 150-170 187-209 330-351 363-386 |
35-52 68-84 109-123 152-166 191-206 334-347 368-382 |
N-terminal | extracellular | extracellular | extracellular | extracellular |
C-terminal | extracellular | cytoplasmic | cytoplasmic | cytoplasmic |
Signal peptide | 1-29 | - | ||
Re-entrant Helix | - | - | - | |
Pore-lining Helix | 1 | |||
Graphical position | ||||
more information | P35462 | 3PBL | 3PBL |
Signal Peptides
For the following proteins, the signal peptides as well as its cleavage sides were predigted with SignalP:
- Glucosylceramidase (P04062, human)
- Serum albumin (P02768, human)
- Aquaporin 4 (P11279, rat)
- Lysosome-associated membrane glycoprotein 1 (P47863, human)
The four eukaryotic proteins were also looked up in the Signal Peptide Database to compare the entry with the results of the prediction.
Glucosylceramidase (P04062)
For the Glucosylcerbrosidase, the prediction of SignalP differs from the database entry.
In the database the protein has a signal peptide of 39 residues. A signal peptide is characterized with high hydrophobicity in its core region followed by the cleavage site[3]. Especially the residues 18-23 and 27-34 indicate with its higher hydrohobicity to a signal peptide (green area in the hydrophobicity image).
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASG
However, the prediction of SignalP results in no signal peptide. On the visualisation of the different scores below, the green signal peptide score shows the most possible prediction for an signal peptide. The green line is higher for the first 39 residues than for the later residues. But the calculated D-score of the detected peptide lies with 0.37 below the threshold (0.5). The peptide is neglected as signal. These residues are not only defined as signal peptide by the database, but were also detected, with a light deviation, by the transmembrane helix predictors MemsatSVM(residues 1-34) and Polyphobius(residues 1-40).
SignalP result for P04062: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score.
Serum albumin (P02768)
The signal peptide consists of residues 1-18 and is predicted of SignalP as well as documented in the Signal Peptide Databse
MKWVTFISLLFLFSSAYS
The images below show an clearly prediction of the signal peptide. A high S-core for the signal peptide region with a D-score of 0.85 far over th threshold. The cleavage side is predicted between the residue 18 and 19. The database shows a high hydrophobicity for the residues 6-14 which marks the region as signal peptide as well.
SignalP result for P02768: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score.
Aquaporin 4 (P11279)
For Aquaporin the Scores are even higher than for Serum albumin. The signal peptide consists of 28 residues as follows:
MAAPGSARRPLLLLLLLLLLGLMHCASA
The database shows a large hydrophobic region of 17 residues. At the end of the protein a transmembrane helix with a length of 23 residues ending in cytoplasm is documented in the Aquaporin 4 entry. The SignalP prediction gives a D-score of 0.95 for the detected signal peptide. The cleavage site is predicted between the residues 28 and 29 (ASA-AM).
SignalP result for P11279: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score.
Lysosome-associated membrane glycoprotein 1 (P47863)
The rat protein has no entry in the Signal Peptide Database, as no signal peptide exists for it. The visualised results of the prediction show on the first sight, that the protein does not have a signal peptide. All scores are lower than 0.21, which is far below the threshold for signal peptides.
SignalP result for P47863: The green line represents the signal peptide score. The higher the score the higher the probability of a residue being part of a signal peptide. A higher raw cleavage site score (C-score) marks the residue directly after the cleavage side. The blue line shows a combination of the C and S-score. The threshold is marked by a red dotted line.
GO Terms
Not very good prediction, depends a lot on what is known.
Discussion
Other available methods
Prediction of | Tool |
---|---|
secondary structure | GOR |
disorder | DISOPRED2 |
transmembrane helices | MEMSAT3 |
TMHMM | |
PredictProtein | |
DAS | |
HMMTOP | |
TMpred | |
signal peptides | PrediSi |
Polyphobius | |
MemsatSVM | |
SIGCLEAVE | |
ANTHEPROT | |
Signal Find Server | |
SPD | |
SPEPlip | |
SOSUIsignal | |
GO terms |
What else can/is be predicted from protein sequence alone
- Fold recognition (profile based pGenTHREADER and rapid GenTHREADER)
- Fold domain recognition (pDomTHREADER)
- Protein domain prediction (DomPred)
- Homology modelling (BioSerf v2.0)
- Function prediction (eukaryotic function: FFPred v2.0)
- Prediction of TM topology and helix packing (SVM-based MEMPACK)
http://bioinf.cs.ucl.ac.uk/psipred/
- Cleavage site prediction
- Ab initio structure prediction (not very succesfull, combinatorial problem, computational intensive, worse for longer sequences. Moreover biological molecules are not necesserily in the lowest energy comformation.)
- Solvent accesibility
- Metal binding sites, active sites
- Protein protein interactions
- SNPs effect prediction
Which predictions can be improved considerably by structure-based approaches
- Solvent accesibility