Difference between revisions of "Sequence-based predictions HEXA"

From Bioinformatikpedia
(TMHMM)
(Prediction of transmembrane alpha-helices and signal peptides)
 
(21 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
=== Secondary Structure Prediction ===
 
=== Secondary Structure Prediction ===
   
To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods gave similar results or if they differ extremely.
+
To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods give similar results or if they differ extremely.
   
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/secstr_general Here]] you can find some general information about these methods.
 
[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/secstr_general Here]] you can find some general information about these methods.
  +
<br><br>
 
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>
 
----
 
----
   
Line 13: Line 14:
 
After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.
 
After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.
   
To get more insight in the methods and the theory behind them we also offer you an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/disorder_general general information page]].
+
To get more insight into the methods and the theory behind them we also offer you an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/disorder_general general information page]].
  +
<br><br>
 
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>
 
----
 
----
   
Line 24: Line 26:
   
 
To have a closer look at the different methods we again provide an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/transmembrane_signal_peptide_general information page.]]
 
To have a closer look at the different methods we again provide an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/transmembrane_signal_peptide_general information page.]]
  +
<br><br>
 
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>
 
----
 
----
   
Line 32: Line 35:
   
 
Again we also provide an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_terms_general general information page]] about the GO Term methods, we used in our analysis.
 
Again we also provide an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_terms_general general information page]] about the GO Term methods, we used in our analysis.
  +
<br><br>
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>
   
 
== Secondary Structure prediction ==
 
== Secondary Structure prediction ==
Line 77: Line 82:
 
=== Comparison of the different methods ===
 
=== Comparison of the different methods ===
   
To determine how succesful our secondary structure prediction with PSIPRED and Jpred are, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore the main comparison in this case refers to the alpha-helices.
+
To determine how successful our secondary structure prediction with PSIPRED and Jpred were, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore, the main comparison in this case refers to the alpha-helices.
   
For PSIPRED the prediction of the alpha-helices was good. In the most cases the alpha-helices of DSSP und PSIPRED corrspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones.
+
For PSIPRED the prediction of the alpha-helices was good. In most cases the alpha-helices of DSSP and PSIPRED correspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones.
   
For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not also assigned by DSSP. In contrary there are three small helics which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assignes two helices which are separated by a turn and Jpred predicts there only one big helix.
+
For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not assigned by DSSP. In contrary, there are three small helices which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assigns two helices which are separated by a turn and Jpred predicts there only one big helix.
   
All in all, the prediction of the helices is probably good because they correspond mostly with the assignmet of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all.
+
All in all, the prediction of the helices is probably good because they correspond mostly with the assignment of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all.
  +
<br><br>
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>
   
 
== Prediction of disordered regions ==
 
== Prediction of disordered regions ==
   
Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disoredered regions. Therefore, we search our protein in the DisProt database and didn't found it, so our protein doesn't have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check in UniProt, if there is an entry for DisProt.
+
Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disordered regions. Therefore, we search our protein in the [[http://www.disprot.org/ DisProt database]] and did not find it, so our protein does not have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check [[http://www.uniprot.org/ UniProt]], if there is an entry for [[http://www.disprot.org DisProt]].
   
 
=== Results ===
 
=== Results ===
Line 139: Line 146:
   
 
=== Comparison of the different POODLE variants ===
 
=== Comparison of the different POODLE variants ===
POODLE-L doesn't find any disordered regions. This is the result we expected, because our protein doesn't posses any disordered regions.
+
POODLE-L does not find any disordered regions. This is the result we expected, because our protein does not possess any disordered regions.
   
Both POODLE-S variants found several short disordered regions, which is a false positive result. Interesstingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.
+
Both POODLE-S variants found several short disordered regions, which is a false positive result. Interestingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.
   
 
POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).
 
POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).
Line 147: Line 154:
 
Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.
 
Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.
   
In general, these predictions are used, if nothing is known about the protein. Therefore, normally we don't know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important.
+
In general, these predictions are used, if nothing is known about the protein. Therefore, normally we do not know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important.
 
We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.
 
We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.
   
Line 256: Line 263:
 
<br><br>
 
<br><br>
 
POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct.
 
POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct.
The baddest prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).<br><br>
+
The worst prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).<br><br>
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>
   
 
== Prediction of transmembrane alpha-helices and signal peptides ==
 
== Prediction of transmembrane alpha-helices and signal peptides ==
Line 314: Line 322:
 
=== Results ===
 
=== Results ===
   
==== TMHMM ====
+
==== Transmembrane Helices ====
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|-
 
|-
  +
|
|Organism
 
  +
|colspan="3" | TMHMM
  +
|colspan="3" | Phobius
  +
|colspan="3" | PolyPhobius
  +
|colspan="3" | OCTOPUS
  +
|colspan="3" | SPOCTOPUS
  +
|-
  +
|protein
  +
|start position
  +
|end position
  +
|location
  +
|start position
  +
|end position
  +
|location
  +
|start position
  +
|end position
  +
|location
  +
|start position
  +
|end position
  +
|location
 
|start position
 
|start position
 
|end position
 
|end position
 
|location
 
|location
 
|-
 
|-
|HEXA HUMAN
+
|rowspan="3" | HEXA HUMAN
 
|1
 
|1
  +
|529
  +
|outside
  +
|23
  +
|529
  +
|outside
  +
|20
  +
|520
  +
|outside
  +
|1
  +
|2
  +
|inside
  +
|22
 
|529
 
|529
 
|outside
 
|outside
 
|-
 
|-
|rowspan="13" | BACR HALSA
+
|colspan="9" |
  +
|3
  +
|23
  +
|TM helix
  +
|colspan="3" |
  +
|-
  +
|colspan="9" |
  +
|24
  +
|529
  +
|outside
  +
|colspan="3" |
  +
|-
  +
|rowspan="15" | BACR HALSA
  +
|1
  +
|22
  +
|outside
  +
|
  +
|
  +
|
  +
|
  +
|
  +
|
  +
|1
  +
|22
  +
|outside
 
|1
 
|1
 
|22
 
|22
Line 336: Line 399:
 
|42
 
|42
 
|TM Helix
 
|TM Helix
  +
|23
  +
|42
  +
|TM helix
  +
|22
  +
|43
  +
|TM helix
  +
|23
  +
|43
  +
|TM helix
  +
|23
  +
|43
  +
|TM helix
 
|-
 
|-
 
|43
 
|43
  +
|54
  +
|inside
  +
|43
  +
|53
  +
|inside
  +
|44
  +
|54
  +
|inside
  +
|44
  +
|54
  +
|inside
  +
|44
 
|54
 
|54
 
|inside
 
|inside
Line 344: Line 431:
 
|77
 
|77
 
|TM Helix
 
|TM Helix
  +
|54
  +
|76
  +
|TM helix
  +
|55
  +
|77
  +
|TM helix
  +
|55
  +
|75
  +
|TM helix
  +
|55
  +
|75
  +
|TM helix
 
|-
 
|-
 
|78
 
|78
 
|91
 
|91
  +
|outside
  +
|77
  +
|95
  +
|outside
  +
|78
  +
|94
  +
|outside
  +
|76
  +
|95
  +
|outside
  +
|76
  +
|95
 
|outside
 
|outside
 
|-
 
|-
Line 352: Line 463:
 
|114
 
|114
 
|TM Helix
 
|TM Helix
  +
|96
  +
|114
  +
|TM helix
  +
|95
  +
|114
  +
|TM helix
  +
|96
  +
|116
  +
|TM helix
  +
|96
  +
|116
  +
|TM helix
 
|-
 
|-
 
|115
 
|115
  +
|120
  +
|inside
  +
|115
  +
|120
  +
|inside
  +
|115
  +
|120
  +
|inside
  +
|117
  +
|121
  +
|inside
  +
|117
 
|120
 
|120
 
|inside
 
|inside
Line 360: Line 495:
 
|143
 
|143
 
|TM Helix
 
|TM Helix
  +
|121
  +
|142
  +
|TM helix
  +
|121
  +
|141
  +
|TM helix
  +
|122
  +
|142
  +
|TM helix
  +
|121
  +
|141
  +
|TM helix
 
|-
 
|-
 
|144
 
|144
  +
|147
  +
|outside
  +
|143
  +
|147
  +
|outside
  +
|142
  +
|147
  +
|outside
  +
|143
  +
|147
  +
|outside
  +
|142
 
|147
 
|147
 
|outside
 
|outside
Line 368: Line 527:
 
|170
 
|170
 
|TM Helix
 
|TM Helix
  +
|148
  +
|169
  +
|TM helix
  +
|148
  +
|166
  +
|TM helix
  +
|148
  +
|168
  +
|TM helix
  +
|148
  +
|168
  +
|TM helix
 
|-
 
|-
 
|171
 
|171
 
|189
 
|189
  +
|inside
  +
|170
  +
|189
  +
|inside
  +
|167
  +
|186
  +
|inside
  +
|169
  +
|185
  +
|inside
  +
|169
  +
|185
 
|inside
 
|inside
 
|-
 
|-
Line 376: Line 559:
 
|212
 
|212
 
|TM Helix
 
|TM Helix
  +
|190
  +
|212
  +
|TM helix
  +
|187
  +
|205
  +
|TM helix
  +
|186
  +
|206
  +
|TM helix
  +
|186
  +
|206
  +
|TM helix
 
|-
 
|-
 
|213
 
|213
 
|262
 
|262
  +
|outside
  +
|213
  +
|217
  +
|outside
  +
|206
  +
|215
  +
|outside
  +
|207
  +
|216
  +
|outside
  +
|207
  +
|216
 
|outside
 
|outside
 
|-
 
|-
  +
|colspan="3" |
|RET4 HUMAN
 
  +
|218
  +
|237
  +
|TM helix
  +
|216
  +
|237
  +
|TM helix
  +
|217
  +
|237
  +
|TM helix
  +
|217
  +
|237
  +
|TM helix
  +
|-
  +
|colspan="3" |
  +
|238
  +
|262
  +
|inside
  +
|238
  +
|262
  +
|inside
  +
|238
  +
|262
  +
|inside
  +
|238
  +
|262
  +
|inside
  +
|-
  +
|rowspan="3" | RET4 HUMAN
  +
|colspan="9" |
  +
|1
  +
|1
  +
|inside
  +
|colspan="3" |
  +
|-
  +
|colspan="9" |
  +
|2
  +
|23
  +
|TM helix
  +
|colspan="3" |
  +
|-
 
|1
 
|1
  +
|201
  +
|outside
  +
|19
  +
|201
  +
|outside
  +
|19
  +
|201
  +
|outside
  +
|24
  +
|201
  +
|outside
  +
|20
 
|201
 
|201
 
|outside
 
|outside
 
|-
 
|-
|INSL5 HUMAN
+
|rowspan="3" | INSL5 HUMAN
  +
|colspan="9" |
  +
|1
  +
|1
  +
|inside
  +
|colspan="3" |
  +
|-
  +
|colspan="9" |
  +
|2
  +
|32
  +
|TM helix
  +
|colspan="3" |
  +
|-
 
|1
 
|1
  +
|135
  +
|outside
  +
|23
  +
|135
  +
|outside
  +
|23
  +
|135
  +
|outside
  +
|33
  +
|135
  +
|outside
  +
|24
 
|135
 
|135
 
|outside
 
|outside
 
|-
 
|-
 
|rowspan="5" | LAMP1 HUMAN
 
|rowspan="5" | LAMP1 HUMAN
  +
|1
  +
|10
  +
|inside
  +
|colspan="6" |
  +
|1
 
|10
 
|10
 
|inside
 
|inside
  +
|colspan="3" |
 
|-
 
|-
 
|11
 
|11
 
|33
 
|33
 
|TM Helix
 
|TM Helix
  +
|colspan="6" |
  +
|11
  +
|31
  +
|TM helix
  +
|colspan="3" |
 
|-
 
|-
 
|34
 
|34
  +
|383
  +
|outside
  +
|29
  +
|381
  +
|outside
  +
|29
  +
|381
  +
|outside
  +
|32
  +
|383
  +
|outside
  +
|30
 
|383
 
|383
 
|outside
 
|outside
Line 406: Line 712:
 
|406
 
|406
 
|TM Helix
 
|TM Helix
  +
|382
  +
|405
  +
|TM helix
  +
|382
  +
|405
  +
|TM helix
  +
|384
  +
|404
  +
|TM helix
  +
|384
  +
|404
  +
|TM helix
 
|-
 
|-
 
|407
 
|407
 
|417
 
|417
 
|inside
 
|inside
  +
|406
  +
|417
  +
|outside
  +
|406
  +
|417
  +
|outside
  +
|405
  +
|417
  +
|outside
  +
|405
  +
|417
  +
|outside
  +
|-
  +
|rowspan="5" | A4 HUMAN
  +
|colspan="9" |
  +
|1
  +
|5
  +
|outside
  +
|colspan="3" |
  +
|-
  +
|colspan="9" |
  +
|6
  +
|11
  +
|R
  +
|colspan="3" |
 
|-
 
|-
|rowspan="3" | A4 HUMAN
 
 
|1
 
|1
 
|700
 
|700
  +
|outside
  +
|18
  +
|700
  +
|outside
  +
|18
  +
|700
  +
|outside
  +
|12
  +
|701
  +
|outside
  +
|19
  +
|701
 
|outside
 
|outside
 
|-
 
|-
Line 419: Line 773:
 
|723
 
|723
 
|TM Helix
 
|TM Helix
  +
|701
  +
|723
  +
|TM helix
  +
|701
  +
|723
  +
|TM helix
  +
|702
  +
|722
  +
|TM helix
  +
|702
  +
|722
  +
|TM helix
 
|-
 
|-
 
|724
 
|724
  +
|770
  +
|inside
  +
|724
  +
|770
  +
|inside
  +
|724
  +
|770
  +
|inside
  +
|723
  +
|770
  +
|inside
  +
|723
 
|770
 
|770
 
|inside
 
|inside
 
|-
 
|-
 
|}
 
|}
  +
<br><br>
  +
On the table above, you can see the summary of the results of the different methods which predict transmembrane helices. As you can see on this table, OCTOPUS often predicts a transmembrane helix, although all other methods do not predict one. Phobis, PolyPhobius and SPOCTOPUS show always very similar result, whereas TMHMM and OCTOPUS differ from these results.<br><br>
  +
  +
==== Signal Peptide ====
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
|
  +
|colspan="2" | Phobius
  +
|colspan="2" | PolyPhobius
  +
|colspan="2" | SPOCTOPUS
  +
|colspan="1" | TargetP
  +
|colspan="2" | SignalP
  +
|-
  +
|protein
  +
|start position
  +
|end position
  +
|start position
  +
|end position
  +
|start position
  +
|end position
  +
|location
  +
|start position
  +
|end position
  +
|-
  +
|HEXA HUMAN
  +
|1
  +
|22
  +
|1
  +
|19
  +
|7
  +
|21
  +
|secretory pathway
  +
|1
  +
|22
  +
|-
  +
|BACR HALSA
  +
|colspan="6" | no prediction available
  +
|secretory pathway
  +
|1
  +
|38
  +
|-
  +
|RET4 HUMAN
  +
|1
  +
|18
  +
|1
  +
|18
  +
|6
  +
|19
  +
|secretory pathway
  +
|1
  +
|18
  +
|-
  +
|INSL5 HUMAN
  +
|1
  +
|22
  +
|1
  +
|22
  +
|6
  +
|23
  +
|secretory pathway
  +
|1
  +
|22
  +
|-
  +
|LAMP1 HUMAN
  +
|1
  +
|28
  +
|1
  +
|28
  +
|12
  +
|29
  +
|secretory pathway
  +
|1
  +
|28
  +
|-
  +
|A4 HUMAN
  +
|1
  +
|17
  +
|1
  +
|17
  +
|5
  +
|18
  +
|secretory pathway
  +
|1
  +
|15
  +
|-
  +
|}
  +
<br>
  +
In the last table there is a list with the results of the prediction of the signal peptides created by different methods. As we can see on the first look, all methods predict always a signal peptide, although the stop position of this signal differ. Phobius, PolyPhobius and SPOCTOPUS failed by predicting the signal peptide from BACR_HALSA. Furthermore, TargetP do not predict the position of the signal peptide, instead it only predicts the location of the protein.<br><br>
   
 
=== Comparison of the different methods ===
 
=== Comparison of the different methods ===
Line 693: Line 1,159:
 
|}
 
|}
   
TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices.
+
TMHMM is the worst prediction method. This can also be seen on the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices.
 
SPOCTOPUS and PolyPhobius are the best prediction methods.<br><br>
 
SPOCTOPUS and PolyPhobius are the best prediction methods.<br><br>
In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.
+
In general, the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.
 
<br><br>
 
<br><br>
 
* Comparison of signal peptide prediction
 
* Comparison of signal peptide prediction
 
<br><br>
 
<br><br>
Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius.
+
Now we compared TargetP and SignalP which only predict signal peptides. Furthermore, we compared SPOCTOPUS, Phobius and PolyPhobius.
 
TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.
 
TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.
   
Line 898: Line 1,364:
 
In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.<br>
 
In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.<br>
 
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices.
 
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices.
The results of Phobius were a litte bit better than the results of PolyPhobius.<br>
+
The results of Phobius were a little bit better than the results of PolyPhobius.<br>
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.<br><br>
+
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysed BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA do not possess a signal peptide, but every method predicts one. Only the eukaryotic prediction method recognized a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.<br><br>
 
<br><br>
 
<br><br>
 
* Comparison of the combined methods
 
* Comparison of the combined methods
 
<br><br>
 
<br><br>
The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.
+
The last issue, we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 914: Line 1,380:
 
|SPOCTOPUS
 
|SPOCTOPUS
 
|-
 
|-
|rowspan="3" | HEXA_HUMAN
+
|rowspan="3" | HEXA_HUMAN
 
|#wrong predicted residues (TM)
 
|#wrong predicted residues (TM)
 
|0
 
|0
Line 932: Line 1,398:
 
!colspan="5" |
 
!colspan="5" |
 
|-
 
|-
|rowspan="3" | BACR_HALSA
+
|rowspan="3" | BACR_HALSA
 
|#wrong predicted residues (TM)
 
|#wrong predicted residues (TM)
 
|29
 
|29
Line 1,020: Line 1,486:
 
|no prediction
 
|no prediction
 
|-
 
|-
!colspan="5" | Average
+
!colspan="5" | Average
 
|-
 
|-
 
|rowspan="3" |
 
|rowspan="3" |
Line 1,040: Line 1,506:
 
|}
 
|}
   
In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.<br>
+
In general, PolyPhobius gave the best results. Although it predicts the signal peptide stop position a little bit worse than Phobius, the transmembrane prediction is significant better than by the prediction of Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.<br>
 
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.<br><br>
 
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.<br><br>
  +
<br><br>
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>
   
== Prediction of GO terms ==
+
==== Signal Peptide ====
 
Before we start with out analysis, we decided to check the GO annotations for the six sequences:
 
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
  +
|
!colspan="2" | HEXA_HUMAN
 
  +
|colspan="2" | Phobius
  +
|colspan="2" | PolyPhobius
  +
|colspan="2" | SPOCTOPUS
  +
|colspan="1" | TargetP
  +
|colspan="2" | SignalP
 
|-
 
|-
  +
|protein
|rowspan="14" | Process
 
  +
|start position
|skeletal system development
 
  +
|end position
  +
|start position
  +
|end position
  +
|start position
  +
|end position
  +
|location
  +
|start position
  +
|end position
 
|-
 
|-
  +
|HEXA HUMAN
|carbohydrate metabolic process
 
  +
|1
  +
|22
  +
|1
  +
|19
  +
|7
  +
|21
  +
|secretory pathway
  +
|1
  +
|22
 
|-
 
|-
  +
|BACR HALSA
|ganglioside catabolic process
 
  +
|colspan="6" | no prediction available
  +
|secretory pathway
  +
|1
  +
|38
 
|-
 
|-
  +
|RET4 HUMAN
|lysosome organization
 
  +
|1
  +
|18
  +
|1
  +
|18
  +
|6
  +
|19
  +
|secretory pathway
  +
|1
  +
|18
 
|-
 
|-
  +
|INSL5 HUMAN
|sensory perception of sound
 
  +
|1
  +
|22
  +
|1
  +
|22
  +
|6
  +
|23
  +
|secretory pathway
  +
|1
  +
|22
 
|-
 
|-
  +
|LAMP1 HUMAN
|locomotory behavior
 
  +
|1
  +
|28
  +
|1
  +
|28
  +
|12
  +
|29
  +
|secretory pathway
  +
|1
  +
|28
 
|-
 
|-
  +
|A4 HUMAN
|adult walking behavior
 
|-
+
|1
  +
|17
|lipid storage
 
|-
+
|1
  +
|17
|sexual reproduction
 
|-
+
|5
  +
|18
|glycosaminoglycan metabolic process
 
  +
|secretory pathway
|-
 
  +
|1
|myelination
 
|-
+
|15
|cell morphogenesis involved in neuron differentiation
 
|-
 
|neuromuscular process controlling posture
 
|-
 
|neuromuscular process controlling balance
 
|-
 
|rowspan="8" |Function
 
|catalytic activity
 
|-
 
|hydrolase activity, hydrolyzing O-glycosyl compounds
 
|-
 
|beta-N-acetylhexosaminidase activity
 
|-
 
|protein binding
 
|-
 
|hydrolase activity
 
|-
 
|hydrolase activity, acting on glycosyl bonds
 
|-
 
|cation binding
 
|-
 
|protein heterodimerization activity
 
|-
 
|rowspan="2" |Component
 
|lysosome
 
|-
 
|membrane
 
|-
 
!colspan="2" | BACR_HALSA
 
|-
 
|rowspan="6" | Process
 
|transport
 
|-
 
|ion transport
 
|-
 
|phototransduction
 
|-
 
|proton transport
 
|-
 
|protein-chromophore linkage
 
|-
 
|response to stimulus
 
|-
 
|rowspan="3" | Function
 
|receptor activity
 
|-
 
|ion channel activity
 
|-
 
|photoreceptor activity
 
|-
 
|rowspan="3" | Component
 
|plasma membrane
 
|-
 
|membrane
 
|-
 
|integral to membrane
 
|-
 
!colspan="2" | RET4_HUMAN
 
|-
 
|rowspan="30"| Process
 
|eye development
 
|-
 
|gluconeogenesis
 
|-
 
|transport
 
|-
 
|spermatogenesis
 
|-
 
|heart development
 
|-
 
|visual perception
 
|-
 
|male gonad development
 
|-
 
|embryo development
 
|-
 
|maintenance of gastrointestinal epithelium
 
|-
 
|lung development
 
|-
 
|positive regulation of insulin secretion
 
|-
 
|response to retinoic acid
 
|-
 
|response to insulin stimulus
 
|-
 
|retinol transport
 
|-
 
|retinol metabolic process
 
|-
 
|glucose homeostasis
 
|-
 
|response to ethanol
 
|-
 
|embryonic organ morphogenesis
 
|-
 
|embryonic skeletal system development
 
|-
 
|cardiac muscle tissue development
 
|-
 
|female genitalia morphogenesis
 
|-
 
|detection of light stimulus involved in visual perception
 
|-
 
|positive regulation of immunoglobulin secretion
 
|-
 
|retina development in camera-type eye
 
|-
 
|negative regulation of cardiac muscle cell proliferation
 
|-
 
|embryonic retina morphogenesis in camera-type eye
 
|-
 
|uterus development
 
|-
 
|vagina development
 
|-
 
|urinary bladder development
 
|-
 
|heart trabecula formation
 
|-
 
|rowspan="7" | Function
 
|transporter activity
 
|-
 
|binding
 
|-
 
|retinoid binding
 
|-
 
|protein binding
 
|-
 
|retinal binding
 
|-
 
|retinol binding
 
|-
 
|retinol transporter activity
 
|-
 
|rowspan="2" | Component
 
|extracellular region
 
|-
 
|extracellular space
 
|-
 
!colspan="2" | INSL5_HUMAN
 
|-
 
|Process
 
|biological_process
 
|-
 
|Function
 
|hormone activity
 
|-
 
|rowspan="2" | Component
 
|cellular_component
 
|-
 
|extracellular region
 
|-
 
!colspan="2" | LAMP1_HUMAN
 
|-
 
|Process
 
|autophagy
 
|-
 
|rowspan="16" | Component
 
|membrane fraction
 
|-
 
|lysosome
 
|-
 
|lysosomal membrane
 
|-
 
|endosome
 
|-
 
|late endosome
 
|-
 
|multivesicular body
 
|-
 
|plasma membrane
 
|-
 
|integral to plasma membrane
 
|-
 
|external side of plasma membrane
 
|-
 
|cell surface
 
|-
 
|endosome membrane
 
|-
 
|membrane
 
|-
 
|integral to membrane
 
|-
 
|vesicle
 
|-
 
|sarcolemma
 
|-
 
|melanosome
 
|-
 
!colspan="2" | A4_HUMAN
 
|-
 
|rowspan="42" | Process
 
|G2 phase of mitotic cell cycle
 
|-
 
|suckling behavior
 
|-
 
|platelet degranulation
 
|-
 
|mRNA polyadenylation
 
|-
 
|regulation of translation
 
|-
 
|protein phosphorylation
 
|-
 
|cellular copper ion homeostasis
 
|-
 
|endocytosis
 
|-
 
|apoptosis
 
|-
 
|induction of apoptosis
 
|-
 
|cell adhesion
 
|-
 
|regulation of epidermal growth factor receptor activity
 
|-
 
|Notch signaling pathway
 
|-
 
|axonogenesis
 
|-
 
|blood coagulation
 
|-
 
|mating behavior
 
|-
 
|locomotory behavior
 
|-
 
|axon cargo transport
 
|-
 
|cell death
 
|-
 
|adult locomotory behavior
 
|-
 
|visual learning
 
|-
 
|negative regulation of peptidase activity
 
|-
 
|positive regulation of peptidase activity
 
|-
 
|axon midline choice point recognition
 
|-
 
|neuron remodeling
 
|-
 
|dendrite development
 
|-
 
|platelet activation
 
|-
 
|extracellular matrix organization
 
|-
 
|forebrain development
 
|-
 
|neuron projection development
 
|-
 
|ionotropic glutamate receptor signaling pathway
 
|-
 
|regulation of multicellular organism growth
 
|-
 
|innate immune response
 
|-
 
|negative regulation of neuron differentiation
 
|-
 
|positive regulation of mitotic cell cycle
 
|-
 
|positive regulation of transcription from RNA polymerase II promoter
 
|-
 
|collateral sprouting in absence of injury
 
|-
 
|regulation of synapse structure and activity
 
|-
 
|neuromuscular process controlling balance
 
|-
 
|synaptic growth at neuromuscular junction
 
|-
 
|neuron apoptosis
 
|-
 
|smooth endoplasmic reticulum calcium ion homeostasis
 
|-
 
|rowspan="11" | Function
 
|DNA binding
 
|-
 
|serine-type endopeptidase inhibitor activity
 
|-
 
|receptor binding
 
|-
 
|binding
 
|-
 
|protein binding
 
|-
 
|peptidase activator activity
 
|-
 
|peptidase inhibitor activity
 
|-
 
|acetylcholine receptor binding
 
|-
 
|identical protein binding
 
|-
 
|metal ion binding
 
|-
 
|PTB domain binding
 
|-
 
|rowspan="24" | Component
 
|extracellular region
 
|-
 
|membrane fraction
 
|-
 
|cytoplasm
 
|-
 
|Golgi apparatus
 
|-
 
|plasma membrane
 
|-
 
|integral to plasma membrane
 
|-
 
|coated pit
 
|-
 
|cell surface
 
|-
 
|membrane
 
|-
 
|integral to membrane
 
|-
 
|synaptosome
 
|-
 
|axon
 
|-
 
|platelet alpha granule lumen
 
|-
 
|cytoplasmic vesicle
 
|-
 
|neuromuscular junction
 
|-
 
|ciliary rootlet
 
|-
 
|neuron projection
 
|-
 
|dendritic spine
 
|-
 
|dendritic shaft
 
|-
 
|intracellular membrane-bounded organelle
 
|-
 
|apical part of cell
 
|-
 
|synapse
 
|-
 
|perinuclear region of cytoplasm
 
|-
 
|spindle midzone
 
 
|-
 
|-
 
|}
 
|}
 
<br>
 
<br>
  +
In the last table there is a list with the results of the prediction of the signal peptides created by different methods.<br><br>
A detailed list of the GO annotation terms of each protein can be found [[https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Go_annotations_here here]].
 
===GOPET===
 
   
  +
=== Comparison of the different methods ===
We tried to predict the GO annotations with GOPET for our six different proteins.
 
 
<br><br>
 
<br><br>
  +
We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.
* '''HEXA_HUMAN'''
 
 
<br><br>
 
<br><br>
  +
* Comparison of transmembrane helix prediction
[[Image:hexa_human_gopet.png|center|Result of the GOPET prediction for HEXA_HUMAN]]
 
  +
<br><br>
 
  +
Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
The method only predicts functional GO terms. HEXA_HUMAN has 8 annotated GO functions. The methods predicts also 8 GO function terms. Therefore we decided to check if all predictions are correct. We checked if the general term is correct and also if the GO number is correct.
 
  +
<br>
  +
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
  +
|rowspan="2" |
|GO term
 
  +
|rowspan="2" |
|confidence
 
  +
|colspan="5" | methods
|prediction term
 
  +
|rowspan="1" |
|prediction GOid
 
 
|-
 
|-
  +
|TMHMM
|hexosamidase activity
 
  +
|Phobius
|97%
 
  +
|PolyPhobius
|right
 
  +
|OCTOPUS
|wrong
 
  +
|SPOCTOPUS
  +
|Transmembrane protein
 
|-
 
|-
  +
|rowspan="5" | HEXA_HUMAN
|beta-N-acetylhexosamidase activity
 
  +
|#wrong transmembrane
|96%
 
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|0
  +
|0
  +
|rowspan="5" | no
 
|-
 
|-
  +
|#wrong outside
|hydrolase activity
 
|96%
+
|0
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|0
 
|-
 
|-
  +
|#wrong insde
|hydrolase activity acting on glycosyl bonds
 
|96%
+
|0
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|0
 
|-
 
|-
  +
|#wrong sum
|hydrolase activity hydrolyzing O-glycosyl compounds
 
|96%
+
|0
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|0
 
|-
 
|-
  +
|%wrong predicted
|catalytic activity
 
|96%
+
|0%
  +
|0%
|right
 
  +
|0%
|right
 
  +
|0%
  +
|0%
  +
|-
  +
!colspan="8" |
 
|-
 
|-
  +
|rowspan="5" | BACR_HALSA
|hydrolase activity hydrolyzing N-glycosyl compounds
 
  +
|#wrong transmembrane
|78%
 
  +
|24
|wrong
 
  +
|20
|wrong
 
  +
|12
  +
|16
  +
|11
  +
|rowspan="5" | yes (7 transmembrane helices)
 
|-
 
|-
  +
|#wrong outside
|protein heterodimerization activity
 
|61%
+
|46
  +
|5
|right
 
  +
|3
|right
 
  +
|4
  +
|6
 
|-
 
|-
  +
|#wrong inside
|}
 
  +
|4
<br><br>
 
  +
|4
* '''BACR_HALSA'''
 
  +
|2
<br><br>
 
  +
|0
[[Image:bacr_halsa_gopet.png|center|Result of the GOPET prediction for BACR_HALSA]]
 
  +
|0
 
The method only predicts functional GO terms. BACR_HALSA has 3 annotated GO functions. The methods predicts also 3 GO function terms. Therefore we decided to check if all predictions are correct.
 
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|GO term
 
|confidence
 
|prediction term
 
|prediction GOid
 
 
|-
 
|-
  +
|#wrong sum
|ion channel activity
 
|77%
+
|74
  +
|29
|right
 
  +
|17
|right
 
  +
|20
  +
|17
 
|-
 
|-
  +
|%wrong predicted
|G-protein coupled photoreceptor activity
 
|75%
+
|29%
  +
|11%
|right
 
  +
|6%
|wrong
 
  +
|8%
  +
|6%
 
|-
 
|-
  +
!colspan="8" |
|hydrogen ion transmembrane transporter activity
 
|60%
 
|wrong
 
|wrong
 
 
|-
 
|-
  +
|rowspan="5" | RET4_HUMAN
|}
 
  +
|#wrong transmembrane
<br><br>
 
  +
|0
* '''RET4_HUMAN'''
 
  +
|0
<br><br>
 
  +
|0
[[Image:ret4_human_gopet.png|center|Result of the GOPET prediction for RET4_HUMAN]]
 
  +
|5
 
  +
|0
The method only predicts functional GO terms. RET4_HUMAN has 7 annotated GO functions. The methods predicts 8 GO function terms. Therefore we decided to check if all predictions are correct.
 
  +
|rowspan="5" | no
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|GO term
 
|confidence
 
|prediction term
 
|prediction GOid
 
 
|-
 
|-
  +
|#wrong outside
|binding
 
|90%
+
|0
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|0
 
|-
 
|-
  +
|#wrong inside
|retiniod binding
 
|81&
+
|0
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|0
 
|-
 
|-
  +
|#wrong sum
|lipid binding
 
|80%
+
|0
  +
|0
|wrong
 
  +
|0
|wrong
 
  +
|5
  +
|0
 
|-
 
|-
  +
|%wrong predicted
|retional binding
 
|78%
+
|0%
  +
|0%
|right
 
  +
|0%
|right
 
  +
|2%
  +
|0%
 
|-
 
|-
  +
!colspan="8" |
|transporter activity
 
|78%
 
|right
 
|right
 
 
|-
 
|-
  +
|rowspan="5" | INSL5_HUMAN
|retinal binding
 
  +
|#wrong transmembrane
|78%
 
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|10
  +
|0
  +
|rowspan="5" | no
 
|-
 
|-
  +
|#wrong outside
|lipid transport activity
 
|69%
+
|0
  +
|0
|wrong
 
  +
|0
|wrong
 
  +
|0
  +
|0
 
|-
 
|-
  +
|#wrong inside
|high-density lipoprotein particle binding
 
|60%
+
|0
  +
|0
|wrong
 
  +
|0
|wrong
 
  +
|0
  +
|0
 
|-
 
|-
  +
|#wrong sum
|}
 
  +
|0
<br><br>
 
  +
|0
*''' INSL5_HUMAN'''
 
  +
|0
<br><br>
 
  +
|10
[[Image:insl5_human_gopet.png|center|Result of the GOPET prediction for INSL5_HUMAN]]
 
  +
|0
 
The method only predicts functional GO terms. INSL5_HUMAN has 1 annotated GO functions. The methods predicts also 1 GO function terms. Therefore we decided to check if all predictions are correct.
 
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|GO term
 
|confidence
 
|prediction term
 
|prediction GOid
 
 
|-
 
|-
  +
|%wrong predicted
|hormone activity
 
|80%
+
|0%
  +
|0%
|right
 
  +
|0%
|right
 
  +
|8%
  +
|0%
 
|-
 
|-
  +
!colspan="8" |
|}
 
<br><br>
 
* '''LAMP1_HUMAN'''
 
<br><br>
 
[[Image:lamp1_human_gopet.png|center|Result of the GOPET prediction for LAMP1_HUMAN]]
 
 
The method only predicts functional GO terms. LAMP1_HUMAN has 0 annotated GO functions. The methods predicts 2 GO function terms. Therefore the predictions are wrong.
 
<br><br>
 
* '''A4_HUMAN'''
 
<br><br>
 
[[Image:a4_human_gopet.png|center|Result of the GOPET prediction for A4_HUMAN]]
 
 
The method only predicts functional GO terms. A4_HUMAN has 11 annotated GO functions. The methods predicts 13 GO function terms. Therefore we decided to check if all predictions are correct.
 
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|GO term
 
|confidence
 
|prediction term
 
|prediction GOid
 
 
|-
 
|-
  +
|rowspan="5" | LAMP1_HUMAN
|endopeptidase inhibitor activity
 
  +
|#wrong transmembrane
|87%
 
  +
|5
|right
 
  +
|3
|wrong
 
  +
|4
  +
|3
  +
|1
  +
|rowspan="5" | yes (single-spanning)
 
|-
 
|-
  +
|#wrong outside
|serine-type endopeptidase inhibitor activity
 
|86%
+
|2
  +
|0
|right
 
  +
|0
|right
 
  +
|1
  +
|1
 
|-
 
|-
  +
|#wrong inside
|plasmin inhibitor activity
 
|83%
+
|0
  +
|0
|wrong
 
  +
|0
|wrong
 
  +
|1
  +
|1
 
|-
 
|-
  +
|#wrong sum
|trypsin inhibitor activtiy
 
|83%
+
|7
  +
|3
|wrong
 
  +
|4
|wrong
 
  +
|5
  +
|3
 
|-
 
|-
  +
|%wrong predicted
|peptidase inhibitor activity
 
|82%
+
|2%
  +
|0%
|right
 
  +
|1%
|right
 
  +
|1%
  +
|0%
 
|-
 
|-
  +
!colspan="8" |
|binding
 
|79%
 
|right
 
|right
 
 
|-
 
|-
  +
|rowspan="5" | A4_HUMAN
|protein binding
 
  +
|#wrong transmembrane
|74%
 
  +
|0
|right
 
  +
|0
|right
 
  +
|0
  +
|0
  +
|0
  +
|rowspan="5" | yes (single-spanning)
 
|-
 
|-
  +
|#wrong outside
|metal ion binding
 
|73%
+
|1
  +
|1
|right
 
  +
|1
|right
 
  +
|1
  +
|2
 
|-
 
|-
  +
|#wrong inside
|DNA binding
 
|71%
+
|0
  +
|0
|right
 
  +
|0
|right
 
  +
|1
  +
|1
 
|-
 
|-
  +
|#wrong sum
|heparin binding
 
|70%
+
|1
  +
|1
|wrong
 
  +
|1
|right
 
  +
|2
  +
|3
 
|-
 
|-
  +
|%wrong predicted
|zinc ion binding
 
|69%
+
|0%
  +
|0%
|wrong
 
  +
|0%
|wrong
 
  +
|0%
  +
|0%
 
|-
 
|-
  +
!colspan="8" | Average number of wrong predicted residues
|copper ion binding
 
|69%
 
|wrong
 
|wrong
 
|-
 
|iron ion binding
 
|67%
 
|wrong
 
|wrong
 
 
|-
 
|-
  +
|
  +
|
  +
|13.6
  +
|5.5
  +
|3.6
  +
|7
  +
|3.8
  +
|
 
|}
 
|}
<br><br>
 
<br><br>
 
 
=== Pfam ===
 
We used the webserver for our analysis. We decided to only trust the significant Pfam-A matches. To check if the predictions are correct we mapped the Pfam ids to the Go ids with help of a mapping website [[http://www.geneontology.org/external2go/pfam2go]]. If a successful mapping was not possible, we compared the names of the predicted Pfam family with the names of the GO terms. If the names are similar or equal, we decided to trust the mapping.
 
   
  +
TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices.
  +
SPOCTOPUS and PolyPhobius are the best prediction methods.<br><br>
  +
In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.
 
<br><br>
 
<br><br>
  +
* Comparison of signal peptide prediction
* '''HEXA_HUMAN'''
 
 
Graphical representation of the prediction result of Pfam:
 
[[Image:hexa_human_pfam.png|center|Result of the Pfam prediction for HEXA_HUMAN]]
 
 
Pfam found two significant Pfam-A matches:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|Family
 
|E-Value
 
|GO id
 
|prediction
 
|-
 
|Glycosyl hydrolase family 20, domain 2
 
|3.7e-43
 
|GO:0004553
 
|right
 
|-
 
|Glycosyl hydrolase family 20, catalytic domain
 
|1.8e-84
 
|GO:0005975
 
|right
 
|-
 
|}
 
 
<br><br>
 
<br><br>
  +
Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius.
* '''BACR_HALSA'''
 
  +
TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.
<br><br>
 
Graphical representation of the prediction result of Pfam:
 
[[Image:bacr_halsa_pfam.png|center|Result of the Pfam prediction for BACR_HALSA]]
 
   
Pfam found one significant Pfam-A matches:
 
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
  +
|rowspan="2" |
|Family
 
  +
|rowspan="2" |
|E-Value
 
  +
|colspan="6" | methods
|GOid
 
|prediction
 
 
|-
 
|-
  +
|real position
|rowspan="3" | Bacteriorhodopsin-like protein
 
  +
|Phobius
|rowspan="3" | 2e-88
 
  +
|PolyPhobius
|GO:0005216
 
  +
|SPOCTOPUS
|right
 
  +
|TargetP
  +
|SignalP
 
|-
 
|-
  +
|rowspan="3" | HEXA_HUMAN
|GO:0006811
 
  +
|stop position
|right
 
  +
|22
  +
|22
  +
|19
  +
|21
  +
|no prediction
  +
|22
 
|-
 
|-
  +
|#wrong residues
|GO:0016020
 
  +
|
|right
 
  +
|0
  +
|3
  +
|3
  +
|no prediction
  +
|0
 
|-
 
|-
  +
|location
|}
 
  +
|secretory pathway
<br><br>
 
  +
|secretory pathway
* '''RET4_HUMAN'''
 
  +
|secretory pathway
<br><br>
 
  +
|no prediction
Graphical representation of the prediction result of Pfam:
 
  +
|secretory pathway
[[Image:ret4_human_pfam.png|center|Result of the Pfam prediction for RET4_HUMAN]]
 
  +
|no prediction
 
Pfam found one significant Pfam-A matches:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|Family
 
|E-Value
 
|GOid
 
|prediction
 
 
|-
 
|-
  +
!colspan="8" |
|Lipocalin/cytosolic fatty-acid binding protein family
 
|1.7e-22
 
|GO:0005488
 
|right
 
|}
 
<br><br>
 
* '''INSL5_HUMAN'''
 
<br><br>
 
Graphical representation of the prediction result of Pfam:
 
[[Image:insl5_human_pfam.png|center|Result of the Pfam prediction for LAMP1_HUMAN]]
 
 
Pfam found two significant Pfam-A matches:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|Family
 
|E-Value
 
|GOid
 
|prediction
 
 
|-
 
|-
|rowspan="2" | Insulin/IGF/Relaxin family
+
|rowspan="3" | BACR_HALSA
  +
|stop position
|rowspan="2" | 6.7e-08
 
  +
|not available
|GO:0005179
 
  +
|no prediction
|right
 
  +
|no prediction
  +
|no prediction
  +
|no prediction
  +
|no consensus prediction
 
|-
 
|-
  +
|#wrong predicted
|GO:0005576
 
  +
|not available
|right
 
  +
|not available
  +
|not available
  +
|not available
  +
|no prediction
  +
|not available
 
|-
 
|-
  +
|location
|}
 
  +
|membrane
<br><br>
 
  +
|not available
* '''LAMP1_HUMAN'''
 
  +
|not available
<br><br>
 
  +
|not available
Graphical representation of the prediction result of Pfam:
 
  +
|secretory pathway
[[Image:lamp1_human_pfam2.png|center|Result of the Pfam prediction for LAMP1_HUMAN]]
 
  +
|non-signal peptide
 
Pfam found one significant Pfam-A matches:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|Family
 
|E-Value
 
|GOid
 
|prediction
 
 
|-
 
|-
  +
!colspan="8" |
|Lysosome-associated membrane glyoprotein (LAMP)
 
|2.3e-135
 
|GO:0016020
 
|right
 
 
|-
 
|-
  +
|rowspan="3" | RET4_HUMAN
|}
 
  +
|stop position
<br><br>
 
  +
|18
*''' A4_HUMAN'''
 
  +
|18
<br><br>
 
  +
|18
Graphical representation of the prediction result of Pfam:
 
  +
|19
[[Image:a4_human_pfam.png|center|Result of the Pfam prediction for A4_HUMAN]]
 
  +
|no prediction
 
  +
|18
Pfam found six significant Pfam-A matches:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
|Family
 
|E-Value
 
|GOid
 
|prediction
 
 
|-
 
|-
  +
|#wrong predicted
|Amyloid A4 N-terminal heparin-binding
 
  +
|
|4e-42
 
  +
|0
|none
 
  +
|0
|right
 
  +
|1
  +
|no prediction
  +
|0
 
|-
 
|-
  +
|location
|Copper-binding of amyloid precursor CuBD
 
  +
|secretory pathway
|2.3e-27
 
  +
|secretory pathway
|none
 
  +
|secretory pathway
|right
 
  +
|no prediction
|-
 
  +
|secretory pathway
|Kunitz/Bovine pancreatic trypsin inhibitor domain
 
  +
|no prediction
|3e-19
 
|GO:0004867
 
|right
 
 
|-
 
|-
|E2 domain of amyloid precursor protein
 
|1.6e-74
 
|none
 
|right
 
|-
 
|rowspan="2" | Beta-amyloid peptide (beta-APP)
 
|rowspan="2" | 4.3e-28
 
|GO:0005488
 
|right
 
|-
 
|GO:0016021
 
|right
 
|-
 
|Beta-amyloid precursor protein C-terminus
 
|1.1e-29
 
|none
 
|right
 
|-
 
|}
 
<br><br>
 
<br><br>
 
   
  +
!colspan="8" |
=== ProtFun 2.2 ===
 
<br><br>
 
ProtFun 2.2 does not give clear predictions if the protein belongs to this class or not, instead it gives probabilities and odd scores.
 
We decided to make a cutoff by 2. So all classes with an odd score of 2 or higher are right results for us. You can also find a "=>" sign in the result file. This sign shows the result with the highest information content. We also take this line as result, although if the odd score is lower than 2. If we only have result with a odd score lower than 2, the line with this sign is our onlyest result.<br>
 
Because the prediction categories are very general, it was not possible to map the GOids. Therefore, we checked the known GO annotations. If there was a hint for a category and the protein was predicted to be in this category, we decided that the prediction is right, otherwise if the known GO annotations and the categories conflict, we count the prediction as wrong.
 
<br><br>
 
* ''' HEXA_HUMAN
 
<br><br>
 
The ProtFun Server calculated following prediction result for HEXA_HUMAN:
 
 
{| border="1" style="text-align:center; border-spacing:0;"
 
!colspan="4" | Functional category
 
 
|-
 
|-
  +
|rowspan="3" | INSL5_HUMAN
|Functional category
 
  +
|stop position
|Probability
 
  +
|22
|Odd score
 
  +
|22
|Prediction
 
  +
|22
  +
|22
  +
|no prediction
  +
|22
 
|-
 
|-
  +
|#wrong residues
|Amino acid biosynthesis
 
  +
|
|0.161
 
  +
|0
|7.331
 
  +
|0
|wrong
 
  +
|0
  +
|no prediction
  +
|0
 
|-
 
|-
  +
|location
|Biosynthesis of cofactors
 
  +
|secretory pathway
|0.332
 
  +
|secretory pathway
|4.609
 
  +
|secretory pathway
|right
 
  +
|no prediction
  +
|secretory pathway
  +
|no prediction
 
|-
 
|-
  +
!colspan="8" |
|Cell envelope
 
|0.804 =>
 
|13.186 =>
 
|right
 
 
|-
 
|-
  +
|rowspan="3" | LAMP1_HUMAN
|Cellular processes
 
  +
|stop position
|0.110
 
  +
|28
|1.506
 
  +
|28
|right
 
  +
|28
  +
|29
  +
|no prediction
  +
|28
 
|-
 
|-
  +
|#wrong residues
|Central intermediary metabolism
 
  +
|
|0.432
 
  +
|0
|6.856
 
  +
|0
|right
 
  +
|1
  +
|no prediction
  +
|0
 
|-
 
|-
  +
|location
|Engergy metabolism
 
  +
|transmembrane helix
|0.113
 
  +
|secretory pathway
|1.259
 
  +
|secretory pathway
|right
 
  +
|no prediction
  +
|secretory pathway
  +
|no prediction
 
|-
 
|-
  +
!colspan="8" |
|Fatty acid metabolsim
 
|0.019
 
|1.427
 
|right
 
 
|-
 
|-
  +
|rowspan="3" | A4_HUMAN
|Purines and Pyrimidines
 
  +
|stop position
|0.519
 
  +
|17
|2.136
 
  +
|17
|wrong
 
  +
|17
  +
|18
  +
|no prediction
  +
|17
 
|-
 
|-
  +
|#wrong residues
|Regulatory functions
 
  +
|
|0.018
 
|0.111
+
|0
  +
|0
|right
 
  +
|1
  +
|no prediction
  +
|0
 
|-
 
|-
  +
|location
|Replication and transcription
 
  +
|transmembrane helix
|0.073
 
  +
|secretory pathway
|0.271
 
  +
|secretory pathway
|right
 
  +
|no prediction
  +
|secretory pathway
  +
|secretory pathway
 
|-
 
|-
  +
!colspan="8" | Average number of wrong prediction
|Translation
 
|0.040
 
|0.904
 
|right
 
 
|-
 
|-
  +
|rowspan="2" |
|Transport and binding
 
  +
|sum of wrong predicted residues
|0.685
 
  +
|
|1.670
 
  +
|0
|right
 
|-
+
|3
  +
|2
!colspan="4" | Enyzme/non-enzyme
 
  +
|no prediction
|-
 
  +
|0
|Enzyme/non-enzyme
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Enzyme
 
|0.792 =>
 
|2.764 =>
 
|right
 
|-
 
|Nonenzyme
 
|0.208
 
|0.292
 
|right
 
|-
 
!colspan="4" | Enyzme class
 
|-
 
|Enzyme class
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Oxidoreductase (EC 1.-.-.-)
 
|0.143
 
|0.685
 
|right
 
|-
 
|Transferase (EC 2.-.-.-)
 
|0.201
 
|0.582
 
|right
 
|-
 
|Hydrolase (EC 3.-.-.-)
 
|0.329
 
|1.039
 
|wrong
 
|-
 
|Lyase (EC 4.-.-.-)
 
|0.054
 
|1.143
 
|right
 
|-
 
|Isomerase (EC 5.-.-.-)
 
|0.027
 
|0.856
 
|right
 
|-
 
|Ligase (EC 6.-.-.-)
 
|0.085 =>
 
|1.661 =>
 
|right
 
|-
 
!colspan="4" | Gene ontology category
 
|-
 
|Gene ontology category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Signal transducer
 
|0.083
 
|0.389
 
|right
 
|-
 
|Receptor
 
|0.105
 
|0.617
 
|right
 
|-
 
|Hormone
 
|0.001
 
|0.206
 
|right
 
|-
 
|Structural protein
 
|0.010
 
|0.357
 
|right
 
|-
 
|Transporter
 
|0.024
 
|0.222
 
|right
 
|-
 
|Ion channel
 
|0.018
 
|0.310
 
|right
 
|-
 
|Volatge-gated ion channel
 
|0.002
 
|0.082
 
|right
 
|-
 
|Cation channel
 
|0.010
 
|0.218
 
|right
 
|-
 
|Transcription
 
|0.058
 
|0.453
 
|right
 
|-
 
|Transcription regulation
 
|0.026
 
|0.205
 
|right
 
|-
 
|Stress response
 
|0.004
 
|0.500
 
|right
 
|-
 
|Immune response
 
|0.014
 
|0.167
 
|right
 
|-
 
|Growth factor
 
|0.005
 
|0.372
 
|right
 
|-
 
|Metal ion transport
 
|0.009
 
|0.020
 
|right
 
 
|-
 
|-
  +
|#right predicted locations / #predicted locations
  +
|
  +
|3/5
  +
|3/5
  +
|no prediction
  +
|3/5
  +
|no prediction
 
|}
 
|}
   
  +
SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not.
  +
In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.<br>
  +
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices.
  +
The results of Phobius were a litte bit better than the results of PolyPhobius.<br>
  +
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.<br><br>
 
<br><br>
 
<br><br>
  +
* Comparison of the combined methods
* '''BACR_HALSA'''
 
 
<br><br>
 
<br><br>
  +
The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.
The ProtFun Server calculated following prediction result for BACR_HALSA:
 
  +
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
  +
|rowspan="2" |
!colspan="4" | Functional category
 
  +
|rowspan="2" |
  +
|colspan="3" | methods
 
|-
 
|-
  +
|Phobius
|Functional category
 
  +
|PolyPhobius
|Probability
 
  +
|SPOCTOPUS
|Odd score
 
|Prediction
 
 
|-
 
|-
  +
|rowspan="3" | HEXA_HUMAN
|Amino acid biosynthesis
 
  +
|#wrong predicted residues (TM)
|0.033
 
  +
|0
|1.495
 
  +
|0
|right
 
  +
|0
 
|-
 
|-
  +
|#wrong predicted residues (SP)
|Biosynthesis of cofactors
 
|0.186
+
|0
  +
|3
|2.589
 
  +
|2
|wrong
 
 
|-
 
|-
  +
|location
|Cell envelope
 
|0.029
 
|0.483
 
 
|right
 
|right
|-
 
|Cellular processes
 
|0.051
 
|0.698
 
 
|right
 
|right
  +
|no prediction
 
|-
 
|-
  +
!colspan="5" |
|Central intermediary metabolism
 
|0.045
 
|0.711
 
|right
 
 
|-
 
|-
  +
|rowspan="3" | BACR_HALSA
|Engergy metabolism
 
  +
|#wrong predicted residues (TM)
|0.138
 
  +
|29
|1.537
 
  +
|17
|right
 
  +
|17
 
|-
 
|-
  +
|#wrong predicted residues (SP)
|Fatty acid metabolsim
 
|0.016
+
|n.a.
|1.265
+
|n.a.
  +
|n.a.
|right
 
 
|-
 
|-
  +
|location
|Purines and Pyrimidines
 
|0.302
+
|n.a
|1.244
+
|n.a
  +
|no prediction
|right
 
 
|-
 
|-
  +
!colspan="5" |
|Regulatory functions
 
|0.013
 
|0.080
 
|wrong
 
 
|-
 
|-
  +
|rowspan="3" | RET4_HUMAN
|Replication and transcription
 
  +
|#wrong predicted residues (TM)
|0.019
 
|0.073
+
|0
  +
|0
|right
 
  +
|0
 
|-
 
|-
  +
|#wrong predicted residues (SP)
|Translation
 
|0.059
+
|0
  +
|0
|1.339
 
  +
|0
|right
 
 
|-
 
|-
  +
|location
|Transport and binding
 
|0.791 =>
 
|1.929 =>
 
 
|right
 
|right
|-
 
!colspan="4" | Enyzme/non-enzyme
 
|-
 
|Enzyme/non-enzyme
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Enzyme
 
|0.199
 
|0.696
 
 
|right
 
|right
  +
|no prediction
 
|-
 
|-
  +
!colspan="5" |
|Nonenzyme
 
|0.801 =>
 
|1.122 =>
 
|right
 
 
|-
 
|-
!colspan="4" | Enyzme class
+
|rowspan="3" | INSL5_HUMAN
  +
|#wrong predicted residues (TM)
  +
|0
  +
|0
  +
|0
 
|-
 
|-
  +
|#wrong predicted residues (SP)
|Enzyme class
 
  +
|0
|Probability
 
  +
|0
|Odd score
 
  +
|1
|Prediction
 
 
|-
 
|-
  +
|location
|Oxidoreductase (EC 1.-.-.-)
 
|0.114
 
|0.549
 
 
|right
 
|right
|-
 
|Transferase (EC 2.-.-.-)
 
|0.031
 
|0.091
 
 
|right
 
|right
  +
|no prediction
 
|-
 
|-
  +
!colspan="5" |
|Hydrolase (EC 3.-.-.-)
 
|0.057
 
|0.180
 
|right
 
 
|-
 
|-
  +
|rowspan="3" | LAMP1_HUMAN
|Lyase (EC 4.-.-.-)
 
  +
|#wrong predicted residues (TM)
|0.020
 
  +
|3
|0.430
 
  +
|4
|right
 
  +
|3
 
|-
 
|-
  +
|#wrong predicted residues (SP)
|Isomerase (EC 5.-.-.-)
 
|0.010
+
|0
|0.321
+
|0
  +
|0
|right
 
|-
 
|Ligase (EC 6.-.-.-)
 
|0.017
 
|0.625
 
|right
 
 
|-
 
|-
  +
|location
!colspan="4" | Gene ontology category
 
|-
 
|Gene ontology category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Signal transducer
 
|0.258
 
|1.205
 
 
|wrong
 
|wrong
|-
 
|Receptor
 
|0.355
 
|2.087
 
|right
 
|-
 
|Hormone
 
|0.001
 
|0.206
 
|right
 
|-
 
|Structural protein
 
|0.006
 
|0.200
 
|right
 
|-
 
|Transporter
 
|0.440 =>
 
|4.036 =>
 
|right
 
|-
 
|Ion channel
 
|0.010
 
|0.169
 
 
|wrong
 
|wrong
  +
|no prediction
 
|-
 
|-
  +
!colspan="5" |
|Volatge-gated ion channel
 
|0.004
 
|0.172
 
|right
 
 
|-
 
|-
  +
|rowspan="3" | A4_HUMAN
|Cation channel
 
  +
|#wrong predicted residues (TM)
|0.078
 
  +
|0
|1.689
 
  +
|0
|right
 
  +
|0
 
|-
 
|-
  +
|#wrong predicted residues (SP)
|Transcription
 
  +
|1
|0.026
 
  +
|1
|0.205
 
  +
|3
|right
 
 
|-
 
|-
  +
|location
|Transcription regulation
 
|0.028
 
|0.226
 
|right
 
|-
 
|Stress response
 
|0.012
 
|0.139
 
|right
 
|-
 
|Immune response
 
|0.011
 
|0.128
 
|right
 
|-
 
|Growth factor
 
|0.010
 
|0.727
 
|right
 
|-
 
|Metal ion transport
 
|0.049
 
|0.106
 
|right
 
|-
 
|}
 
<br><br>
 
* '''RET4_HUMAN'''
 
<br><br>
 
The ProtFun Server calculated following prediction result for RET4_HUMAN:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
!colspan="4" | Functional category
 
|-
 
|Functional category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Amino acid biosynthesis
 
|0.017
 
|0.751
 
|right
 
|-
 
|Biosynthesis of cofactors
 
|0.044
 
|0.610
 
|right
 
|-
 
|Cell envelope
 
|0.804 =>
 
|13.186 =>
 
|right
 
|-
 
|Cellular processes
 
|0.075
 
|1.021
 
 
|wrong
 
|wrong
|-
 
|Central intermediary metabolism
 
|0.197
 
|3.128
 
|right
 
|-
 
|Engergy metabolism
 
|0.043
 
|0.475
 
|right
 
|-
 
|Fatty acid metabolsim
 
|0.016
 
|1.265
 
|right
 
|-
 
|Purines and Pyrimidines
 
|0.275
 
|1.131
 
|right
 
|-
 
|Regulatory functions
 
|0.013
 
|0.080
 
|right
 
|-
 
|Replication and transcription
 
|0.022
 
|0.084
 
|right
 
|-
 
|Translation
 
|0.032
 
|0.721
 
|right
 
|-
 
|Transport and binding
 
|0.800
 
|1.951
 
 
|wrong
 
|wrong
  +
|no prediction
 
|-
 
|-
!colspan="4" | Enyzme/non-enzyme
+
!colspan="5" | Average
 
|-
 
|-
  +
|rowspan="3" |
|Enzyme/non-enzyme
 
  +
|avg(#wrong predicted residues (TM))
|Probabilty
 
  +
|5.3
|Odd score
 
  +
|3.5
|Prediction
 
  +
|3.3
 
|-
 
|-
  +
|avg(#wrong predicted residues (SP))
|Enzyme
 
|0.544 =>
+
|0.1
|1.900 =>
+
|0.6
  +
|1
|right
 
 
|-
 
|-
  +
|#location (right predicted) / #location(predicted)
|Nonenzyme
 
  +
|3/5
|0.456
 
  +
|3/5
|0.639
 
  +
|no prediction
|right
 
|-
 
!colspan="4" | Enyzme class
 
|-
 
|Enzyme class
 
|Probabilty
 
|Odd score
 
|Prediction
 
|-
 
|Oxidoreductase (EC 1.-.-.-)
 
|0.095
 
|0.458
 
|right
 
|-
 
|Transferase (EC 2.-.-.-)
 
|0.038
 
|0.109
 
|right
 
|-
 
|Hydrolase (EC 3.-.-.-)
 
|0.235
 
|0.742
 
|right
 
|-
 
|Lyase (EC 4.-.-.-)
 
|0.059 =>
 
|1.264 =>
 
|wrong
 
|-
 
|Isomerase (EC 5.-.-.-)
 
|0.010
 
|0.321
 
|right
 
|-
 
|Ligase (EC 6.-.-.-)
 
|0.017
 
|0.326
 
|right
 
|-
 
!colspan="4" | Gene ontology category
 
|-
 
|Gene ontology category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Signal transducer
 
|0.202
 
|0.942
 
|right
 
|-
 
|Receptor
 
|0.147
 
|0.862
 
|right
 
|-
 
|Hormone
 
|0.004
 
|0.667
 
|right
 
|-
 
|Structural protein
 
|0.002
 
|0.058
 
|right
 
|-
 
|Transporter
 
|0.025
 
|0.232
 
|right
 
|-
 
|Ion channel
 
|0.016
 
|0.288
 
|right
 
|-
 
|Volatge-gated ion channel
 
|0.003
 
|0.148
 
|right
 
|-
 
|Cation channel
 
|0.010
 
|0.215
 
|right
 
|-
 
|Transcription
 
|0.027
 
|0.207
 
|right
 
|-
 
|Transcription regulation
 
|0.025
 
|0.196
 
|right
 
|-
 
|Stress response
 
|0.161
 
|1.829
 
|right
 
|-
 
|Immune response
 
|0.239 =>
 
|2.813 =>
 
|wrong
 
|-
 
|Growth factor
 
|0.023
 
|1.617
 
|right
 
|-
 
|Metal ion transport
 
|0.009
 
|0.020
 
|right
 
 
|-
 
|-
 
|}
 
|}
  +
<br><br>
 
  +
In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.<br>
*'''INSL5_HUMAN'''
 
  +
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.<br><br>
<br><br>
 
  +
The ProtFun Server calculated following prediction result for INSL5_HUMAN:
 
  +
== Prediction of GO terms ==
{| border="1" style="text-align:center; border-spacing:0;"
 
  +
!colspan="4" | Functional category
 
  +
Before we start with our analysis, we decided to check the GO annotations for the six sequences, which can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_annotation_of_the_proteins here]]:
|-
 
  +
|Functional category
 
  +
A detailed list of the GO annotation terms of each protein can be found [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Go_annotations_here here]].
|Probability
 
  +
|Odd score
 
  +
=== Results ===
|Prediction
 
  +
|-
 
  +
We created for each protein an own result page. Sadly, it is not possible to summarize the results in a short way, so please have a look at the different result pages for a detailed output.
|Amino acid biosynthesis
 
  +
|0.011
 
  +
*[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_HEXA_HUMAN HEXA HUMAN]]
|0.484
 
  +
*[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_BACR_HALSA BACR HALSA]]
|right
 
  +
*[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_RET4_HUMAN RET4 HUMAN]]
|-
 
  +
*[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_INSL5_HUMAN INSL5 HUMAN]]
|Biosynthesis of cofactors
 
  +
*[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_LAMP1_HUMAN LAMP1 HUMAN]]
|0.040
 
  +
*[[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_Terms_A4_HUMAN A4 HUMAN]]
|0.558
 
|right
 
|-
 
|Cell envelope
 
|0.756 =>
 
|12.393 =>
 
|right
 
|-
 
|Cellular processes
 
|0.033
 
|0.448
 
|right
 
|-
 
|Central intermediary metabolism
 
|0.048
 
|0.755
 
|right
 
|-
 
|Engergy metabolism
 
|0.036
 
|0.397
 
|right
 
|-
 
|Fatty acid metabolsim
 
|0.016
 
|1.265
 
|right
 
|-
 
|Purines and Pyrimidines
 
|0.144
 
|0.592
 
|right
 
|-
 
|Regulatory functions
 
|0.014
 
|0.087
 
|right
 
|-
 
|Replication and Transcription
 
|0.020
 
|0.075
 
|right
 
|-
 
|Translation
 
|0.032
 
|0.735
 
|right
 
|-
 
|Transport and binding
 
|0.834
 
|2.033
 
|right
 
|-
 
!colspan="4" | Enyzme/non-enzyme
 
|-
 
|Enzyme/non-enzyme
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Enzyme
 
|0.209
 
|0.729
 
|right
 
|-
 
|Nonenzyme
 
|0.791 =>
 
|1.109 =>
 
|right
 
|-
 
!colspan="4" | Enyzme class
 
|-
 
|Enzyme class
 
|Probabilty
 
|Odd score
 
|Prediction
 
|-
 
|Oxidoreductase (EC 1.-.-.-)
 
|0.056
 
|0.268
 
|right
 
|-
 
|Transferase (EC 2.-.-.-)
 
|0.031
 
|0.091
 
|right
 
|-
 
|Hydrolase (EC 3.-.-.-)
 
|0.062
 
|0.195
 
|right
 
|-
 
|Lyase (EC 4.-.-.-)
 
|0.020
 
|0.430
 
|right
 
|-
 
|Isomerase (EC 5.-.-.-)
 
|0.010
 
|0.321
 
|right
 
|-
 
|Ligase (EC 6.-.-.-)
 
|0.017
 
|0.327
 
|right
 
|-
 
!colspan="4" | Gene ontology category
 
|-
 
|Gene ontology category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Signal transducer
 
|0.374
 
|1.746
 
|right
 
|-
 
|Receptor
 
|0.128
 
|0.750
 
|right
 
|-
 
|Hormone
 
|0.247 =>
 
|37.936 =>
 
|right
 
|-
 
|Structural protein
 
|0.001
 
|0.041
 
|right
 
|-
 
|Transporter
 
|0.025
 
|0.228
 
|right
 
|-
 
|Ion channel
 
|0.010
 
|0.168
 
|right
 
|-
 
|Volatge-gated ion channel
 
|0.003
 
|0.131
 
|right
 
|-
 
|Cation channel
 
|0.010
 
|0.215
 
|right
 
|-
 
|Transcription
 
|0.054
 
|0.425
 
|right
 
|-
 
|Transcription regulation
 
|0.091
 
|0.724
 
|right
 
|-
 
|Stress response
 
|0.099
 
|1.128
 
|right
 
|-
 
|Immune response
 
|0.178
 
|2.090
 
|wrong
 
|-
 
|Growth factor
 
|0.061
 
|4.379
 
|wrong
 
|-
 
|Metal ion transport
 
|0.009
 
|0.020
 
|right
 
|-
 
|}
 
<br><br>
 
* '''LAMP1_HUMAN'''
 
<br><br>
 
The ProtFun Server calculated following prediction result for LAMP1_HUMAN:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
!colspan="4" | Functional category
 
|-
 
|Functional category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Amino acid biosynthesis
 
|0.011
 
|0.484
 
|right
 
|-
 
|Biosynthesis of cofactors
 
|0.053
 
|0.735
 
|right
 
|-
 
|Cell envelope
 
|0.804 =>
 
|13.186 =>
 
|right
 
|-
 
|Cellular processes
 
|0.027
 
|0.373
 
|right
 
|-
 
|Central intermediary metabolism
 
|0.138
 
|2.188
 
|right
 
|-
 
|Engergy metabolism
 
|0.037
 
|0.411
 
|right
 
|-
 
|Fatty acid metabolsim
 
|0.016
 
|1.265
 
|right
 
|-
 
|Purines and Pyrimidines
 
|0.533
 
|2.195
 
|wrong
 
|-
 
|Regulatory functions
 
|0.015
 
|0.090
 
|right
 
|-
 
|Replication and transcription
 
|0.019
 
|0.073
 
|right
 
|-
 
|Translation
 
|0.027
 
|0.613
 
|right
 
|-
 
|Transport and binding
 
|0.834
 
|2.033
 
|right
 
|-
 
!colspan="4" | Enyzme/non-enzyme
 
|-
 
|Enzyme/non-enzyme
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Enzyme
 
|0.276
 
|0.965
 
|right
 
|-
 
|Nonenzyme
 
|0.724 =>
 
|1.014 =>
 
|right
 
|-
 
!colspan="4" | Enyzme class
 
|-
 
|Enzyme class
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Oxidoreductase (EC 1.-.-.-)
 
|0.039
 
|0.187
 
|right
 
|-
 
|Transferase (EC 2.-.-.-)
 
|0.046
 
|0.134
 
|right
 
|-
 
|Hydrolase (EC 3.-.-.-)
 
|0.058
 
|0.184
 
|right
 
|-
 
|Lyase (EC 4.-.-.-)
 
|0.020
 
|0.430
 
|right
 
|-
 
|Isomerase (EC 5.-.-.-)
 
|0.010
 
|0.321
 
|right
 
|-
 
|Ligase (EC 6.-.-.-)
 
|0.017
 
|0.326
 
|right
 
|-
 
!colspan="4" | Gene ontology category
 
|-
 
|Gene ontology category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Signal transducer
 
|0.396
 
|1.849
 
|right
 
|-
 
|Receptor
 
|0.282
 
|1.659
 
|right
 
|-
 
|Hormone
 
|0.001
 
|0.206
 
|right
 
|-
 
|Structural protein
 
|0.011
 
|0.408
 
|right
 
|-
 
|Transporter
 
|0.024
 
|0.222
 
|right
 
|-
 
|Ion channel
 
|0.008
 
|0.147
 
|right
 
|-
 
|Volatge-gated ion channel
 
|0.002
 
|0.111
 
|right
 
|-
 
|Cation channel
 
|0.010
 
|0.215
 
|right
 
|-
 
|Transcription
 
|0.032
 
|0.247
 
|right
 
|-
 
|Transcription regulation
 
|0.018
 
|0.142
 
|right
 
|-
 
|Stress response
 
|0.246
 
|2.795
 
|right
 
|-
 
|Immune response
 
|0.371 =>
 
|4.368 =>
 
|right
 
|-
 
|Growth factor
 
|0.013
 
|0.956
 
|right
 
|-
 
|Metal ion transport
 
|0.009
 
|0.020
 
|right
 
|-
 
|}
 
<br><br>
 
* '''A4_HUMAN'''
 
<br><br>
 
The ProtFun Server calculated following prediction result for A4_HUMAN:
 
{| border="1" style="text-align:center; border-spacing:0;"
 
!colspan="4" | Functional category
 
|-
 
|Functional category
 
|Probabilty
 
|Odd score
 
|Prediction
 
|-
 
|Amino acid biosynthesis
 
|0.020
 
|0.921
 
|right
 
|-
 
|Biosynthesis of cofactors
 
|0.261
 
|3.623
 
|right
 
|-
 
|Cell envelope
 
|0.804 =>
 
|13.186 =>
 
|right
 
|-
 
|Cellular processes
 
|0.053
 
|0.070
 
|right
 
|-
 
|Central intermediary metabolism
 
|0.184
 
|2.920
 
|right
 
|-
 
|Engergy metabolism
 
|0.023
 
|0.259
 
|right
 
|-
 
|Fatty acid metabolsim
 
|0.016
 
|1.265
 
|right
 
|-
 
|Purines and Pyrimidines
 
|0.417
 
|1.716
 
|right
 
|-
 
|Regulatory functions
 
|0.013
 
|0.084
 
|wrong
 
|-
 
|Replication and transcription
 
|0.029
 
|0.109
 
|right
 
|-
 
|Translation
 
|0.027
 
|0.613
 
|right
 
|-
 
|Transport and binding
 
|0.827
 
|2.016
 
|right
 
|-
 
!colspan="4" | Enyzme/non-enzyme
 
|-
 
|Enzyme/non-enzyme
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Enzyme
 
|0.392 =>
 
|1.368 =>
 
|right
 
|-
 
|Nonenzyme
 
|0.608
 
|0.852
 
|right
 
|-
 
!colspan="4" | Enyzme class
 
|-
 
|Enzyme class
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Oxidoreductase (EC 1.-.-.-)
 
|0.024
 
|0.114
 
|right
 
|-
 
|Transferase (EC 2.-.-.-)
 
|0.208
 
|0.603
 
|right
 
|-
 
|Hydrolase (EC 3.-.-.-)
 
|0.190
 
|0.600
 
|right
 
|-
 
|Lyase (EC 4.-.-.-)
 
|0.020
 
|0.430
 
|right
 
|-
 
|Isomerase (EC 5.-.-.-)
 
|0.010
 
|0.324
 
|right
 
|-
 
|Ligase (EC 6.-.-.-)
 
|0.048
 
|0.946
 
|right
 
|-
 
!colspan="4" | Gene ontology category
 
|-
 
|Gene ontology category
 
|Probability
 
|Odd score
 
|Prediction
 
|-
 
|Signal transducer
 
|0.126
 
|0.586
 
|right
 
|-
 
|Receptor
 
|0.036
 
|0.211
 
|right
 
|-
 
|Hormone
 
|0.001
 
|0.206
 
|right
 
|-
 
|Structural protein
 
|0.034 =>
 
|1.205 =>
 
|right
 
|-
 
|Transporter
 
|0.024
 
|0.222
 
|right
 
|-
 
|Ion channel
 
|0.009
 
|0.162
 
|right
 
|-
 
|Volatge-gated ion channel
 
|0.002
 
|0.108
 
|right
 
|-
 
|Cation channel
 
|0.010
 
|0.215
 
|right
 
|-
 
|Transcription
 
|0.043
 
|0.335
 
|right
 
|-
 
|Transcription regulation
 
|0.018
 
|0.143
 
|right
 
|-
 
|Stress response
 
|0.076
 
|0.862
 
|right
 
|-
 
|Immune response
 
|0.016
 
|0.183
 
|right
 
|-
 
|Growth factor
 
|0.005
 
|0.372
 
|right
 
|-
 
|Metal ion transport
 
|0.009
 
|0.020
 
|right
 
|-
 
|}
 
<br><br>
 
 
<br><br>
 
<br><br>
   
Line 3,105: Line 2,269:
 
|-
 
|-
 
|#GO terms
 
|#GO terms
|colspan="4" | 25
+
|colspan="4" | 25
 
|-
 
|-
 
|true positive (in %)
 
|true positive (in %)
Line 3,139: Line 2,303:
 
|-
 
|-
 
|#GO terms
 
|#GO terms
|colspan="4" | 12
+
|colspan="4" | 12
 
|-
 
|-
 
|true positive (in %)
 
|true positive (in %)
Line 3,173: Line 2,337:
 
|-
 
|-
 
|#GO terms
 
|#GO terms
|colspan="4" | 41
+
|colspan="4" | 41
 
|-
 
|-
 
|true positive (in %)
 
|true positive (in %)
Line 3,207: Line 2,371:
 
|-
 
|-
 
|#GO terms
 
|#GO terms
|colspan="4" | 4
+
|colspan="4" | 4
 
|-
 
|-
 
|true positive (in %)
 
|true positive (in %)
Line 3,241: Line 2,405:
 
|-
 
|-
 
|#GO terms
 
|#GO terms
|colspan="4" | 17
+
|colspan="4" | 17
 
|-
 
|-
 
|true positive (in %)
 
|true positive (in %)
Line 3,275: Line 2,439:
 
|-
 
|-
 
|#GO terms
 
|#GO terms
|colspan="4" | 78
+
|colspan="4" | 78
 
|-
 
|-
 
|true positive (in %)
 
|true positive (in %)
Line 3,290: Line 2,454:
 
|}
 
|}
   
As you can see in the tabel above, each method only predict a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the onyl method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive and it also predicts more GO terms than the other methods.<br>
+
As you can see in the table above, each method only predicts a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the only method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive. Furthermore, it also predicts more GO terms than the other methods.<br>
 
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms. <br>
 
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms. <br>
 
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.<br><br>
 
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.<br><br>
  +
Back to [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Tay-Sachs_Disease Tay-Sachs Disease]]<br>

Latest revision as of 22:27, 30 August 2011

General Information

Secondary Structure Prediction

To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods give similar results or if they differ extremely.

[Here] you can find some general information about these methods.

Back to [Tay-Sachs Disease]


Prediction of disordered regions

After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.

To get more insight into the methods and the theory behind them we also offer you an [general information page].

Back to [Tay-Sachs Disease]


Prediction of transmembrane helices and signal peptides

The third big analysis section is the prediction of transmembrane helices and signal peptides. We merged the prediction of transmembrane helices and signal peptides in one section, because there are several prediction methods which can predict both and therefore we looked at both predictions in this section.

Therefore we used several methods, some which only predict transmembrane helices, some which only predict signal peptides and some combined methods.

To have a closer look at the different methods we again provide an [information page.]

Back to [Tay-Sachs Disease]


Prediction of GO Terms

The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other.

Again we also provide an [general information page] about the GO Term methods, we used in our analysis.

Back to [Tay-Sachs Disease]

Secondary Structure prediction

Results

The detailed output of the different prediction methods can be found [here]

Here we only present a short summary of the output of the different methods.

  • Predicted Helices
method #helices
PSIPRED 14
Jpred3 14
DSSP 16
  • Predicted Beta-Sheets
method #sheets
PSIPRED 15
Jpred3 15
DSSP 0

Comparison of the different methods

To determine how successful our secondary structure prediction with PSIPRED and Jpred were, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore, the main comparison in this case refers to the alpha-helices.

For PSIPRED the prediction of the alpha-helices was good. In most cases the alpha-helices of DSSP and PSIPRED correspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones.

For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not assigned by DSSP. In contrary, there are three small helices which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assigns two helices which are separated by a turn and Jpred predicts there only one big helix.

All in all, the prediction of the helices is probably good because they correspond mostly with the assignment of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all.

Back to [Tay-Sachs Disease]

Prediction of disordered regions

Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disordered regions. Therefore, we search our protein in the [DisProt database] and did not find it, so our protein does not have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check [UniProt], if there is an entry for [DisProt].

Results

The detailed results of the different methods can be found [here]

In this section, we only want to give a summary of the output of the different methods.

method #disordered regions in the protein #disordered regions on the brink
Disopred 0 2
POODLE-I 3 2
POODLE-L 0 0
POODLE-S (B-factors) 3 2
POODLE-S (missing residues) 4 2
IUPred (short) 0 2
IUPred (long) 0 0
IUPred (structural information) 0 0
Meta-Disorder 0 0

Comparison of the different POODLE variants

POODLE-L does not find any disordered regions. This is the result we expected, because our protein does not possess any disordered regions.

Both POODLE-S variants found several short disordered regions, which is a false positive result. Interestingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.

POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).

Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.

In general, these predictions are used, if nothing is known about the protein. Therefore, normally we do not know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important. We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.

functional residues disordered
residue position amino acid function POODLE-S (missing) POODLE-I
323 E active site ordered ordered
115 N Glycolysation ordered ordered
157 N Glycolysation ordered ordered
259 N Glycolysation ordered ordered
58 (connected with 104) C Disulfide bond disordered ordered
104 (connected with 58) C Disulfide bond disordered ordered
277 (connected with 328) C Disulfide bond ordered ordered
328 (connected with 277) C Disulfide bond ordered ordered
505 (connected with 522) C Disulfide bond ordered ordered
522 (connected with 505) C Disulfide bond ordered ordered

As you can see in the table above, only one disulfide bond is located in a disordered region, all other functionally important residues are located in ordered regions. This is a further good hint, that the predictions are wrong.

Comparison of the different methods

We decided to compare the results of the different methods. Therefore, we count how many residues are predicted as disordered, which is wrong in our case.

methods
Disopred POODLE-I POODLE-L POODLE-S (missing) POODLE-S (B-factor) IUPred (short) IUPred (long) IUPred (structure) Meta-Disorder
#wrong predicted residues 5 23 0 47 24 3 0 0 0



POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct. The worst prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).

Back to [Tay-Sachs Disease]

Prediction of transmembrane alpha-helices and signal peptides

Because most of the proteins we used in this practical are not membrane proteins, we got five additional proteins for the transmembrane and signal peptide analyses.

Additional proteins:

name organism location transmembrane protein sequence
BACR_HALSA Halobacterium salinarium (Archaea) Cell membrane Multi-pass membrane protein [P02945.fasta]
RET4_HUMAN Human (Homo sapiens) extracellular space No [P02753.fasta]
INSL5_HUMAN Human (Homo sapiens) extracellular region No [Q9Y5Q6.fasta]
LAMP1_HUMAN Human (Homo sapiens) Cell membrane Single-pass membrane protein [P11279.fasta]
A4_HUMAN Human (Homo sapiens) Cell membrane Single-pass membrane protein [P05067.fasta]

The detailed output for the different organism and the different prediction methods can be found here:

Results

Transmembrane Helices

TMHMM Phobius PolyPhobius OCTOPUS SPOCTOPUS
protein start position end position location start position end position location start position end position location start position end position location start position end position location
HEXA HUMAN 1 529 outside 23 529 outside 20 520 outside 1 2 inside 22 529 outside
3 23 TM helix
24 529 outside
BACR HALSA 1 22 outside 1 22 outside 1 22 outside
23 42 TM Helix 23 42 TM helix 22 43 TM helix 23 43 TM helix 23 43 TM helix
43 54 inside 43 53 inside 44 54 inside 44 54 inside 44 54 inside
55 77 TM Helix 54 76 TM helix 55 77 TM helix 55 75 TM helix 55 75 TM helix
78 91 outside 77 95 outside 78 94 outside 76 95 outside 76 95 outside
92 114 TM Helix 96 114 TM helix 95 114 TM helix 96 116 TM helix 96 116 TM helix
115 120 inside 115 120 inside 115 120 inside 117 121 inside 117 120 inside
121 143 TM Helix 121 142 TM helix 121 141 TM helix 122 142 TM helix 121 141 TM helix
144 147 outside 143 147 outside 142 147 outside 143 147 outside 142 147 outside
148 170 TM Helix 148 169 TM helix 148 166 TM helix 148 168 TM helix 148 168 TM helix
171 189 inside 170 189 inside 167 186 inside 169 185 inside 169 185 inside
190 212 TM Helix 190 212 TM helix 187 205 TM helix 186 206 TM helix 186 206 TM helix
213 262 outside 213 217 outside 206 215 outside 207 216 outside 207 216 outside
218 237 TM helix 216 237 TM helix 217 237 TM helix 217 237 TM helix
238 262 inside 238 262 inside 238 262 inside 238 262 inside
RET4 HUMAN 1 1 inside
2 23 TM helix
1 201 outside 19 201 outside 19 201 outside 24 201 outside 20 201 outside
INSL5 HUMAN 1 1 inside
2 32 TM helix
1 135 outside 23 135 outside 23 135 outside 33 135 outside 24 135 outside
LAMP1 HUMAN 1 10 inside 1 10 inside
11 33 TM Helix 11 31 TM helix
34 383 outside 29 381 outside 29 381 outside 32 383 outside 30 383 outside
384 406 TM Helix 382 405 TM helix 382 405 TM helix 384 404 TM helix 384 404 TM helix
407 417 inside 406 417 outside 406 417 outside 405 417 outside 405 417 outside
A4 HUMAN 1 5 outside
6 11 R
1 700 outside 18 700 outside 18 700 outside 12 701 outside 19 701 outside
701 723 TM Helix 701 723 TM helix 701 723 TM helix 702 722 TM helix 702 722 TM helix
724 770 inside 724 770 inside 724 770 inside 723 770 inside 723 770 inside



On the table above, you can see the summary of the results of the different methods which predict transmembrane helices. As you can see on this table, OCTOPUS often predicts a transmembrane helix, although all other methods do not predict one. Phobis, PolyPhobius and SPOCTOPUS show always very similar result, whereas TMHMM and OCTOPUS differ from these results.

Signal Peptide

Phobius PolyPhobius SPOCTOPUS TargetP SignalP
protein start position end position start position end position start position end position location start position end position
HEXA HUMAN 1 22 1 19 7 21 secretory pathway 1 22
BACR HALSA no prediction available secretory pathway 1 38
RET4 HUMAN 1 18 1 18 6 19 secretory pathway 1 18
INSL5 HUMAN 1 22 1 22 6 23 secretory pathway 1 22
LAMP1 HUMAN 1 28 1 28 12 29 secretory pathway 1 28
A4 HUMAN 1 17 1 17 5 18 secretory pathway 1 15


In the last table there is a list with the results of the prediction of the signal peptides created by different methods. As we can see on the first look, all methods predict always a signal peptide, although the stop position of this signal differ. Phobius, PolyPhobius and SPOCTOPUS failed by predicting the signal peptide from BACR_HALSA. Furthermore, TargetP do not predict the position of the signal peptide, instead it only predicts the location of the protein.

Comparison of the different methods



We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.

  • Comparison of transmembrane helix prediction



Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.

methods
TMHMM Phobius PolyPhobius OCTOPUS SPOCTOPUS Transmembrane protein
HEXA_HUMAN #wrong transmembrane 0 0 0 0 0 no
#wrong outside 0 0 0 0 0
#wrong insde 0 0 0 0 0
#wrong sum 0 0 0 0 0
%wrong predicted 0% 0% 0% 0% 0%
BACR_HALSA #wrong transmembrane 24 20 12 16 11 yes (7 transmembrane helices)
#wrong outside 46 5 3 4 6
#wrong inside 4 4 2 0 0
#wrong sum 74 29 17 20 17
%wrong predicted 29% 11% 6% 8% 6%
RET4_HUMAN #wrong transmembrane 0 0 0 5 0 no
#wrong outside 0 0 0 0 0
#wrong inside 0 0 0 0 0
#wrong sum 0 0 0 5 0
%wrong predicted 0% 0% 0% 2% 0%
INSL5_HUMAN #wrong transmembrane 0 0 0 10 0 no
#wrong outside 0 0 0 0 0
#wrong inside 0 0 0 0 0
#wrong sum 0 0 0 10 0
%wrong predicted 0% 0% 0% 8% 0%
LAMP1_HUMAN #wrong transmembrane 5 3 4 3 1 yes (single-spanning)
#wrong outside 2 0 0 1 1
#wrong inside 0 0 0 1 1
#wrong sum 7 3 4 5 3
%wrong predicted 2% 0% 1% 1% 0%
A4_HUMAN #wrong transmembrane 0 0 0 0 0 yes (single-spanning)
#wrong outside 1 1 1 1 2
#wrong inside 0 0 0 1 1
#wrong sum 1 1 1 2 3
%wrong predicted 0% 0% 0% 0% 0%
Average number of wrong predicted residues
13.6 5.5 3.6 7 3.8

TMHMM is the worst prediction method. This can also be seen on the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices. SPOCTOPUS and PolyPhobius are the best prediction methods.

In general, the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.

  • Comparison of signal peptide prediction



Now we compared TargetP and SignalP which only predict signal peptides. Furthermore, we compared SPOCTOPUS, Phobius and PolyPhobius. TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.

methods
real position Phobius PolyPhobius SPOCTOPUS TargetP SignalP
HEXA_HUMAN stop position 22 22 19 21 no prediction 22
#wrong residues 0 3 3 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
BACR_HALSA stop position not available no prediction no prediction no prediction no prediction no consensus prediction
#wrong predicted not available not available not available not available no prediction not available
location membrane not available not available not available secretory pathway non-signal peptide
RET4_HUMAN stop position 18 18 18 19 no prediction 18
#wrong predicted 0 0 1 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
INSL5_HUMAN stop position 22 22 22 22 no prediction 22
#wrong residues 0 0 0 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
LAMP1_HUMAN stop position 28 28 28 29 no prediction 28
#wrong residues 0 0 1 no prediction 0
location transmembrane helix secretory pathway secretory pathway no prediction secretory pathway no prediction
A4_HUMAN stop position 17 17 17 18 no prediction 17
#wrong residues 0 0 1 no prediction 0
location transmembrane helix secretory pathway secretory pathway no prediction secretory pathway secretory pathway
Average number of wrong prediction
sum of wrong predicted residues 0 3 2 no prediction 0
#right predicted locations / #predicted locations 3/5 3/5 no prediction 3/5 no prediction

SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not. In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices. The results of Phobius were a little bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysed BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA do not possess a signal peptide, but every method predicts one. Only the eukaryotic prediction method recognized a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.



  • Comparison of the combined methods



The last issue, we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.

methods
Phobius PolyPhobius SPOCTOPUS
HEXA_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 3 2
location right right no prediction
BACR_HALSA #wrong predicted residues (TM) 29 17 17
#wrong predicted residues (SP) n.a. n.a. n.a.
location n.a n.a no prediction
RET4_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 0 0
location right right no prediction
INSL5_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 0 1
location right right no prediction
LAMP1_HUMAN #wrong predicted residues (TM) 3 4 3
#wrong predicted residues (SP) 0 0 0
location wrong wrong no prediction
A4_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 1 1 3
location wrong wrong no prediction
Average
avg(#wrong predicted residues (TM)) 5.3 3.5 3.3
avg(#wrong predicted residues (SP)) 0.1 0.6 1
#location (right predicted) / #location(predicted) 3/5 3/5 no prediction

In general, PolyPhobius gave the best results. Although it predicts the signal peptide stop position a little bit worse than Phobius, the transmembrane prediction is significant better than by the prediction of Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.



Back to [Tay-Sachs Disease]

Signal Peptide

 Phobius  PolyPhobius  SPOCTOPUS TargetP  SignalP
protein start position end position start position end position start position end position location start position end position
HEXA HUMAN 1 22 1 19 7 21 secretory pathway 1 22
BACR HALSA no prediction available secretory pathway 1 38
RET4 HUMAN 1 18 1 18 6 19 secretory pathway 1 18
INSL5 HUMAN 1 22 1 22 6 23 secretory pathway 1 22
LAMP1 HUMAN 1 28 1 28 12 29 secretory pathway 1 28
A4 HUMAN 1 17 1 17 5 18 secretory pathway 1 15


In the last table there is a list with the results of the prediction of the signal peptides created by different methods.

Comparison of the different methods



We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.

  • Comparison of transmembrane helix prediction



Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.

methods
TMHMM Phobius PolyPhobius OCTOPUS SPOCTOPUS Transmembrane protein
HEXA_HUMAN #wrong transmembrane 0 0 0 0 0 no
#wrong outside 0 0 0 0 0
#wrong insde 0 0 0 0 0
#wrong sum 0 0 0 0 0
%wrong predicted 0% 0% 0% 0% 0%
BACR_HALSA #wrong transmembrane 24 20 12 16 11 yes (7 transmembrane helices)
#wrong outside 46 5 3 4 6
#wrong inside 4 4 2 0 0
#wrong sum 74 29 17 20 17
%wrong predicted 29% 11% 6% 8% 6%
RET4_HUMAN #wrong transmembrane 0 0 0 5 0 no
#wrong outside 0 0 0 0 0
#wrong inside 0 0 0 0 0
#wrong sum 0 0 0 5 0
%wrong predicted 0% 0% 0% 2% 0%
INSL5_HUMAN #wrong transmembrane 0 0 0 10 0 no
#wrong outside 0 0 0 0 0
#wrong inside 0 0 0 0 0
#wrong sum 0 0 0 10 0
%wrong predicted 0% 0% 0% 8% 0%
LAMP1_HUMAN #wrong transmembrane 5 3 4 3 1 yes (single-spanning)
#wrong outside 2 0 0 1 1
#wrong inside 0 0 0 1 1
#wrong sum 7 3 4 5 3
%wrong predicted 2% 0% 1% 1% 0%
A4_HUMAN #wrong transmembrane 0 0 0 0 0 yes (single-spanning)
#wrong outside 1 1 1 1 2
#wrong inside 0 0 0 1 1
#wrong sum 1 1 1 2 3
%wrong predicted 0% 0% 0% 0% 0%
Average number of wrong predicted residues
13.6 5.5 3.6 7 3.8

TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices. SPOCTOPUS and PolyPhobius are the best prediction methods.

In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.

  • Comparison of signal peptide prediction



Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius. TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.

methods
real position Phobius PolyPhobius SPOCTOPUS TargetP SignalP
HEXA_HUMAN stop position 22 22 19 21 no prediction 22
#wrong residues 0 3 3 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
BACR_HALSA stop position not available no prediction no prediction no prediction no prediction no consensus prediction
#wrong predicted not available not available not available not available no prediction not available
location membrane not available not available not available secretory pathway non-signal peptide
RET4_HUMAN stop position 18 18 18 19 no prediction 18
#wrong predicted 0 0 1 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
INSL5_HUMAN stop position 22 22 22 22 no prediction 22
#wrong residues 0 0 0 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
LAMP1_HUMAN stop position 28 28 28 29 no prediction 28
#wrong residues 0 0 1 no prediction 0
location transmembrane helix secretory pathway secretory pathway no prediction secretory pathway no prediction
A4_HUMAN stop position 17 17 17 18 no prediction 17
#wrong residues 0 0 1 no prediction 0
location transmembrane helix secretory pathway secretory pathway no prediction secretory pathway secretory pathway
Average number of wrong prediction
sum of wrong predicted residues 0 3 2 no prediction 0
#right predicted locations / #predicted locations 3/5 3/5 no prediction 3/5 no prediction

SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not. In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices. The results of Phobius were a litte bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.



  • Comparison of the combined methods



The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.

methods
Phobius PolyPhobius SPOCTOPUS
 HEXA_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 3 2
location right right no prediction
 BACR_HALSA #wrong predicted residues (TM) 29 17 17
#wrong predicted residues (SP) n.a. n.a. n.a.
location n.a n.a no prediction
RET4_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 0 0
location right right no prediction
INSL5_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 0 1
location right right no prediction
LAMP1_HUMAN #wrong predicted residues (TM) 3 4 3
#wrong predicted residues (SP) 0 0 0
location wrong wrong no prediction
A4_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 1 1 3
location wrong wrong no prediction
 Average
avg(#wrong predicted residues (TM)) 5.3 3.5 3.3
avg(#wrong predicted residues (SP)) 0.1 0.6 1
#location (right predicted) / #location(predicted) 3/5 3/5 no prediction

In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.

Prediction of GO terms

Before we start with our analysis, we decided to check the GO annotations for the six sequences, which can be found [here]:

A detailed list of the GO annotation terms of each protein can be found [here].

Results

We created for each protein an own result page. Sadly, it is not possible to summarize the results in a short way, so please have a look at the different result pages for a detailed output.



Comparison of the different methods



It is difficult to compare these methods. First of all, two methods are based on homology-based prediction, whereas ProtFun is based on ab initio prediction. So it is clear, that the results differ. Second, each method has another prediction focus and called the results a little bit different. Only GOPET predicts exact GO numbers, the other two methods only predict the approximate functions and processes.
Therefore, to compare the results, we decided to calculate the fraction of right prediction and the ratio between right predictions and annotated GO terms.

methods
GOPET terms GOPET GOids Pfam ProtFun
HEXA_HUMAN #true positive 7 7 2 31
#false negative 1 1 0 3
#predictions 8 8 2 34
#GO terms 25
true positive (in %) 0.87 0.87 1 0.91
ratio true positive/annotated GO terms 0.28 0.28 0.08 not possible
BACR_HALSA #true positive 2 1 1 30
#false negative 1 2 0 4
#predictions 3 3 1 34
#GO terms 12
true positive (in %) 0.66 0.33 1 0.88
ratio true positive/annotated GO terms 0.16 0.08 0.08 not possible
RET4_HUMAN #true positive 5 5 1 30
#false negative 3 3 0 4
#predictions 8 8 1 34
#GO terms 41
true positive (in %) 0.62 0.62 1 0.88
ratio true positive/annotated GO terms 0.12 0.12 0.02 not possible
INSL5_HUMAN #true positive 1 1 1 32
#false negative 0 0 0 2
#predictions 1 1 1 34
#GO terms 4
true positive (in %) 1 1 1 0.94
ratio true positive/annotated GO terms 0.25 0.25 0.25 not possible
LAMP1_HUMAN #true positive 0 0 1 33
#false negative 2 2 0 1
#predictions 2 2 1 34
#GO terms 17
true positive (in %) 0 0 1 0.97
ratio true positive/annotated GO terms 0 0 0.05 not possible
A4_HUMAN #true positive 7 7 6 33
#false negative 6 6 0 1
#predictions 13 13 6 34
#GO terms 78
true positive (in %) 0.53 0.53 1 0.97
ratio true positive/annotated GO terms 0.08 0.08 0.07 not possible

As you can see in the table above, each method only predicts a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the only method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive. Furthermore, it also predicts more GO terms than the other methods.
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms.
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.

Back to [Tay-Sachs Disease]