Sequence-based predictions HEXA

From Bioinformatikpedia
Revision as of 14:01, 11 August 2011 by Link (talk | contribs) (Secondary Structure prediction)

General Information

Secondary Structure Prediction

To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods gave similar results or if they differ extremely.

[Here] you can find some general information about these methods.


Prediction of disordered regions

After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.

To get more insight in the methods and the theory behind them we also offer you an [general information page].


Prediction of transmembrane helices and signal peptides

The third big analysis section is the prediction of transmembrane helices and signal peptides. We merged the prediction of transmembrane helices and signal peptides in one section, because there are several prediction methods which can predict both and therefore we looked at both predictions in this section.

Therefore we used several methods, some which only predict transmembrane helices, some which only predict signal peptides and some combined methods.

To have a closer look at the different methods we again provide an [information page.]


Prediction of GO Terms

The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other.

Again we also provide an [general information page] about the GO Term methods, we used in our analysis.

Secondary Structure prediction

PSIPRED predicted 14 alpha-helices and 15 beta-sheets. The rest of the protein was predicted as coils. The detailed output can be seen [here]

Prediction of disordered regions

Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disoredered regions. Therefore, we search our protein in the DisProt database and didn't found it, so our protein doesn't have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check in UniProt, if there is an entry for DisProt.


Disopred

Disopred predicts two disordered regions in our protein. The first region is at the beginning of the protein (first two residues) and the second region is at the end (last three regions). This prediction is wrong, because it is normal, that the electrons from the first and the last amino acids lack in the electron density map. So, our protein Hexosamidase A has no disordered regions.

Result of the Disopred prediction. * shows that this amino acid belongs to a disordered regions, whereas . signs for a non-disordered region.


POODLE

We decided to test several POODLE variants and to compare the results.

  • POODLE-I

POODLE-I predicted five disordered regions:

start position end position length
1 2 2
14 19 6
83 89 7
105 109 5
527 529 3


  • POODLE-L

POODLE-L found no disordered regions. Therefore, there is no disordered region with a length more than 40aa in our protein.


  • POODLE-S (High B-factor residues)

This POODLE-S variant searches for high B-factor values in the crystallography, which implies uncertainty in the assignment of the atom positions.

POODLE-S predicted five disordered regions:

start position end position length
0 2 2
13 19 7
83 88 6
105 109 5
526 529 4


  • POODLE-S (missing residues)

POODLE-S (missing residues) predicts a disordered region, if there is an amino acid in the sequence record, but not on the electron density map.

Poodle-S found 6 disordered regions.

start position end position length
17 18 2
53 61 9
78 109 33
153 153 1
280 280 1
345 345 1


Graphical Output:

Prediction of POODLE-S (High B-factor residues)
Prediction of POODLE-S (missing residues)
Prediction of POODLE-I
Prediction of POODLE-L


  • Comparison of the different POODLE variants:

POODLE-L doesn't find any disordered regions. This is the result we expected, because our protein doesn't posses any disordered regions.

Both POODLE-S variants found several short disordered regions, which is a false positive result. Interesstingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.

POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).

Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.

In general, these predictions are used, if nothing is known about the protein. Therefore, normally we don't know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important. We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.

functional residues disordered
residue position amino acid function POODLE-S (missing) POODLE-I
323 E active site ordered ordered
115 N Glycolysation ordered ordered
157 N Glycolysation ordered ordered
259 N Glycolysation ordered ordered
58 (connected with 104) C Disulfide bond disordered ordered
104 (connected with 58) C Disulfide bond disordered ordered
277 (connected with 328) C Disulfide bond ordered ordered
328 (connected with 277) C Disulfide bond ordered ordered
505 (connected with 522) C Disulfide bond ordered ordered
522 (connected with 505) C Disulfide bond ordered ordered

As you can see in the table above, only one disulfide bond is located in a disordered region, all other functionally important residues are located in ordered regions. This is a further good hint, that the predictions are wrong.

IUPred

We tested the three different IUPred variants, which are offered by the webserver.

  • IUPred (short)
Result of the IUPred prediction, which is focus on short disordered regions.

As you can see in the picture, IUPred which is focus on short disordered regions found only at the beginning and at the end of the protein a disordered region. This may be wrong, because at the beginning and at the end there are often regions without defined secondary structure, but also without function.

  • IUPred (long)

Next we take a look to the prediction of the long disordered regions:

Result of the IUPred prediction, which is focus on long disordered regions.

The picture above shows the result of this prediction. There is no disordered region predicted, not even at the beginning or at the end of the protein. This prediction is quite good, because the HEXA_HUMAN protein does not posses any disordered regions.


  • IUPred (with structural information)

As last, we analysed the prediction of IUPred with the additional usage of structural information.

Result of the IUPred prediction with additional structural information

As before, the method did not find any disordered regions. Therefore, the method predict three times the right result. Only by the method with focus on short disordered regions was a prediction of two disordered regions, but these regions were located at the beginning and at the end of the protein, which is obviously wrong.

Meta-Disorder

Meta-Disorder did not predict any disordered region in our protein. The different methods of which Meta-Disorder consists predicted some disordered regions, but Meta-Disorder build the consensus over all of these methods, and therefore it did not predict any disordered regions.

Graphical representation of the result:

Result of the Meta-Disorder prediction


The result is very good, because HEXA_HUMAN does not have any disordered regions. Therefore, the prediction of Meta-Disorder is right.

Comparison of the different methods

We decided to compare the results of the different methods. Therefore, we count how many residues are predicted as disordered, which is wrong in our case.

methods
Disopred POODLE-I POODLE-L POODLE-S (missing) POODLE-S (B-factor) IUPred (short) IUPred (long) IUPred (structure) Meta-Disorder
#wrong predicted residues 5 23 0 47 24 3 0 0 0



POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct. The baddest prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).

Prediction of transmembrane alpha-helices and signal peptides

Because most of the proteins we used in this practical are not membrane proteins, we got five additional proteins for the transmembrane and signal peptide analyses.

Additional proteins:

name organism location transmembrane protein sequence
BACR_HALSA Halobacterium salinarium (Archaea) Cell membrane Multi-pass membrane protein [P02945.fasta]
RET4_HUMAN Human (Homo sapiens) extracellular space No [P02753.fasta]
INSL5_HUMAN Human (Homo sapiens) extracellular region No [Q9Y5Q6.fasta]
LAMP1_HUMAN Human (Homo sapiens) Cell membrane Single-pass membrane protein [P11279.fasta]
A4_HUMAN Human (Homo sapiens) Cell membrane Single-pass membrane protein [P05067.fasta]

TMHMM

We analysed the six sequences with TMHMM.

  • HEXA_HUMAN



Prediction of TMHMM for the transmembrane helices of HEXA_HUMAN
start position end position location
1 529 outside

TMHMM predicts no transmembrane helix at all. The whole protein is located at the extracellular space. To evaluate this result, we compared the data from UniProt with our prediction.

Comparison between real occuring transmembrane helices and the TMHMM result.

As you can see above, the TMHMM prediction result is completly right, expect of the signal peptide, which can't be predicted by TMHMM.

  • BACR_HALSA



Prediction of TMHMM for the transmembrane helices of BACR_HALSA
start position end position location
1 22 outside
23 42 TM Helix
43 54 inside
55 77 TM Helix
78 91 outside
92 114 TM Helix
115 120 inside
121 143 TM Helix
144 147 outside
148 170 TM Helix
171 189 inside
190 212 TM Helix
213 262 outside

TMHMM predicts six transmembrane helices for BACR_HALSA. We decided to compare the TMHMM prediction with the real occuring transmembrane helices in BACR_HALSA:

Comparison between real occuring transmembrane helices and the TMHMM result.

Especially at the beginning is the prediction very good. There is almost 100% overlap between predicted and real helices. Only in the end of the protein lacks one transmembrane helix in the TMHMM prediction. Therefore, in real there are 7 transmembrane helices, whereas TMHMM only predicts 6. This is really bad, because it is a different for the function if there are 6 or 7 helices, but in general the prediction of TMHMM was quite good.

  • RET4_HUMAN



Prediction of TMHMM for the transmembrane helices of RET4_HUMAN
start position end position location
1 201 outside

TMHMM predicts no transmembrane helices. The whole protein is loacted at the extracellular space.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The TMHMM prediction is completely right. Therefore, you can see TMHMM can also predict, that a protein is not a transmembrane protein.

  • INSL5_HUMAN



Prediction of TMHMM for the transmembrane helices of INSL5_HUMAN
start position end position location
1 135 outside

TMHMM predicts no transmembrane helices. The whole protein is loacted at the extracellular space.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The TMHMM prediction is again completely right.

  • LAMP1_HUMAN



Prediction of TMHMM for the transmembrane helices of LAMP1_HUMAN
start position end position location
1 10 inside
11 33 TM Helix
34 383 outside
384 406 TM Helix
407 417 inside

TMHMM predicts two transmembrane helices, which are divided by a very long loop which is loacted at the extracellular space.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The prediction of TMHMM is quite good. Only at the beginning of the protein TMHMM predicts one wrong transmembrane helix (which is a signal peptide in real), but the rest of the prediction is correct.

  • A4_HUMAN



Prediction of TMHMM for the transmembrane helices of A4_HUMAN


start position end position location
1 700 outside
701 723 TM Helix
724 770 inside

TMHMM predicts one transmembrane helix at the end of the protein. As we already know is A4_HUMAN a single-spanning transmembrane protein and therefore the numbers of transmembrane helices is right predicted.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The result of the TMHMM prediction is pretty well. Except of the first residues at the beginning and the exact start position of the transmembrane helix, the prediction is correct.

Phobius and PolyPhobius



  • HEXA_HUMAN



Prediction of Phobius for the transmembrane helices and signal peptides of HEXA_HUMAN
Prediction of PolyPhobius for the transmembrane helices and signal peptides of HEXA_HUMAN
Phobius PolyPhobius
start position end position prediction start position end position prediction
Signal peptide prediction
1 5 N-Region 1 5 N-Region
6 17 H-Region 6 15 H-Region
18 22 C-Region 16 19 C-Region
Summary signal peptide
1 22 Signal Peptide 1 19 Signal Peptide
Transmembrane helices prediction
23 529 outside 20 520 outside

Both methods don't predict a transmembrane helix, which is correct, because HEXA_HUMAN is located at the lysosmal space. We compared the results of Phobius and PolyPhobius with the real protein.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein
Comparison between the prediction of PolyPhobius and the real protein

The prediction of Phobius is a little bit better than the PolyPhobius prediction, because Phobius predicts the beginning and the end of the signal peptide totally correct, whereas PolyPhobius cuts two residues of the signal peptide.

  • BACR_HALSA



Prediction of Phobius for the transmembrane helices and signal peptides of BACR_HALSA
Prediction of PolyPhobius for the transmembrane helices and signal peptides of BACR_HALSA
Phobius PolyPhobius
start position end position prediction start position end position prediction
Signal peptide prediction
No prediction available
Transmembrane helices prediction
23 42 TM helix 22 43 TM helix
43 53 inside 44 54 inside
54 76 TM helix 55 77 TM helix
77 95 outside 78 94 outside
96 114 TM helix 95 114 TM helix
115 120 inside 115 120 inside
121 142 TM helix 121 141 TM helix
143 147 outside 142 147 outside
148 169 TM helix 148 166 TM helix
170 189 inside 167 186 inside
190 212 TM helix 187 205 TM helix
213 217 outside 206 215 outside
218 237 TM helix 216 237 TM helix
238 262 inside 238 262 inside

Both methods don't predict a signal peptide, but both recognize, that this protein is a transmembrane protein with seven helices. The predictions only differ at the beginning and the end of the helix positions, but the differences between these two predictions is only about 1 to 3 residues.

To evaluate the predictions, we compared the predictions with the real occuring transmembrane helices.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein
Comparison between the prediction of PolyPhobius and the real protein



  • RET4_HUMAN



Prediction of Phobius for the transmembrane helices and signal peptides of RET4_HUMAN
Prediction of PolyPhobius for the transmembrane helices and signal peptides of RET4_HUMAN
Phobius PolyPhobius
start position end position prediction start position end position prediction
Signal peptide prediction
1 2 N-Region 1 3 N-Region
3 13 H-Region 4 13 H-Region
14 18 C-Region 14 18 C-Region
Summary signal peptide
1 18 secretory signal peptide 1 18 secretoy signal peptide
Transmembrane helices prediction
19 201 outside 19 201 outside

Both methods predict a signal peptide for the secretory pathway. This result is correct.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein
Comparison between the prediction of PolyPhobius and the real protein

Both methods show exactly the same result.

  • INSL5_HUMAN



Prediction of Phobius for the transmembrane helices and signal peptides of INSL5_HUMAN
Prediction of PolyPhobius for the transmembrane helices and signal peptides of INSL5_HUMAN
Phobius PolyPhobius
start position end position prediction start position end position prediction
Signal peptide prediction
1 5 N-Region 1 4 N-Region
6 17 H-Region 5 16 H-Region
18 22 C-Region 17 22 C-Region
Summary signal peptide
1 22 Secretory signal peptide 1 22 Secretoy signal peptide
Transmembrane helices prediction
23 135 outside 23 135 outside

Both methods predict a signale peptide for the secretory pathway and both prediction results are totally equal.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein
Comparison between the prediction of PolyPhobius and the real protein

The complete prediction is correct.

  • LAMP1_HUMAN



Prediction of Phobius for the transmembrane helices and signal peptides of LAMP1_HUMAN
Prediction of PolyPhobius for the transmembrane helices and signal peptides of LAMP1_HUMAN
Phobius PolyPhobius
start position end position prediction start position end position prediction
Signal peptide prediction
1 10 N-Region 1 9 N-Region
11 22 H-Region 10 22 H-Region
23 28 C-Region 23 28 C-Region
Summary signal peptide
1 28 secretory signal peptide 1 28 secretory signal peptide
Transmembrane helices prediction
29 381 outside 29 381 outside
382 405 TM helix 382 405 TM helix
406 417 outside 406 417 outside

The results of both methods are quite equal.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein
Comparison between the prediction of PolyPhobius and the real protein

Both results of the prediction methods are equal and furthermore, the are equal to the real protein.

  • A4_HUMAN



Prediction of Phobius for the transmembrane helices and signal peptides of A4_HUMAN
Prediction of PolyPhobius for the transmembrane helices and signal peptides of A4_HUMAN
Phobius PolyPhobius
start position end position prediction start position end position prediction
Signal peptide prediction
1 1 N-Region 1 3 N-Region
2 12 H-Region 4 12 H-Region
13 17 C-Region 13 17 C-Region
Summary signal peptide
1 17 secretory signal peptide 1 17 secretory signal peptide
Transmembrane helices prediction
18 700 outside 18 700 outside
701 723 TM helix 701 723 TM helix
724 770 inside 724 770 inside

The results of both methods are quite equal.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein
Comparison between the prediction of PolyPhobius and the real protein

Both results of the prediction methods are equal and furthermore, the are equal to the real protein.

OCTOPUS and SPOCTOPUS



  • HEXA_HUMAN



Prediction of OCTOPUS for the transmembrane helices of HEXA_HUMAN
Prediction of SPOCTOPUS for the transmembrane helices of HEXA_HUMAN
OCTOPUS SPOCTOPUS
start position end position prediction start position end position prediction
1 2 inside 1 6 N-terminal of a signal peptide
3 23 TM helix 7 21 signal peptide
24 529 outside 22 529 outside

The results of these two predictions differ. OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts at the same location a signal peptide.
To check which method predicted right, we compared the protein and the prediction.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein
Comparison between the prediction of SPOCTOPUS and the real protein

SPOCTOPUS gave us the better result, because SPOCTOPUS recognices the signal peptide, whereas OCTOPUS predicts a transmembrane helix instead.

  • BACR_HALSA



Prediction of OCTOPUS for the transmembrane helices of BACR_HALSA
Prediction of SPOCTOPUS for the transmembrane helices of BACR_HALSA
OCTOPUS SPOCTOPUS
start position end position prediction start position end position prediction
1 22 outside 1 22 outside
23 43 TM helix 23 43 TM helix
44 54 inside 44 54 inside
55 75 TM helix 55 75 TM helix
76 95 outside 76 95 outside
96 116 TM helix 96 116 TM helix
117 121 inside 117 120 inside
122 142 TM helix 121 141 TM helix
143 147 outside 142 147 outside
148 168 TM helix 148 168 TM helix
169 185 inside 169 185 inside
186 206 TM helix 186 206 TM helix
207 216 outside 207 216 outside
217 237 TM helix 217 237 TM helix
238 262 inside 238 262 inside

Both methods have a very similar result, which is identical with the exception of some residues. Both predicted the seven transmembrane helices, which is a very good result.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein
Comparison between the prediction of SPOCTOPUS and the real protein



  • RET4_HUMAN



Prediction of OCTOPUS for the transmembrane helices of RET4_HUMAN
Prediction of SPOCTOPUS for the transmembrane helices of RET4_HUMAN
OCTOPUS SPOCTOPUS
start position end position prediction start position end position prediction
1 1 inside 1 5 N-terminal of a signal peptide
2 23 TM helix 6 19 signal peptide
24 201 outside 20 201 outside


As before by HEXA_HUMAN, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts the signal peptide.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein
Comparison between the prediction of SPOCTOPUS and the real protein



  • INSL5_HUMAN



Prediction of OCTOPUS for the transmembrane helices of INSL5_HUMAN
Prediction of SPOCTOPUS for the transmembrane helices of INSL5_HUMAN
OCTOPUS SPOCTOPUS
start position end position prediction start position end position prediction
1 1 inside 1 5 N-terminal of a signale peptide
2 32 TM helix 6 23 signal peptide
33 135 outside 24 135 outside



Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein
Comparison between the prediction of SPOCTOPUS and the real protein

As we already have seen before, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts this region as signal peptid, which is correct.

  • LAMP1_HUMAN



Prediction of OCTOPUS for the transmembrane helices of LAMP1_HUMAN
Prediction of SPOCTOPUS for the transmembrane helices of LAMP1_HUMAN
OCTOPUS SPOCTOPUS
start position end position prediction start position end position prediction
1 10 inside 1 11 N-terminal of a signal peptide
11 31 TM helix 12 29 signal peptide
32 383 outside 30 383 outside
384 404 TM helix 384 404 TM helix
405 417 outside 405 417 outside

As before by HEXA_HUMAN and RET4_HUMAN, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts the signal peptide.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein
Comparison between the prediction of SPOCTOPUS and the real protein



  • A4_HUMAN



Prediction of OCTOPUS for the transmembrane helices of LAMP1_HUMAN
Prediction of SPOCTOPUS for the transmembrane helices of LAMP1_HUMAN
OCTOPUS SPOCTOPUS
start position end position prediction start position end position prediction
1 5 outside 1 4 N-terminal of signal peptide
6 11 R 5 18 Signal peptide
12 701 outside 19 701 outside
702 722 TM helix 702 722 TM helix
723 770 inside 723 770 inside

As before by HEXA_HUMAN and RET4_HUMAN, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts the signal peptide.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein
Comparison between the prediction of SPOCTOPUS and the real protein

TargetP

All of our proteins are proteins from human and archaea, so therefore we only use the non-plant option of TargetP.


  • HEXA_HUMAN



Location Probability
mitochondrial targeting SP 0.214
secretory pathway SP 0.877
other 0.009

TargetP predicts a secretory pathway signal peptide for this protein, which is correct.



  • BACR_HALSA



Location Probability
mitochondiral targeting SP 0.019
secretory pathway SP 0.897
other 0.562

TargetP predicts that this protein contains a secretory pathway signal peptide. The probability for this signal peptide is very high, although the result is wrong, because BACR_HALSA is a transmembrane protein.

  • RET4_HUMAN



Location Probability
mitochondrial targeting SP 0.242
secretory pathway SP 0.928
other 0.020

TargetP predicts a secretory pathway signal peptide for this protein, which is completely correct.

  • INSL5_HUMAN



Location Probability
mitochondrial targeting SP 0.074
secretory pathway SP 0.899
other 0.037

As before, TargetP predicts a secretory pathway signal peptide, which is again correct.

  • LAMP1_HUMAN



Location Probability
mitochondrial targeting SP 0.043
secretory pathway SP 0.953
other 0.017

The prediction of the secretory pathway signal peptide is wrong, because LAMP1_HUMAN is a transmembrane protein.

  • A4_HUMAN



Location Probability
mitochondrial targeting SP 0.035
secretory pathway SP 0.937
other 0.084

Because A4_HUMAN is a transmembrane protein, the prediction for the secretory pathway signal peptide is wrong.

SignalP

For our analysis we used the hidden markov model based and also the neuronal network based prediction.
The prediction with the hidden markov model used three different scores. The S-score which is the score for the signal peptide, the C-score which is the score for the clevage site and the Y-score which is a combination of the S-score and the C-score and is used to predict the cleavage site, because the Y-score is more precise than the C-score.

  • HEXA_HUMAN



Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 22 22 23 signal peptide

Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
signal peptide 1.000 0.000 22 23
Result of the SignalP method based on the neuronal network
Result of the SignalP method based on the hidden markov model

Both methods predict the same start and end position of the cleavage site and also both methods predict a signal peptide, which is correct because HEXA_HUMAN takes part at the secretory pathway.

  • BACR_HALSA



BACR_HALSA is an archaea protein. SignalP gave the possibility to predict eukaryotic or bacteria (gram-positive and gram-negative) signal peptides. Therefore, we decided to use all three possible prediction methods and to compare the results with the real signal peptide.

eukaryotes

Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 38 38 39 signal peptide


Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
signal peptide 0.017 0.859 15 16
Result of the SignalP method based on the neuronal network for BACR_HALSA with the prediction method for eukaryotes
Result of the SignalP method based on the hidden markov model for BACR_HALSA with the prediction method for eukaryotes



gram-negative bacteria

Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 42 42 43 no signal peptide

Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
Non-secretory protein 0.000 0.000
Result of the SignalP method based on the neuronal network for BACR_HALSA with the prediction method for gram-negative bacteria
Result of the SignalP method based on the hidden markov model for BACR_HALSA with the prediction method for gram-negative bacteria



gram-positive bacteria

Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 33 33 34 no signal peptide

Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
Non-secretoy protein 0.000 0.000
Result of the SignalP method based on the neuronal network for BACR_HALSA with the prediction method for gram-positive bacteria
Result of the SignalP method based on the hidden markov model for BACR_HALSA with the prediction method for gram-positive bacteria


Only the eukaryotic prediction method predicts a signal peptide, whereas the both methods for bacteria predict, that this protein has no signal peptide. Otherwise, only the eukaryotic prediction method predict the protein as a signal anchor, which is correct, because BACR_HALSA is a transmembrane protein. Therefore, it seemds, that the eukaryotic prediction method suited better for BACR_HALSA

  • RET4_HUMAN



Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 18 18 19 signal peptide

Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
signal peptide 1.000 0.000 18 19
Result of the SignalP method based on the neuronal network for RET4_HUMAN
Result of the SignalP method based on the hidden markov model for RET4_HUMAN

Both methods predict a signal peptide for RET4_HUMAN, which is correct.

  • INSL5_HUMAN



Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 22 22 23 signal peptide

Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
signal peptide 0.999 0.000 22 23
Result of the SignalP method based on the neuronal network for INSL5_HUMAN
Result of the SignalP method based on the hidden markov model for INSL5_HUMAN

Both methods predict a signal peptide for RET4_HUMAN, which is correct.

  • LAMP1_HUMAN



Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 28 28 29 signal peptide

Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
signal peptide 1.000 0.000 28 29
Result of the SignalP method based on the neuronal network for LAMP1_HUMAN
Result of the SignalP method based on the hidden markov model for LAMP1_HUMAN

Both methods predict a signal peptide for LAMP1_HUMAN, which is not correct, because LAMP1_HUMAN is a transmembrane protein.

  • A4_HUMAN



Result of the neuronal network

Signal peptide Clevage site
start position end position start position end position prediction
1 17 17 18 signal peptide

Result of the hidden markov model

prediction signal peptide probability signal anchor probability cleavage site start cleavage site end
signal peptide 1.000 0.000 17 18
Result of the SignalP method based on the neuronal network for A4_HUMAN
Result of the SignalP method based on the hidden markov model for A4_HUMAN

Both methods predict a signal peptide for A4_HUMAN, which is not correct, because A4_HUMAN is a transmembrane protein.



Comparison of the different methods



We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.

  • Comparison of transmembrane helix prediction



Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.

methods
TMHMM Phobius PolyPhobius OCTOPUS SPOCTOPUS Transmembrane protein
HEXA_HUMAN #wrong transmembrane 0 0 0 0 0 no
#wrong outside 0 0 0 0 0
#wrong insde 0 0 0 0 0
#wrong sum 0 0 0 0 0
%wrong predicted 0% 0% 0% 0% 0%
BACR_HALSA #wrong transmembrane 24 20 12 16 11 yes (7 transmembrane helices)
#wrong outside 46 5 3 4 6
#wrong inside 4 4 2 0 0
#wrong sum 74 29 17 20 17
%wrong predicted 29% 11% 6% 8% 6%
RET4_HUMAN #wrong transmembrane 0 0 0 5 0 no
#wrong outside 0 0 0 0 0
#wrong inside 0 0 0 0 0
#wrong sum 0 0 0 5 0
%wrong predicted 0% 0% 0% 2% 0%
INSL5_HUMAN #wrong transmembrane 0 0 0 10 0 no
#wrong outside 0 0 0 0 0
#wrong inside 0 0 0 0 0
#wrong sum 0 0 0 10 0
%wrong predicted 0% 0% 0% 8% 0%
LAMP1_HUMAN #wrong transmembrane 5 3 4 3 1 yes (single-spanning)
#wrong outside 2 0 0 1 1
#wrong inside 0 0 0 1 1
#wrong sum 7 3 4 5 3
%wrong predicted 2% 0% 1% 1% 0%
A4_HUMAN #wrong transmembrane 0 0 0 0 0 yes (single-spanning)
#wrong outside 1 1 1 1 2
#wrong inside 0 0 0 1 1
#wrong sum 1 1 1 2 3
%wrong predicted 0% 0% 0% 0% 0%
Average number of wrong predicted residues
13.6 5.5 3.6 7 3.8

TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices. SPOCTOPUS and PolyPhobius are the best prediction methods.

In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.

  • Comparison of signal peptide prediction



Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius. TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.

methods
real position Phobius PolyPhobius SPOCTOPUS TargetP SignalP
HEXA_HUMAN stop position 22 22 19 21 no prediction 22
#wrong residues 0 3 3 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
BACR_HALSA stop position not available no prediction no prediction no prediction no prediction no consensus prediction
#wrong predicted not available not available not available not available no prediction not available
location membrane not available not available not available secretory pathway non-signal peptide
RET4_HUMAN stop position 18 18 18 19 no prediction 18
#wrong predicted 0 0 1 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
INSL5_HUMAN stop position 22 22 22 22 no prediction 22
#wrong residues 0 0 0 no prediction 0
location secretory pathway secretory pathway secretory pathway no prediction secretory pathway no prediction
LAMP1_HUMAN stop position 28 28 28 29 no prediction 28
#wrong residues 0 0 1 no prediction 0
location transmembrane helix secretory pathway secretory pathway no prediction secretory pathway no prediction
A4_HUMAN stop position 17 17 17 18 no prediction 17
#wrong residues 0 0 1 no prediction 0
location transmembrane helix secretory pathway secretory pathway no prediction secretory pathway secretory pathway
Average number of wrong prediction
sum of wrong predicted residues 0 3 2 no prediction 0
#right predicted locations / #predicted locations 3/5 3/5 no prediction 3/5 no prediction

SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not. In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices. The results of Phobius were a litte bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.



  • Comparison of the combined methods



The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.

methods
Phobius PolyPhobius SPOCTOPUS
 HEXA_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 3 2
location right right no prediction
 BACR_HALSA #wrong predicted residues (TM) 29 17 17
#wrong predicted residues (SP) n.a. n.a. n.a.
location n.a n.a no prediction
RET4_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 0 0
location right right no prediction
INSL5_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 0 0 1
location right right no prediction
LAMP1_HUMAN #wrong predicted residues (TM) 3 4 3
#wrong predicted residues (SP) 0 0 0
location wrong wrong no prediction
A4_HUMAN #wrong predicted residues (TM) 0 0 0
#wrong predicted residues (SP) 1 1 3
location wrong wrong no prediction
 Average
avg(#wrong predicted residues (TM)) 5.3 3.5 3.3
avg(#wrong predicted residues (SP)) 0.1 0.6 1
#location (right predicted) / #location(predicted) 3/5 3/5 no prediction

In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.

Prediction of GO terms

Before we start with out analysis, we decided to check the GO annotations for the six sequences:

HEXA_HUMAN
Process skeletal system development
carbohydrate metabolic process
ganglioside catabolic process
lysosome organization
sensory perception of sound
locomotory behavior
adult walking behavior
lipid storage
sexual reproduction
glycosaminoglycan metabolic process
myelination
cell morphogenesis involved in neuron differentiation
neuromuscular process controlling posture
neuromuscular process controlling balance
Function catalytic activity
hydrolase activity, hydrolyzing O-glycosyl compounds
beta-N-acetylhexosaminidase activity
protein binding
hydrolase activity
hydrolase activity, acting on glycosyl bonds
cation binding
protein heterodimerization activity
Component lysosome
membrane
BACR_HALSA
Process transport
ion transport
phototransduction
proton transport
protein-chromophore linkage
response to stimulus
Function receptor activity
ion channel activity
photoreceptor activity
Component plasma membrane
membrane
integral to membrane
 RET4_HUMAN
Process eye development
gluconeogenesis
transport
spermatogenesis
heart development
visual perception
male gonad development
embryo development
maintenance of gastrointestinal epithelium
lung development
positive regulation of insulin secretion
response to retinoic acid
response to insulin stimulus
retinol transport
retinol metabolic process
glucose homeostasis
response to ethanol
embryonic organ morphogenesis
embryonic skeletal system development
cardiac muscle tissue development
female genitalia morphogenesis
detection of light stimulus involved in visual perception
positive regulation of immunoglobulin secretion
retina development in camera-type eye
negative regulation of cardiac muscle cell proliferation
embryonic retina morphogenesis in camera-type eye
uterus development
vagina development
urinary bladder development
heart trabecula formation
Function transporter activity
binding
retinoid binding
protein binding
retinal binding
retinol binding
retinol transporter activity
Component extracellular region
extracellular space
 INSL5_HUMAN
Process biological_process
Function hormone activity
 Component cellular_component
extracellular region
LAMP1_HUMAN
Process autophagy
 Component membrane fraction
lysosome
lysosomal membrane
endosome
late endosome
multivesicular body
plasma membrane
integral to plasma membrane
external side of plasma membrane
cell surface
endosome membrane
membrane
integral to membrane
vesicle
sarcolemma
melanosome
 A4_HUMAN
 Process G2 phase of mitotic cell cycle
suckling behavior
platelet degranulation
mRNA polyadenylation
regulation of translation
protein phosphorylation
cellular copper ion homeostasis
endocytosis
apoptosis
induction of apoptosis
cell adhesion
regulation of epidermal growth factor receptor activity
Notch signaling pathway
axonogenesis
blood coagulation
mating behavior
locomotory behavior
axon cargo transport
cell death
adult locomotory behavior
visual learning
negative regulation of peptidase activity
positive regulation of peptidase activity
axon midline choice point recognition
neuron remodeling
dendrite development
platelet activation
extracellular matrix organization
forebrain development
neuron projection development
ionotropic glutamate receptor signaling pathway
regulation of multicellular organism growth
innate immune response
negative regulation of neuron differentiation
positive regulation of mitotic cell cycle
positive regulation of transcription from RNA polymerase II promoter
collateral sprouting in absence of injury
regulation of synapse structure and activity
neuromuscular process controlling balance
synaptic growth at neuromuscular junction
neuron apoptosis
smooth endoplasmic reticulum calcium ion homeostasis
 Function DNA binding
serine-type endopeptidase inhibitor activity
receptor binding
binding
protein binding
peptidase activator activity
peptidase inhibitor activity
acetylcholine receptor binding
identical protein binding
metal ion binding
PTB domain binding
 Component extracellular region
membrane fraction
cytoplasm
Golgi apparatus
plasma membrane
integral to plasma membrane
coated pit
cell surface
membrane
integral to membrane
synaptosome
axon
platelet alpha granule lumen
cytoplasmic vesicle
neuromuscular junction
ciliary rootlet
neuron projection
dendritic spine
dendritic shaft
intracellular membrane-bounded organelle
apical part of cell
synapse
perinuclear region of cytoplasm
spindle midzone


A detailed list of the GO annotation terms of each protein can be found [here].

GOPET

We tried to predict the GO annotations with GOPET for our six different proteins.

  • HEXA_HUMAN



Result of the GOPET prediction for HEXA_HUMAN

The method only predicts functional GO terms. HEXA_HUMAN has 8 annotated GO functions. The methods predicts also 8 GO function terms. Therefore we decided to check if all predictions are correct. We checked if the general term is correct and also if the GO number is correct.

GO term confidence prediction term prediction GOid
hexosamidase activity 97% right wrong
beta-N-acetylhexosamidase activity 96% right right
hydrolase activity 96% right right
hydrolase activity acting on glycosyl bonds 96% right right
hydrolase activity hydrolyzing O-glycosyl compounds 96% right right
catalytic activity 96% right right
hydrolase activity hydrolyzing N-glycosyl compounds 78% wrong wrong
protein heterodimerization activity 61% right right



  • BACR_HALSA



Result of the GOPET prediction for BACR_HALSA

The method only predicts functional GO terms. BACR_HALSA has 3 annotated GO functions. The methods predicts also 3 GO function terms. Therefore we decided to check if all predictions are correct.

GO term confidence prediction term prediction GOid
ion channel activity 77% right right
G-protein coupled photoreceptor activity 75% right wrong
hydrogen ion transmembrane transporter activity 60% wrong wrong



  • RET4_HUMAN



Result of the GOPET prediction for RET4_HUMAN

The method only predicts functional GO terms. RET4_HUMAN has 7 annotated GO functions. The methods predicts 8 GO function terms. Therefore we decided to check if all predictions are correct.

GO term confidence prediction term prediction GOid
binding 90% right right
retiniod binding 81& right right
lipid binding 80% wrong wrong
retional binding 78% right right
transporter activity 78% right right
retinal binding 78% right right
lipid transport activity 69% wrong wrong
high-density lipoprotein particle binding 60% wrong wrong



  • INSL5_HUMAN



Result of the GOPET prediction for INSL5_HUMAN

The method only predicts functional GO terms. INSL5_HUMAN has 1 annotated GO functions. The methods predicts also 1 GO function terms. Therefore we decided to check if all predictions are correct.

GO term confidence prediction term prediction GOid
hormone activity 80% right right



  • LAMP1_HUMAN



Result of the GOPET prediction for LAMP1_HUMAN

The method only predicts functional GO terms. LAMP1_HUMAN has 0 annotated GO functions. The methods predicts 2 GO function terms. Therefore the predictions are wrong.

  • A4_HUMAN



Result of the GOPET prediction for A4_HUMAN

The method only predicts functional GO terms. A4_HUMAN has 11 annotated GO functions. The methods predicts 13 GO function terms. Therefore we decided to check if all predictions are correct.

GO term confidence prediction term prediction GOid
endopeptidase inhibitor activity 87% right wrong
serine-type endopeptidase inhibitor activity 86% right right
plasmin inhibitor activity 83% wrong wrong
trypsin inhibitor activtiy 83% wrong wrong
peptidase inhibitor activity 82% right right
binding 79% right right
protein binding 74% right right
metal ion binding 73% right right
DNA binding 71% right right
heparin binding 70% wrong right
zinc ion binding 69% wrong wrong
copper ion binding 69% wrong wrong
iron ion binding 67% wrong wrong





Pfam

We used the webserver for our analysis. We decided to only trust the significant Pfam-A matches. To check if the predictions are correct we mapped the Pfam ids to the Go ids with help of a mapping website [[1]]. If a successful mapping was not possible, we compared the names of the predicted Pfam family with the names of the GO terms. If the names are similar or equal, we decided to trust the mapping.



  • HEXA_HUMAN

Graphical representation of the prediction result of Pfam:

Result of the Pfam prediction for HEXA_HUMAN

Pfam found two significant Pfam-A matches:

Family E-Value GO id prediction
Glycosyl hydrolase family 20, domain 2 3.7e-43 GO:0004553 right
Glycosyl hydrolase family 20, catalytic domain 1.8e-84 GO:0005975 right



  • BACR_HALSA



Graphical representation of the prediction result of Pfam:

Result of the Pfam prediction for BACR_HALSA

Pfam found one significant Pfam-A matches:

Family E-Value GOid prediction
 Bacteriorhodopsin-like protein  2e-88 GO:0005216 right
GO:0006811 right
GO:0016020 right



  • RET4_HUMAN



Graphical representation of the prediction result of Pfam:

Result of the Pfam prediction for RET4_HUMAN

Pfam found one significant Pfam-A matches:

Family E-Value GOid prediction
Lipocalin/cytosolic fatty-acid binding protein family 1.7e-22 GO:0005488 right



  • INSL5_HUMAN



Graphical representation of the prediction result of Pfam:

Result of the Pfam prediction for LAMP1_HUMAN

Pfam found two significant Pfam-A matches:

Family E-Value GOid prediction
 Insulin/IGF/Relaxin family  6.7e-08 GO:0005179 right
GO:0005576 right



  • LAMP1_HUMAN



Graphical representation of the prediction result of Pfam:

Result of the Pfam prediction for LAMP1_HUMAN

Pfam found one significant Pfam-A matches:

Family E-Value GOid prediction
Lysosome-associated membrane glyoprotein (LAMP) 2.3e-135 GO:0016020 right



  • A4_HUMAN



Graphical representation of the prediction result of Pfam:

Result of the Pfam prediction for A4_HUMAN

Pfam found six significant Pfam-A matches:

Family E-Value GOid prediction
Amyloid A4 N-terminal heparin-binding 4e-42 none right
Copper-binding of amyloid precursor CuBD 2.3e-27 none right
Kunitz/Bovine pancreatic trypsin inhibitor domain 3e-19 GO:0004867 right
E2 domain of amyloid precursor protein 1.6e-74 none right
 Beta-amyloid peptide (beta-APP)  4.3e-28 GO:0005488 right
GO:0016021 right
Beta-amyloid precursor protein C-terminus 1.1e-29 none right





ProtFun 2.2



ProtFun 2.2 does not give clear predictions if the protein belongs to this class or not, instead it gives probabilities and odd scores. We decided to make a cutoff by 2. So all classes with an odd score of 2 or higher are right results for us. You can also find a "=>" sign in the result file. This sign shows the result with the highest information content. We also take this line as result, although if the odd score is lower than 2. If we only have result with a odd score lower than 2, the line with this sign is our onlyest result.
Because the prediction categories are very general, it was not possible to map the GOids. Therefore, we checked the known GO annotations. If there was a hint for a category and the protein was predicted to be in this category, we decided that the prediction is right, otherwise if the known GO annotations and the categories conflict, we count the prediction as wrong.

  • HEXA_HUMAN



The ProtFun Server calculated following prediction result for HEXA_HUMAN:

 Functional category
Functional category Probability Odd score Prediction
Amino acid biosynthesis 0.161 7.331 wrong
Biosynthesis of cofactors 0.332 4.609 right
Cell envelope 0.804 => 13.186 => right
Cellular processes 0.110 1.506 right
Central intermediary metabolism 0.432 6.856 right
Engergy metabolism 0.113 1.259 right
Fatty acid metabolsim 0.019 1.427 right
Purines and Pyrimidines 0.519 2.136 wrong
Regulatory functions 0.018 0.111 right
Replication and transcription 0.073 0.271 right
Translation 0.040 0.904 right
Transport and binding 0.685 1.670 right
 Enyzme/non-enzyme
Enzyme/non-enzyme Probability Odd score Prediction
Enzyme 0.792 => 2.764 => right
Nonenzyme 0.208 0.292 right
 Enyzme class
Enzyme class Probability Odd score Prediction
Oxidoreductase (EC 1.-.-.-) 0.143 0.685 right
Transferase (EC 2.-.-.-) 0.201 0.582 right
Hydrolase (EC 3.-.-.-) 0.329 1.039 wrong
Lyase (EC 4.-.-.-) 0.054 1.143 right
Isomerase (EC 5.-.-.-) 0.027 0.856 right
Ligase (EC 6.-.-.-) 0.085 => 1.661 => right
 Gene ontology category
Gene ontology category Probability Odd score Prediction
Signal transducer 0.083 0.389 right
Receptor 0.105 0.617 right
Hormone 0.001 0.206 right
Structural protein 0.010 0.357 right
Transporter 0.024 0.222 right
Ion channel 0.018 0.310 right
Volatge-gated ion channel 0.002 0.082 right
Cation channel 0.010 0.218 right
Transcription 0.058 0.453 right
Transcription regulation 0.026 0.205 right
Stress response 0.004 0.500 right
Immune response 0.014 0.167 right
Growth factor 0.005 0.372 right
Metal ion transport 0.009 0.020 right



  • BACR_HALSA



The ProtFun Server calculated following prediction result for BACR_HALSA:

Functional category
Functional category Probability Odd score Prediction
Amino acid biosynthesis 0.033 1.495 right
Biosynthesis of cofactors 0.186 2.589 wrong
Cell envelope 0.029 0.483 right
Cellular processes 0.051 0.698 right
Central intermediary metabolism 0.045 0.711 right
Engergy metabolism 0.138 1.537 right
Fatty acid metabolsim 0.016 1.265 right
Purines and Pyrimidines 0.302 1.244 right
Regulatory functions 0.013 0.080 wrong
Replication and transcription 0.019 0.073 right
Translation 0.059 1.339 right
Transport and binding 0.791 => 1.929 => right
Enyzme/non-enzyme
Enzyme/non-enzyme Probability Odd score Prediction
Enzyme 0.199 0.696 right
Nonenzyme 0.801 => 1.122 => right
Enyzme class
Enzyme class Probability Odd score Prediction
Oxidoreductase (EC 1.-.-.-) 0.114 0.549 right
Transferase (EC 2.-.-.-) 0.031 0.091 right
Hydrolase (EC 3.-.-.-) 0.057 0.180 right
Lyase (EC 4.-.-.-) 0.020 0.430 right
Isomerase (EC 5.-.-.-) 0.010 0.321 right
Ligase (EC 6.-.-.-) 0.017 0.625 right
Gene ontology category
Gene ontology category Probability Odd score Prediction
Signal transducer 0.258 1.205 wrong
Receptor 0.355 2.087 right
Hormone 0.001 0.206 right
Structural protein 0.006 0.200 right
Transporter 0.440 => 4.036 => right
Ion channel 0.010 0.169 wrong
Volatge-gated ion channel 0.004 0.172 right
Cation channel 0.078 1.689 right
Transcription 0.026 0.205 right
Transcription regulation 0.028 0.226 right
Stress response 0.012 0.139 right
Immune response 0.011 0.128 right
Growth factor 0.010 0.727 right
Metal ion transport 0.049 0.106 right



  • RET4_HUMAN



The ProtFun Server calculated following prediction result for RET4_HUMAN:

Functional category
Functional category Probability Odd score Prediction
Amino acid biosynthesis 0.017 0.751 right
Biosynthesis of cofactors 0.044 0.610 right
Cell envelope 0.804 => 13.186 => right
Cellular processes 0.075 1.021 wrong
Central intermediary metabolism 0.197 3.128 right
Engergy metabolism 0.043 0.475 right
Fatty acid metabolsim 0.016 1.265 right
Purines and Pyrimidines 0.275 1.131 right
Regulatory functions 0.013 0.080 right
Replication and transcription 0.022 0.084 right
Translation 0.032 0.721 right
Transport and binding 0.800 1.951 wrong
Enyzme/non-enzyme
Enzyme/non-enzyme Probabilty Odd score Prediction
Enzyme 0.544 => 1.900 => right
Nonenzyme 0.456 0.639 right
Enyzme class
Enzyme class Probabilty Odd score Prediction
Oxidoreductase (EC 1.-.-.-) 0.095 0.458 right
Transferase (EC 2.-.-.-) 0.038 0.109 right
Hydrolase (EC 3.-.-.-) 0.235 0.742 right
Lyase (EC 4.-.-.-) 0.059 => 1.264 => wrong
Isomerase (EC 5.-.-.-) 0.010 0.321 right
Ligase (EC 6.-.-.-) 0.017 0.326 right
Gene ontology category
Gene ontology category Probability Odd score Prediction
Signal transducer 0.202 0.942 right
Receptor 0.147 0.862 right
Hormone 0.004 0.667 right
Structural protein 0.002 0.058 right
Transporter 0.025 0.232 right
Ion channel 0.016 0.288 right
Volatge-gated ion channel 0.003 0.148 right
Cation channel 0.010 0.215 right
Transcription 0.027 0.207 right
Transcription regulation 0.025 0.196 right
Stress response 0.161 1.829 right
Immune response 0.239 => 2.813 => wrong
Growth factor 0.023 1.617 right
Metal ion transport 0.009 0.020 right



  • INSL5_HUMAN



The ProtFun Server calculated following prediction result for INSL5_HUMAN:

Functional category
Functional category Probability Odd score Prediction
Amino acid biosynthesis 0.011 0.484 right
Biosynthesis of cofactors 0.040 0.558 right
Cell envelope 0.756 => 12.393 => right
Cellular processes 0.033 0.448 right
Central intermediary metabolism 0.048 0.755 right
Engergy metabolism 0.036 0.397 right
Fatty acid metabolsim 0.016 1.265 right
Purines and Pyrimidines 0.144 0.592 right
Regulatory functions 0.014 0.087 right
Replication and Transcription 0.020 0.075 right
Translation 0.032 0.735 right
Transport and binding 0.834 2.033 right
Enyzme/non-enzyme
Enzyme/non-enzyme Probability Odd score Prediction
Enzyme 0.209 0.729 right
Nonenzyme 0.791 => 1.109 => right
Enyzme class
Enzyme class Probabilty Odd score Prediction
Oxidoreductase (EC 1.-.-.-) 0.056 0.268 right
Transferase (EC 2.-.-.-) 0.031 0.091 right
Hydrolase (EC 3.-.-.-) 0.062 0.195 right
Lyase (EC 4.-.-.-) 0.020 0.430 right
Isomerase (EC 5.-.-.-) 0.010 0.321 right
Ligase (EC 6.-.-.-) 0.017 0.327 right
Gene ontology category
Gene ontology category Probability Odd score Prediction
Signal transducer 0.374 1.746 right
Receptor 0.128 0.750 right
Hormone 0.247 => 37.936 => right
Structural protein 0.001 0.041 right
Transporter 0.025 0.228 right
Ion channel 0.010 0.168 right
Volatge-gated ion channel 0.003 0.131 right
Cation channel 0.010 0.215 right
Transcription 0.054 0.425 right
Transcription regulation 0.091 0.724 right
Stress response 0.099 1.128 right
Immune response 0.178 2.090 wrong
Growth factor 0.061 4.379 wrong
Metal ion transport 0.009 0.020 right



  • LAMP1_HUMAN



The ProtFun Server calculated following prediction result for LAMP1_HUMAN:

Functional category
Functional category Probability Odd score Prediction
Amino acid biosynthesis 0.011 0.484 right
Biosynthesis of cofactors 0.053 0.735 right
Cell envelope 0.804 => 13.186 => right
Cellular processes 0.027 0.373 right
Central intermediary metabolism 0.138 2.188 right
Engergy metabolism 0.037 0.411 right
Fatty acid metabolsim 0.016 1.265 right
Purines and Pyrimidines 0.533 2.195 wrong
Regulatory functions 0.015 0.090 right
Replication and transcription 0.019 0.073 right
Translation 0.027 0.613 right
Transport and binding 0.834 2.033 right
Enyzme/non-enzyme
Enzyme/non-enzyme Probability Odd score Prediction
Enzyme 0.276 0.965 right
Nonenzyme 0.724 => 1.014 => right
Enyzme class
Enzyme class Probability Odd score Prediction
Oxidoreductase (EC 1.-.-.-) 0.039 0.187 right
Transferase (EC 2.-.-.-) 0.046 0.134 right
Hydrolase (EC 3.-.-.-) 0.058 0.184 right
Lyase (EC 4.-.-.-) 0.020 0.430 right
Isomerase (EC 5.-.-.-) 0.010 0.321 right
Ligase (EC 6.-.-.-) 0.017 0.326 right
Gene ontology category
Gene ontology category Probability Odd score Prediction
Signal transducer 0.396 1.849 right
Receptor 0.282 1.659 right
Hormone 0.001 0.206 right
Structural protein 0.011 0.408 right
Transporter 0.024 0.222 right
Ion channel 0.008 0.147 right
Volatge-gated ion channel 0.002 0.111 right
Cation channel 0.010 0.215 right
Transcription 0.032 0.247 right
Transcription regulation 0.018 0.142 right
Stress response 0.246 2.795 right
Immune response 0.371 => 4.368 => right
Growth factor 0.013 0.956 right
Metal ion transport 0.009 0.020 right



  • A4_HUMAN



The ProtFun Server calculated following prediction result for A4_HUMAN:

Functional category
Functional category Probabilty Odd score Prediction
Amino acid biosynthesis 0.020 0.921 right
Biosynthesis of cofactors 0.261 3.623 right
Cell envelope 0.804 => 13.186 => right
Cellular processes 0.053 0.070 right
Central intermediary metabolism 0.184 2.920 right
Engergy metabolism 0.023 0.259 right
Fatty acid metabolsim 0.016 1.265 right
Purines and Pyrimidines 0.417 1.716 right
Regulatory functions 0.013 0.084 wrong
Replication and transcription 0.029 0.109 right
Translation 0.027 0.613 right
Transport and binding 0.827 2.016 right
Enyzme/non-enzyme
Enzyme/non-enzyme Probability Odd score Prediction
Enzyme 0.392 => 1.368 => right
Nonenzyme 0.608 0.852 right
Enyzme class
Enzyme class Probability Odd score Prediction
Oxidoreductase (EC 1.-.-.-) 0.024 0.114 right
Transferase (EC 2.-.-.-) 0.208 0.603 right
Hydrolase (EC 3.-.-.-) 0.190 0.600 right
Lyase (EC 4.-.-.-) 0.020 0.430 right
Isomerase (EC 5.-.-.-) 0.010 0.324 right
Ligase (EC 6.-.-.-) 0.048 0.946 right
Gene ontology category
Gene ontology category Probability Odd score Prediction
Signal transducer 0.126 0.586 right
Receptor 0.036 0.211 right
Hormone 0.001 0.206 right
Structural protein 0.034 => 1.205 => right
Transporter 0.024 0.222 right
Ion channel 0.009 0.162 right
Volatge-gated ion channel 0.002 0.108 right
Cation channel 0.010 0.215 right
Transcription 0.043 0.335 right
Transcription regulation 0.018 0.143 right
Stress response 0.076 0.862 right
Immune response 0.016 0.183 right
Growth factor 0.005 0.372 right
Metal ion transport 0.009 0.020 right





Comparison of the different methods



It is difficult to compare these methods. First of all, two methods are based on homology-based prediction, whereas ProtFun is based on ab initio prediction. So it is clear, that the results differ. Second, each method has another prediction focus and called the results a little bit different. Only GOPET predicts exact GO numbers, the other two methods only predict the approximate functions and processes.
Therefore, to compare the results, we decided to calculate the fraction of right prediction and the ratio between right predictions and annotated GO terms.

methods
GOPET terms GOPET GOids Pfam ProtFun
HEXA_HUMAN #true positive 7 7 2 31
#false negative 1 1 0 3
#predictions 8 8 2 34
#GO terms  25
true positive (in %) 0.87 0.87 1 0.91
ratio true positive/annotated GO terms 0.28 0.28 0.08 not possible
BACR_HALSA #true positive 2 1 1 30
#false negative 1 2 0 4
#predictions 3 3 1 34
#GO terms  12
true positive (in %) 0.66 0.33 1 0.88
ratio true positive/annotated GO terms 0.16 0.08 0.08 not possible
RET4_HUMAN #true positive 5 5 1 30
#false negative 3 3 0 4
#predictions 8 8 1 34
#GO terms  41
true positive (in %) 0.62 0.62 1 0.88
ratio true positive/annotated GO terms 0.12 0.12 0.02 not possible
INSL5_HUMAN #true positive 1 1 1 32
#false negative 0 0 0 2
#predictions 1 1 1 34
#GO terms  4
true positive (in %) 1 1 1 0.94
ratio true positive/annotated GO terms 0.25 0.25 0.25 not possible
LAMP1_HUMAN #true positive 0 0 1 33
#false negative 2 2 0 1
#predictions 2 2 1 34
#GO terms  17
true positive (in %) 0 0 1 0.97
ratio true positive/annotated GO terms 0 0 0.05 not possible
A4_HUMAN #true positive 7 7 6 33
#false negative 6 6 0 1
#predictions 13 13 6 34
#GO terms  78
true positive (in %) 0.53 0.53 1 0.97
ratio true positive/annotated GO terms 0.08 0.08 0.07 not possible

As you can see in the tabel above, each method only predict a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the onyl method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive and it also predicts more GO terms than the other methods.
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms.
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.