Sequence-based predictions HEXA
Contents
General Information
Secondary Structure Prediction
To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods gave similar results or if they differ extremely.
[Here] you can find some general information about these methods.
Prediction of disordered regions
After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.
To get more insight in the methods and the theory behind them we also offer you an [general information page].
Prediction of transmembrane helices and signal peptides
The third big analysis section is the prediction of transmembrane helices and signal peptides. We merged the prediction of transmembrane helices and signal peptides in one section, because there are several prediction methods which can predict both and therefore we looked at both predictions in this section.
Therefore we used several methods, some which only predict transmembrane helices, some which only predict signal peptides and some combined methods.
To have a closer look at the different methods we again provide an [information page.]
Prediction of GO Terms
The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other.
Again we also provide an [general information page] about the GO Term methods, we used in our analysis.
Secondary Structure prediction
Results
The detailed output of the different prediction methods can be found [here]
Here we only present a short summary of the output of the different methods.
- Predicted Helices
method | #helices |
PSIPRED | 14 |
Jpred3 | 14 |
DSSP | 16 |
- Predicted Beta-Sheets
method | #sheets |
PSIPRED | 15 |
Jpred3 | 15 |
DSSP | 0 |
Comparison of the different methods
To determine how succesful our secondary structure prediction with PSIPRED and Jpred are, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore the main comparison in this case refers to the alpha-helices.
For PSIPRED the prediction of the alpha-helices was good. In the most cases the alpha-helices of DSSP und PSIPRED corrspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones.
For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not also assigned by DSSP. In contrary there are three small helics which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assignes two helices which are separated by a turn and Jpred predicts there only one big helix.
All in all, the prediction of the helices is probably good because they correspond mostly with the assignmet of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all.
Prediction of disordered regions
Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disoredered regions. Therefore, we search our protein in the DisProt database and didn't found it, so our protein doesn't have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check in UniProt, if there is an entry for DisProt.
Results
The detailed results of the different methods can be found [here]
In this section, we only want to give a summary of the output of the different methods.
method | #disordered regions in the protein | #disordered regions on the brink |
Disopred | 0 | 2 |
POODLE-I | 3 | 2 |
POODLE-L | 0 | 0 |
POODLE-S (B-factors) | 3 | 2 |
POODLE-S (missing residues) | 4 | 2 |
IUPred (short) | 0 | 2 |
IUPred (long) | 0 | 0 |
IUPred (structural information) | 0 | 0 |
Meta-Disorder | 0 | 0 |
Comparison of the different POODLE variants
POODLE-L doesn't find any disordered regions. This is the result we expected, because our protein doesn't posses any disordered regions.
Both POODLE-S variants found several short disordered regions, which is a false positive result. Interesstingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.
POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).
Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.
In general, these predictions are used, if nothing is known about the protein. Therefore, normally we don't know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important. We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.
functional residues | disordered | |||
---|---|---|---|---|
residue position | amino acid | function | POODLE-S (missing) | POODLE-I |
323 | E | active site | ordered | ordered |
115 | N | Glycolysation | ordered | ordered |
157 | N | Glycolysation | ordered | ordered |
259 | N | Glycolysation | ordered | ordered |
58 (connected with 104) | C | Disulfide bond | disordered | ordered |
104 (connected with 58) | C | Disulfide bond | disordered | ordered |
277 (connected with 328) | C | Disulfide bond | ordered | ordered |
328 (connected with 277) | C | Disulfide bond | ordered | ordered |
505 (connected with 522) | C | Disulfide bond | ordered | ordered |
522 (connected with 505) | C | Disulfide bond | ordered | ordered |
As you can see in the table above, only one disulfide bond is located in a disordered region, all other functionally important residues are located in ordered regions. This is a further good hint, that the predictions are wrong.
Comparison of the different methods
We decided to compare the results of the different methods. Therefore, we count how many residues are predicted as disordered, which is wrong in our case.
methods | |||||||||
Disopred | POODLE-I | POODLE-L | POODLE-S (missing) | POODLE-S (B-factor) | IUPred (short) | IUPred (long) | IUPred (structure) | Meta-Disorder | |
#wrong predicted residues | 5 | 23 | 0 | 47 | 24 | 3 | 0 | 0 | 0 |
POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct.
The baddest prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).
Prediction of transmembrane alpha-helices and signal peptides
Because most of the proteins we used in this practical are not membrane proteins, we got five additional proteins for the transmembrane and signal peptide analyses.
Additional proteins:
name | organism | location | transmembrane protein | sequence |
BACR_HALSA | Halobacterium salinarium (Archaea) | Cell membrane | Multi-pass membrane protein | [P02945.fasta] |
RET4_HUMAN | Human (Homo sapiens) | extracellular space | No | [P02753.fasta] |
INSL5_HUMAN | Human (Homo sapiens) | extracellular region | No | [Q9Y5Q6.fasta] |
LAMP1_HUMAN | Human (Homo sapiens) | Cell membrane | Single-pass membrane protein | [P11279.fasta] |
A4_HUMAN | Human (Homo sapiens) | Cell membrane | Single-pass membrane protein | [P05067.fasta] |
The detailed output for the different organism and the different prediction methods can be found here:
- [HEXA_HUMAN]
- [BACR_HALSA]
- [RET4_HUMAN]
- [INSL5_HUMAN]
- [LAMP1_HUMAN]
- [A4_HUMAN]
Results
Transmembrane Helices
TMHMM | Phobius | PolyPhobius | OCTOPUS | SPOCTOPUS | |||||||||||
protein | start position | end position | location | start position | end position | location | start position | end position | location | start position | end position | location | start position | end position | location |
HEXA HUMAN | 1 | 529 | outside | 23 | 529 | outside | 20 | 520 | outside | 1 | 2 | inside | 22 | 529 | outside |
3 | 23 | TM helix | |||||||||||||
24 | 529 | outside | |||||||||||||
BACR HALSA | 1 | 22 | outside | 1 | 22 | outside | 1 | 22 | outside | ||||||
23 | 42 | TM Helix | 23 | 42 | TM helix | 22 | 43 | TM helix | 23 | 43 | TM helix | 23 | 43 | TM helix | |
43 | 54 | inside | 43 | 53 | inside | 44 | 54 | inside | 44 | 54 | inside | 44 | 54 | inside | |
55 | 77 | TM Helix | 54 | 76 | TM helix | 55 | 77 | TM helix | 55 | 75 | TM helix | 55 | 75 | TM helix | |
78 | 91 | outside | 77 | 95 | outside | 78 | 94 | outside | 76 | 95 | outside | 76 | 95 | outside | |
92 | 114 | TM Helix | 96 | 114 | TM helix | 95 | 114 | TM helix | 96 | 116 | TM helix | 96 | 116 | TM helix | |
115 | 120 | inside | 115 | 120 | inside | 115 | 120 | inside | 117 | 121 | inside | 117 | 120 | inside | |
121 | 143 | TM Helix | 121 | 142 | TM helix | 121 | 141 | TM helix | 122 | 142 | TM helix | 121 | 141 | TM helix | |
144 | 147 | outside | 143 | 147 | outside | 142 | 147 | outside | 143 | 147 | outside | 142 | 147 | outside | |
148 | 170 | TM Helix | 148 | 169 | TM helix | 148 | 166 | TM helix | 148 | 168 | TM helix | 148 | 168 | TM helix | |
171 | 189 | inside | 170 | 189 | inside | 167 | 186 | inside | 169 | 185 | inside | 169 | 185 | inside | |
190 | 212 | TM Helix | 190 | 212 | TM helix | 187 | 205 | TM helix | 186 | 206 | TM helix | 186 | 206 | TM helix | |
213 | 262 | outside | 213 | 217 | outside | 206 | 215 | outside | 207 | 216 | outside | 207 | 216 | outside | |
218 | 237 | TM helix | 216 | 237 | TM helix | 217 | 237 | TM helix | 217 | 237 | TM helix | ||||
238 | 262 | inside | 238 | 262 | inside | 238 | 262 | inside | 238 | 262 | inside | ||||
RET4 HUMAN | 1 | 1 | inside | ||||||||||||
2 | 23 | TM helix | |||||||||||||
1 | 201 | outside | 19 | 201 | outside | 19 | 201 | outside | 24 | 201 | outside | 20 | 201 | outside | |
INSL5 HUMAN | 1 | 1 | inside | ||||||||||||
2 | 32 | TM helix | |||||||||||||
1 | 135 | outside | 23 | 135 | outside | 23 | 135 | outside | 33 | 135 | outside | 24 | 135 | outside | |
LAMP1 HUMAN | 1 | 10 | inside | 1 | 10 | inside | |||||||||
11 | 33 | TM Helix | 11 | 31 | TM helix | ||||||||||
34 | 383 | outside | 29 | 381 | outside | 29 | 381 | outside | 32 | 383 | outside | 30 | 383 | outside | |
384 | 406 | TM Helix | 382 | 405 | TM helix | 382 | 405 | TM helix | 384 | 404 | TM helix | 384 | 404 | TM helix | |
407 | 417 | inside | 406 | 417 | outside | 406 | 417 | outside | 405 | 417 | outside | 405 | 417 | outside | |
A4 HUMAN | 1 | 5 | outside | ||||||||||||
6 | 11 | R | |||||||||||||
1 | 700 | outside | 18 | 700 | outside | 18 | 700 | outside | 12 | 701 | outside | 19 | 701 | outside | |
701 | 723 | TM Helix | 701 | 723 | TM helix | 701 | 723 | TM helix | 702 | 722 | TM helix | 702 | 722 | TM helix | |
724 | 770 | inside | 724 | 770 | inside | 724 | 770 | inside | 723 | 770 | inside | 723 | 770 | inside |
On the table above, you can see the summary of the results of the different methods which predict transmembrane helices.
Signal Peptide
Phobius | PolyPhobius | SPOCTOPUS | TargetP | SignalP | |||||
protein | start position | end position | start position | end position | start position | end position | location | start position | end position |
HEXA HUMAN | 1 | 22 | 1 | 19 | 7 | 21 | secretory pathway | 1 | 22 |
BACR HALSA | no prediction available | secretory pathway | 1 | 38 | |||||
RET4 HUMAN | 1 | 18 | 1 | 18 | 6 | 19 | secretory pathway | 1 | 18 |
INSL5 HUMAN | 1 | 22 | 1 | 22 | 6 | 23 | secretory pathway | 1 | 22 |
LAMP1 HUMAN | 1 | 28 | 1 | 28 | 12 | 29 | secretory pathway | 1 | 28 |
A4 HUMAN | 1 | 17 | 1 | 17 | 5 | 18 | secretory pathway | 1 | 15 |
In the last table there is a list with the results of the prediction of the signal peptides created by different methods.
Comparison of the different methods
We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.
- Comparison of transmembrane helix prediction
Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.
methods | |||||||
TMHMM | Phobius | PolyPhobius | OCTOPUS | SPOCTOPUS | Transmembrane protein | ||
HEXA_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 0 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong insde | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 0 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 0% | 0% | ||
BACR_HALSA | #wrong transmembrane | 24 | 20 | 12 | 16 | 11 | yes (7 transmembrane helices) |
#wrong outside | 46 | 5 | 3 | 4 | 6 | ||
#wrong inside | 4 | 4 | 2 | 0 | 0 | ||
#wrong sum | 74 | 29 | 17 | 20 | 17 | ||
%wrong predicted | 29% | 11% | 6% | 8% | 6% | ||
RET4_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 5 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong inside | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 5 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 2% | 0% | ||
INSL5_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 10 | 0 | no |
#wrong outside | 0 | 0 | 0 | 0 | 0 | ||
#wrong inside | 0 | 0 | 0 | 0 | 0 | ||
#wrong sum | 0 | 0 | 0 | 10 | 0 | ||
%wrong predicted | 0% | 0% | 0% | 8% | 0% | ||
LAMP1_HUMAN | #wrong transmembrane | 5 | 3 | 4 | 3 | 1 | yes (single-spanning) |
#wrong outside | 2 | 0 | 0 | 1 | 1 | ||
#wrong inside | 0 | 0 | 0 | 1 | 1 | ||
#wrong sum | 7 | 3 | 4 | 5 | 3 | ||
%wrong predicted | 2% | 0% | 1% | 1% | 0% | ||
A4_HUMAN | #wrong transmembrane | 0 | 0 | 0 | 0 | 0 | yes (single-spanning) |
#wrong outside | 1 | 1 | 1 | 1 | 2 | ||
#wrong inside | 0 | 0 | 0 | 1 | 1 | ||
#wrong sum | 1 | 1 | 1 | 2 | 3 | ||
%wrong predicted | 0% | 0% | 0% | 0% | 0% | ||
Average number of wrong predicted residues | |||||||
13.6 | 5.5 | 3.6 | 7 | 3.8 |
TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices.
SPOCTOPUS and PolyPhobius are the best prediction methods.
In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.
- Comparison of signal peptide prediction
Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius.
TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.
methods | |||||||
real position | Phobius | PolyPhobius | SPOCTOPUS | TargetP | SignalP | ||
HEXA_HUMAN | stop position | 22 | 22 | 19 | 21 | no prediction | 22 |
#wrong residues | 0 | 3 | 3 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
BACR_HALSA | stop position | not available | no prediction | no prediction | no prediction | no prediction | no consensus prediction |
#wrong predicted | not available | not available | not available | not available | no prediction | not available | |
location | membrane | not available | not available | not available | secretory pathway | non-signal peptide | |
RET4_HUMAN | stop position | 18 | 18 | 18 | 19 | no prediction | 18 |
#wrong predicted | 0 | 0 | 1 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
INSL5_HUMAN | stop position | 22 | 22 | 22 | 22 | no prediction | 22 |
#wrong residues | 0 | 0 | 0 | no prediction | 0 | ||
location | secretory pathway | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
LAMP1_HUMAN | stop position | 28 | 28 | 28 | 29 | no prediction | 28 |
#wrong residues | 0 | 0 | 1 | no prediction | 0 | ||
location | transmembrane helix | secretory pathway | secretory pathway | no prediction | secretory pathway | no prediction | |
A4_HUMAN | stop position | 17 | 17 | 17 | 18 | no prediction | 17 |
#wrong residues | 0 | 0 | 1 | no prediction | 0 | ||
location | transmembrane helix | secretory pathway | secretory pathway | no prediction | secretory pathway | secretory pathway | |
Average number of wrong prediction | |||||||
sum of wrong predicted residues | 0 | 3 | 2 | no prediction | 0 | ||
#right predicted locations / #predicted locations | 3/5 | 3/5 | no prediction | 3/5 | no prediction |
SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not.
In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices.
The results of Phobius were a litte bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.
- Comparison of the combined methods
The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.
methods | ||||
Phobius | PolyPhobius | SPOCTOPUS | ||
HEXA_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 3 | 2 | |
location | right | right | no prediction | |
BACR_HALSA | #wrong predicted residues (TM) | 29 | 17 | 17 |
#wrong predicted residues (SP) | n.a. | n.a. | n.a. | |
location | n.a | n.a | no prediction | |
RET4_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 0 | 0 | |
location | right | right | no prediction | |
INSL5_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 0 | 0 | 1 | |
location | right | right | no prediction | |
LAMP1_HUMAN | #wrong predicted residues (TM) | 3 | 4 | 3 |
#wrong predicted residues (SP) | 0 | 0 | 0 | |
location | wrong | wrong | no prediction | |
A4_HUMAN | #wrong predicted residues (TM) | 0 | 0 | 0 |
#wrong predicted residues (SP) | 1 | 1 | 3 | |
location | wrong | wrong | no prediction | |
Average | ||||
avg(#wrong predicted residues (TM)) | 5.3 | 3.5 | 3.3 | |
avg(#wrong predicted residues (SP)) | 0.1 | 0.6 | 1 | |
#location (right predicted) / #location(predicted) | 3/5 | 3/5 | no prediction |
In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.
Prediction of GO terms
Before we start with out analysis, we decided to check the GO annotations for the six sequences, which can be found [here]:
A detailed list of the GO annotation terms of each protein can be found [here].
Results
GOPET
We tried to predict the GO annotations with GOPET for our six different proteins.
- HEXA_HUMAN
The method only predicts functional GO terms. HEXA_HUMAN has 8 annotated GO functions. The methods predicts also 8 GO function terms. Therefore we decided to check if all predictions are correct. We checked if the general term is correct and also if the GO number is correct.
GO term | confidence | prediction term | prediction GOid |
hexosamidase activity | 97% | right | wrong |
beta-N-acetylhexosamidase activity | 96% | right | right |
hydrolase activity | 96% | right | right |
hydrolase activity acting on glycosyl bonds | 96% | right | right |
hydrolase activity hydrolyzing O-glycosyl compounds | 96% | right | right |
catalytic activity | 96% | right | right |
hydrolase activity hydrolyzing N-glycosyl compounds | 78% | wrong | wrong |
protein heterodimerization activity | 61% | right | right |
- BACR_HALSA
The method only predicts functional GO terms. BACR_HALSA has 3 annotated GO functions. The methods predicts also 3 GO function terms. Therefore we decided to check if all predictions are correct.
GO term | confidence | prediction term | prediction GOid |
ion channel activity | 77% | right | right |
G-protein coupled photoreceptor activity | 75% | right | wrong |
hydrogen ion transmembrane transporter activity | 60% | wrong | wrong |
- RET4_HUMAN
The method only predicts functional GO terms. RET4_HUMAN has 7 annotated GO functions. The methods predicts 8 GO function terms. Therefore we decided to check if all predictions are correct.
GO term | confidence | prediction term | prediction GOid |
binding | 90% | right | right |
retiniod binding | 81& | right | right |
lipid binding | 80% | wrong | wrong |
retional binding | 78% | right | right |
transporter activity | 78% | right | right |
retinal binding | 78% | right | right |
lipid transport activity | 69% | wrong | wrong |
high-density lipoprotein particle binding | 60% | wrong | wrong |
- INSL5_HUMAN
The method only predicts functional GO terms. INSL5_HUMAN has 1 annotated GO functions. The methods predicts also 1 GO function terms. Therefore we decided to check if all predictions are correct.
GO term | confidence | prediction term | prediction GOid |
hormone activity | 80% | right | right |
- LAMP1_HUMAN
The method only predicts functional GO terms. LAMP1_HUMAN has 0 annotated GO functions. The methods predicts 2 GO function terms. Therefore the predictions are wrong.
- A4_HUMAN
The method only predicts functional GO terms. A4_HUMAN has 11 annotated GO functions. The methods predicts 13 GO function terms. Therefore we decided to check if all predictions are correct.
GO term | confidence | prediction term | prediction GOid |
endopeptidase inhibitor activity | 87% | right | wrong |
serine-type endopeptidase inhibitor activity | 86% | right | right |
plasmin inhibitor activity | 83% | wrong | wrong |
trypsin inhibitor activtiy | 83% | wrong | wrong |
peptidase inhibitor activity | 82% | right | right |
binding | 79% | right | right |
protein binding | 74% | right | right |
metal ion binding | 73% | right | right |
DNA binding | 71% | right | right |
heparin binding | 70% | wrong | right |
zinc ion binding | 69% | wrong | wrong |
copper ion binding | 69% | wrong | wrong |
iron ion binding | 67% | wrong | wrong |
Pfam
We used the webserver for our analysis. We decided to only trust the significant Pfam-A matches. To check if the predictions are correct we mapped the Pfam ids to the Go ids with help of a mapping website [[1]]. If a successful mapping was not possible, we compared the names of the predicted Pfam family with the names of the GO terms. If the names are similar or equal, we decided to trust the mapping.
- HEXA_HUMAN
Graphical representation of the prediction result of Pfam:
Pfam found two significant Pfam-A matches:
Family | E-Value | GO id | prediction |
Glycosyl hydrolase family 20, domain 2 | 3.7e-43 | GO:0004553 | right |
Glycosyl hydrolase family 20, catalytic domain | 1.8e-84 | GO:0005975 | right |
- BACR_HALSA
Graphical representation of the prediction result of Pfam:
Pfam found one significant Pfam-A matches:
Family | E-Value | GOid | prediction |
Bacteriorhodopsin-like protein | 2e-88 | GO:0005216 | right |
GO:0006811 | right | ||
GO:0016020 | right |
- RET4_HUMAN
Graphical representation of the prediction result of Pfam:
Pfam found one significant Pfam-A matches:
Family | E-Value | GOid | prediction |
Lipocalin/cytosolic fatty-acid binding protein family | 1.7e-22 | GO:0005488 | right |
- INSL5_HUMAN
Graphical representation of the prediction result of Pfam:
Pfam found two significant Pfam-A matches:
Family | E-Value | GOid | prediction |
Insulin/IGF/Relaxin family | 6.7e-08 | GO:0005179 | right |
GO:0005576 | right |
- LAMP1_HUMAN
Graphical representation of the prediction result of Pfam:
Pfam found one significant Pfam-A matches:
Family | E-Value | GOid | prediction |
Lysosome-associated membrane glyoprotein (LAMP) | 2.3e-135 | GO:0016020 | right |
- A4_HUMAN
Graphical representation of the prediction result of Pfam:
Pfam found six significant Pfam-A matches:
Family | E-Value | GOid | prediction |
Amyloid A4 N-terminal heparin-binding | 4e-42 | none | right |
Copper-binding of amyloid precursor CuBD | 2.3e-27 | none | right |
Kunitz/Bovine pancreatic trypsin inhibitor domain | 3e-19 | GO:0004867 | right |
E2 domain of amyloid precursor protein | 1.6e-74 | none | right |
Beta-amyloid peptide (beta-APP) | 4.3e-28 | GO:0005488 | right |
GO:0016021 | right | ||
Beta-amyloid precursor protein C-terminus | 1.1e-29 | none | right |
ProtFun 2.2
ProtFun 2.2 does not give clear predictions if the protein belongs to this class or not, instead it gives probabilities and odd scores.
We decided to make a cutoff by 2. So all classes with an odd score of 2 or higher are right results for us. You can also find a "=>" sign in the result file. This sign shows the result with the highest information content. We also take this line as result, although if the odd score is lower than 2. If we only have result with a odd score lower than 2, the line with this sign is our onlyest result.
Because the prediction categories are very general, it was not possible to map the GOids. Therefore, we checked the known GO annotations. If there was a hint for a category and the protein was predicted to be in this category, we decided that the prediction is right, otherwise if the known GO annotations and the categories conflict, we count the prediction as wrong.
- HEXA_HUMAN
The ProtFun Server calculated following prediction result for HEXA_HUMAN:
Functional category | |||
---|---|---|---|
Functional category | Probability | Odd score | Prediction |
Amino acid biosynthesis | 0.161 | 7.331 | wrong |
Biosynthesis of cofactors | 0.332 | 4.609 | right |
Cell envelope | 0.804 => | 13.186 => | right |
Cellular processes | 0.110 | 1.506 | right |
Central intermediary metabolism | 0.432 | 6.856 | right |
Engergy metabolism | 0.113 | 1.259 | right |
Fatty acid metabolsim | 0.019 | 1.427 | right |
Purines and Pyrimidines | 0.519 | 2.136 | wrong |
Regulatory functions | 0.018 | 0.111 | right |
Replication and transcription | 0.073 | 0.271 | right |
Translation | 0.040 | 0.904 | right |
Transport and binding | 0.685 | 1.670 | right |
Enyzme/non-enzyme | |||
Enzyme/non-enzyme | Probability | Odd score | Prediction |
Enzyme | 0.792 => | 2.764 => | right |
Nonenzyme | 0.208 | 0.292 | right |
Enyzme class | |||
Enzyme class | Probability | Odd score | Prediction |
Oxidoreductase (EC 1.-.-.-) | 0.143 | 0.685 | right |
Transferase (EC 2.-.-.-) | 0.201 | 0.582 | right |
Hydrolase (EC 3.-.-.-) | 0.329 | 1.039 | wrong |
Lyase (EC 4.-.-.-) | 0.054 | 1.143 | right |
Isomerase (EC 5.-.-.-) | 0.027 | 0.856 | right |
Ligase (EC 6.-.-.-) | 0.085 => | 1.661 => | right |
Gene ontology category | |||
Gene ontology category | Probability | Odd score | Prediction |
Signal transducer | 0.083 | 0.389 | right |
Receptor | 0.105 | 0.617 | right |
Hormone | 0.001 | 0.206 | right |
Structural protein | 0.010 | 0.357 | right |
Transporter | 0.024 | 0.222 | right |
Ion channel | 0.018 | 0.310 | right |
Volatge-gated ion channel | 0.002 | 0.082 | right |
Cation channel | 0.010 | 0.218 | right |
Transcription | 0.058 | 0.453 | right |
Transcription regulation | 0.026 | 0.205 | right |
Stress response | 0.004 | 0.500 | right |
Immune response | 0.014 | 0.167 | right |
Growth factor | 0.005 | 0.372 | right |
Metal ion transport | 0.009 | 0.020 | right |
- BACR_HALSA
The ProtFun Server calculated following prediction result for BACR_HALSA:
Functional category | |||
---|---|---|---|
Functional category | Probability | Odd score | Prediction |
Amino acid biosynthesis | 0.033 | 1.495 | right |
Biosynthesis of cofactors | 0.186 | 2.589 | wrong |
Cell envelope | 0.029 | 0.483 | right |
Cellular processes | 0.051 | 0.698 | right |
Central intermediary metabolism | 0.045 | 0.711 | right |
Engergy metabolism | 0.138 | 1.537 | right |
Fatty acid metabolsim | 0.016 | 1.265 | right |
Purines and Pyrimidines | 0.302 | 1.244 | right |
Regulatory functions | 0.013 | 0.080 | wrong |
Replication and transcription | 0.019 | 0.073 | right |
Translation | 0.059 | 1.339 | right |
Transport and binding | 0.791 => | 1.929 => | right |
Enyzme/non-enzyme | |||
Enzyme/non-enzyme | Probability | Odd score | Prediction |
Enzyme | 0.199 | 0.696 | right |
Nonenzyme | 0.801 => | 1.122 => | right |
Enyzme class | |||
Enzyme class | Probability | Odd score | Prediction |
Oxidoreductase (EC 1.-.-.-) | 0.114 | 0.549 | right |
Transferase (EC 2.-.-.-) | 0.031 | 0.091 | right |
Hydrolase (EC 3.-.-.-) | 0.057 | 0.180 | right |
Lyase (EC 4.-.-.-) | 0.020 | 0.430 | right |
Isomerase (EC 5.-.-.-) | 0.010 | 0.321 | right |
Ligase (EC 6.-.-.-) | 0.017 | 0.625 | right |
Gene ontology category | |||
Gene ontology category | Probability | Odd score | Prediction |
Signal transducer | 0.258 | 1.205 | wrong |
Receptor | 0.355 | 2.087 | right |
Hormone | 0.001 | 0.206 | right |
Structural protein | 0.006 | 0.200 | right |
Transporter | 0.440 => | 4.036 => | right |
Ion channel | 0.010 | 0.169 | wrong |
Volatge-gated ion channel | 0.004 | 0.172 | right |
Cation channel | 0.078 | 1.689 | right |
Transcription | 0.026 | 0.205 | right |
Transcription regulation | 0.028 | 0.226 | right |
Stress response | 0.012 | 0.139 | right |
Immune response | 0.011 | 0.128 | right |
Growth factor | 0.010 | 0.727 | right |
Metal ion transport | 0.049 | 0.106 | right |
- RET4_HUMAN
The ProtFun Server calculated following prediction result for RET4_HUMAN:
Functional category | |||
---|---|---|---|
Functional category | Probability | Odd score | Prediction |
Amino acid biosynthesis | 0.017 | 0.751 | right |
Biosynthesis of cofactors | 0.044 | 0.610 | right |
Cell envelope | 0.804 => | 13.186 => | right |
Cellular processes | 0.075 | 1.021 | wrong |
Central intermediary metabolism | 0.197 | 3.128 | right |
Engergy metabolism | 0.043 | 0.475 | right |
Fatty acid metabolsim | 0.016 | 1.265 | right |
Purines and Pyrimidines | 0.275 | 1.131 | right |
Regulatory functions | 0.013 | 0.080 | right |
Replication and transcription | 0.022 | 0.084 | right |
Translation | 0.032 | 0.721 | right |
Transport and binding | 0.800 | 1.951 | wrong |
Enyzme/non-enzyme | |||
Enzyme/non-enzyme | Probabilty | Odd score | Prediction |
Enzyme | 0.544 => | 1.900 => | right |
Nonenzyme | 0.456 | 0.639 | right |
Enyzme class | |||
Enzyme class | Probabilty | Odd score | Prediction |
Oxidoreductase (EC 1.-.-.-) | 0.095 | 0.458 | right |
Transferase (EC 2.-.-.-) | 0.038 | 0.109 | right |
Hydrolase (EC 3.-.-.-) | 0.235 | 0.742 | right |
Lyase (EC 4.-.-.-) | 0.059 => | 1.264 => | wrong |
Isomerase (EC 5.-.-.-) | 0.010 | 0.321 | right |
Ligase (EC 6.-.-.-) | 0.017 | 0.326 | right |
Gene ontology category | |||
Gene ontology category | Probability | Odd score | Prediction |
Signal transducer | 0.202 | 0.942 | right |
Receptor | 0.147 | 0.862 | right |
Hormone | 0.004 | 0.667 | right |
Structural protein | 0.002 | 0.058 | right |
Transporter | 0.025 | 0.232 | right |
Ion channel | 0.016 | 0.288 | right |
Volatge-gated ion channel | 0.003 | 0.148 | right |
Cation channel | 0.010 | 0.215 | right |
Transcription | 0.027 | 0.207 | right |
Transcription regulation | 0.025 | 0.196 | right |
Stress response | 0.161 | 1.829 | right |
Immune response | 0.239 => | 2.813 => | wrong |
Growth factor | 0.023 | 1.617 | right |
Metal ion transport | 0.009 | 0.020 | right |
- INSL5_HUMAN
The ProtFun Server calculated following prediction result for INSL5_HUMAN:
Functional category | |||
---|---|---|---|
Functional category | Probability | Odd score | Prediction |
Amino acid biosynthesis | 0.011 | 0.484 | right |
Biosynthesis of cofactors | 0.040 | 0.558 | right |
Cell envelope | 0.756 => | 12.393 => | right |
Cellular processes | 0.033 | 0.448 | right |
Central intermediary metabolism | 0.048 | 0.755 | right |
Engergy metabolism | 0.036 | 0.397 | right |
Fatty acid metabolsim | 0.016 | 1.265 | right |
Purines and Pyrimidines | 0.144 | 0.592 | right |
Regulatory functions | 0.014 | 0.087 | right |
Replication and Transcription | 0.020 | 0.075 | right |
Translation | 0.032 | 0.735 | right |
Transport and binding | 0.834 | 2.033 | right |
Enyzme/non-enzyme | |||
Enzyme/non-enzyme | Probability | Odd score | Prediction |
Enzyme | 0.209 | 0.729 | right |
Nonenzyme | 0.791 => | 1.109 => | right |
Enyzme class | |||
Enzyme class | Probabilty | Odd score | Prediction |
Oxidoreductase (EC 1.-.-.-) | 0.056 | 0.268 | right |
Transferase (EC 2.-.-.-) | 0.031 | 0.091 | right |
Hydrolase (EC 3.-.-.-) | 0.062 | 0.195 | right |
Lyase (EC 4.-.-.-) | 0.020 | 0.430 | right |
Isomerase (EC 5.-.-.-) | 0.010 | 0.321 | right |
Ligase (EC 6.-.-.-) | 0.017 | 0.327 | right |
Gene ontology category | |||
Gene ontology category | Probability | Odd score | Prediction |
Signal transducer | 0.374 | 1.746 | right |
Receptor | 0.128 | 0.750 | right |
Hormone | 0.247 => | 37.936 => | right |
Structural protein | 0.001 | 0.041 | right |
Transporter | 0.025 | 0.228 | right |
Ion channel | 0.010 | 0.168 | right |
Volatge-gated ion channel | 0.003 | 0.131 | right |
Cation channel | 0.010 | 0.215 | right |
Transcription | 0.054 | 0.425 | right |
Transcription regulation | 0.091 | 0.724 | right |
Stress response | 0.099 | 1.128 | right |
Immune response | 0.178 | 2.090 | wrong |
Growth factor | 0.061 | 4.379 | wrong |
Metal ion transport | 0.009 | 0.020 | right |
- LAMP1_HUMAN
The ProtFun Server calculated following prediction result for LAMP1_HUMAN:
Functional category | |||
---|---|---|---|
Functional category | Probability | Odd score | Prediction |
Amino acid biosynthesis | 0.011 | 0.484 | right |
Biosynthesis of cofactors | 0.053 | 0.735 | right |
Cell envelope | 0.804 => | 13.186 => | right |
Cellular processes | 0.027 | 0.373 | right |
Central intermediary metabolism | 0.138 | 2.188 | right |
Engergy metabolism | 0.037 | 0.411 | right |
Fatty acid metabolsim | 0.016 | 1.265 | right |
Purines and Pyrimidines | 0.533 | 2.195 | wrong |
Regulatory functions | 0.015 | 0.090 | right |
Replication and transcription | 0.019 | 0.073 | right |
Translation | 0.027 | 0.613 | right |
Transport and binding | 0.834 | 2.033 | right |
Enyzme/non-enzyme | |||
Enzyme/non-enzyme | Probability | Odd score | Prediction |
Enzyme | 0.276 | 0.965 | right |
Nonenzyme | 0.724 => | 1.014 => | right |
Enyzme class | |||
Enzyme class | Probability | Odd score | Prediction |
Oxidoreductase (EC 1.-.-.-) | 0.039 | 0.187 | right |
Transferase (EC 2.-.-.-) | 0.046 | 0.134 | right |
Hydrolase (EC 3.-.-.-) | 0.058 | 0.184 | right |
Lyase (EC 4.-.-.-) | 0.020 | 0.430 | right |
Isomerase (EC 5.-.-.-) | 0.010 | 0.321 | right |
Ligase (EC 6.-.-.-) | 0.017 | 0.326 | right |
Gene ontology category | |||
Gene ontology category | Probability | Odd score | Prediction |
Signal transducer | 0.396 | 1.849 | right |
Receptor | 0.282 | 1.659 | right |
Hormone | 0.001 | 0.206 | right |
Structural protein | 0.011 | 0.408 | right |
Transporter | 0.024 | 0.222 | right |
Ion channel | 0.008 | 0.147 | right |
Volatge-gated ion channel | 0.002 | 0.111 | right |
Cation channel | 0.010 | 0.215 | right |
Transcription | 0.032 | 0.247 | right |
Transcription regulation | 0.018 | 0.142 | right |
Stress response | 0.246 | 2.795 | right |
Immune response | 0.371 => | 4.368 => | right |
Growth factor | 0.013 | 0.956 | right |
Metal ion transport | 0.009 | 0.020 | right |
- A4_HUMAN
The ProtFun Server calculated following prediction result for A4_HUMAN:
Functional category | |||
---|---|---|---|
Functional category | Probabilty | Odd score | Prediction |
Amino acid biosynthesis | 0.020 | 0.921 | right |
Biosynthesis of cofactors | 0.261 | 3.623 | right |
Cell envelope | 0.804 => | 13.186 => | right |
Cellular processes | 0.053 | 0.070 | right |
Central intermediary metabolism | 0.184 | 2.920 | right |
Engergy metabolism | 0.023 | 0.259 | right |
Fatty acid metabolsim | 0.016 | 1.265 | right |
Purines and Pyrimidines | 0.417 | 1.716 | right |
Regulatory functions | 0.013 | 0.084 | wrong |
Replication and transcription | 0.029 | 0.109 | right |
Translation | 0.027 | 0.613 | right |
Transport and binding | 0.827 | 2.016 | right |
Enyzme/non-enzyme | |||
Enzyme/non-enzyme | Probability | Odd score | Prediction |
Enzyme | 0.392 => | 1.368 => | right |
Nonenzyme | 0.608 | 0.852 | right |
Enyzme class | |||
Enzyme class | Probability | Odd score | Prediction |
Oxidoreductase (EC 1.-.-.-) | 0.024 | 0.114 | right |
Transferase (EC 2.-.-.-) | 0.208 | 0.603 | right |
Hydrolase (EC 3.-.-.-) | 0.190 | 0.600 | right |
Lyase (EC 4.-.-.-) | 0.020 | 0.430 | right |
Isomerase (EC 5.-.-.-) | 0.010 | 0.324 | right |
Ligase (EC 6.-.-.-) | 0.048 | 0.946 | right |
Gene ontology category | |||
Gene ontology category | Probability | Odd score | Prediction |
Signal transducer | 0.126 | 0.586 | right |
Receptor | 0.036 | 0.211 | right |
Hormone | 0.001 | 0.206 | right |
Structural protein | 0.034 => | 1.205 => | right |
Transporter | 0.024 | 0.222 | right |
Ion channel | 0.009 | 0.162 | right |
Volatge-gated ion channel | 0.002 | 0.108 | right |
Cation channel | 0.010 | 0.215 | right |
Transcription | 0.043 | 0.335 | right |
Transcription regulation | 0.018 | 0.143 | right |
Stress response | 0.076 | 0.862 | right |
Immune response | 0.016 | 0.183 | right |
Growth factor | 0.005 | 0.372 | right |
Metal ion transport | 0.009 | 0.020 | right |
Comparison of the different methods
It is difficult to compare these methods. First of all, two methods are based on homology-based prediction, whereas ProtFun is based on ab initio prediction. So it is clear, that the results differ. Second, each method has another prediction focus and called the results a little bit different. Only GOPET predicts exact GO numbers, the other two methods only predict the approximate functions and processes.
Therefore, to compare the results, we decided to calculate the fraction of right prediction and the ratio between right predictions and annotated GO terms.
methods | |||||
GOPET terms | GOPET GOids | Pfam | ProtFun | ||
HEXA_HUMAN | #true positive | 7 | 7 | 2 | 31 |
#false negative | 1 | 1 | 0 | 3 | |
#predictions | 8 | 8 | 2 | 34 | |
#GO terms | 25 | ||||
true positive (in %) | 0.87 | 0.87 | 1 | 0.91 | |
ratio true positive/annotated GO terms | 0.28 | 0.28 | 0.08 | not possible | |
BACR_HALSA | #true positive | 2 | 1 | 1 | 30 |
#false negative | 1 | 2 | 0 | 4 | |
#predictions | 3 | 3 | 1 | 34 | |
#GO terms | 12 | ||||
true positive (in %) | 0.66 | 0.33 | 1 | 0.88 | |
ratio true positive/annotated GO terms | 0.16 | 0.08 | 0.08 | not possible | |
RET4_HUMAN | #true positive | 5 | 5 | 1 | 30 |
#false negative | 3 | 3 | 0 | 4 | |
#predictions | 8 | 8 | 1 | 34 | |
#GO terms | 41 | ||||
true positive (in %) | 0.62 | 0.62 | 1 | 0.88 | |
ratio true positive/annotated GO terms | 0.12 | 0.12 | 0.02 | not possible | |
INSL5_HUMAN | #true positive | 1 | 1 | 1 | 32 |
#false negative | 0 | 0 | 0 | 2 | |
#predictions | 1 | 1 | 1 | 34 | |
#GO terms | 4 | ||||
true positive (in %) | 1 | 1 | 1 | 0.94 | |
ratio true positive/annotated GO terms | 0.25 | 0.25 | 0.25 | not possible | |
LAMP1_HUMAN | #true positive | 0 | 0 | 1 | 33 |
#false negative | 2 | 2 | 0 | 1 | |
#predictions | 2 | 2 | 1 | 34 | |
#GO terms | 17 | ||||
true positive (in %) | 0 | 0 | 1 | 0.97 | |
ratio true positive/annotated GO terms | 0 | 0 | 0.05 | not possible | |
A4_HUMAN | #true positive | 7 | 7 | 6 | 33 |
#false negative | 6 | 6 | 0 | 1 | |
#predictions | 13 | 13 | 6 | 34 | |
#GO terms | 78 | ||||
true positive (in %) | 0.53 | 0.53 | 1 | 0.97 | |
ratio true positive/annotated GO terms | 0.08 | 0.08 | 0.07 | not possible |
As you can see in the tabel above, each method only predict a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the onyl method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive and it also predicts more GO terms than the other methods.
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms.
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.