Difference between revisions of "Sequence-based predictions HEXA"

Revision as of 12:18, 11 August 2011

General Information

Secondary Structure Prediction

To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods gave similar results or if they differ extremely.

[Here] you can find some general information about these methods.

Prediction of disordered regions

After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.

To get more insight in the methods and the theory behind them we also offer you an [general information page].

Prediction of transmembrane helices and signal peptides

The third big analysis section is the prediction of transmembrane helices and signal peptides. We merged the prediction of transmembrane helices and signal peptides in one section, because there are several prediction methods which can predict both and therefore we looked at both predictions in this section.

Therefore we used several methods, some which only predict transmembrane helices, some which only predict signal peptides and some combined methods.

To have a closer look at the different methods we again provide an [information page.]

Prediction of GO Terms

The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other.

Again we also provide an [general information page] about the GO Term methods, we used in our analysis.

Secondary Structure prediction

PSIPRED

PSIPRED delivers many different kind of output file formats. The pictures show the pdf-output which shows the secondary structure in a graphical kind. It predicts 14 alpha-helices and 15 beta-sheets. The rest are predicted coils.

First part of the PSIPRED output

Second part of the PSIPRED output

Legend for the PSIPRED output

Jpred3

The following alignemt shows the output of Jpred3. The secondary structure elements is marked by ourself. It predicted 14 alpha-helices and 15 beta-sheets.

Jpred output with colored secondary structure elements

DSSP

We started DSSP on the webserver with the PDBB-id. Therefore, we get the secondary structure assignment for the whole protein and not only for the alpha-subunit. The following sequence with the according secondary structure is the output for our sequence (we extracted it from the whole). It assigned 16 alpha-helices and no beta-sheet.

                    10        20        30        40        50
                     |         |         |         |         |
   1 -   52 LWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLDEAFQRYRDLLFG
   1 -   52   T  TT    T      TTT      TT TT TT HHHHHHHHHHHHHHH
   1 -   52    * ** *                     *
   1 -   52    A  AAAA AAA   A AA A A  AA   AAA        AA  A   A

                  60        70        80        90       100       110
                   |         |         |         |         |         |
  53 -  112 TLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDDQCLLLSETVWGALRGLETFSQLVWK
  53 -  112       SSSSTT   TTT   TT    SSSSSTTT SSSSSTTHHHHHHHHHHHHHHSSS
  53 -  112           * ***
  53 -  112 AA AA A   AAAA  AAA A A A   A    AAA A A A                AA
                 120       130       140       150       160       170
                   |         |         |         |         |         |
 113 -  172 SAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLSSILDTLDVMAYNKLNVFHWHLVDD
 113 -  172  TT  SSS  SSSSS  T TSSSSSSSTTTT   HHHHHHHHHHHHHTT  SSSSS   T
 113 -  172
 113 -  172  AA      A A A A A A             AAA  AA                   A
                 180       190       200       210       220       230
                   |         |         |         |         |         |
 173 -  232 PSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEYARLRGIRVLAEFDTPGHTLSWGP
 173 -  232 T      TT THHHHHHTT TTTT   HHHHHHHHHHHHHTT SSSSS   TTT TTTTT
 173 -  232
 173 -  232 A      AAA A  AA  A A A    A   AA  AA                  A   A
                 240       250       260       270       280       290
                   |         |         |         |         |         |
 233 -  292 GIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFLEVSSVFPDFYLHLGGDEVDFTC
 233 -  292 TTTT SSSSSTTTTSSSSSSSS TT HHHHHHHHHHHHHHHHH  TTSSS    T  THH
 233 -  292
 233 -  292   AA   A AAAAAAAAAA      AAA  A   A  A   AA A A       A AAA
                 300       310       320       330       340       350
                   |         |         |         |         |         |
 293 -  352 WKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYGKGYVVWQEVFDNKVKIQPDTI
 293 -  352 HHH HHHHHHHHHHT TT THHHHHHHHHHHHHHHHTTT SSSSSHHHHHTT    TT S
 293 -  352         *  ****  *
 293 -  352  AA AA AAA AAAAAAAA AA   A  AA  A  AAAA         A  A A AAA
                 360       370       380       390       400       410
                   |         |         |         |         |         |
 353 -  412 IQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYGPDWKDFYVVEPLAFEGTPEQ
 353 -  412 SSS  TTTTT HHHHHHHHHHTT SSSS TT  TTT  TT THHHHHH  TT TT  HHH
 353 -  412       *
 353 -  412    AAAAAAAAAAA   A  AAA A          A AA A  AA  A   AA A AAA
                 420       430       440       450       460       470
                   |         |         |         |         |         |
 413 -  472 KALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKLTSDLTFAYERLSHFRCELL
 413 -  472 HTTSSSSSSSS TTT TTTTHHHHHTTHHHHHHHHHHT TT   HHHHHHHHHHHHHHHH
 413 -  472                                       * **
 413 -  472 AAA              A   A                AAA AAAAA AA   A  AA
                 480       490
                   |         |
 473 -  492 RRGVQAQPLNVGFCEQEFEQ
 473 -  492 HTT     TTT   TT
 473 -  492
 473 -  492 A A A    AA A AA  AA

Discussion

To determine how succesful our secondary structure prediction with PSIPRED and Jpred are, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore the main comparison in this case refers to the alpha-helices.

For PSIPRED the prediction of the alpha-helices was good. In the most cases the alpha-helices of DSSP und PSIPRED corrspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones.

For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not also assigned by DSSP. In contrary there are three small helics which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assignes two helices which are separated by a turn and Jpred predicts there only one big helix.

All in all, the prediction of the helices is probably good because they correspond mostly with the assignmet of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all.

Prediction of disordered regions

Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disoredered regions. Therefore, we search our protein in the DisProt database and didn't found it, so our protein doesn't have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check in UniProt, if there is an entry for DisProt.

Disopred

Disopred predicts two disordered regions in our protein. The first region is at the beginning of the protein (first two residues) and the second region is at the end (last three regions). This prediction is wrong, because it is normal, that the electrons from the first and the last amino acids lack in the electron density map. So, our protein Hexosamidase A has no disordered regions.

Result of the Disopred prediction. * shows that this amino acid belongs to a disordered regions, whereas . signs for a non-disordered region.

POODLE

We decided to test several POODLE variants and to compare the results.

POODLE-I

POODLE-I predicted five disordered regions:

start position	end position	length
1	2	2
14	19	6
83	89	7
105	109	5
527	529	3

POODLE-L

POODLE-L found no disordered regions. Therefore, there is no disordered region with a length more than 40aa in our protein.

POODLE-S (High B-factor residues)

This POODLE-S variant searches for high B-factor values in the crystallography, which implies uncertainty in the assignment of the atom positions.

POODLE-S predicted five disordered regions:

start position	end position	length
0	2	2
13	19	7
83	88	6
105	109	5
526	529	4

POODLE-S (missing residues)

POODLE-S (missing residues) predicts a disordered region, if there is an amino acid in the sequence record, but not on the electron density map.

Poodle-S found 6 disordered regions.

start position	end position	length
17	18	2
53	61	9
78	109	33
153	153	1
280	280	1
345	345	1

Graphical Output:

Prediction of POODLE-S (High B-factor residues)

Prediction of POODLE-S (missing residues)

Prediction of POODLE-I

Prediction of POODLE-L

Comparison of the different POODLE variants:

POODLE-L doesn't find any disordered regions. This is the result we expected, because our protein doesn't posses any disordered regions.

Both POODLE-S variants found several short disordered regions, which is a false positive result. Interesstingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.

POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).

Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.

In general, these predictions are used, if nothing is known about the protein. Therefore, normally we don't know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important. We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.

functional residues			disordered
residue position	amino acid	function	POODLE-S (missing)	POODLE-I
323	E	active site	ordered	ordered
115	N	Glycolysation	ordered	ordered
157	N	Glycolysation	ordered	ordered
259	N	Glycolysation	ordered	ordered
58 (connected with 104)	C	Disulfide bond	disordered	ordered
104 (connected with 58)	C	Disulfide bond	disordered	ordered
277 (connected with 328)	C	Disulfide bond	ordered	ordered
328 (connected with 277)	C	Disulfide bond	ordered	ordered
505 (connected with 522)	C	Disulfide bond	ordered	ordered
522 (connected with 505)	C	Disulfide bond	ordered	ordered

As you can see in the table above, only one disulfide bond is located in a disordered region, all other functionally important residues are located in ordered regions. This is a further good hint, that the predictions are wrong.

IUPred

We tested the three different IUPred variants, which are offered by the webserver.

IUPred (short)

Result of the IUPred prediction, which is focus on short disordered regions.

As you can see in the picture, IUPred which is focus on short disordered regions found only at the beginning and at the end of the protein a disordered region. This may be wrong, because at the beginning and at the end there are often regions without defined secondary structure, but also without function.

IUPred (long)

Next we take a look to the prediction of the long disordered regions:

Result of the IUPred prediction, which is focus on long disordered regions.

The picture above shows the result of this prediction. There is no disordered region predicted, not even at the beginning or at the end of the protein. This prediction is quite good, because the HEXA_HUMAN protein does not posses any disordered regions.

IUPred (with structural information)

As last, we analysed the prediction of IUPred with the additional usage of structural information.

Result of the IUPred prediction with additional structural information

As before, the method did not find any disordered regions. Therefore, the method predict three times the right result. Only by the method with focus on short disordered regions was a prediction of two disordered regions, but these regions were located at the beginning and at the end of the protein, which is obviously wrong.

Meta-Disorder

Meta-Disorder did not predict any disordered region in our protein. The different methods of which Meta-Disorder consists predicted some disordered regions, but Meta-Disorder build the consensus over all of these methods, and therefore it did not predict any disordered regions.

Graphical representation of the result:

The result is very good, because HEXA_HUMAN does not have any disordered regions. Therefore, the prediction of Meta-Disorder is right.

Comparison of the different methods

We decided to compare the results of the different methods. Therefore, we count how many residues are predicted as disordered, which is wrong in our case.

	methods
	Disopred	POODLE-I	POODLE-L	POODLE-S (missing)	POODLE-S (B-factor)	IUPred (short)	IUPred (long)	IUPred (structure)	Meta-Disorder
#wrong predicted residues	5	23	0	47	24	3	0	0	0

POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct. The baddest prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).

Prediction of transmembrane alpha-helices and signal peptides

Because most of the proteins we used in this practical are not membrane proteins, we got five additional proteins for the transmembrane and signal peptide analyses.

Additional proteins:

name	organism	location	transmembrane protein	sequence
BACR_HALSA	Halobacterium salinarium (Archaea)	Cell membrane	Multi-pass membrane protein	[P02945.fasta]
RET4_HUMAN	Human (Homo sapiens)	extracellular space	No	[P02753.fasta]
INSL5_HUMAN	Human (Homo sapiens)	extracellular region	No	[Q9Y5Q6.fasta]
LAMP1_HUMAN	Human (Homo sapiens)	Cell membrane	Single-pass membrane protein	[P11279.fasta]
A4_HUMAN	Human (Homo sapiens)	Cell membrane	Single-pass membrane protein	[P05067.fasta]

TMHMM

We analysed the six sequences with TMHMM.

HEXA_HUMAN

Prediction of TMHMM for the transmembrane helices of HEXA_HUMAN

start position	end position	location
1	529	outside

TMHMM predicts no transmembrane helix at all. The whole protein is located at the extracellular space. To evaluate this result, we compared the data from UniProt with our prediction.

Comparison between real occuring transmembrane helices and the TMHMM result.

As you can see above, the TMHMM prediction result is completly right, expect of the signal peptide, which can't be predicted by TMHMM.

BACR_HALSA

Prediction of TMHMM for the transmembrane helices of BACR_HALSA

start position	end position	location
1	22	outside
23	42	TM Helix
43	54	inside
55	77	TM Helix
78	91	outside
92	114	TM Helix
115	120	inside
121	143	TM Helix
144	147	outside
148	170	TM Helix
171	189	inside
190	212	TM Helix
213	262	outside

TMHMM predicts six transmembrane helices for BACR_HALSA. We decided to compare the TMHMM prediction with the real occuring transmembrane helices in BACR_HALSA:

Comparison between real occuring transmembrane helices and the TMHMM result.

Especially at the beginning is the prediction very good. There is almost 100% overlap between predicted and real helices. Only in the end of the protein lacks one transmembrane helix in the TMHMM prediction. Therefore, in real there are 7 transmembrane helices, whereas TMHMM only predicts 6. This is really bad, because it is a different for the function if there are 6 or 7 helices, but in general the prediction of TMHMM was quite good.

RET4_HUMAN

Prediction of TMHMM for the transmembrane helices of RET4_HUMAN

start position	end position	location
1	201	outside

TMHMM predicts no transmembrane helices. The whole protein is loacted at the extracellular space.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The TMHMM prediction is completely right. Therefore, you can see TMHMM can also predict, that a protein is not a transmembrane protein.

INSL5_HUMAN

Prediction of TMHMM for the transmembrane helices of INSL5_HUMAN

start position	end position	location
1	135	outside

TMHMM predicts no transmembrane helices. The whole protein is loacted at the extracellular space.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The TMHMM prediction is again completely right.

LAMP1_HUMAN

Prediction of TMHMM for the transmembrane helices of LAMP1_HUMAN

start position	end position	location
1	10	inside
11	33	TM Helix
34	383	outside
384	406	TM Helix
407	417	inside

TMHMM predicts two transmembrane helices, which are divided by a very long loop which is loacted at the extracellular space.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The prediction of TMHMM is quite good. Only at the beginning of the protein TMHMM predicts one wrong transmembrane helix (which is a signal peptide in real), but the rest of the prediction is correct.

A4_HUMAN

Prediction of TMHMM for the transmembrane helices of A4_HUMAN

start position	end position	location
1	700	outside
701	723	TM Helix
724	770	inside

TMHMM predicts one transmembrane helix at the end of the protein. As we already know is A4_HUMAN a single-spanning transmembrane protein and therefore the numbers of transmembrane helices is right predicted.

Comparison with the real structure of the protein:

Comparison between real occuring transmembrane helices and the TMHMM result.

The result of the TMHMM prediction is pretty well. Except of the first residues at the beginning and the exact start position of the transmembrane helix, the prediction is correct.

Phobius and PolyPhobius

HEXA_HUMAN

Prediction of Phobius for the transmembrane helices and signal peptides of HEXA_HUMAN

Prediction of PolyPhobius for the transmembrane helices and signal peptides of HEXA_HUMAN

Signal peptide prediction
Phobius			PolyPhobius
start position	end position	prediction	start position	end position	prediction
1	5	N-Region	1	5	N-Region
6	17	H-Region	6	15	H-Region
18	22	C-Region	16	19	C-Region
Summary signal peptide
1	22	Signal Peptide	1	19	Signal Peptide
Transmembrane helices prediction
23	529	outside	20	520	outside

Both methods don't predict a transmembrane helix, which is correct, because HEXA_HUMAN is located at the lysosmal space. We compared the results of Phobius and PolyPhobius with the real protein.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein

Comparison between the prediction of PolyPhobius and the real protein

The prediction of Phobius is a little bit better than the PolyPhobius prediction, because Phobius predicts the beginning and the end of the signal peptide totally correct, whereas PolyPhobius cuts two residues of the signal peptide.

BACR_HALSA

Prediction of Phobius for the transmembrane helices and signal peptides of BACR_HALSA

Prediction of PolyPhobius for the transmembrane helices and signal peptides of BACR_HALSA

Signal peptide prediction
Phobius			PolyPhobius
start position	end position	prediction	start position	end position	prediction
No prediction available
Transmembrane helices prediction
23	42	TM helix	22	43	TM helix
43	53	inside	44	54	inside
54	76	TM helix	55	77	TM helix
77	95	outside	78	94	outside
96	114	TM helix	95	114	TM helix
115	120	inside	115	120	inside
121	142	TM helix	121	141	TM helix
143	147	outside	142	147	outside
148	169	TM helix	148	166	TM helix
170	189	inside	167	186	inside
190	212	TM helix	187	205	TM helix
213	217	outside	206	215	outside
218	237	TM helix	216	237	TM helix
238	262	inside	238	262	inside

Both methods don't predict a signal peptide, but both recognize, that this protein is a transmembrane protein with seven helices. The predictions only differ at the beginning and the end of the helix positions, but the differences between these two predictions is only about 1 to 3 residues.

To evaluate the predictions, we compared the predictions with the real occuring transmembrane helices.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein

Comparison between the prediction of PolyPhobius and the real protein

RET4_HUMAN

Prediction of Phobius for the transmembrane helices and signal peptides of RET4_HUMAN

Prediction of PolyPhobius for the transmembrane helices and signal peptides of RET4_HUMAN

Signal peptide prediction
Phobius			PolyPhobius
start position	end position	prediction	start position	end position	prediction
1	2	N-Region	1	3	N-Region
3	13	H-Region	4	13	H-Region
14	18	C-Region	14	18	C-Region
Summary signal peptide
1	18	secretory signal peptide	1	18	secretoy signal peptide
Transmembrane helices prediction
19	201	outside	19	201	outside

Both methods predict a signal peptide for the secretory pathway. This result is correct.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein

Comparison between the prediction of PolyPhobius and the real protein

Both methods show exactly the same result.

INSL5_HUMAN

Prediction of Phobius for the transmembrane helices and signal peptides of INSL5_HUMAN

Prediction of PolyPhobius for the transmembrane helices and signal peptides of INSL5_HUMAN

Signal peptide prediction
Phobius			PolyPhobius
start position	end position	prediction	start position	end position	prediction
1	5	N-Region	1	4	N-Region
6	17	H-Region	5	16	H-Region
18	22	C-Region	17	22	C-Region
Summary signal peptide
1	22	Secretory signal peptide	1	22	Secretoy signal peptide
Transmembrane helices prediction
23	135	outside	23	135	outside

Both methods predict a signale peptide for the secretory pathway and both prediction results are totally equal.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein

Comparison between the prediction of PolyPhobius and the real protein

The complete prediction is correct.

LAMP1_HUMAN

Prediction of Phobius for the transmembrane helices and signal peptides of LAMP1_HUMAN

Prediction of PolyPhobius for the transmembrane helices and signal peptides of LAMP1_HUMAN

Signal peptide prediction
Phobius			PolyPhobius
start position	end position	prediction	start position	end position	prediction
1	10	N-Region	1	9	N-Region
11	22	H-Region	10	22	H-Region
23	28	C-Region	23	28	C-Region
Summary signal peptide
1	28	secretory signal peptide	1	28	secretory signal peptide
Transmembrane helices prediction
29	381	outside	29	381	outside
382	405	TM helix	382	405	TM helix
406	417	outside	406	417	outside

The results of both methods are quite equal.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein

Comparison between the prediction of PolyPhobius and the real protein

Both results of the prediction methods are equal and furthermore, the are equal to the real protein.

A4_HUMAN

Prediction of Phobius for the transmembrane helices and signal peptides of A4_HUMAN

Prediction of PolyPhobius for the transmembrane helices and signal peptides of A4_HUMAN

Signal peptide prediction
Phobius			PolyPhobius
start position	end position	prediction	start position	end position	prediction
1	1	N-Region	1	3	N-Region
2	12	H-Region	4	12	H-Region
13	17	C-Region	13	17	C-Region
Summary signal peptide
1	17	secretory signal peptide	1	17	secretory signal peptide
Transmembrane helices prediction
18	700	outside	18	700	outside
701	723	TM helix	701	723	TM helix
724	770	inside	724	770	inside

The results of both methods are quite equal.

Comparison with the real structure of the protein:

Comparison between the prediction of Phobius and the real protein

Comparison between the prediction of PolyPhobius and the real protein

Both results of the prediction methods are equal and furthermore, the are equal to the real protein.

OCTOPUS and SPOCTOPUS

HEXA_HUMAN

Prediction of OCTOPUS for the transmembrane helices of HEXA_HUMAN

Prediction of SPOCTOPUS for the transmembrane helices of HEXA_HUMAN

OCTOPUS			SPOCTOPUS
start position	end position	prediction	start position	end position	prediction
1	2	inside	1	6	N-terminal of a signal peptide
3	23	TM helix	7	21	signal peptide
24	529	outside	22	529	outside

The results of these two predictions differ. OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts at the same location a signal peptide.
To check which method predicted right, we compared the protein and the prediction.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein

Comparison between the prediction of SPOCTOPUS and the real protein

SPOCTOPUS gave us the better result, because SPOCTOPUS recognices the signal peptide, whereas OCTOPUS predicts a transmembrane helix instead.

BACR_HALSA

Prediction of OCTOPUS for the transmembrane helices of BACR_HALSA

Prediction of SPOCTOPUS for the transmembrane helices of BACR_HALSA

OCTOPUS			SPOCTOPUS
start position	end position	prediction	start position	end position	prediction
1	22	outside	1	22	outside
23	43	TM helix	23	43	TM helix
44	54	inside	44	54	inside
55	75	TM helix	55	75	TM helix
76	95	outside	76	95	outside
96	116	TM helix	96	116	TM helix
117	121	inside	117	120	inside
122	142	TM helix	121	141	TM helix
143	147	outside	142	147	outside
148	168	TM helix	148	168	TM helix
169	185	inside	169	185	inside
186	206	TM helix	186	206	TM helix
207	216	outside	207	216	outside
217	237	TM helix	217	237	TM helix
238	262	inside	238	262	inside

Both methods have a very similar result, which is identical with the exception of some residues. Both predicted the seven transmembrane helices, which is a very good result.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein

Comparison between the prediction of SPOCTOPUS and the real protein

RET4_HUMAN

Prediction of OCTOPUS for the transmembrane helices of RET4_HUMAN

Prediction of SPOCTOPUS for the transmembrane helices of RET4_HUMAN

OCTOPUS			SPOCTOPUS
start position	end position	prediction	start position	end position	prediction
1	1	inside	1	5	N-terminal of a signal peptide
2	23	TM helix	6	19	signal peptide
24	201	outside	20	201	outside

As before by HEXA_HUMAN, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts the signal peptide.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein

Comparison between the prediction of SPOCTOPUS and the real protein

INSL5_HUMAN

Prediction of OCTOPUS for the transmembrane helices of INSL5_HUMAN

Prediction of SPOCTOPUS for the transmembrane helices of INSL5_HUMAN

OCTOPUS			SPOCTOPUS
start position	end position	prediction	start position	end position	prediction
1	1	inside	1	5	N-terminal of a signale peptide
2	32	TM helix	6	23	signal peptide
33	135	outside	24	135	outside

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein

Comparison between the prediction of SPOCTOPUS and the real protein

As we already have seen before, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts this region as signal peptid, which is correct.

LAMP1_HUMAN

Prediction of OCTOPUS for the transmembrane helices of LAMP1_HUMAN

Prediction of SPOCTOPUS for the transmembrane helices of LAMP1_HUMAN

OCTOPUS			SPOCTOPUS
start position	end position	prediction	start position	end position	prediction
1	10	inside	1	11	N-terminal of a signal peptide
11	31	TM helix	12	29	signal peptide
32	383	outside	30	383	outside
384	404	TM helix	384	404	TM helix
405	417	outside	405	417	outside

As before by HEXA_HUMAN and RET4_HUMAN, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts the signal peptide.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein

Comparison between the prediction of SPOCTOPUS and the real protein

A4_HUMAN

Prediction of OCTOPUS for the transmembrane helices of LAMP1_HUMAN

Prediction of SPOCTOPUS for the transmembrane helices of LAMP1_HUMAN

OCTOPUS			SPOCTOPUS
start position	end position	prediction	start position	end position	prediction
1	5	outside	1	4	N-terminal of signal peptide
6	11	R	5	18	Signal peptide
12	701	outside	19	701	outside
702	722	TM helix	702	722	TM helix
723	770	inside	723	770	inside

As before by HEXA_HUMAN and RET4_HUMAN, OCTOPUS predicts a transmembrane helix, whereas SPOCTOPUS predicts the signal peptide.

Comparison with the real structure of the protein:

Comparison between the prediction of OCTOPUS and the real protein

Comparison between the prediction of SPOCTOPUS and the real protein

TargetP

All of our proteins are proteins from human and archaea, so therefore we only use the non-plant option of TargetP.

HEXA_HUMAN

Location	Probability
mitochondrial targeting SP	0.214
secretory pathway SP	0.877
other	0.009

TargetP predicts a secretory pathway signal peptide for this protein, which is correct.

BACR_HALSA

Location	Probability
mitochondiral targeting SP	0.019
secretory pathway SP	0.897
other	0.562

TargetP predicts that this protein contains a secretory pathway signal peptide. The probability for this signal peptide is very high, although the result is wrong, because BACR_HALSA is a transmembrane protein.

RET4_HUMAN

Location	Probability
mitochondrial targeting SP	0.242
secretory pathway SP	0.928
other	0.020

TargetP predicts a secretory pathway signal peptide for this protein, which is completely correct.

INSL5_HUMAN

Location	Probability
mitochondrial targeting SP	0.074
secretory pathway SP	0.899
other	0.037

As before, TargetP predicts a secretory pathway signal peptide, which is again correct.

LAMP1_HUMAN

Location	Probability
mitochondrial targeting SP	0.043
secretory pathway SP	0.953
other	0.017

The prediction of the secretory pathway signal peptide is wrong, because LAMP1_HUMAN is a transmembrane protein.

A4_HUMAN

Location	Probability
mitochondrial targeting SP	0.035
secretory pathway SP	0.937
other	0.084

Because A4_HUMAN is a transmembrane protein, the prediction for the secretory pathway signal peptide is wrong.

SignalP

For our analysis we used the hidden markov model based and also the neuronal network based prediction.
The prediction with the hidden markov model used three different scores. The S-score which is the score for the signal peptide, the C-score which is the score for the clevage site and the Y-score which is a combination of the S-score and the C-score and is used to predict the cleavage site, because the Y-score is more precise than the C-score.

HEXA_HUMAN

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	22	22	23	signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
signal peptide	1.000	0.000	22	23

Result of the SignalP method based on the neuronal network

Result of the SignalP method based on the hidden markov model

Both methods predict the same start and end position of the cleavage site and also both methods predict a signal peptide, which is correct because HEXA_HUMAN takes part at the secretory pathway.

BACR_HALSA

BACR_HALSA is an archaea protein. SignalP gave the possibility to predict eukaryotic or bacteria (gram-positive and gram-negative) signal peptides. Therefore, we decided to use all three possible prediction methods and to compare the results with the real signal peptide.

eukaryotes

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	38	38	39	signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
signal peptide	0.017	0.859	15	16

Result of the SignalP method based on the neuronal network for BACR_HALSA with the prediction method for eukaryotes

Result of the SignalP method based on the hidden markov model for BACR_HALSA with the prediction method for eukaryotes

gram-negative bacteria

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	42	42	43	no signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
Non-secretory protein	0.000	0.000

Result of the SignalP method based on the neuronal network for BACR_HALSA with the prediction method for gram-negative bacteria

Result of the SignalP method based on the hidden markov model for BACR_HALSA with the prediction method for gram-negative bacteria

gram-positive bacteria

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	33	33	34	no signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
Non-secretoy protein	0.000	0.000

Result of the SignalP method based on the neuronal network for BACR_HALSA with the prediction method for gram-positive bacteria

Result of the SignalP method based on the hidden markov model for BACR_HALSA with the prediction method for gram-positive bacteria

Only the eukaryotic prediction method predicts a signal peptide, whereas the both methods for bacteria predict, that this protein has no signal peptide. Otherwise, only the eukaryotic prediction method predict the protein as a signal anchor, which is correct, because BACR_HALSA is a transmembrane protein. Therefore, it seemds, that the eukaryotic prediction method suited better for BACR_HALSA

RET4_HUMAN

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	18	18	19	signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
signal peptide	1.000	0.000	18	19

Result of the SignalP method based on the neuronal network for RET4_HUMAN

Result of the SignalP method based on the hidden markov model for RET4_HUMAN

Both methods predict a signal peptide for RET4_HUMAN, which is correct.

INSL5_HUMAN

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	22	22	23	signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
signal peptide	0.999	0.000	22	23

Result of the SignalP method based on the neuronal network for INSL5_HUMAN

Result of the SignalP method based on the hidden markov model for INSL5_HUMAN

Both methods predict a signal peptide for RET4_HUMAN, which is correct.

LAMP1_HUMAN

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	28	28	29	signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
signal peptide	1.000	0.000	28	29

Result of the SignalP method based on the neuronal network for LAMP1_HUMAN

Result of the SignalP method based on the hidden markov model for LAMP1_HUMAN

Both methods predict a signal peptide for LAMP1_HUMAN, which is not correct, because LAMP1_HUMAN is a transmembrane protein.

A4_HUMAN

Result of the neuronal network

Signal peptide		Clevage site
start position	end position	start position	end position	prediction
1	17	17	18	signal peptide

Result of the hidden markov model

prediction	signal peptide probability	signal anchor probability	cleavage site start	cleavage site end
signal peptide	1.000	0.000	17	18

Result of the SignalP method based on the neuronal network for A4_HUMAN

Result of the SignalP method based on the hidden markov model for A4_HUMAN

Both methods predict a signal peptide for A4_HUMAN, which is not correct, because A4_HUMAN is a transmembrane protein.

Comparison of the different methods

We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.

Comparison of transmembrane helix prediction

Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.


		methods
		TMHMM	Phobius	PolyPhobius	OCTOPUS	SPOCTOPUS	Transmembrane protein
HEXA_HUMAN	#wrong transmembrane	0	0	0	0	0	no
	#wrong outside	0	0	0	0	0
	#wrong insde	0	0	0	0	0
	#wrong sum	0	0	0	0	0
	%wrong predicted	0%	0%	0%	0%	0%
BACR_HALSA	#wrong transmembrane	24	20	12	16	11	yes (7 transmembrane helices)
	#wrong outside	46	5	3	4	6
	#wrong inside	4	4	2	0	0
	#wrong sum	74	29	17	20	17
	%wrong predicted	29%	11%	6%	8%	6%

RET4_HUMAN	#wrong transmembrane	0	0	0	5	0	no
	#wrong outside	0	0	0	0	0
	#wrong inside	0	0	0	0	0
	#wrong sum	0	0	0	5	0
	%wrong predicted	0%	0%	0%	2%	0%

INSL5_HUMAN	#wrong transmembrane	0	0	0	10	0	no
	#wrong outside	0	0	0	0	0
	#wrong inside	0	0	0	0	0
	#wrong sum	0	0	0	10	0
	%wrong predicted	0%	0%	0%	8%	0%

LAMP1_HUMAN	#wrong transmembrane	5	3	4	3	1	yes (single-spanning)
	#wrong outside	2	0	0	1	1
	#wrong inside	0	0	0	1	1
	#wrong sum	7	3	4	5	3
	%wrong predicted	2%	0%	1%	1%	0%

A4_HUMAN	#wrong transmembrane	0	0	0	0	0	yes (single-spanning)
	#wrong outside	1	1	1	1	2
	#wrong inside	0	0	0	1	1
	#wrong sum	1	1	1	2	3
	%wrong predicted	0%	0%	0%	0%	0%
Average number of wrong predicted residues
		13.6	5.5	3.6	7	3.8

TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices. SPOCTOPUS and PolyPhobius are the best prediction methods.

In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.

Comparison of signal peptide prediction

Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius. TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.


		methods
		real position	Phobius	PolyPhobius	SPOCTOPUS	TargetP	SignalP
HEXA_HUMAN	stop position	22	22	19	21	no prediction	22
	#wrong residues		0	3	3	no prediction	0
	location	secretory pathway	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction
BACR_HALSA	stop position	not available	no prediction	no prediction	no prediction	no prediction	no consensus prediction
	#wrong predicted	not available	not available	not available	not available	no prediction	not available
	location	membrane	not available	not available	not available	secretory pathway	non-signal peptide

RET4_HUMAN	stop position	18	18	18	19	no prediction	18
	#wrong predicted		0	0	1	no prediction	0
	location	secretory pathway	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction

INSL5_HUMAN	stop position	22	22	22	22	no prediction	22
	#wrong residues		0	0	0	no prediction	0
	location	secretory pathway	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction

LAMP1_HUMAN	stop position	28	28	28	29	no prediction	28
	#wrong residues		0	0	1	no prediction	0
	location	transmembrane helix	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction

A4_HUMAN	stop position	17	17	17	18	no prediction	17
	#wrong residues		0	0	1	no prediction	0
	location	transmembrane helix	secretory pathway	secretory pathway	no prediction	secretory pathway	secretory pathway
Average number of wrong prediction
	sum of wrong predicted residues		0	3	2	no prediction	0
	#right predicted locations / #predicted locations		3/5	3/5	no prediction	3/5	no prediction

SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not. In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices. The results of Phobius were a litte bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.

Comparison of the combined methods

The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.


		methods
		Phobius	PolyPhobius	SPOCTOPUS
HEXA_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	0	3	2
	location	right	right	no prediction
BACR_HALSA	#wrong predicted residues (TM)	29	17	17
	#wrong predicted residues (SP)	n.a.	n.a.	n.a.
	location	n.a	n.a	no prediction

RET4_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	0	0	0
	location	right	right	no prediction

INSL5_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	0	0	1
	location	right	right	no prediction

LAMP1_HUMAN	#wrong predicted residues (TM)	3	4	3
	#wrong predicted residues (SP)	0	0	0
	location	wrong	wrong	no prediction

A4_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	1	1	3
	location	wrong	wrong	no prediction
Average
	avg(#wrong predicted residues (TM))	5.3	3.5	3.3
	avg(#wrong predicted residues (SP))	0.1	0.6	1
	#location (right predicted) / #location(predicted)	3/5	3/5	no prediction

In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.

Prediction of GO terms

Before we start with out analysis, we decided to check the GO annotations for the six sequences:

HEXA_HUMAN
Process	skeletal system development
	carbohydrate metabolic process
	ganglioside catabolic process
	lysosome organization
	sensory perception of sound
	locomotory behavior
	adult walking behavior
	lipid storage
	sexual reproduction
	glycosaminoglycan metabolic process
	myelination
	cell morphogenesis involved in neuron differentiation
	neuromuscular process controlling posture
	neuromuscular process controlling balance
Function	catalytic activity
	hydrolase activity, hydrolyzing O-glycosyl compounds
	beta-N-acetylhexosaminidase activity
	protein binding
	hydrolase activity
	hydrolase activity, acting on glycosyl bonds
	cation binding
	protein heterodimerization activity
Component	lysosome
Component	membrane
BACR_HALSA
Process	transport
	ion transport
	phototransduction
	proton transport
	protein-chromophore linkage
	response to stimulus
Function	receptor activity
	ion channel activity
	photoreceptor activity
Component	plasma membrane
	membrane
	integral to membrane
RET4_HUMAN
Process	eye development
	gluconeogenesis
	transport
	spermatogenesis
	heart development
	visual perception
	male gonad development
	embryo development
	maintenance of gastrointestinal epithelium
	lung development
	positive regulation of insulin secretion
	response to retinoic acid
	response to insulin stimulus
	retinol transport
	retinol metabolic process
	glucose homeostasis
	response to ethanol
	embryonic organ morphogenesis
	embryonic skeletal system development
	cardiac muscle tissue development
	female genitalia morphogenesis
	detection of light stimulus involved in visual perception
	positive regulation of immunoglobulin secretion
	retina development in camera-type eye
	negative regulation of cardiac muscle cell proliferation
	embryonic retina morphogenesis in camera-type eye
	uterus development
	vagina development
	urinary bladder development
	heart trabecula formation
Function	transporter activity
	binding
	retinoid binding
	protein binding
	retinal binding
	retinol binding
	retinol transporter activity
Component	extracellular region
Component	extracellular space
INSL5_HUMAN
Process	biological_process
Function	hormone activity
Component	cellular_component
Component	extracellular region
LAMP1_HUMAN
Process	autophagy
Component	membrane fraction
	lysosome
	lysosomal membrane
	endosome
	late endosome
	multivesicular body
	plasma membrane
	integral to plasma membrane
	external side of plasma membrane
	cell surface
	endosome membrane
	membrane
	integral to membrane
	vesicle
	sarcolemma
	melanosome
A4_HUMAN
Process	G2 phase of mitotic cell cycle
	suckling behavior
	platelet degranulation
	mRNA polyadenylation
	regulation of translation
	protein phosphorylation
	cellular copper ion homeostasis
	endocytosis
	apoptosis
	induction of apoptosis
	cell adhesion
	regulation of epidermal growth factor receptor activity
	Notch signaling pathway
	axonogenesis
	blood coagulation
	mating behavior
	locomotory behavior
	axon cargo transport
	cell death
	adult locomotory behavior
	visual learning
	negative regulation of peptidase activity
	positive regulation of peptidase activity
	axon midline choice point recognition
	neuron remodeling
	dendrite development
	platelet activation
	extracellular matrix organization
	forebrain development
	neuron projection development
	ionotropic glutamate receptor signaling pathway
	regulation of multicellular organism growth
	innate immune response
	negative regulation of neuron differentiation
	positive regulation of mitotic cell cycle
	positive regulation of transcription from RNA polymerase II promoter
	collateral sprouting in absence of injury
	regulation of synapse structure and activity
	neuromuscular process controlling balance
	synaptic growth at neuromuscular junction
	neuron apoptosis
	smooth endoplasmic reticulum calcium ion homeostasis
Function	DNA binding
	serine-type endopeptidase inhibitor activity
	receptor binding
	binding
	protein binding
	peptidase activator activity
	peptidase inhibitor activity
	acetylcholine receptor binding
	identical protein binding
	metal ion binding
	PTB domain binding
Component	extracellular region
	membrane fraction
	cytoplasm
	Golgi apparatus
	plasma membrane
	integral to plasma membrane
	coated pit
	cell surface
	membrane
	integral to membrane
	synaptosome
	axon
	platelet alpha granule lumen
	cytoplasmic vesicle
	neuromuscular junction
	ciliary rootlet
	neuron projection
	dendritic spine
	dendritic shaft
	intracellular membrane-bounded organelle
	apical part of cell
	synapse
	perinuclear region of cytoplasm
	spindle midzone

A detailed list of the GO annotation terms of each protein can be found [here].

GOPET

We tried to predict the GO annotations with GOPET for our six different proteins.

HEXA_HUMAN

Result of the GOPET prediction for HEXA_HUMAN

The method only predicts functional GO terms. HEXA_HUMAN has 8 annotated GO functions. The methods predicts also 8 GO function terms. Therefore we decided to check if all predictions are correct. We checked if the general term is correct and also if the GO number is correct.

GO term	confidence	prediction term	prediction GOid
hexosamidase activity	97%	right	wrong
beta-N-acetylhexosamidase activity	96%	right	right
hydrolase activity	96%	right	right
hydrolase activity acting on glycosyl bonds	96%	right	right
hydrolase activity hydrolyzing O-glycosyl compounds	96%	right	right
catalytic activity	96%	right	right
hydrolase activity hydrolyzing N-glycosyl compounds	78%	wrong	wrong
protein heterodimerization activity	61%	right	right

BACR_HALSA

Result of the GOPET prediction for BACR_HALSA

The method only predicts functional GO terms. BACR_HALSA has 3 annotated GO functions. The methods predicts also 3 GO function terms. Therefore we decided to check if all predictions are correct.

GO term	confidence	prediction term	prediction GOid
ion channel activity	77%	right	right
G-protein coupled photoreceptor activity	75%	right	wrong
hydrogen ion transmembrane transporter activity	60%	wrong	wrong

RET4_HUMAN

Result of the GOPET prediction for RET4_HUMAN

The method only predicts functional GO terms. RET4_HUMAN has 7 annotated GO functions. The methods predicts 8 GO function terms. Therefore we decided to check if all predictions are correct.

GO term	confidence	prediction term	prediction GOid
binding	90%	right	right
retiniod binding	81&	right	right
lipid binding	80%	wrong	wrong
retional binding	78%	right	right
transporter activity	78%	right	right
retinal binding	78%	right	right
lipid transport activity	69%	wrong	wrong
high-density lipoprotein particle binding	60%	wrong	wrong

INSL5_HUMAN

The method only predicts functional GO terms. INSL5_HUMAN has 1 annotated GO functions. The methods predicts also 1 GO function terms. Therefore we decided to check if all predictions are correct.

GO term	confidence	prediction term	prediction GOid
hormone activity	80%	right	right

LAMP1_HUMAN

Result of the GOPET prediction for LAMP1_HUMAN

The method only predicts functional GO terms. LAMP1_HUMAN has 0 annotated GO functions. The methods predicts 2 GO function terms. Therefore the predictions are wrong.

A4_HUMAN

Result of the GOPET prediction for A4_HUMAN

The method only predicts functional GO terms. A4_HUMAN has 11 annotated GO functions. The methods predicts 13 GO function terms. Therefore we decided to check if all predictions are correct.

GO term	confidence	prediction term	prediction GOid
endopeptidase inhibitor activity	87%	right	wrong
serine-type endopeptidase inhibitor activity	86%	right	right
plasmin inhibitor activity	83%	wrong	wrong
trypsin inhibitor activtiy	83%	wrong	wrong
peptidase inhibitor activity	82%	right	right
binding	79%	right	right
protein binding	74%	right	right
metal ion binding	73%	right	right
DNA binding	71%	right	right
heparin binding	70%	wrong	right
zinc ion binding	69%	wrong	wrong
copper ion binding	69%	wrong	wrong
iron ion binding	67%	wrong	wrong

Pfam

We used the webserver for our analysis. We decided to only trust the significant Pfam-A matches. To check if the predictions are correct we mapped the Pfam ids to the Go ids with help of a mapping website [[1]]. If a successful mapping was not possible, we compared the names of the predicted Pfam family with the names of the GO terms. If the names are similar or equal, we decided to trust the mapping.

HEXA_HUMAN

Graphical representation of the prediction result of Pfam:

Pfam found two significant Pfam-A matches:

Family	E-Value	GO id	prediction
Glycosyl hydrolase family 20, domain 2	3.7e-43	GO:0004553	right
Glycosyl hydrolase family 20, catalytic domain	1.8e-84	GO:0005975	right

BACR_HALSA

Graphical representation of the prediction result of Pfam:

Pfam found one significant Pfam-A matches:

Family	E-Value	GOid	prediction
Bacteriorhodopsin-like protein	2e-88	GO:0005216	right
		GO:0006811	right
		GO:0016020	right

RET4_HUMAN

Graphical representation of the prediction result of Pfam:

Pfam found one significant Pfam-A matches:

Family	E-Value	GOid	prediction
Lipocalin/cytosolic fatty-acid binding protein family	1.7e-22	GO:0005488	right

INSL5_HUMAN

Graphical representation of the prediction result of Pfam:

Pfam found two significant Pfam-A matches:

Family	E-Value	GOid	prediction
Insulin/IGF/Relaxin family	6.7e-08	GO:0005179	right
Insulin/IGF/Relaxin family	6.7e-08	GO:0005576	right

LAMP1_HUMAN

Graphical representation of the prediction result of Pfam:

Pfam found one significant Pfam-A matches:

Family	E-Value	GOid	prediction
Lysosome-associated membrane glyoprotein (LAMP)	2.3e-135	GO:0016020	right

A4_HUMAN

Graphical representation of the prediction result of Pfam:

Pfam found six significant Pfam-A matches:

Family	E-Value	GOid	prediction
Amyloid A4 N-terminal heparin-binding	4e-42	none	right
Copper-binding of amyloid precursor CuBD	2.3e-27	none	right
Kunitz/Bovine pancreatic trypsin inhibitor domain	3e-19	GO:0004867	right
E2 domain of amyloid precursor protein	1.6e-74	none	right
Beta-amyloid peptide (beta-APP)	4.3e-28	GO:0005488	right
Beta-amyloid peptide (beta-APP)	4.3e-28	GO:0016021	right
Beta-amyloid precursor protein C-terminus	1.1e-29	none	right

ProtFun 2.2

ProtFun 2.2 does not give clear predictions if the protein belongs to this class or not, instead it gives probabilities and odd scores. We decided to make a cutoff by 2. So all classes with an odd score of 2 or higher are right results for us. You can also find a "=>" sign in the result file. This sign shows the result with the highest information content. We also take this line as result, although if the odd score is lower than 2. If we only have result with a odd score lower than 2, the line with this sign is our onlyest result.
Because the prediction categories are very general, it was not possible to map the GOids. Therefore, we checked the known GO annotations. If there was a hint for a category and the protein was predicted to be in this category, we decided that the prediction is right, otherwise if the known GO annotations and the categories conflict, we count the prediction as wrong.

HEXA_HUMAN

The ProtFun Server calculated following prediction result for HEXA_HUMAN:

Functional category
Functional category	Probability	Odd score	Prediction
Amino acid biosynthesis	0.161	7.331	wrong
Biosynthesis of cofactors	0.332	4.609	right
Cell envelope	0.804 =>	13.186 =>	right
Cellular processes	0.110	1.506	right
Central intermediary metabolism	0.432	6.856	right
Engergy metabolism	0.113	1.259	right
Fatty acid metabolsim	0.019	1.427	right
Purines and Pyrimidines	0.519	2.136	wrong
Regulatory functions	0.018	0.111	right
Replication and transcription	0.073	0.271	right
Translation	0.040	0.904	right
Transport and binding	0.685	1.670	right
Enyzme/non-enzyme
Enzyme/non-enzyme	Probability	Odd score	Prediction
Enzyme	0.792 =>	2.764 =>	right
Nonenzyme	0.208	0.292	right
Enyzme class
Enzyme class	Probability	Odd score	Prediction
Oxidoreductase (EC 1.-.-.-)	0.143	0.685	right
Transferase (EC 2.-.-.-)	0.201	0.582	right
Hydrolase (EC 3.-.-.-)	0.329	1.039	wrong
Lyase (EC 4.-.-.-)	0.054	1.143	right
Isomerase (EC 5.-.-.-)	0.027	0.856	right
Ligase (EC 6.-.-.-)	0.085 =>	1.661 =>	right
Gene ontology category
Gene ontology category	Probability	Odd score	Prediction
Signal transducer	0.083	0.389	right
Receptor	0.105	0.617	right
Hormone	0.001	0.206	right
Structural protein	0.010	0.357	right
Transporter	0.024	0.222	right
Ion channel	0.018	0.310	right
Volatge-gated ion channel	0.002	0.082	right
Cation channel	0.010	0.218	right
Transcription	0.058	0.453	right
Transcription regulation	0.026	0.205	right
Stress response	0.004	0.500	right
Immune response	0.014	0.167	right
Growth factor	0.005	0.372	right
Metal ion transport	0.009	0.020	right

BACR_HALSA

The ProtFun Server calculated following prediction result for BACR_HALSA:

Functional category
Functional category	Probability	Odd score	Prediction
Amino acid biosynthesis	0.033	1.495	right
Biosynthesis of cofactors	0.186	2.589	wrong
Cell envelope	0.029	0.483	right
Cellular processes	0.051	0.698	right
Central intermediary metabolism	0.045	0.711	right
Engergy metabolism	0.138	1.537	right
Fatty acid metabolsim	0.016	1.265	right
Purines and Pyrimidines	0.302	1.244	right
Regulatory functions	0.013	0.080	wrong
Replication and transcription	0.019	0.073	right
Translation	0.059	1.339	right
Transport and binding	0.791 =>	1.929 =>	right
Enyzme/non-enzyme
Enzyme/non-enzyme	Probability	Odd score	Prediction
Enzyme	0.199	0.696	right
Nonenzyme	0.801 =>	1.122 =>	right
Enyzme class
Enzyme class	Probability	Odd score	Prediction
Oxidoreductase (EC 1.-.-.-)	0.114	0.549	right
Transferase (EC 2.-.-.-)	0.031	0.091	right
Hydrolase (EC 3.-.-.-)	0.057	0.180	right
Lyase (EC 4.-.-.-)	0.020	0.430	right
Isomerase (EC 5.-.-.-)	0.010	0.321	right
Ligase (EC 6.-.-.-)	0.017	0.625	right
Gene ontology category
Gene ontology category	Probability	Odd score	Prediction
Signal transducer	0.258	1.205	wrong
Receptor	0.355	2.087	right
Hormone	0.001	0.206	right
Structural protein	0.006	0.200	right
Transporter	0.440 =>	4.036 =>	right
Ion channel	0.010	0.169	wrong
Volatge-gated ion channel	0.004	0.172	right
Cation channel	0.078	1.689	right
Transcription	0.026	0.205	right
Transcription regulation	0.028	0.226	right
Stress response	0.012	0.139	right
Immune response	0.011	0.128	right
Growth factor	0.010	0.727	right
Metal ion transport	0.049	0.106	right

RET4_HUMAN

The ProtFun Server calculated following prediction result for RET4_HUMAN:

Functional category
Functional category	Probability	Odd score	Prediction
Amino acid biosynthesis	0.017	0.751	right
Biosynthesis of cofactors	0.044	0.610	right
Cell envelope	0.804 =>	13.186 =>	right
Cellular processes	0.075	1.021	wrong
Central intermediary metabolism	0.197	3.128	right
Engergy metabolism	0.043	0.475	right
Fatty acid metabolsim	0.016	1.265	right
Purines and Pyrimidines	0.275	1.131	right
Regulatory functions	0.013	0.080	right
Replication and transcription	0.022	0.084	right
Translation	0.032	0.721	right
Transport and binding	0.800	1.951	wrong
Enyzme/non-enzyme
Enzyme/non-enzyme	Probabilty	Odd score	Prediction
Enzyme	0.544 =>	1.900 =>	right
Nonenzyme	0.456	0.639	right
Enyzme class
Enzyme class	Probabilty	Odd score	Prediction
Oxidoreductase (EC 1.-.-.-)	0.095	0.458	right
Transferase (EC 2.-.-.-)	0.038	0.109	right
Hydrolase (EC 3.-.-.-)	0.235	0.742	right
Lyase (EC 4.-.-.-)	0.059 =>	1.264 =>	wrong
Isomerase (EC 5.-.-.-)	0.010	0.321	right
Ligase (EC 6.-.-.-)	0.017	0.326	right
Gene ontology category
Gene ontology category	Probability	Odd score	Prediction
Signal transducer	0.202	0.942	right
Receptor	0.147	0.862	right
Hormone	0.004	0.667	right
Structural protein	0.002	0.058	right
Transporter	0.025	0.232	right
Ion channel	0.016	0.288	right
Volatge-gated ion channel	0.003	0.148	right
Cation channel	0.010	0.215	right
Transcription	0.027	0.207	right
Transcription regulation	0.025	0.196	right
Stress response	0.161	1.829	right
Immune response	0.239 =>	2.813 =>	wrong
Growth factor	0.023	1.617	right
Metal ion transport	0.009	0.020	right

INSL5_HUMAN

The ProtFun Server calculated following prediction result for INSL5_HUMAN:

Functional category
Functional category	Probability	Odd score	Prediction
Amino acid biosynthesis	0.011	0.484	right
Biosynthesis of cofactors	0.040	0.558	right
Cell envelope	0.756 =>	12.393 =>	right
Cellular processes	0.033	0.448	right
Central intermediary metabolism	0.048	0.755	right
Engergy metabolism	0.036	0.397	right
Fatty acid metabolsim	0.016	1.265	right
Purines and Pyrimidines	0.144	0.592	right
Regulatory functions	0.014	0.087	right
Replication and Transcription	0.020	0.075	right
Translation	0.032	0.735	right
Transport and binding	0.834	2.033	right
Enyzme/non-enzyme
Enzyme/non-enzyme	Probability	Odd score	Prediction
Enzyme	0.209	0.729	right
Nonenzyme	0.791 =>	1.109 =>	right
Enyzme class
Enzyme class	Probabilty	Odd score	Prediction
Oxidoreductase (EC 1.-.-.-)	0.056	0.268	right
Transferase (EC 2.-.-.-)	0.031	0.091	right
Hydrolase (EC 3.-.-.-)	0.062	0.195	right
Lyase (EC 4.-.-.-)	0.020	0.430	right
Isomerase (EC 5.-.-.-)	0.010	0.321	right
Ligase (EC 6.-.-.-)	0.017	0.327	right
Gene ontology category
Gene ontology category	Probability	Odd score	Prediction
Signal transducer	0.374	1.746	right
Receptor	0.128	0.750	right
Hormone	0.247 =>	37.936 =>	right
Structural protein	0.001	0.041	right
Transporter	0.025	0.228	right
Ion channel	0.010	0.168	right
Volatge-gated ion channel	0.003	0.131	right
Cation channel	0.010	0.215	right
Transcription	0.054	0.425	right
Transcription regulation	0.091	0.724	right
Stress response	0.099	1.128	right
Immune response	0.178	2.090	wrong
Growth factor	0.061	4.379	wrong
Metal ion transport	0.009	0.020	right

LAMP1_HUMAN

The ProtFun Server calculated following prediction result for LAMP1_HUMAN:

Functional category
Functional category	Probability	Odd score	Prediction
Amino acid biosynthesis	0.011	0.484	right
Biosynthesis of cofactors	0.053	0.735	right
Cell envelope	0.804 =>	13.186 =>	right
Cellular processes	0.027	0.373	right
Central intermediary metabolism	0.138	2.188	right
Engergy metabolism	0.037	0.411	right
Fatty acid metabolsim	0.016	1.265	right
Purines and Pyrimidines	0.533	2.195	wrong
Regulatory functions	0.015	0.090	right
Replication and transcription	0.019	0.073	right
Translation	0.027	0.613	right
Transport and binding	0.834	2.033	right
Enyzme/non-enzyme
Enzyme/non-enzyme	Probability	Odd score	Prediction
Enzyme	0.276	0.965	right
Nonenzyme	0.724 =>	1.014 =>	right
Enyzme class
Enzyme class	Probability	Odd score	Prediction
Oxidoreductase (EC 1.-.-.-)	0.039	0.187	right
Transferase (EC 2.-.-.-)	0.046	0.134	right
Hydrolase (EC 3.-.-.-)	0.058	0.184	right
Lyase (EC 4.-.-.-)	0.020	0.430	right
Isomerase (EC 5.-.-.-)	0.010	0.321	right
Ligase (EC 6.-.-.-)	0.017	0.326	right
Gene ontology category
Gene ontology category	Probability	Odd score	Prediction
Signal transducer	0.396	1.849	right
Receptor	0.282	1.659	right
Hormone	0.001	0.206	right
Structural protein	0.011	0.408	right
Transporter	0.024	0.222	right
Ion channel	0.008	0.147	right
Volatge-gated ion channel	0.002	0.111	right
Cation channel	0.010	0.215	right
Transcription	0.032	0.247	right
Transcription regulation	0.018	0.142	right
Stress response	0.246	2.795	right
Immune response	0.371 =>	4.368 =>	right
Growth factor	0.013	0.956	right
Metal ion transport	0.009	0.020	right

A4_HUMAN

The ProtFun Server calculated following prediction result for A4_HUMAN:

Functional category
Functional category	Probabilty	Odd score	Prediction
Amino acid biosynthesis	0.020	0.921	right
Biosynthesis of cofactors	0.261	3.623	right
Cell envelope	0.804 =>	13.186 =>	right
Cellular processes	0.053	0.070	right
Central intermediary metabolism	0.184	2.920	right
Engergy metabolism	0.023	0.259	right
Fatty acid metabolsim	0.016	1.265	right
Purines and Pyrimidines	0.417	1.716	right
Regulatory functions	0.013	0.084	wrong
Replication and transcription	0.029	0.109	right
Translation	0.027	0.613	right
Transport and binding	0.827	2.016	right
Enyzme/non-enzyme
Enzyme/non-enzyme	Probability	Odd score	Prediction
Enzyme	0.392 =>	1.368 =>	right
Nonenzyme	0.608	0.852	right
Enyzme class
Enzyme class	Probability	Odd score	Prediction
Oxidoreductase (EC 1.-.-.-)	0.024	0.114	right
Transferase (EC 2.-.-.-)	0.208	0.603	right
Hydrolase (EC 3.-.-.-)	0.190	0.600	right
Lyase (EC 4.-.-.-)	0.020	0.430	right
Isomerase (EC 5.-.-.-)	0.010	0.324	right
Ligase (EC 6.-.-.-)	0.048	0.946	right
Gene ontology category
Gene ontology category	Probability	Odd score	Prediction
Signal transducer	0.126	0.586	right
Receptor	0.036	0.211	right
Hormone	0.001	0.206	right
Structural protein	0.034 =>	1.205 =>	right
Transporter	0.024	0.222	right
Ion channel	0.009	0.162	right
Volatge-gated ion channel	0.002	0.108	right
Cation channel	0.010	0.215	right
Transcription	0.043	0.335	right
Transcription regulation	0.018	0.143	right
Stress response	0.076	0.862	right
Immune response	0.016	0.183	right
Growth factor	0.005	0.372	right
Metal ion transport	0.009	0.020	right

Comparison of the different methods

It is difficult to compare these methods. First of all, two methods are based on homology-based prediction, whereas ProtFun is based on ab initio prediction. So it is clear, that the results differ. Second, each method has another prediction focus and called the results a little bit different. Only GOPET predicts exact GO numbers, the other two methods only predict the approximate functions and processes.
Therefore, to compare the results, we decided to calculate the fraction of right prediction and the ratio between right predictions and annotated GO terms.

		methods
		GOPET terms	GOPET GOids	Pfam	ProtFun
HEXA_HUMAN	#true positive	7	7	2	31
	#false negative	1	1	0	3
	#predictions	8	8	2	34
	#GO terms	25
	true positive (in %)	0.87	0.87	1	0.91
	ratio true positive/annotated GO terms	0.28	0.28	0.08	not possible
BACR_HALSA	#true positive	2	1	1	30
	#false negative	1	2	0	4
	#predictions	3	3	1	34
	#GO terms	12
	true positive (in %)	0.66	0.33	1	0.88
	ratio true positive/annotated GO terms	0.16	0.08	0.08	not possible
RET4_HUMAN	#true positive	5	5	1	30
	#false negative	3	3	0	4
	#predictions	8	8	1	34
	#GO terms	41
	true positive (in %)	0.62	0.62	1	0.88
	ratio true positive/annotated GO terms	0.12	0.12	0.02	not possible
INSL5_HUMAN	#true positive	1	1	1	32
	#false negative	0	0	0	2
	#predictions	1	1	1	34
	#GO terms	4
	true positive (in %)	1	1	1	0.94
	ratio true positive/annotated GO terms	0.25	0.25	0.25	not possible
LAMP1_HUMAN	#true positive	0	0	1	33
	#false negative	2	2	0	1
	#predictions	2	2	1	34
	#GO terms	17
	true positive (in %)	0	0	1	0.97
	ratio true positive/annotated GO terms	0	0	0.05	not possible
A4_HUMAN	#true positive	7	7	6	33
	#false negative	6	6	0	1
	#predictions	13	13	6	34
	#GO terms	78
	true positive (in %)	0.53	0.53	1	0.97
	ratio true positive/annotated GO terms	0.08	0.08	0.07	not possible

As you can see in the tabel above, each method only predict a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the onyl method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive and it also predicts more GO terms than the other methods.
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms.
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.

@@ Line 29: / Line 29: @@
 === Prediction of GO Terms ===
+The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other.
-''' GOPET (Gene Ontology Term Prediction and Evaluation Tool) '''<br>
+Again we also provide an [[http://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/GO_terms_general general information page]] about the GO Term methods, we used in our analysis.
-''Authors:'' Vinayagam A, König R, Moormann J, Schubert F, Eils R, Glatting KH, Suhai S <br>
-''Year:'' 2004<br>
-''Source:'' [[http://www.ncbi.nlm.nih.gov/pmc/articles/PMC517617/?tool=pubmed Applying Support Vector Machines for Gene Ontology based gene function prediction.]]<br>
-''Description:''<br>
-GOPET is a homology-based GO term prediction methods. It tries to assign uncharacterised cDNA sequences to GO molecular function terms. Therefore, the method uses in the first step a Blast search against GO-mapped proteins in a database. The found GO terms and attributes are used as input for a Support Vector Machine, which makes the final classification.<br<>
-''Input:''<br>
-We used the [[http://genius.embnet.dkfz-heidelberg.de/menu/cgi-bin/w2h-open/w2h.open/w2h.startthis?SIMGO=w2h.welcome&INTRA_CONTINUE=1 Webserver]] for our prediction. Therefore, it was only necessary to paste our sequences in FASTA-format and to sumbit the job. <br>
-''Output:''<br>
-GOPET returns a table with the predicted GOid, the Aspect (Molecular Function Ontology (F), Biological Process Ontology (P) and Cellular Component Ontology (C)), the confidence for the prediction and the GO term itself.
-<br>
-''' Pfam '''<br>
-''Authors:'' R.D. Finn, J. Mistry, J. Tate, P. Coggill, A. Heger, J.E. Pollington, O.L. Gavin, P. Gunesekaran, G. Ceric, K. Forslund, L. Holm, E.L. Sonnhammer, S.R. Eddy, A. Bateman <br>
-''Year:'' 2010<br>
-''Source:'' [[http://nar.oxfordjournals.org/content/38/suppl_1/D211.full The Pfam protein families database]]<br>
-''Description:'' <br>
-Pfam is also a homology-based prediction method. The domains are saved as hidden markov models. The method uses a naive bayes classifactor and classify the proteins with the aid of the hidden markov models.<br>
-''Input:''<br>
-We used the [[http://pfam.sanger.ac.uk/ Webserver]] for our predictions. Therefore, we chose the point "Sequence search", pasted the protein sequence in FASTA-format and sumbitted the job. <br>
-''Output:'' <br>
-The webserver shows a graphical representation of the prediction and also the matches. There are two categories of matches, significant and insignificant Pfam A-family matches. These matches are listed with family name, a short description, the entry type, Clan, some information about the HMM and the E-Value. <br>
-''' ProtFun2.2 '''<br>
-''Authors:'' L. Juhl Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames, C. Kesmir, H. Nielsen, H. H. Stærfeldt, K. Rapacki, C. Workman, C. A. F. Andersen, S. Knudsen, A. Krogh, A. Valencia and S. Brunak.<br>
-''Year:'' 2002<br>
-''Source:'' [[http://www.ncbi.nlm.nih.gov/pubmed/12079362 Prediction of human protein function from post-translational modifications and localization features.]]<br>
-''Description:'' <br>
-ProtFun2.2 is an ab initio prediction method, which try to assign orphan proteins to functional classes. It integrates relevant features which are related to the linear amino acid sequence. Furthermore, it queries a large number of other feature prediction servers (PsiPred, TMHMM and so on). This explains why the prediction with ProtFun is very slow and you have to wait a long time for the prediction result. Techniqually, uses this method an ensemble of five different neuronal networks (which are three-layer feed-forward networks).<br>
-''Input:'' <br>
-We used the [[http://www.cbs.dtu.dk/services/ProtFun/ Webserver]] in our prediction. The prediction takes a long time and your request is queued, so you have to wait some hours. For the prediciton it is only necessary to paste the sequence in FASTA-format to the input field.<br>
-''Output:''<br>
-As output, you get a list with different functional categories and with a probability and an odd score. The probability shows you how likely your protein belongs to this class. But the probability is influenced by the prior probability of the class. The second score is an odd score, which shows you if the sequence belongs to this class or not. We decided to make a cutoff by 2. Furthermore, it predicts if your protein is an enzyme and the probability and odd score that this protein belongs to different enzyme classes. The last prediction section of the result file is the prediction for the gene ontology category and also the probabilities and odd scores for that.
-<br>
 == Secondary Structure prediction ==

Difference between revisions of "Sequence-based predictions HEXA"

Revision as of 12:18, 11 August 2011

Contents

General Information

Secondary Structure Prediction

Prediction of disordered regions

Prediction of transmembrane helices and signal peptides

Prediction of GO Terms

Secondary Structure prediction

PSIPRED

Jpred3

DSSP

Discussion

Prediction of disordered regions

Disopred

POODLE

IUPred

Meta-Disorder

Comparison of the different methods

Prediction of transmembrane alpha-helices and signal peptides

TMHMM

Phobius and PolyPhobius

OCTOPUS and SPOCTOPUS

TargetP

SignalP

Comparison of the different methods

Prediction of GO terms

GOPET

Pfam

ProtFun 2.2

Comparison of the different methods

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools