Difference between revisions of "Sequence-based predictions HEXA"

Revision as of 21:08, 30 August 2011

General Information

Secondary Structure Prediction

To analyse the secondary structure of our protein we used different methods. In our analysis we used PSIPRED, Jpred3 and DSSP. In the analysis section of this page we want to compare these three methods to see if the methods give similar results or if they differ extremely.

[Here] you can find some general information about these methods.

Back to [Tay-Sachs Disease]

Prediction of disordered regions

After analysing the secondary structure, we also want to have a look at disordered regions in this protein. Therefore, we used different methods. We used DISOPRED, POODLE in several variations, IUPred and Meta-Disorder. As before, with the the secondary structure prediction methods we want to compare the different methods and variants, if the predictions are similar. Therefore, we also want to decided which methods seems to be the best one for our purpose.

To get more insight into the methods and the theory behind them we also offer you an [general information page].

Back to [Tay-Sachs Disease]

Prediction of transmembrane helices and signal peptides

The third big analysis section is the prediction of transmembrane helices and signal peptides. We merged the prediction of transmembrane helices and signal peptides in one section, because there are several prediction methods which can predict both and therefore we looked at both predictions in this section.

Therefore we used several methods, some which only predict transmembrane helices, some which only predict signal peptides and some combined methods.

To have a closer look at the different methods we again provide an [information page.]

Back to [Tay-Sachs Disease]

Prediction of GO Terms

The last section is about the analysis of GO Terms. As before, we used several methods and compared them to each other.

Again we also provide an [general information page] about the GO Term methods, we used in our analysis.

Back to [Tay-Sachs Disease]

Secondary Structure prediction

Results

The detailed output of the different prediction methods can be found [here]

Here we only present a short summary of the output of the different methods.

Predicted Helices

method	#helices
PSIPRED	14
Jpred3	14
DSSP	16

Predicted Beta-Sheets

method	#sheets
PSIPRED	15
Jpred3	15
DSSP	0

Comparison of the different methods

To determine how successful our secondary structure prediction with PSIPRED and Jpred were, we had to compare it with the secondary structure assignment of DSSP. First of all, DSSP assigns no beta-sheets whereas both prediction methods predict some beta-sheets. Therefore, the main comparison in this case refers to the alpha-helices.

For PSIPRED the prediction of the alpha-helices was good. In most cases the alpha-helices of DSSP and PSIPRED correspond. There is only one helix which is predicted by PSIPRED which is not assigned as helix by DSSP. Furthermore there are three helices which are allocated as helices by DSSP which were not predicted by PSIPRED. The most of these helices which were presented only in one output are very small ones.

For Jpred3 the prediction of the alpha-helices was sufficiently good. In the most cases it agrees with DSSP. There are only two helices which are predicted by Jpred and which are not assigned by DSSP. In contrary, there are three small helices which are allocated to an alpha-helices by DSSP but are not predicted by Jpred. There is another special case where DSSP assigns two helices which are separated by a turn and Jpred predicts there only one big helix.

All in all, the prediction of the helices is probably good because they correspond mostly with the assignment of DSSP. The only negative aspect is, that both prediction methods predict a lot of sheets which were not assigned by DSSP at all.

Back to [Tay-Sachs Disease]

Prediction of disordered regions

Before we start with the analysis of the results of the different methods, we checked, if our protein has one or more disordered regions. Therefore, we search our protein in the [DisProt database] and did not find it, so our protein does not have any disordered regions. Another possibility to find out if the protein has disordered regions, is to check [UniProt], if there is an entry for [DisProt].

Results

The detailed results of the different methods can be found [here]

In this section, we only want to give a summary of the output of the different methods.

method	#disordered regions in the protein	#disordered regions on the brink
Disopred	0	2
POODLE-I	3	2
POODLE-L	0	0
POODLE-S (B-factors)	3	2
POODLE-S (missing residues)	4	2
IUPred (short)	0	2
IUPred (long)	0	0
IUPred (structural information)	0	0
Meta-Disorder	0	0

Comparison of the different POODLE variants

POODLE-L does not find any disordered regions. This is the result we expected, because our protein does not possess any disordered regions.

Both POODLE-S variants found several short disordered regions, which is a false positive result. Interestingly, there seems to be more missing electrons in the electron density map, than residues with high B-factor value.

POODLE-I found the same result as POODLE-S with high B-factor, which was expected, because POODLE-I combines POODLE-L and POODLE-S (high B-factor).

Therefore, the predictions of short disordered regions are wrong results. Only the prediction of POODLE-L is correct.

In general, these predictions are used, if nothing is known about the protein. Therefore, normally we do not know, that the prediction is wrong. Because of that, we want to trust the result and we want to check if the disordered regions overlap with the functionally important residues, because it seems that disordered regions are functionally very important. We check this for POODLE-S with missing residues and POODLE-I, because POODLE-S with high B-factor values shows the same result as POODLE-I.

functional residues			disordered
residue position	amino acid	function	POODLE-S (missing)	POODLE-I
323	E	active site	ordered	ordered
115	N	Glycolysation	ordered	ordered
157	N	Glycolysation	ordered	ordered
259	N	Glycolysation	ordered	ordered
58 (connected with 104)	C	Disulfide bond	disordered	ordered
104 (connected with 58)	C	Disulfide bond	disordered	ordered
277 (connected with 328)	C	Disulfide bond	ordered	ordered
328 (connected with 277)	C	Disulfide bond	ordered	ordered
505 (connected with 522)	C	Disulfide bond	ordered	ordered
522 (connected with 505)	C	Disulfide bond	ordered	ordered

As you can see in the table above, only one disulfide bond is located in a disordered region, all other functionally important residues are located in ordered regions. This is a further good hint, that the predictions are wrong.

Comparison of the different methods

We decided to compare the results of the different methods. Therefore, we count how many residues are predicted as disordered, which is wrong in our case.

	methods
	Disopred	POODLE-I	POODLE-L	POODLE-S (missing)	POODLE-S (B-factor)	IUPred (short)	IUPred (long)	IUPred (structure)	Meta-Disorder
#wrong predicted residues	5	23	0	47	24	3	0	0	0

POODLE-L, IUPred(long) and IUPred(structure) predict the disordered regions correct. The worst prediction result gave POODLE-S (B-factor) which predicts 47 residues as disordered, followed by POODLE-S (missing) (24 wrong predicted residues) and POODLE-I (23 wrong predicted residues).

Back to [Tay-Sachs Disease]

Prediction of transmembrane alpha-helices and signal peptides

Because most of the proteins we used in this practical are not membrane proteins, we got five additional proteins for the transmembrane and signal peptide analyses.

Additional proteins:

name	organism	location	transmembrane protein	sequence
BACR_HALSA	Halobacterium salinarium (Archaea)	Cell membrane	Multi-pass membrane protein	[P02945.fasta]
RET4_HUMAN	Human (Homo sapiens)	extracellular space	No	[P02753.fasta]
INSL5_HUMAN	Human (Homo sapiens)	extracellular region	No	[Q9Y5Q6.fasta]
LAMP1_HUMAN	Human (Homo sapiens)	Cell membrane	Single-pass membrane protein	[P11279.fasta]
A4_HUMAN	Human (Homo sapiens)	Cell membrane	Single-pass membrane protein	[P05067.fasta]

The detailed output for the different organism and the different prediction methods can be found here:

Results

Transmembrane Helices

	TMHMM			Phobius			PolyPhobius			OCTOPUS			SPOCTOPUS
protein	start position	end position	location	start position	end position	location	start position	end position	location	start position	end position	location	start position	end position	location
HEXA HUMAN	1	529	outside	23	529	outside	20	520	outside	1	2	inside	22	529	outside
										3	23	TM helix
										24	529	outside
BACR HALSA	1	22	outside							1	22	outside	1	22	outside
	23	42	TM Helix	23	42	TM helix	22	43	TM helix	23	43	TM helix	23	43	TM helix
	43	54	inside	43	53	inside	44	54	inside	44	54	inside	44	54	inside
	55	77	TM Helix	54	76	TM helix	55	77	TM helix	55	75	TM helix	55	75	TM helix
	78	91	outside	77	95	outside	78	94	outside	76	95	outside	76	95	outside
	92	114	TM Helix	96	114	TM helix	95	114	TM helix	96	116	TM helix	96	116	TM helix
	115	120	inside	115	120	inside	115	120	inside	117	121	inside	117	120	inside
	121	143	TM Helix	121	142	TM helix	121	141	TM helix	122	142	TM helix	121	141	TM helix
	144	147	outside	143	147	outside	142	147	outside	143	147	outside	142	147	outside
	148	170	TM Helix	148	169	TM helix	148	166	TM helix	148	168	TM helix	148	168	TM helix
	171	189	inside	170	189	inside	167	186	inside	169	185	inside	169	185	inside
	190	212	TM Helix	190	212	TM helix	187	205	TM helix	186	206	TM helix	186	206	TM helix
	213	262	outside	213	217	outside	206	215	outside	207	216	outside	207	216	outside
				218	237	TM helix	216	237	TM helix	217	237	TM helix	217	237	TM helix
				238	262	inside	238	262	inside	238	262	inside	238	262	inside
RET4 HUMAN										1	1	inside
										2	23	TM helix
	1	201	outside	19	201	outside	19	201	outside	24	201	outside	20	201	outside
INSL5 HUMAN										1	1	inside
										2	32	TM helix
	1	135	outside	23	135	outside	23	135	outside	33	135	outside	24	135	outside
LAMP1 HUMAN	1	10	inside							1	10	inside
	11	33	TM Helix							11	31	TM helix
	34	383	outside	29	381	outside	29	381	outside	32	383	outside	30	383	outside
	384	406	TM Helix	382	405	TM helix	382	405	TM helix	384	404	TM helix	384	404	TM helix
	407	417	inside	406	417	outside	406	417	outside	405	417	outside	405	417	outside
A4 HUMAN										1	5	outside
										6	11	R
	1	700	outside	18	700	outside	18	700	outside	12	701	outside	19	701	outside
	701	723	TM Helix	701	723	TM helix	701	723	TM helix	702	722	TM helix	702	722	TM helix
	724	770	inside	724	770	inside	724	770	inside	723	770	inside	723	770	inside

On the table above, you can see the summary of the results of the different methods which predict transmembrane helices.

Signal Peptide

	Phobius		PolyPhobius		SPOCTOPUS		TargetP	SignalP
protein	start position	end position	start position	end position	start position	end position	location	start position	end position
HEXA HUMAN	1	22	1	19	7	21	secretory pathway	1	22
BACR HALSA	no prediction available						secretory pathway	1	38
RET4 HUMAN	1	18	1	18	6	19	secretory pathway	1	18
INSL5 HUMAN	1	22	1	22	6	23	secretory pathway	1	22
LAMP1 HUMAN	1	28	1	28	12	29	secretory pathway	1	28
A4 HUMAN	1	17	1	17	5	18	secretory pathway	1	15

In the last table there is a list with the results of the prediction of the signal peptides created by different methods.

Comparison of the different methods

We decided to split the comparison of the methods, because it is unfair to directly compare a method which can not predict a signal peptide and a method which predicts signal peptides. Therefore, we split the comparison in one comparison for transmembrane helices, one for signal peptides and one for the combination of both.

Comparison of transmembrane helix prediction

Here we compared TMHMM, OCTOPUS and the transmembrane predictions of SPOCTOPUS, Phobius and PolyPhobius. In this comparison we skipped the first residues which are signal peptides, because all only-transmembrane prediction methods predicted these region as transmembrane helices, which is wrong.
For this comparison we counted the wrong predicted transmembrane residues, the wrong predicted outside located residues and the wrong predicted inside residues.


		methods
		TMHMM	Phobius	PolyPhobius	OCTOPUS	SPOCTOPUS	Transmembrane protein
HEXA_HUMAN	#wrong transmembrane	0	0	0	0	0	no
	#wrong outside	0	0	0	0	0
	#wrong insde	0	0	0	0	0
	#wrong sum	0	0	0	0	0
	%wrong predicted	0%	0%	0%	0%	0%
BACR_HALSA	#wrong transmembrane	24	20	12	16	11	yes (7 transmembrane helices)
	#wrong outside	46	5	3	4	6
	#wrong inside	4	4	2	0	0
	#wrong sum	74	29	17	20	17
	%wrong predicted	29%	11%	6%	8%	6%

RET4_HUMAN	#wrong transmembrane	0	0	0	5	0	no
	#wrong outside	0	0	0	0	0
	#wrong inside	0	0	0	0	0
	#wrong sum	0	0	0	5	0
	%wrong predicted	0%	0%	0%	2%	0%

INSL5_HUMAN	#wrong transmembrane	0	0	0	10	0	no
	#wrong outside	0	0	0	0	0
	#wrong inside	0	0	0	0	0
	#wrong sum	0	0	0	10	0
	%wrong predicted	0%	0%	0%	8%	0%

LAMP1_HUMAN	#wrong transmembrane	5	3	4	3	1	yes (single-spanning)
	#wrong outside	2	0	0	1	1
	#wrong inside	0	0	0	1	1
	#wrong sum	7	3	4	5	3
	%wrong predicted	2%	0%	1%	1%	0%

A4_HUMAN	#wrong transmembrane	0	0	0	0	0	yes (single-spanning)
	#wrong outside	1	1	1	1	2
	#wrong inside	0	0	0	1	1
	#wrong sum	1	1	1	2	3
	%wrong predicted	0%	0%	0%	0%	0%
Average number of wrong predicted residues
		13.6	5.5	3.6	7	3.8

TMHMM is the baddest prediction method. This can also be seen at the example of BACR_HALSA, because TMHMM is the only prediction method, which do not recognize the 7 transmembrane helices. SPOCTOPUS and PolyPhobius are the best prediction methods.

In general the prediction of transmembrane helices works quite good and almost all predictions are very close to the real protein.

Comparison of signal peptide prediction

Now we compared TargetP and SignalP which can only predict signal peptides. Furthermore we compared SPOCTOPUS, Phobius and PolyPhobius. TargetP does not predict the start and end position of the signal peptide, instead it predicts only the location of the protein.


		methods
		real position	Phobius	PolyPhobius	SPOCTOPUS	TargetP	SignalP
HEXA_HUMAN	stop position	22	22	19	21	no prediction	22
	#wrong residues		0	3	3	no prediction	0
	location	secretory pathway	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction
BACR_HALSA	stop position	not available	no prediction	no prediction	no prediction	no prediction	no consensus prediction
	#wrong predicted	not available	not available	not available	not available	no prediction	not available
	location	membrane	not available	not available	not available	secretory pathway	non-signal peptide

RET4_HUMAN	stop position	18	18	18	19	no prediction	18
	#wrong predicted		0	0	1	no prediction	0
	location	secretory pathway	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction

INSL5_HUMAN	stop position	22	22	22	22	no prediction	22
	#wrong residues		0	0	0	no prediction	0
	location	secretory pathway	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction

LAMP1_HUMAN	stop position	28	28	28	29	no prediction	28
	#wrong residues		0	0	1	no prediction	0
	location	transmembrane helix	secretory pathway	secretory pathway	no prediction	secretory pathway	no prediction

A4_HUMAN	stop position	17	17	17	18	no prediction	17
	#wrong residues		0	0	1	no prediction	0
	location	transmembrane helix	secretory pathway	secretory pathway	no prediction	secretory pathway	secretory pathway
Average number of wrong prediction
	sum of wrong predicted residues		0	3	2	no prediction	0
	#right predicted locations / #predicted locations		3/5	3/5	no prediction	3/5	no prediction

SPOCTOPUS and SignalP do not predict the location of the protein, they only predict the start and stop position of the signal peptide. Furthermore, SignalP predicts if it is a signal peptide or not. In contrast, TargetP only predicts the location of the protein, not the start and stop position of the signal peptide. Only Phobius and PolyPhobius predict both.
Therefore, it is difficult to compare the different methods. First of all, Phobius and PolyPhobius have more power than the other prediction methods, because they predict both. In average they predict the location and also the position as good as the other prediction methods. None of the methods could predict the transmembrane proteins, all methods predict them as proteins of the secretory pathway. Therefore, it is useful to use Phobius or PolyPhobius, because they predict more than the other methods. Furthermore, both methods can also predict transmembrane helices. The results of Phobius were a litte bit better than the results of PolyPhobius.
We also wanted to mention, that SignalP gave you the possibility to choose between the prediction for eukaryotes, gram-positive bacteria and gram-negative bacteria. In our analyse we also analysied BACR_HALSA, which is an archaea protein. We tested all three prediction methods for this protein and all three methods failed. BACR_HALSA don't posses a signal peptide, but every method predicts one. Only the eukaryotic prediction method recogniced a signal anchor for BACR_HALSA, whereas the other two methods could not give a prediction of the location.

Comparison of the combined methods

The last thing, which we wanted to compare, was the combined methods. SPOCTOPUS, Phobius and PolyPhobius can predict transmembrane helices as well as signal peptides. Therefore we combined our two further comparisons.


		methods
		Phobius	PolyPhobius	SPOCTOPUS
HEXA_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	0	3	2
	location	right	right	no prediction
BACR_HALSA	#wrong predicted residues (TM)	29	17	17
	#wrong predicted residues (SP)	n.a.	n.a.	n.a.
	location	n.a	n.a	no prediction

RET4_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	0	0	0
	location	right	right	no prediction

INSL5_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	0	0	1
	location	right	right	no prediction

LAMP1_HUMAN	#wrong predicted residues (TM)	3	4	3
	#wrong predicted residues (SP)	0	0	0
	location	wrong	wrong	no prediction

A4_HUMAN	#wrong predicted residues (TM)	0	0	0
	#wrong predicted residues (SP)	1	1	3
	location	wrong	wrong	no prediction
Average
	avg(#wrong predicted residues (TM))	5.3	3.5	3.3
	avg(#wrong predicted residues (SP))	0.1	0.6	1
	#location (right predicted) / #location(predicted)	3/5	3/5	no prediction

In general, PolyPhobius gave the best results. Although it predicts the singal peptide stop position a little bit badder than Phobius, the transmembrane prediction is significant bettern than by Phobius. The predictions of SPOCTOPUS are also good, but sadly SPOCTOPUS does not predict the location of the protein.
Therefore, it seems a good choice to use PolyPhobius, which is in average the best method for transmembrane and signal peptide prediction.

Prediction of GO terms

Before we start with out analysis, we decided to check the GO annotations for the six sequences, which can be found [here]:

A detailed list of the GO annotation terms of each protein can be found [here].

Results

We created for each protein an own result page. Sadly, it is not possible to summarize the results in a short way, so please have a look at the different result pages for a detailed output.

Comparison of the different methods

It is difficult to compare these methods. First of all, two methods are based on homology-based prediction, whereas ProtFun is based on ab initio prediction. So it is clear, that the results differ. Second, each method has another prediction focus and called the results a little bit different. Only GOPET predicts exact GO numbers, the other two methods only predict the approximate functions and processes.
Therefore, to compare the results, we decided to calculate the fraction of right prediction and the ratio between right predictions and annotated GO terms.

		methods
		GOPET terms	GOPET GOids	Pfam	ProtFun
HEXA_HUMAN	#true positive	7	7	2	31
	#false negative	1	1	0	3
	#predictions	8	8	2	34
	#GO terms	25
	true positive (in %)	0.87	0.87	1	0.91
	ratio true positive/annotated GO terms	0.28	0.28	0.08	not possible
BACR_HALSA	#true positive	2	1	1	30
	#false negative	1	2	0	4
	#predictions	3	3	1	34
	#GO terms	12
	true positive (in %)	0.66	0.33	1	0.88
	ratio true positive/annotated GO terms	0.16	0.08	0.08	not possible
RET4_HUMAN	#true positive	5	5	1	30
	#false negative	3	3	0	4
	#predictions	8	8	1	34
	#GO terms	41
	true positive (in %)	0.62	0.62	1	0.88
	ratio true positive/annotated GO terms	0.12	0.12	0.02	not possible
INSL5_HUMAN	#true positive	1	1	1	32
	#false negative	0	0	0	2
	#predictions	1	1	1	34
	#GO terms	4
	true positive (in %)	1	1	1	0.94
	ratio true positive/annotated GO terms	0.25	0.25	0.25	not possible
LAMP1_HUMAN	#true positive	0	0	1	33
	#false negative	2	2	0	1
	#predictions	2	2	1	34
	#GO terms	17
	true positive (in %)	0	0	1	0.97
	ratio true positive/annotated GO terms	0	0	0.05	not possible
A4_HUMAN	#true positive	7	7	6	33
	#false negative	6	6	0	1
	#predictions	13	13	6	34
	#GO terms	78
	true positive (in %)	0.53	0.53	1	0.97
	ratio true positive/annotated GO terms	0.08	0.08	0.07	not possible

As you can see in the tabel above, each method only predict a small subgroup of the real annotated GO terms. In general, GOPET seems to be the best method, because GOPET is the onyl method which predicts the GO Terms and in sum, it has mostly the best ratio by prediction true positive and it also predicts more GO terms than the other methods.
It was not possible to calculate the ratio between true positives and annotated GO terms for ProtFun, because this method has defined terms and only predicts the probability, that the protein belongs to these terms.
In general, you can say GO term prediction does not work very well and the prediction results only give hints of the function and localization of the protein.

@@ Line 335: / Line 335: @@
 |protein
 |start position
-|end position
+||end position
 |location
 |start position

Difference between revisions of "Sequence-based predictions HEXA"

Revision as of 21:08, 30 August 2011

Contents

General Information

Secondary Structure Prediction

Prediction of disordered regions

Prediction of transmembrane helices and signal peptides

Prediction of GO Terms

Secondary Structure prediction

Results

Comparison of the different methods

Prediction of disordered regions

Results

Comparison of the different POODLE variants

Comparison of the different methods

Prediction of transmembrane alpha-helices and signal peptides

Results

Transmembrane Helices

Signal Peptide

Comparison of the different methods

Prediction of GO terms

Results

Comparison of the different methods

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools