Latest revision as of 14:04, 14 June 2011

^{by Benjamin Drexler and Fabian Grandke}

Find additional information and graphics here.

Proteins

Protein	GLA	BACR_HALSA	RET4_HUMAN	INSL5_HUMAN	LAMP1_HUMAN	A4_HUMAN
Organism	Homo sapiens	Halobacterium salinarium	Homo sapiens	Homo sapiens	Homo sapiens	Homo sapiens
Size	429 AA	262 AA	201 AA	135 AA	417 AA	770 AA
Subcellular location	Lysosome	Cell membrane	Secreted	Secreted	Cell membrane	Membrane
Function	Glycosidase	Photoreceptor protein	Sensory transduction	Hormone	Carbohydrate presentation	Serine protease inhibitor
Transmembrane	No	7 transmembrane regions	No	No	1 transmembrane region	1 transmembrane region
UniProt entry	P06280	P02945	P02753	Q9Y5Q6	P11279	P05067

Secondary structure prediction

PSIPRED

http://bioinf.cs.ucl.ac.uk/psipred/

PSIPRED was developed by David T. Jones at the University of Warwick in 1998. Nowadays the server runs at the University College London. <ref name=PSIPRED>History of PSIPRED</ref>

PSIPRED predicts secondary structures based on neuronal networks with a single hidden layer and feed-forward back-propagation.<ref name=Praesi>Talk_Task3</ref> The workflow can be split into three states:

Sequence profiles generation: Neuronal network gets position-specific matrix from PSI-Blast as input
Initial secondary structure prediction: Output Layer predicts one of the three secondary structures
Predicted structure filtering: Additional network filters the raw predictions from the previous step

PSIPRED takes an amino acid sequence as input. The output is a the predicted secondary structure as shown in Figure 1.

Prediction

Figure 1:PSIPRED result for GLA

Figure 1 shows 10 alpha helices, 16 beta strands and 27 coils.

Jpred3

http://www.compbio.dundee.ac.uk/www-jpred/index.html

Jpred3 was developed by C. Cole at the University of Dundee. Similar to PSIPRED a neuronal network is used for the prediction of the secondary structure. For single sequences as input the program uses PSI-Blast sequence profiles, as well. Jpred3 is also capable of taking multiple sequence alignments as input. Both are further processed using the Jpred algorithm. <ref name=jpred>Cole et al., " The Jpred 3 secondary structure prediction server.", Nucleic acids research. 2008 Jul 1, PubMed</ref>

The table below shows the PDB entries found by Jpred3 concerning our input sequence:

EBI	Chain	Describtion	E-value
3hg5	B	Alpha-galactosidase A	0.0
3hg5	A	Alpha-galactosidase A	0.0
3hg4	B	Alpha-galactosidase A	0.0
3hg4	A	Alpha-galactosidase A	0.0
3hg2	B	Alpha-galactosidase A	0.0
3hg2	A	Alpha-galactosidase A	0.0
3gxt	B	Alpha-galactosidase A	0.0
3gxt	A	Alpha-galactosidase A	0.0
3gxp	B	Alpha-galactosidase A	0.0
3gxp	A	Alpha-galactosidase A	0.0
3gxn	B	Alpha-galactosidase A	0.0
3gxn	A	Alpha-galactosidase A	0.0
1r47	B	Alpha-galactosidase A	0.0
1r47	A	Alpha-galactosidase A	0.0
1r46	B	Alpha-galactosidase A	0.0
1r46	A	Alpha-galactosidase A	0.0
3hg3	B	Alpha-galactosidase A	0.0
3hg3	A	Alpha-galactosidase A	0.0
3lxc	B	Alpha-galactosidase A	0.0
3lxc	A	Alpha-galactosidase A	0.0
3lxb	B	Alpha-galactosidase A	0.0
3lxb	A	Alpha-galactosidase A	0.0
3lxa	B	Alpha-galactosidase A	0.0
3lxa	A	Alpha-galactosidase A	0.0
3lx9	B	Alpha-galactosidase A	0.0
3lx9	A	Alpha-galactosidase A	0.0
1ktc	A	alpha-N-acetylgalactosaminidase	e-113
1ktb	A	alpha-N-acetylgalactosaminidase	e-113
3igu	B	Alpha-N-acetylgalactosaminidase	e-100
3igu	A	Alpha-N-acetylgalactosaminidase	e-100
3h55	B	Alpha-N-acetylgalactosaminidase	e-100
3h55	A	Alpha-N-acetylgalactosaminidase	e-100
3h54	B	Alpha-N-acetylgalactosaminidase	e-100
3h54	A	Alpha-N-acetylgalactosaminidase	e-100
3h53	B	Alpha-N-acetylgalactosaminidase	e-100
3h53	A	Alpha-N-acetylgalactosaminidase	e-100

The lightblue colored protein is the protein that was used as query sequence.

We ignored those results and forced Jpred to make a prediction anyway. The programs output was a prediction of the secondary structure among others.

Comparison with DSSP

http://swift.cmbi.ru.nl/servers/html/

DSSP was developed by W.Kabsch and C. Sander in 1983. It is a database, containing secondary structure assignments for each protein in PDB. As it is no prediction program itself, it is used to compare the results of a prediction with the data in DSSP. Therefor it uses the 3D coordinates from PDB entries and used them to calculate DSSP entries.<ref name="dssp">Kabsch et al., "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.", Biopolymers, 1983,PubMed</ref>

Figure 2 is a color labeled picture of the prediction by DSSP. The letter code used by DSSP has been translated to the 3-letter code of H = helix, E = strand and C = coil.

Figure 2: Result of DSSP

Find a pdf version of Figure 2 here: File:GLA DSSP Comp.pdf

Results

Structural Element	PSIPRED	Jpred3	DSSP
Helices	10	10	12
Strands	16	15	16
Coils	27	26	29

The results of Jpred3 and PSIPRED are very similar to the reference prediction by DSSP. Although they are not entirely equal, the predictions are very close and thus can be labeled successful.

Prediction of disordered regions

As GLA was not found in the DisProt database we tried to predict disordered regions with several tools.

DISOPRED

An online version of DISOPRED is available at http://bioinf.cs.ucl.ac.uk/disopred/, but we run it locally, as well. Therefor we first had to adapt some paths and then we run it using the command sudo ./runpsipred.

DISOPRED was developed by JJ. Ward, JS. Sodhi, LJ. McGuffin, BF. Buxton and DT. Jones in 2004. A neuronal network is used to predict disordered regions. As it is a knowledge-based method, the DISOPRED neuronal network is trained with X-ray structures from PDB. The program takes a sequence as input and runs PSI-Blast against a database. The trained neuronal network predicts residuewise profiles and classifies them as disordered or not disordered. <ref name=disopred>Ward et al., "Prediction and functional analysis of native disorder in proteins from the three kingdoms of life", Journal of Molecular Biology. 2004, PubMed</ref> <ref name=Praesi>Talk_Task3</ref>

Figure 3: Prediction of disordered regions by DISOPRED

Figure 3 shows that DISOPRED predicts two disordered regions. One is at the very beginning of the protein sequence the other one is at the end. The lightgrey dotted line shows, that another region at the position ~35 was predicted, as well, but as DISOPRED filters the results, the peak is smoothed out.

POODLE

http://mbs.cbrc.jp/poodle/poodle.html

POODLE was developed by S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi in 2007. It predicts disordered regions by usage of a machine-learning approach.

There are four different variants available.:<ref name=poodlehp>POODLE Help Page</ref>

POODLE-L<ref name=poodle_l>Hirose et al., " POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions.", PharmaDesign. 2007, PubMed</ref>: Predicts long disordered regions(>40 consecutive residues).

POODLE-I<ref name=poodle_i>POODLE-I</ref>: Uses structural information predictors based on a work-flow approach.

POODLE-S<ref name=poodle_s>Shimizu et al., "POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.", Bioinformatics. 2007, PubMed</ref>: Predicts short disordered regions. Has two subversions differing in the preparation of the databases:
- Missing residues: Missing regions in X-ray structures.
- High B-Factor residues: Regions with high B-Factors.

POODLE-W<ref name=poodle_w>Shimizu et al., "Predicting mostly disordered proteins by using structure-unknown protein data

.", Bioinformatics. 2007, PubMed</ref>: Specialized on mostly disordered proteins.

For the analysis only POODLE-S was used, because not even short disordered regions were found in our protein, so long regions are even more unlikely. All variants only need an amino acid sequence as input and provide as well a picture as a textual output of the predicted regions.

POODLE-S: Missing residues

Figure 4: Prediction of disordered regions by POODLE-S, with "Missing Residues" parameter

Figure 4 shows two disordered regions at both ends of the amino acid sequence, predicted by POODLE-S using the "Missing Residues" option. Additionally there are two peaks, at position ~35 and ~400 that are near to the disordered cutoff value.

POODLE-S: High B-Factor residues

Figure 5: Prediction of disordered regions by POODLE-S, with "High B-Factor Residues" parameter

Figure 5 shows the prediction by POODLE-S using the "High B-Factor residues" option. Only one disordered region is predicted. It is at the very beginning of the protein sequence. There are manly small peaks in the curve, so the result is very noisy.

IUPRED

http://iupred.enzim.hu/index.html

IUPRED was developed by Zsuzsanna Dosztányi, Veronika Csizmók, Péter Tompa and István Simon in 2005. It is a prediction method for ordered and disordered regions in protein sequences. IUPRED is based on an energy function and the assumption that there are less interresidue interactions in disordered regions. <ref name=poodle_l>Dosztányi et al., "IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.", Bioinformatics. 2005, PubMed</ref> The input of IUPRED is just an amino acid sequence. The output is a file containing predicted values for each position in the sequence. We used R-scripts to create the Figures.

Short Disorder

Figure 6: Prediction of disordered regions by IUPRED with "Short Disorder" paramter

Figure 6 shows three disordered regions, predicted by IUPRED using the "Short Disorder" option. The first one is at the beginning of the sequence, the next one at position ~107 and the last one at the end of the sequence. Additionally, there are two significant peaks at positions ~275 and ~320, but the predicted values are smaller than the cutoff, so they are not predicted as disordered regions.

Long Disorder

Figure 7: Prediction of disordered regions by IUPRED with "Long Disorder" parameter

Figure 7 shows the noisy prediction of IUPRED in the "Long Disorder" mode. It predicts only one disordered region at position ~107.

META-Disorder

http://www.predictprotein.org/

Hint: You will have to register. It is free of charge, but you can submit max. 3 sequences within the next 12 months!

The program takes an amino acid sequence as input. The resulting file consists of predictions of the programs described below. We used an R-script to create the figure. Metadisorder was developed by Avner Schlessinger and Burkhard Rost in 2005 at the columbia university. It combines different methods and uses various sources of information to predict disordered regions. Metadisorder makes use of the methods described below <ref name=MD>Schlessinger et al., "Improved Disorder Prediction by Combination of Orthogonal Approaches.", PLoS ONE. 2009, PLoS ONE</ref> <ref name=rostlab_MD>Rostlab - Metadisorder</ref>:

PROFbval

PROFbval is a residue mobility prediction method based on the amino-acid sequence. <ref name=rostlab_prof>Rostlab - PROFbval</ref>

NORSnet

NORSnet is a method that identifies unstructured loops, based on neuronal networks. <ref name=rostlab_nors>Rostlab - NORSnet</ref>

UCON

UCON predicts natively unstructured regions through contacts. <ref name=rostlab_ucon>Rostlab - UCON</ref>

Figure 8 shows the results by META-Disorder(black line). Only one disordered region is predicted at the very beginning of the sequence.

Figure 8: Prediction of disordered regions by META-Disorder

Evaluation of the results

Conlusion of the results of the predictions of disordered regions:

Position	Disopred	POODLE-S(MR)	POODLE-S(HBF)	IUPRED(SHORT)	IUPRED(LONG)	META-Disorder
~3
~107
~429

The disordered region at the beginning of the sequence is predicted by all but one methods. The one at the end is even predicted by 50% of the used programs. Unfortunately the prediction of disordered regions at the periphery of protein sequences is often positive, but wrong because of the character of electron density maps. The disordered region at position ~107 is only predicted by the two IUPRED methods and can not be confirmed by other programs. Thus a true disordered region at that position is very unlikely, as well. Summarizing, there is no disordered region in the protein.

Prediction of transmembrane alpha-helices and signal peptides

Programs

All programs but TMHMM were run on web servers and only take an amino acid sequence as input.

TMHMM

TMHMM is a program that predicts transmembrane helices in proteins. It is based on a hidden Markov model and was developed by Sonnhammer et. al in 1998<ref name=sonnhammer>Sonnhammer et al., "A hidden Markov model for predicting transmembrane helices in protein sequences.", Proc Int Conf Intell Syst Mol Biol. 1998, PubMed</ref>.

We used the webserver of TMHMM with the FASTA-sequence of the protein as input. The output contains some statistics (e.g. number of TM helices, expected number of AAs in TM helices, probability that the N-term is on the cytoplasmic membrane), a listening of the labeled sequence areas (e.g. inside, outside, TM helix) and a plot of the probabilities for the residues. Additionally, if it is predicted that there is a great number of the first 60 AA are part of a TM helix, TMHMM will indicate that there could be a signal peptide in the N-term region.

Phobius/Polyphobius

Phobius was developed by Käll, Krogh and Sonnhammer in 2004 at the Stockholm Bioinformatics Center. It is based on HMMs and predicts transmembrane helices and signal-peptides at the N-terminal. Phobius takes a protein sequence(FASTA format) as input an outputs diagrams and text files containing the predictions. <ref name=phobius>Käll et al., "A combined transmembrane topology and signal peptide prediction method.", J Mol Biol. 2004, PubMed</ref>

Polyphobius was developed by L. Käll, A. Krogh, EL. Sonnhammer in 2005 at the Stockholm Bioinformatics Center. It is very similar to Phobius and uses HMMs, as well. Additionally Polyphobius uses information from homologous sequences to improve the prediction accuracy. <ref name=polyphobius>Käll et al., "An HMM posterior decoder for sequence feature prediction that includes homology information.", Bioinformatics. 2005, PubMed</ref>

We used the web server for both versions of the program.

Octopus/Spoctopus

Octopus was developed by H. Viklund and A. Elofsson in 2008 at the Stockholm Bioinformatics Center. It combines HMMs and artificial neuronal networks(ANN) to predict the topology of transmembrane proteins. At first several ANNs are used to make predictions for every single residue and afterwards HMMs are used to smooth the results and combine them to a useful prediction.<ref name=octopus>Viklund et al., "OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar.", Bioinformatics. 2008, PubMed</ref>

Spoctopus was developed by H. Viklund, A. Bernsel, M. Skwark and A. Elofsson in 2008 at the Stockholm Bioinformatics Center. It is almost identical to Octopus, but additionally predicts the signal peptide.<ref name=spoctopus>Viklund et al., "SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology.", Bioinformatics. 2008, PubMed</ref>

We used the web server for both versions of the program.

TargetP

TargetP was developed by H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne in 1997 at the Stockholm Bioinformatics Center. It uses neuronal networks to predict the subcellular location of eukaryotic proteins. The method is based on the predicted presence of any of the N-terminal presequences (i.e.:chloroplast transit peptide, mitochondrial targeting peptide or secretory pathway signal peptide).<ref name=targetp>Emanuellson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.", J Mol Biol. 2000, PubMed</ref>

We used the web server of the program.

SignalP

SignalP was developed by H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne in 1997 at the Stockholm Bioinformatics Center. It predicts presence of a signal peptide and provides information about hte exact location of signal peptide cleavage sites. The most recent version of the method uses as well AMMs as HMMs. <ref name=signalp>Nielsen et al., "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.", Protein engineering. 1997, PubMed</ref>

We used the web server of the program.

Proteins

In the following sections we present the results of the programs for the proteins, i.e. the prediction of transmembrane alpha-helices and signal peptides. The sequence annotation of the UniProt entries (see protein overview) were used as the reference. For the colouring of the tables, we used the following scheme:

green (completely correct): the predicted positions are allowed to differ one position per index, e.g. a prediction of the positions 19 - 31 will be marked green, if the reference is 20 - 30.
- exception: TMHMM per se is not able to predict signal peptides and hence it will include the N-terminal of the protein in the first region almost always even though it is a signal peptide. Therefore we also declare the prediction of the first region correct, when it starts at the first residue and the end residue of the first region is correct.
yellow (partial correct): a vast majority of the residues is assigned to the correct region.
red (wrong): the prediction is completely wrong, e.g. a region is predicted which does not exist.

General evaluation

TMHMM does not predict signal peptides and therefore it often includes the beginning of the protein into the first region by mistake. It also made an error by missing a transmembrane alpha-helix and Phobius/Polyphobius often had more accurate predictions.

Octopus has a similar problem to TMHMM, since it also does not predict signal peptides and hence it predicts a transmembrane alpha-helix for the signal peptide. On the other hand, Spoctopus predicts the transmembrane regions very good, but is always missing the beginning of the signal peptide.

Phobius and Polyphobius made very good predictions in respect to transmembrane alpha-helices and the predictions of the signal peptides were completely correct. Since it delivers the whole package (i.e. transmembrane alpha-helix prediction, signal peptide prediction, highst accurarcy), we prefer to work with these programs.

GLA

The graphical output of the programs is provided on this page.

Transmembrane alpha-helices

Region	Reference	TMHMM	Phobius	Polyphobius	Octopus	Spoctopus
Cytoplasmic	-	-	-	-	1 - 9	-
Transmembrane	-	-	-	-	10 - 30	-
Non-cytoplasmic	32 - 429	1 - 429	32 - 429	32 - 429	31 - 429	32 - 429

No program predicted a transmembrane alpha-helix except for Octopus which predicts the signal peptide as a cytoplasmic and transmembrane region. We also observe this problem of Octopus for all of the other five proteins.

Signal peptides

Region	Reference	Phobius	Polyphobius	Spoctopus	SignalP
Signal peptide	1 - 31	1 - 31	1 - 31	11 - 31	1 - 31
N-Region	-	1 - 9	1 - 12	-	-
H-Region	-	10 - 22	13 - 26	-	-
C-Region	-	23 - 31	27 - 31	-	-

All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.

BACR_HALSA

The graphical output of the programs is provided on this page.

Transmembrane alpha-helices

Region	Reference	TMHMM	Phobius	Polyphobius	Octopus	Spoctopus
Non-cytoplasmic	14 - 23	1 - 22	1 - 22	1 - 21	1 - 22	1 - 22
Transmembrane	24 - 42	23 - 42	23 - 42	22 - 43	23 - 43	23 - 43
Cytoplasmic	43 - 56	43 - 54	43 - 53	44 - 54	44 - 54	44 - 54
Transmembrane	57 - 75	55 - 77	54 - 76	55 - 77	55 - 75	55 - 75
Non-cytoplasmic	76 - 91	78 - 91	77 - 95	78 - 94	76 - 95	76 - 95
Transmembrane	92 - 109	92 - 114	96 - 114	95 - 114	96 - 116	96 - 116
Cytoplasmic	110 - 120	115 - 120	115 - 120	115 - 120	117 - 121	117 - 120
Transmembrane	121 - 140	121 - 143	121 - 142	121 - 141	122 - 142	121 - 141
Non-cytoplasmic	141 - 147	144 - 147	143 - 147	142 - 147	143 - 147	142 - 147
Transmembrane	148 - 167	148 - 170	148 - 169	148 - 166	148 - 168	148 - 168
Cytoplasmic	168 - 185	171 - 189	170 - 189	167 - 186	169 - 185	169 - 185
Transmembrane	186 - 204	190 - 212	190 - 212	187 - 205	186 - 206	186 - 206
Non-cytoplasmic	205 - 216	213 - 262	213 - 217	206 - 215	207 - 216	207 - 216
Transmembrane	217 - 236	-	218 - 237	216 - 237	217 - 237	217 - 237
Cytoplasmic	237 - 262	-	238 - 262	238 - 262	238 - 262	238 - 262

All programs predict the correct number of transmembrane alpha-helices apart from TMHMM which misses the last one. Polyphobius has the best result since it has the highest number of completely correct predicted regions.

Signal peptides

No program predicted a signal peptide which is correct.

RET4_HUMAN

The graphical output of the programs is provided on this page.

Transmembrane alpha-helices

Region	Reference	TMHMM	Phobius	Polyphobius	Octopus	Spoctopus
Cytoplasmic	-	-	-	-	1 - 1	-
Transmembrane	-	-	-	-	2 - 23	-
Non-cytoplasmic	19 - 201	1 - 201	19 - 201	19 - 201	24 - 201	20 - 201

All programs predict correctly that this protein does not have a transmembrane alpha-helix except for Octopus.

Signal peptides

Region	Reference	Phobius	Polyphobius	Spoctopus	SignalP
Signal peptide	1 - 18	1 - 18	1 - 18	6 - 19	1 - 18
N-Region	-	1 - 2	1 - 3	-	-
H-Region	-	3 - 13	4 - 13	-	-
C-Region	-	14 - 18	14 - 18	-	-

All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.

INSL5_HUMAN

The graphical output of the programs is provided on this page.

Transmembrane alpha-helices

Region	Reference	TMHMM	Phobius	Polyphobius	Octopus	Spoctopus
Cytoplasmic	-	-	-	-	1 - 1	-
Transmembrane	-	-	-	-	2 - 32	-
Non-cytoplasmic	23 - 135	1 - 135	23 - 135	23 - 135	33 - 135	24 - 135

All programs predict correctly that this protein does not have a transmembrane alpha-helix except for Octopus.

Signal peptides

Region	Reference	Phobius	Polyphobius	Spoctopus	SignalP
Signal peptide	1 - 22	1 - 22	1 - 22	6 - 23	1 - 22
N-Region	-	1 - 5	1 - 4	-	-
H-Region	-	6 - 17	5 - 16	-	-
C-Region	-	18 - 22	17 - 22	-	-

All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.

LAMP1_HUMAN

The graphical output of the programs is provided on this page.

Transmembrane alpha-helices

Region	Reference	TMHMM	Phobius	Polyphobius	Octopus	Spoctopus
Cytoplasmic	-	1 - 10	-	-	1 - 10	-
Transmembrane	-	11 - 33	-	-	11 - 31	-
Non-cytoplasmic	29 - 382	34 - 383	29 - 381	29 - 381	32 - 383	30 - 383
Transmembrane	383 - 405	384 - 406	382 - 405	382 - 405	384 - 404	384 - 404
Cytoplasmic	406 - 417	407 - 417	406 - 417	406 - 417	405 - 417	405 - 417

Phobius, Polyphobius and Spoctopus predict the one transmembrane alpha-helix correctly. Octopus has the common error that it assigns a cytoplasmic and transmembrane region to the signal peptide of the protein. TMHMM also predicts a transmembrane region for the signal peptide. The developers of TMHMM indicate this problem on the instruction page of TMHMM.

Signal peptides

Region	Reference	Phobius	Polyphobius	Spoctopus	SignalP
Signal peptide	1 - 28	1 - 28	1 - 28	12 - 29	1 - 28
N-Region	-	1 - 10	1 - 9	-	-
H-Region	-	11 - 22	10 - 22	-	-
C-Region	-	23 - 28	23 - 28	-	-

All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.

A4_HUMAN

The graphical output of the programs is provided on this page.

Transmembrane alpha-helices

Region	Reference	TMHMM	Phobius	Polyphobius	Octopus	Spoctopus
Cytoplasmic	-	-	-	-	1 - 5	-
Transmembrane	-	-	-	-	6 - 11	-
Non-cytoplasmic	18 - 699	1 - 700	18 - 700	18 - 700	12 - 701	19 - 701
Transmembrane	700 - 723	701 - 723	701 - 723	701 - 723	702 - 722	702 - 722
Cytoplasmic	724 - 770	724 - 770	724 - 770	724 - 770	723 - 770	723 - 770

Every program predicts the one transmembrane alpha-helix of the protein except for Octopus.

Signal peptides

Region	Reference	Phobius	Polyphobius	Spoctopus	SignalP
Signal peptide	1 - 17	1 - 17	1 - 17	5 - 18	1 - 17
N-Region	-	1 - 1	1 - 3	-	-
H-Region	-	2 - 12	4 - 12	-	-
C-Region	-	13 - 17	13 - 17	-	-

All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.

TargetP

The result TargetP is shown in the following table. A explanation of the output is given on this page.

Name	Length	mTP	SP	other	Loc	RC
GLA	429	0.041	0.860	0.141	S	2
BACR_HALSA	262	0.019	0.897	0.562	S	4
RET4_HUMAN	201	0.242	0.928	0.020	S	2
INSL5_HUMA	135	0.074	0.899	0.037	S	1
LAMP1_HUMA	417	0.043	0.953	0.017	S	1
A4_HUMAN	770	0.035	0.937	0.084	S	1

TargetP predicts for all six proteins a signal peptide and therefore assigns the protein to the secretory pathway. This prediction is correct for every protein.

Prediction of GO terms

GOPET

GOPET stands for Gene Ontology term Prediction and Evaluation Tool and was developed by Vinayagam et al. in 2006<ref name=vinayagam>Vinayagam et al., "GOPET: a tool for automated predictions of Gene Ontology terms.", BMC Bioinformatics. 2006 Mar 20, PubMed</ref>. It is based on homology searches on GO-mapped protein databases and uses support vector machines for the calculation of the confidence values.

We used the webserver of GOPET with the default settings (GO aspect: molecular function, maximum number of predictions: 20, confidence threshold: 60, GOPET model 2007 june, version 2.0, GOPET database 2007) and the FASTA-sequence of the protein as input. The results only contain GOids of the GO aspect "molecular function", since the other two GO aspects (cellular component and biological process) were not available.

Pfam

Pfam is a database composed of the protein domain families that is created by using Hidden Markov Models profiles (HMMs) and was first described by Sonnhammer et al. in 1997<ref name=sonnhammer>Sonnhammer et al., "Pfam: a comprehensive database of protein domain families based on seed alignments.", Proteins. 1997 Jul, PubMed</ref>. Each protein domain family is represented by a multiple sequence alignment and a HMMs. One can search one protein sequence against Pfam and obtain all the possible domains that the query sequence might contain.

Pfam database includes two parts A and B where the protein domain families with different quality levels. In the 1.0 release of Pfam, the protein entries in Pfam-A and Pfam-B were from Swissprot (a few initial members of seed alignment in Pfam-A were from several sources: Swissprot, Prosite, ProDom etc.). In the current release of Pfam, the entries in Pfam-A and Pfam-B are from Pfamseq(UniProtKB) and ADDA respectively.

The Pfam-A contains the well characterized entries with annotation. It starts with the building of the seed alignment with a few selected representative sequence members under manually quality checking. Then the HMMs is applied automatically to make full alignment and try to detect all the possible members for each initial family. The families/domains in Pfam-A are in high quality level and could be used as a reliable annotation/classification evidence for the query sequence.

The Pfam-B is created based on the sequence alignment of the entries from ADDA by using HMMs. Those entries existing already in Pfam-A are excluded. There are no confirmed annotation and no manual quality checking for the families in Pfam-B, therefore there could be some errors (e.g. the members in one family could be just randomly aligned) and the overall quality is relative low. However, it still can be useful for the situation that one can not find domain evidence in Pfam-A for the query sequence.

We used the "sequence search" feature of Pfam website with the FASTA-sequence of the protein to determine potential domains or domain families. Afterwards we checked out the corresponding page of the domain (family) for a GO annotation. The search was performed with the default settings (cut-off: use E-Value, threshold 1.0), but we also included Pfam-B in the search. Only one hit in Pfam-B was found which does not have any GO annotation and hence there was no gain in including Pfam-B. The classification in respect to the significance of a hit was done by the Pfam search algorithm.

ProtFun

ProtFun tries to assign a function to the query protein. For this purpose, it uses the prediction of several other features like post-translational modification sites or localization of the protein. The prediction of these features itself is based on other programs like SignalP, TargetP, NetOGlyc, TMHMM and some others. ProtFun was developed by Jensen et al. in 2002<ref name=jensen_1>Jensen et al., "Prediction of human protein function from post-translational modifications and localization features.", J Mol Biol. 2002 Jun 21, PubMed</ref> and the prediction of the Gene Ontology category was added in 2003<ref name=jensen_2>Jensen et al., "Prediction of human protein function according to Gene Ontology categories.", Bioinformatics. 2003 Mar 22, PubMed</ref>.

We used the webserver of ProtFun 2.2 with the default settings and the FASTA-sequence of the protein as the input. The output contains predictions about the functional category, enzyme/nonenzyme, enzyme class and the Gene Ontology category. In our case, only the result of the latter was relevant. The term 'Prob' represents the calculated probability by ProtFun that the query belongs to the category. This probability is dependent on the prior probability of the category. 'Odds' describes the odds that the query belongs to the certain category and is not influenced by the prior probability.<ref name=ProtFun>Explanation of the ProtFun 2.2 output.</ref> The class with the highest information content and with the highest probability is marked bold. Additionally we provide a table for each query that contains the categories with the highest information content or probability, respectively, and their associated GO id. For this purpose, we used the search feature of the Gene Ontology website.

Proteins

The results of the prorgams are listed for the five proteins in the following sections. If a prediction is correct, it will be marked with a green background. For this purpose, we used the GO annotation of the corresponding protein at the EBI website (see list of annotated GO ids).

GLA

GOPET

GOid	Confidence	GO term
GO:0016798	98%	hydrolase activity acting on glycosyl bonds
GO:0004553	98%	hydrolase activity hydrolyzing O-glycosyl compounds
GO:0016787	97%	hydrolase activity
GO:0004557	96%	alpha-galactosidase activit
GO:0008456	89%	alpha-N-acetylgalactosaminidase activity

Pfam

Source	Description	Entry type	Significant	GO aspect	GO description	GO id
Pfam-A	Melibiase	Family	x	Molecular function	hydrolase activity, hydrolyzing O-glycosyl compounds	GO:0004553
Pfam-A	Melibiase	Family	x	Biological process	carbohydrate metabolic process	GO:0005975

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.090    0.419
 Receptor                             0.014    0.083
 Hormone                              0.002    0.318
 Structural_protein                   0.004    0.127
 Transporter                          0.024    0.222
 Ion_channel                          0.010    0.169
 Voltage-gated_ion_channel            0.003    0.127
 Cation_channel                       0.010    0.215
 Transcription                        0.047    0.367
 Transcription_regulation             0.026    0.204
 Stress_response                      0.049    0.552
 Immune_response                      0.012    0.136
 Growth_factor                        0.006    0.412
 Metal_ion_transport                  0.009    0.020

Type	GO category	GO aspect	GO id
Highest probablity	Signal transducer	Molecular function	GO:0004871

BACR_HALSA

GOPET

GOid	Confidence	GO term
GO:0005216	77%	ion channel activity
GO:0008020	75%	G-protein coupled photoreceptor activity
GO:0015078	60%	hydrogen ion transmembrane transporter activity

Pfam

Source	Description	Entry type	Significant	GO aspect	GO description	GO id
Pfam-A	Bacteriorhodopsin-like protein	Domain	x	Cellular component	membrane	GO:0016020
Pfam-A	Bacteriorhodopsin-like protein	Domain	x	Molecular function	ion channel activity	GO:0005216
Pfam-A	Bacteriorhodopsin-like protein	Domain	x	Biological process	ion transport	GO:0006811
Pfam-A	Domain of unknown function DUF21	Family		-	-	-

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.258    1.205
 Receptor                             0.355    2.087
 Hormone                              0.001    0.206
 Structural_protein                   0.006    0.200
 Transporter                       => 0.440    4.036
 Ion_channel                          0.010    0.169
 Voltage-gated_ion_channel            0.004    0.172
 Cation_channel                       0.078    1.689
 Transcription                        0.026    0.205
 Transcription_regulation             0.028    0.226
 Stress_response                      0.012    0.139
 Immune_response                      0.011    0.128
 Growth_factor                        0.010    0.727
 Metal_ion_transport                  0.049    0.106

Type	GO category	GO aspect	GO id
Highest information content / highest probability	Transporter	Molecular function	GO:0005215

RET4_HUMAN

GOPET

GOid	Confidence	GO term
GO:0005488	90%	binding
GO:0005501	81%	retinoid binding
GO:0008289	80%	lipid binding
GO:0019841	78%	retinol binding
GO:0005215	78%	transporter activity
GO:0016918	78%	retinal binding
GO:0005319	69%	lipid transporter activity
GO:0008035	60%	high-density lipoprotein particle binding

Pfam

Source	Description	Entry type	Significant	GO aspect	GO description	GO id
Pfam-A	Lipocalin / cytosolic fatty-acid binding protein family	Domain	x	Molecular function	binding	GO:0005488
Pfam-A	DspF/AvrF protein	Family		-	-	-
Pfam-B	PB008544	-	-	-	-	-

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.202    0.942
 Receptor                             0.147    0.862
 Hormone                              0.004    0.667
 Structural_protein                   0.002    0.058
 Transporter                          0.025    0.232
 Ion_channel                          0.016    0.288
 Voltage-gated_ion_channel            0.003    0.148
 Cation_channel                       0.010    0.215
 Transcription                        0.027    0.207
 Transcription_regulation             0.025    0.196
 Stress_response                      0.161    1.829
 Immune_response                   => 0.239    2.813
 Growth_factor                        0.023    1.617
 Metal_ion_transport                  0.009    0.020

Type	GO category	GO aspect	GO id
Highest information content / highest probability	Immune response	Biological process	GO:0006955

INSL5_HUMAN

GOPET

GOid	Confidence	GO term
GO:0005179	80%	hormone activity

Pfam

Source	Description	Entry type	Significant	GO aspect	GO description	GO id
Pfam-A	Insulin/IGF/Relaxin family	Domain	x	Cellular component	extracellular region	GO:0005576
Pfam-A	Insulin/IGF/Relaxin family	Domain	x	Molecular function	hormone activity	GO:0005179

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.374    1.746
 Receptor                             0.128    0.750
 Hormone                           => 0.247   37.936
 Structural_protein                   0.001    0.041
 Transporter                          0.025    0.228
 Ion_channel                          0.010    0.168
 Voltage-gated_ion_channel            0.003    0.131
 Cation_channel                       0.010    0.215
 Transcription                        0.054    0.425
 Transcription_regulation             0.091    0.724
 Stress_response                      0.099    1.128
 Immune_response                      0.178    2.090
 Growth_factor                        0.061    4.379
 Metal_ion_transport                  0.009    0.020

Type	GO category	GO aspect	GO id
Highest information content	Hormone	Molecular function	GO:0005179
Highest probability	Signal transducer	Molecular function	GO:0004871

LAMP1_HUMAN

GOPET

GOid	Confidence	GO term
GO:0004812	60%	aminoacyl-tRNA ligase activity
GO:0005524	60%	ATP binding

Pfam

Source	Description	Entry type	Significant	GO aspect	GO description	GO id
Pfam-A	Lysosome-associated membrane glycoprotein	Family	x	Cellular component	membrane	GO:0016020
Pfam-A	Protein of unknown function DUF1180	Family		-	-	-

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.396    1.849
 Receptor                             0.282    1.659
 Hormone                              0.001    0.206
 Structural_protein                   0.011    0.408
 Transporter                          0.024    0.222
 Ion_channel                          0.008    0.147
 Voltage-gated_ion_channel            0.002    0.111
 Cation_channel                       0.010    0.215
 Transcription                        0.032    0.247
 Transcription_regulation             0.018    0.142
 Stress_response                      0.246    2.795
 Immune_response                   => 0.371    4.368
 Growth_factor                        0.013    0.956
 Metal_ion_transport                  0.009    0.020

Type	GO category	GO aspect	GO id
Highest information content	Immune response	Biological process	GO:0006955
Highest probability	Signal transducer	Molecular function	GO:0004871

A4_HUMAN

GOPET

GOid	Confidence	GO term
GO:0004866	87%	endopeptidase inhibitor activity
GO:0004867	86%	serine-type endopeptidase inhibitor activity
GO:0030568	83%	plasmin inhibitor activity
GO:0030304	83%	trypsin inhibitor activity
GO:0030414	82%	peptidase inhibitor activity
GO:0005488	79%	binding
GO:0005515	74%	protein binding
GO:0046872	73%	metal ion binding
GO:0003677	71%	DNA binding
GO:0008201	70%	heparin binding
GO:0008270	69%	zinc ion binding
GO:0005507	69%	copper ion binding
GO:0005506	67%	iron ion binding

Pfam

Source	Description	Entry type	Significant	GO aspect	GO description	GO id
Pfam-A	Amyloid A4 N-terminal heparin-binding	Domain	x	Cellular component	integral to membrane	GO:0016021
Pfam-A	Amyloid A4 N-terminal heparin-binding	Domain	x	Molecular function	binding	GO:0005488
Pfam-A	Copper-binding of amyloid precursor, CuBD	Domain	x	-	-	-
Pfam-A	Kunitz/Bovine pancreatic trypsin inhibitor domain	Domain	x	Molecular function	serine-type endopeptidase inhibitor activity	GO:0004867
Pfam-A	E2 domain of amyloid precursor protein	Domain	x	-	-	-
Pfam-A	Beta-amyloid peptide	Family	x	Cellular component	integral to membrane	GO:0016021
Pfam-A	Beta-amyloid peptide	Family	x	Molecular function	binding	GO:0005488
Pfam-A	beta-amyloid precursor protein C-terminus	Family	x	-	-	-
Pfam-A	Exonuclease VII, large subunit	Family		-	-	-
Pfam-A	Transcriptional activator TraM	Family		-	-	-

ProtFun

 Gene Ontology category               Prob     Odds
 Signal_transducer                    0.126    0.586
 Receptor                             0.036    0.211
 Hormone                              0.001    0.206
 Structural_protein                => 0.034    1.205
 Transporter                          0.024    0.222
 Ion_channel                          0.009    0.162
 Voltage-gated_ion_channel            0.002    0.108
 Cation_channel                       0.010    0.215
 Transcription                        0.043    0.335
 Transcription_regulation             0.018    0.143
 Stress_response                      0.076    0.862
 Immune_response                      0.016    0.183
 Growth_factor                        0.005    0.372
 Metal_ion_transport                  0.009    0.020

Type	GO category	GO aspect	GO id
Highest information content	Structural protein	Molecular function	GO:0005198
Highest probability	Signal transducer	Molecular function	GO:0004871

Evaluation of the Results

We used the Gene Ontology annotation of the corresponding protein of the EBI website as the reference for the evaluation of the programs (see list of annotated GO ids). Afterwards we determined the true positives, false positives, true negatives and false negatives and calculated the sensitivity and specificity. For this, we created Venn diagrams which we provide on this page. A true negative was defined as a false positive of one of the other two programs. "Overall" is the evaluation of all six proteins. The highest value of sensitivity/specificity in respect to a certain protein is marked with a green background.

Protein	Program	True positives	False positives	True negatives	False negatives	Sensitivity	Specificity
GLA	GOPET	4	1	1	18	0.18	0.5
	Pfam	2	0	2	20	0.09	1
	ProtFun	0	1	1	22	0	0.5
BACR_HALSA	GOPET	1	2	1	11	0.08	0.33
	Pfam	3	0	3	9	0.25	1
	ProtFun	0	1	2	12	0	0.67
RET4_HUMAN	GOPET	5	3	1	36	0.12	0.25
	Pfam	1	0	4	40	0.02	1
	ProtFun	0	1	3	41	0	0.75
INSL5_HUMAN	GOPET	1	0	1	3	0.25	1
	Pfam	2	0	1	2	0.5	1
	ProtFun	1	1	0	3	0.25	0
LAMP1_HUMAN	GOPET	0	2	2	17	0	0.5
	Pfam	1	0	4	16	0.06	1
	ProtFun	0	2	2	17	0	0.5
A4_HUMAN	GOPET	7	6	2	71	0.09	0.25
	Pfam	3	0	8	75	0.04	1
	ProtFun	0	2	6	78	0	0.75
Overall	GOPET	18	14	8	156	0.10	0.36
	Pfam	12	0	22	162	0.07	1
	ProtFun	1	8	14	171	0	0.64

There are two things remarkable. First, Pfam does not have a single false positive prediction and hence the specificity is 1. This leads us to the conclusion that the search feature has also a very good specificity and that the annotation of the domains and families in Pfam is also very accurate in respect to Gene Ontology. Second, ProtFun achieved only one true positive and thus the sensitivity is close to 0. This can be explained due to the fact that ProtFun only predicts very general Gene Ontology categories (e.g. immune response, receptor, etc.) and therefore the prediction misses alot of subannotations. Nevertheless five out of six predictions of the main categories were also wrong and therefore we would not recommend to use ProtFun for a prediction of GO terms.

In respect to the sensitivity, GOPET achieves the highest value with 0.10 which is still very low. Since the overall sensitivity of Pfam is slighty lower (0.07) and the specificity of Pfam is 1, we would prefer Pfam for the prediction of GO term. Overall the sensitivity is very low and hence you have to keep in mind that you will miss a lot of GO terms when using one of these programs.

List of annotated GO ids

References

@@ Line 1: / Line 1: @@
 <sup>by [[User:Drexler|Benjamin Drexler]] and [[User:Grandke|Fabian Grandke]]</sup>
+Find additional information and graphics [[Sequence-based_predictions_GLA_diagrams | here]].
+=Proteins=
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
+! Protein
+! GLA
+! BACR_HALSA
+! RET4_HUMAN
+! INSL5_HUMAN
+! LAMP1_HUMAN
+! A4_HUMAN
+|-
+| Organism || Homo sapiens || Halobacterium salinarium || Homo sapiens || Homo sapiens || Homo sapiens || Homo sapiens
+|-
+| Size || 429 AA || 262 AA || 201 AA || 135 AA || 417 AA || 770 AA
+|-
+| Subcellular location || Lysosome || Cell membrane || Secreted || Secreted || Cell membrane || Membrane
+|-
+| Function || Glycosidase || Photoreceptor protein || Sensory transduction || Hormone || Carbohydrate presentation || Serine protease inhibitor
+|-
+| Transmembrane || No || 7 transmembrane regions || No || No || 1 transmembrane region || 1 transmembrane region
+|-
+| UniProt entry || [http://www.uniprot.org/uniprot/P06280 P06280] || [http://www.uniprot.org/uniprot/P02945 P02945] || [http://www.uniprot.org/uniprot/P02753 P02753] || [http://www.uniprot.org/uniprot/Q9Y5Q6 Q9Y5Q6] || [http://www.uniprot.org/uniprot/P11279 P11279] || [http://www.uniprot.org/uniprot/P05067 P05067]
+|-
+|}
 =Secondary structure prediction=
-[[alpha_galactosidase_reference_amino_acid|GLA sequence]]
 ==PSIPRED==
 http://bioinf.cs.ucl.ac.uk/psipred/
+PSIPRED was developed by David T. Jones at the University of Warwick in 1998. Nowadays the server runs at the University College London. <ref name=PSIPRED>[http://cms.cs.ucl.ac.uk/typo3/fileadmin/bioinf/PSIPRED/psipred_history.html History of PSIPRED]</ref>
-[[Image:GLA_Psipred.png]]
+PSIPRED predicts secondary structures based on neuronal networks with a single hidden layer and feed-forward back-propagation.<ref name=Praesi>[https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/images/b/b5/Talk_Task3.pdf Talk_Task3]</ref>
+The workflow can be split into three states:
+# Sequence profiles generation: Neuronal network gets position-specific matrix from PSI-Blast as input
+# Initial secondary structure prediction: Output Layer predicts one of the three secondary structures
+# Predicted structure filtering: Additional network filters the raw predictions from the previous step
+PSIPRED takes an amino acid sequence as input. The output is a the predicted secondary structure as shown in Figure 1.
+===Prediction===
+[[Image:GLA_Psipred.png|200px|thumb|right|Figure 1:PSIPRED result for GLA]]
+Figure 1 shows 10 alpha helices, 16 beta strands and 27 coils.
 ==Jpred3==
 http://www.compbio.dundee.ac.uk/www-jpred/index.html
+Jpred3 was developed by C. Cole at the University of Dundee. Similar to PSIPRED a neuronal network is used for the prediction of the secondary structure. For single sequences as input the program uses PSI-Blast sequence profiles, as well. Jpred3 is also capable of taking multiple sequence alignments as input. Both are further processed using the Jpred algorithm.
+<ref name=jpred>Cole et al., " The Jpred 3 secondary structure prediction server.", Nucleic acids research. 2008 Jul 1, [http://www.ncbi.nlm.nih.gov/pubmed/18463136 PubMed]</ref>
+The table below shows the PDB entries found by Jpred3 concerning our input sequence:
 {|border="1" style="text-align:center; border-spacing:0;"
 !EBI
@@ Line 89: / Line 133: @@
 |}
 The lightblue colored protein is the protein that was used as query sequence.
+We ignored those results and forced Jpred to make a prediction anyway. The programs output was a prediction of the secondary structure among others.
 ==Comparison with DSSP==
 http://swift.cmbi.ru.nl/servers/html/
+DSSP was developed by W.Kabsch and C. Sander in 1983. It is a database, containing secondary structure assignments for each protein in PDB. As it is no prediction program itself, it is used to compare the results of a prediction with the data in DSSP. Therefor it uses the 3D coordinates from PDB entries and used them to calculate DSSP entries.<ref name="dssp">Kabsch et al., "Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.", Biopolymers, 1983,[http://www.ncbi.nlm.nih.gov/pubmed/6667333 PubMed]</ref>
-[[File:GLA_DSSP_Comp.png]]
+Figure 2 is a color labeled picture of the prediction by DSSP. The letter code used by DSSP has been translated to the 3-letter code of H = helix, E = strand and C = coil.
-Find a pdf version of this image here: [[File:GLA_DSSP_Comp.pdf]]
-=Prediction of disordered regions=
-==DISOPRED==
-http://bioinf.cs.ucl.ac.uk/disopred/
+[[Image:GLA_DSSP_Comp.png|200px|thumb|right|Figure 2: Result of DSSP]]
-[[File:GLA_Diso_graph.png]]
+Find a pdf version of Figure 2 here: [[File:GLA_DSSP_Comp.pdf]]
-==POODLE==
-http://mbs.cbrc.jp/poodle/poodle.html
-===POODLE-S: Missing residues===
-[[File:GLA_Poodle_s_missing.png]]
+===Results===
-===POODLE-S: High B-Factor residues===
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
-[[File:GLA_Poodle_s_high_b.png]]
+|-
+! Structural Element
+! PSIPRED
+! Jpred3
+! DSSP
+|-
+| Helices ||  10 || 10 || 12
+|-
+| Strands ||16 ||  15 ||  16
+|-
+| Coils || 27 ||  26 ||  29
+|-
+|}
+The results of Jpred3 and PSIPRED are very similar to the reference prediction by DSSP. Although they are not entirely equal, the predictions are very close and thus can be labeled successful.
-==IUPRED==
-http://iupred.enzim.hu/index.html
-===Short Disorder===
-[[File:GLA_Iupred_Short.png]]
-===Long Disorder===
-[[File:GLA_Iupred_Long.png]]
+=Prediction of disordered regions=
-==META-Disorder==
+As GLA was not found in the DisProt database we tried to predict disordered regions with several tools.
-http://www.predictprotein.org/
+==DISOPRED==
-''Hint: You will have to register. It is free of charge, but you can submit max. 3 sequences within the next 12 months!''
+An online version of DISOPRED is available at http://bioinf.cs.ucl.ac.uk/disopred/, but we run it locally, as well. Therefor we first had to adapt some paths and then we run it using the command <code>sudo ./runpsipred</code>.
+DISOPRED was developed by JJ. Ward, JS. Sodhi, LJ. McGuffin, BF. Buxton and DT. Jones in 2004. A neuronal network is used to predict disordered regions. As it is a knowledge-based method, the DISOPRED neuronal network is trained with X-ray structures from PDB. The program takes a sequence as input and runs PSI-Blast against a database. The trained neuronal network predicts residuewise profiles and classifies them as disordered or not disordered.
-https://www.rostlab.org/owiki/index.php/Metadisorder
+<ref name=disopred>Ward et al., "Prediction and functional analysis of native disorder in proteins from the three kingdoms of life", Journal of Molecular Biology. 2004, [http://www.ncbi.nlm.nih.gov/pubmed/15019783 PubMed]</ref>
+<ref name=Praesi>[https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/images/b/b5/Talk_Task3.pdf Talk_Task3]</ref>
+[[File:GLA_Diso_graph.png|200px|thumb|right|Figure 3: Prediction of disordered regions by DISOPRED]]
-[[File:GLA_Meta_disorder.png]]
+Figure 3 shows that DISOPRED predicts two disordered regions. One is at the very beginning of the protein sequence the other one is at the end. The lightgrey dotted line shows, that another region at the position ~35 was predicted, as well, but as DISOPRED filters the results, the peak is smoothed out.
-===PROFbval===
+==POODLE==
+http://mbs.cbrc.jp/poodle/poodle.html
-https://rostlab.org/owiki/index.php/Profbval
-===NORSnet===
-https://www.rostlab.org/owiki/index.php/Norsnet
-===Ucon===
-https://www.rostlab.org/owiki/index.php/UCON
+POODLE was developed by S. Hirose, K. Shimizu, S. Kanai, Y. Kuroda and T. Noguchi in 2007. It predicts disordered regions by usage of a machine-learning approach.
-=Prediction of transmembrane alpha-helices and signal peptides=
-==Additional Proteins==
-* [http://www.uniprot.org/uniprot/P02945 BACR_HALSA]
-* [http://www.uniprot.org/uniprot/P02753 RET4_HUMAN]
-* [http://www.uniprot.org/uniprot/Q9Y5Q6 INSL5_HUMAN]
-* [http://www.uniprot.org/uniprot/P11279 LAMP1_HUMAN]
-* [http://www.uniprot.org/uniprot/P05067 A4_HUMAN]
+There are four different variants available.:<ref name=poodlehp>[http://mbs.cbrc.jp/poodle/help.html POODLE Help Page]</ref>
-==TMHMM==
-===GLA===
+* POODLE-L<ref name=poodle_l>Hirose et al., " POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions.", PharmaDesign. 2007, [http://www.ncbi.nlm.nih.gov/pubmed/17545177 PubMed]</ref>: Predicts long disordered regions(>40 consecutive residues).
-===BARC_HALSA===
+* POODLE-I<ref name=poodle_i>[http://www.bioinfo.de/isb/2010/10/0015/ POODLE-I]</ref>: Uses structural information predictors based on a work-flow approach.
-===RET4_HUMAN===
+* POODLE-S<ref name=poodle_s>Shimizu et al., "POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix.", Bioinformatics. 2007, [http://www.ncbi.nlm.nih.gov/pubmed/17599940 PubMed]</ref>: Predicts short disordered regions. Has two subversions differing in the preparation of the databases:
-===INSL5_HUMAN===
+** Missing residues: Missing regions in X-ray structures.
+** High B-Factor residues: Regions with high B-Factors.
+* POODLE-W<ref name=poodle_w>Shimizu et al., "Predicting mostly disordered proteins by using structure-unknown protein data
-===LAMP1_HUMAN===
+.", Bioinformatics. 2007, [http://www.ncbi.nlm.nih.gov/pubmed/17338828 PubMed]</ref>: Specialized on mostly disordered proteins.
+For the analysis only POODLE-S was used, because not even short disordered regions were found in our protein, so long regions are even more unlikely.
-===A4_HUMAN===
+All variants only need an amino acid sequence as input and provide as well a picture as a textual output of the predicted regions.
+===POODLE-S: Missing residues===
+[[File:GLA_Poodle_s_missing.png|200px|thumb|right|Figure 4: Prediction of disordered regions by POODLE-S, with "Missing Residues" parameter]]
+Figure 4 shows two disordered regions at both ends of the amino acid sequence, predicted by POODLE-S using the "Missing Residues" option. Additionally there are two peaks, at position ~35 and ~400 that are near to the disordered cutoff value.
+===POODLE-S: High B-Factor residues===
-==Phobius and PolyPhobius ==
+[[File:GLA_Poodle_s_high_b.png|200px|thumb|right|Figure 5: Prediction of disordered regions by POODLE-S, with "High B-Factor Residues" parameter]]
-http://phobius.sbc.su.se/
+Figure 5 shows the prediction by POODLE-S using the "High B-Factor residues" option. Only one disordered region is predicted. It is at the very beginning of the protein sequence. There are manly small peaks in the curve, so the result is very noisy.
-===GLA===
-====Phobius====
-[[File:GLA_Phob_gla.png]]
+==IUPRED==
-<code>
+http://iupred.enzim.hu/index.html
-SIGNAL        1     31
+IUPRED was developed by Zsuzsanna Dosztányi, Veronika Csizmók, Péter Tompa and István Simon in 2005.
-REGION        1      9       N-REGION.
+It is a prediction method for ordered and disordered regions in protein sequences. IUPRED is based on an energy function and the assumption that there are less interresidue interactions in disordered regions.
+<ref name=poodle_l>Dosztányi et al., "IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content.", Bioinformatics. 2005, [http://www.ncbi.nlm.nih.gov/pubmed/15955779 PubMed]</ref>
+The input of IUPRED is just an amino acid sequence. The output is a file containing predicted values for each position in the sequence. We used R-scripts to create the Figures.
+===Short Disorder===
+[[File:GLA_Iupred_Short.png|200px|thumb|right|Figure 6: Prediction of disordered regions by IUPRED with "Short Disorder" paramter]]
+Figure 6 shows three disordered regions, predicted by IUPRED using the "Short Disorder" option. The first one is at the beginning of the sequence, the next one at position ~107 and the last one at the end of the sequence. Additionally, there are two significant peaks at positions ~275 and ~320, but the predicted values are smaller than the cutoff, so they are not predicted as disordered regions.
+===Long Disorder===
+[[File:GLA_Iupred_Long.png|200px|thumb|right|Figure 7: Prediction of disordered regions by IUPRED with "Long Disorder" parameter]]
+Figure 7 shows the noisy prediction of IUPRED in the "Long Disorder" mode. It predicts only one disordered region at position ~107.
+==META-Disorder==
-REGION       10     22       H-REGION.
+http://www.predictprotein.org/
+''Hint: You will have to register. It is free of charge, but you can submit max. 3 sequences within the next 12 months!''
-REGION       23     31       C-REGION.
+The program takes an amino acid sequence as input. The resulting file consists of predictions of the programs described below. We used an R-script to create the figure.
-TOPO_DOM     32    429       NON CYTOPLASMIC.
+Metadisorder was developed by Avner Schlessinger and Burkhard Rost in 2005 at the columbia university. It combines different methods and uses various sources of information to predict disordered regions. Metadisorder makes use of the methods described below
+<ref name=MD>Schlessinger et al., "Improved Disorder Prediction by Combination of Orthogonal Approaches.", PLoS ONE. 2009, [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0004433 PLoS ONE]</ref>
+<ref name=rostlab_MD>[https://www.rostlab.org/owiki/index.php/Metadisorder  Rostlab - Metadisorder]</ref>:
+===PROFbval===
+PROFbval is a residue mobility prediction method based on the amino-acid sequence.
+<ref name=rostlab_prof>[https://rostlab.org/owiki/index.php/Profbval  Rostlab - PROFbval]</ref>
+===NORSnet===
-</code>
+NORSnet is a method that identifies unstructured loops, based on neuronal networks.
+<ref name=rostlab_nors>[https://www.rostlab.org/owiki/index.php/Norsnet  Rostlab - NORSnet]</ref>
+===UCON===
+UCON predicts natively unstructured regions through contacts.
+<ref name=rostlab_ucon>[https://www.rostlab.org/owiki/index.php/UCON  Rostlab - UCON]</ref>
+Figure 8 shows the results by META-Disorder(black line). Only one disordered region is predicted at the very beginning of the sequence.
-====PolyPhobius====
-[[File:GLA_Poly_gla.png]]
+[[File:GLA_Meta_disorder.png|200px|thumb|right|Figure 8: Prediction of disordered regions by META-Disorder]]
-<code>
-SIGNAL        1     31
+==Evaluation of the results==
-REGION        1     12       N-REGION.
+Conlusion of the results of the predictions of disordered regions:
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
+! Position
+! Disopred
+! POODLE-S(MR)
+! POODLE-S(HBF)
+! IUPRED(SHORT)
+! IUPRED(LONG)
+! META-Disorder
+|-
+| ~3 || style="background: #9ACD32;"| || style="background: #9ACD32;" | || style="background: #9ACD32;"| || style="background: #9ACD32;"| || style="background: #FF4500;"|  ||style="background: #9ACD32;"|
+|-
+| ~107 || style="background: #FF4500;"|  || style="background: #FF4500;" | || style="background: #FF4500;"| || style="background: #9ACD32;"| || style="background: #9ACD32;"|  ||style="background: #FF4500;"|
+|-
+| ~429  || style="background: #9ACD32;"|  || style="background: #9ACD32;" | || style="background: #FF4500;"| || style="background: #9ACD32;"| || style="background: #FF4500;"|  ||style="background: #FF4500;"|
+|}
+The disordered region at the beginning of the sequence is predicted by all but one methods. The one at the end is even predicted by 50% of the used programs. Unfortunately the prediction of disordered regions at the periphery of protein sequences is often positive, but wrong because of the character of electron density maps. The disordered region at position ~107 is only predicted by the two IUPRED methods and can not be confirmed by other programs. Thus a true disordered region at that position is very unlikely, as well. Summarizing, there is no disordered region in the protein.
-REGION       13     26       H-REGION.
+=Prediction of transmembrane alpha-helices and signal peptides=
-REGION       27     31       C-REGION.
+==Programs==
+All programs but TMHMM were run on web servers and only take an amino acid sequence as input.
+===TMHMM===
+TMHMM is a program that predicts transmembrane helices in proteins. It is based on a [http://en.wikipedia.org/wiki/Hidden_Markov_model hidden Markov model] and was developed by Sonnhammer et. al in 1998<ref name=sonnhammer>Sonnhammer et al., "A hidden Markov model for predicting transmembrane helices in protein sequences.", Proc Int Conf Intell Syst Mol Biol. 1998, [http://www.ncbi.nlm.nih.gov/pubmed/9783223 PubMed]</ref>.
+We used the [http://www.cbs.dtu.dk/services/TMHMM/ webserver of TMHMM] with the FASTA-sequence of the protein as input. The output contains some statistics (e.g. number of TM helices, expected number of AAs in TM helices, probability that the N-term is on the cytoplasmic membrane), a listening of the labeled sequence areas (e.g. inside, outside, TM helix) and a plot of the probabilities for the residues. Additionally, if it is predicted that there is a great number of the first 60 AA are part of a TM helix, TMHMM will indicate that there could be a signal peptide in the N-term region.
-TOPO_DOM     32    429       NON CYTOPLASMIC.
-</code>
-===BARC_HALSA===
+===Phobius/Polyphobius===
+Phobius was developed by Käll, Krogh and Sonnhammer in 2004 at the Stockholm Bioinformatics Center. It is based on HMMs and predicts transmembrane helices and signal-peptides at the N-terminal. Phobius takes a protein sequence(FASTA format) as input an outputs diagrams and text files containing the predictions.
-====Phobius====
+<ref name=phobius>Käll et al., "A combined transmembrane topology and signal peptide prediction method.", J Mol Biol. 2004,  [http://www.ncbi.nlm.nih.gov/pubmed/15111065 PubMed]</ref>
-[[File:GLA_Phob_barc.png]]
+Polyphobius was developed by L. Käll, A. Krogh, EL. Sonnhammer in 2005 at the Stockholm Bioinformatics Center. It is very similar to Phobius and uses HMMs, as well. Additionally Polyphobius uses information from homologous sequences to improve the prediction accuracy.
-<code>
+<ref name=polyphobius>Käll et al., "An HMM posterior decoder for sequence feature prediction that includes homology information.", Bioinformatics. 2005,  [http://www.ncbi.nlm.nih.gov/pubmed/15961464 PubMed]</ref>
-TOPO_DOM      1     22       NON CYTOPLASMIC.
+We used the [http://phobius.sbc.su.se/ web server] for both versions of the program.
-TRANSMEM     23     42
+===Octopus/Spoctopus===
-TOPO_DOM     43     53       CYTOPLASMIC.
+Octopus was developed by H. Viklund and A. Elofsson in 2008 at the Stockholm Bioinformatics Center. It combines HMMs and artificial neuronal networks(ANN) to predict the topology of transmembrane proteins. At first several ANNs are used to make predictions for every single residue and afterwards HMMs are used to smooth the results and combine them to a useful prediction.<ref name=octopus>Viklund et al., "OCTOPUS: improving topology prediction by two-track ANN-based preference scores and an extended topological grammar.", Bioinformatics. 2008,  [http://www.ncbi.nlm.nih.gov/pubmed/18474507 PubMed]</ref>
+Spoctopus was developed by H. Viklund, A. Bernsel, M. Skwark and A. Elofsson in 2008 at the Stockholm Bioinformatics Center. It is almost identical to Octopus, but additionally predicts the signal peptide.<ref name=spoctopus>Viklund et al., "SPOCTOPUS: a combined predictor of signal peptides and membrane protein topology.", Bioinformatics. 2008,  [http://www.ncbi.nlm.nih.gov/pubmed/18945683 PubMed]</ref>
-TRANSMEM     54     76
+We used the [http://octopus.cbr.su.se/index.php web server] for both versions of the program.
-TOPO_DOM     77     95       NON CYTOPLASMIC.
-TRANSMEM     96    114
+===TargetP===
-TOPO_DOM    115    120       CYTOPLASMIC.
+TargetP was developed by H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne in 1997 at the Stockholm Bioinformatics Center. It uses neuronal networks to predict the subcellular location of eukaryotic proteins. The method is based on the predicted presence of any of the N-terminal presequences (i.e.:chloroplast transit peptide, mitochondrial targeting peptide or secretory pathway signal peptide).<ref name=targetp>Emanuellson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.", J Mol Biol. 2000,  [http://www.ncbi.nlm.nih.gov/pubmed/10891285 PubMed]</ref>
+We used the [http://www.cbs.dtu.dk/services/TargetP/ web server] of the program.
-TRANSMEM    121    142
+===SignalP===
-TOPO_DOM    143    147       NON CYTOPLASMIC.
+SignalP was developed by H. Nielsen, J. Engelbrecht, S. Brunak and G. von Heijne in 1997 at the Stockholm Bioinformatics Center. It predicts presence of a signal peptide and provides information about hte exact location of signal peptide cleavage sites. The most recent version of the method uses as well AMMs as HMMs. <ref name=signalp>Nielsen et al., "Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.", Protein engineering. 1997,  [http://www.ncbi.nlm.nih.gov/pubmed/10891285 PubMed]</ref>
+We used the [http://www.cbs.dtu.dk/services/SignalP/ web server] of the program.
-TRANSMEM    148    169
+==Proteins==
-TOPO_DOM    170    189       CYTOPLASMIC.
+In the following sections we present the results of the programs for the proteins, i.e. the prediction of transmembrane alpha-helices and signal peptides. The sequence annotation of the UniProt entries (see [[Sequence-based_predictions_GLA#Proteins|protein overview]]) were used as the reference. For the colouring of the tables, we used the following scheme:
+* green (completely correct): the predicted positions are allowed to differ one position per index, e.g. a prediction of the positions 19 - 31 will be marked green, if the reference is 20 - 30.
+** exception: TMHMM per se is not able to predict signal peptides and hence it will include the N-terminal of the protein in the first region almost always even though it is a signal peptide. Therefore we also declare the prediction of the first region correct, when it starts at the first residue and the end residue of the first region is correct.
+* yellow (partial correct): a vast majority of the residues is assigned to the correct region.
+* red (wrong): the prediction is completely wrong, e.g. a region is predicted which does not exist.
+===General evaluation===
-TRANSMEM    190    212
+TMHMM does not predict signal peptides and therefore it often includes the beginning of the protein into the first region by mistake. It also made an error by missing a transmembrane alpha-helix and Phobius/Polyphobius often had more accurate predictions.
+Octopus has a similar problem to TMHMM, since it also does not predict signal peptides and hence it predicts a transmembrane alpha-helix for the signal peptide. On the other hand, Spoctopus predicts the transmembrane regions very good, but is always missing the beginning of the signal peptide.
-TOPO_DOM    213    217       NON CYTOPLASMIC.
+Phobius and Polyphobius made very good predictions in respect to transmembrane alpha-helices and the predictions of the signal peptides were completely correct. Since it delivers the whole package (i.e. transmembrane alpha-helix prediction, signal peptide prediction, highst accurarcy), we prefer to work with these programs.
-TRANSMEM    218    237
+===GLA===
-TOPO_DOM    238    262       CYTOPLASMIC.
+The graphical output of the programs is provided on [[Fabry Disease sequence-based prediction of tm-helices and signal peptides GLA|this page]].
-</code>
+====Transmembrane alpha-helices====
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
+! Region
+! Reference
+! TMHMM
+! Phobius
+! Polyphobius
+! Octopus
+! Spoctopus
+|-
+| Cytoplasmic || - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 1 - 9 || style="background: lightgray;" | -
+|-
+| Transmembrane || - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 10 - 30 || style="background: lightgray;" | -
+|-
+| Non-cytoplasmic || 32 - 429 || style="background: #9ACD32;" | 1 - 429 || style="background: #9ACD32;" | 32 - 429 || style="background: #9ACD32;" | 32 - 429 || style="background: #9ACD32;" | 31 - 429 || style="background: #9ACD32;" | 32 - 429
+|-
+|}
+No program predicted a transmembrane alpha-helix except for Octopus which predicts the signal peptide as a cytoplasmic and transmembrane region. We also observe this problem of Octopus for all of the other five proteins.
-====PolyPhobius====
-[[File:GLA_Poly_barc.png]]
+====Signal peptides====
-<code>
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
-TOPO_DOM      1     21       NON CYTOPLASMIC.
+|-
+! Region
+! Reference
+! Phobius
+! Polyphobius
+! Spoctopus
+! SignalP
+|-
+| Signal peptide || 1 - 31 || style="background: #9ACD32;" | 1 - 31 || style="background: #9ACD32;" | 1 - 31 || style="background: #FFFF00;" | 11 - 31 || style="background: #9ACD32;" | 1 - 31
+|-
+| N-Region || - || style="background: lightgray;" | 1 - 9 || style="background: lightgray;" | 1 - 12 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+| H-Region || - || style="background: lightgray;" | 10 - 22 || style="background: lightgray;" | 13 - 26 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+| C-Region || - || style="background: lightgray;" | 23 - 31 || style="background: lightgray;" | 27 - 31 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+|}
+All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.
-TRANSMEM     22     43
+===BACR_HALSA===
-TOPO_DOM     44     54       CYTOPLASMIC.
+The graphical output of the programs is provided on [[Fabry Disease sequence-based prediction of tm-helices and signal peptides BARC_HALSA|this page]].
+====Transmembrane alpha-helices====
-TRANSMEM     55     77
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
-TOPO_DOM     78     94       NON CYTOPLASMIC.
+! Region
+! Reference
-TRANSMEM     95    114
+! TMHMM
+! Phobius
-TOPO_DOM    115    120       CYTOPLASMIC.
+! Polyphobius
+! Octopus
-TRANSMEM    121    141
+! Spoctopus
+|-
+| Non-cytoplasmic || 14 - 23 || style="background: #FFFF00;" | 1 - 22 || style="background: #FFFF00;" | 1 - 22 || style="background: #FFFF00;" | 1 - 21 || style="background: #FFFF00;" | 1 - 22 || style="background: #FFFF00;" | 1 - 22
+|-
+| Transmembrane || 24 - 42 || style="background: #9ACD32;" | 23 - 42 || style="background: #9ACD32;" | 23 - 42 || style="background: #FFFF00;" | 22 - 43 || style="background: #9ACD32;" | 23 - 43 || style="background: #9ACD32;" | 23 - 43
+|-
+| Cytoplasmic || 43 - 56 || style="background: #FFFF00;" | 43 - 54 || style="background: #FFFF00;" | 43 - 53|| style="background: #FFFF00;" | 44 - 54 || style="background: #FFFF00;" | 44 - 54 || style="background: #FFFF00;" | 44 - 54
+|-
+| Transmembrane || 57 - 75 || style="background: #FFFF00;" | 55 - 77 || style="background: #FFFF00;" | 54 - 76 || style="background: #FFFF00;" | 55 - 77 || style="background: #FFFF00;" | 55 - 75 || style="background: #FFFF00;" | 55 - 75
+|-
+| Non-cytoplasmic || 76 - 91 || style="background: #FFFF00;" | 78 - 91 || style="background: #FFFF00;" | 77 - 95 || style="background: #FFFF00;" | 78 - 94 || style="background: #FFFF00;" | 76 - 95 || style="background: #FFFF00;" | 76 - 95
+|-
+| Transmembrane || 92 - 109 || style="background: #FFFF00;" | 92 - 114 || style="background: #FFFF00;" | 96 - 114 || style="background: #FFFF00;" | 95 - 114 || style="background: #FFFF00;" | 96 - 116 || style="background: #FFFF00;" | 96 - 116
+|-
+| Cytoplasmic || 110 - 120 || style="background: #FFFF00;" | 115 - 120 || style="background: #FFFF00;" | 115 - 120|| style="background: #FFFF00;" | 115 - 120 || style="background: #FFFF00;" | 117 - 121 || style="background: #FFFF00;" | 117 - 120
+|-
+| Transmembrane || 121 - 140 || style="background: #FFFF00;" | 121 - 143 || style="background: #FFFF00;" | 121 - 142 || style="background: #9ACD32;" | 121 - 141 || style="background: #FFFF00;" | 122 - 142 || style="background: #9ACD32;" | 121 - 141
+|-
+| Non-cytoplasmic || 141 - 147 || style="background: #FFFF00;" | 144 - 147 || style="background: #FFFF00;" | 143 - 147|| style="background: #9ACD32;" | 142 - 147 || style="background: #FFFF00;" | 143 - 147 || style="background: #9ACD32;" | 142 - 147
+|-
+| Transmembrane || 148 - 167 || style="background: #FFFF00;" | 148 - 170 || style="background: #FFFF00;" | 148 - 169 || style="background: #9ACD32;" | 148 - 166 || style="background: #9ACD32;" | 148 - 168 || style="background: #9ACD32;" | 148 - 168
+|-
+| Cytoplasmic || 168 - 185 || style="background: #FFFF00;" | 171 - 189 || style="background: #FFFF00;" | 170 - 189 || style="background: #9ACD32;" | 167 - 186 || style="background: #9ACD32;" | 169 - 185 || style="background: #9ACD32;" | 169 - 185
+|-
+| Transmembrane || 186 - 204 || style="background: #FFFF00;" | 190 - 212 || style="background: #FFFF00;" | 190 - 212 || style="background: #9ACD32;" | 187 - 205 || style="background: #FFFF00;" | 186 - 206 || style="background: #FFFF00;" | 186 - 206
+|-
+| Non-cytoplasmic || 205 - 216 || style="background: #FFFF00;" | 213 - 262 || style="background: #FFFF00;" | 213 - 217 || style="background: #9ACD32;" | 206 - 215 || style="background: #FFFF00;" | 207 - 216 || style="background: #FFFF00;" | 207 - 216
+|-
+| Transmembrane || 217 - 236 || style="background: #FF4500;" | - || style="background: #9ACD32;" | 218 - 237 || style="background: #9ACD32;" | 216 - 237 || style="background: #9ACD32;" | 217 - 237 || style="background: #9ACD32;" | 217 - 237
+|-
+| Cytoplasmic || 237 - 262 || style="background: #FF4500;" | - || style="background: #9ACD32;" | 238 - 262 || style="background: #9ACD32;" | 238 - 262 || style="background: #9ACD32;" | 238 - 262 || style="background: #9ACD32;" | 238 - 262
+|-
+|}
+All programs predict the correct number of transmembrane alpha-helices apart from TMHMM which misses the last one. Polyphobius has the best result since it has the highest number of completely correct predicted regions.
-TOPO_DOM    142    147       NON CYTOPLASMIC.
+====Signal peptides====
-TRANSMEM    148    166
+No program predicted a signal peptide which is correct.
-TOPO_DOM    167    186       CYTOPLASMIC.
-TRANSMEM    187    205
-TOPO_DOM    206    215       NON CYTOPLASMIC.
-TRANSMEM    216    237
-TOPO_DOM    238    262       CYTOPLASMIC.
-</code>
 ===RET4_HUMAN===
+The graphical output of the programs is provided on [[Fabry Disease sequence-based prediction of tm-helices and signal peptides RET4_HUMAN|this page]].
-====Phobius====
-[[File:GLA_Phob_ret4.png]]
+====Transmembrane alpha-helices====
-<code>
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
-SIGNAL        1     18
+|-
+! Region
+! Reference
+! TMHMM
+! Phobius
+! Polyphobius
+! Octopus
+! Spoctopus
+|-
+| Cytoplasmic || - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 1 - 1 || style="background: lightgray;" | -
+|-
+| Transmembrane || - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 2 - 23 || style="background: lightgray;" | -
+|-
+| Non-cytoplasmic || 19 - 201 || style="background: #9ACD32;" | 1 - 201 || style="background: #9ACD32;" | 19 - 201 || style="background: #9ACD32;" | 19 - 201 || style="background: #FFFF00;" | 24 - 201 || style="background: #9ACD32;" | 20 - 201
+|-
+|}
+All programs predict correctly that this protein does not have a transmembrane alpha-helix except for Octopus.
-REGION        1      2       N-REGION.
+====Signal peptides====
-REGION        3     13       H-REGION.
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
+! Region
+! Reference
+! Phobius
+! Polyphobius
+! Spoctopus
+! SignalP
+|-
+| Signal peptide || 1 - 18 || style="background: #9ACD32;" | 1 - 18 || style="background: #9ACD32;" | 1 - 18 || style="background: #FFFF00;" | 6 - 19 || style="background: #9ACD32;" | 1 - 18
+|-
+| N-Region || - || style="background: lightgray;" | 1 - 2 || style="background: lightgray;" | 1 - 3 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+| H-Region || - || style="background: lightgray;" | 3 - 13 || style="background: lightgray;" | 4 - 13 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+| C-Region || - || style="background: lightgray;" | 14 - 18|| style="background: lightgray;" | 14 - 18 || style="background: lightgray;" | -  || style="background: lightgray;" | -
+|-
+|}
+All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.
-REGION       14     18       C-REGION.
-TOPO_DOM     19    201       NON CYTOPLASMIC.
-</code>
-====PolyPhobius====
-[[File:Poly_ret4.png]]
-<code>
-SIGNAL        1     18
-REGION        1      3       N-REGION.
-REGION        4     13       H-REGION.
-REGION       14     18       C-REGION.
-TOPO_DOM     19    201       NON CYTOPLASMIC.
-</code>
 ===INSL5_HUMAN===
+The graphical output of the programs is provided on [[Fabry Disease sequence-based prediction of tm-helices and signal peptides INSL5_HUMAN|this page]].
-====Phobius====
-[[File:GLA_Phob_insl5.png]]
+====Transmembrane alpha-helices====
-<code>
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
-SIGNAL        1     22
+|-
+! Region
+! Reference
+! TMHMM
+! Phobius
+! Polyphobius
+! Octopus
+! Spoctopus
+|-
+| Cytoplasmic || - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 1 - 1 || style="background: lightgray;" | -
+|-
+| Transmembrane || - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 2 - 32 || style="background: lightgray;" | -
+|-
+| Non-cytoplasmic || 23 - 135 || style="background: #9ACD32;" | 1 - 135 || style="background: #9ACD32;" | 23 - 135 || style="background: #9ACD32;" | 23 - 135 || style="background: #FFFF00;" | 33 - 135 || style="background: #9ACD32;" | 24 - 135
+|-
+|}
+All programs predict correctly that this protein does not have a transmembrane alpha-helix except for Octopus.
-REGION        1      5       N-REGION.
+====Signal peptides====
-REGION        6     17       H-REGION.
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
-REGION       18     22       C-REGION.
+! Region
+! Reference
-TOPO_DOM     23    135       NON CYTOPLASMIC.
+! Phobius
-</code>
+! Polyphobius
+! Spoctopus
-====PolyPhobius====
+! SignalP
-[[File:Poly_insl5.png]]
+|-
+| Signal peptide || 1 - 22 || style="background: #9ACD32;" | 1 - 22 || style="background: #9ACD32;" | 1 - 22 || style="background: #FFFF00;" | 6 - 23 || style="background: #9ACD32;" | 1 - 22
-<code>
+|-
-SIGNAL        1     22
+| N-Region || - || style="background: lightgray;" | 1 - 5 || style="background: lightgray;" | 1 - 4 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
-REGION        1      4       N-REGION.
+| H-Region || - || style="background: lightgray;" | 6 - 17 || style="background: lightgray;" | 5 - 16 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
-REGION        5     16       H-REGION.
+| C-Region || - || style="background: lightgray;" | 18 - 22 || style="background: lightgray;" | 17 - 22 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
-REGION       17     22       C-REGION.
+|}
-TOPO_DOM     23    135       NON CYTOPLASMIC.
-</code>
+All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.
 ===LAMP1_HUMAN===
+The graphical output of the programs is provided on [[Fabry Disease sequence-based prediction of tm-helices and signal peptides LAMP1_HUMAN|this page]].
-====Phobius====
-[[File:GLA_Phob_lamp1.png]]
+====Transmembrane alpha-helices====
-<code>
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
-SIGNAL        1     28
+|-
+! Region
+! Reference
+! TMHMM
+! Phobius
+! Polyphobius
+! Octopus
+! Spoctopus
+|-
+| Cytoplasmic || - || style="background: #FF4500;" | 1 - 10 || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 1 - 10 || style="background: lightgray;" | -
+|-
+| Transmembrane || - || style="background: #FF4500;" | 11 - 33 || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 11 - 31 || style="background: lightgray;" | -
+|-
+| Non-cytoplasmic || 29 - 382 || style="background: #FFFF00;" | 34 - 383 || style="background: #9ACD32;" | 29 - 381 || style="background: #9ACD32;" | 29 - 381 || style="background: #FFFF00;" | 32 - 383 || style="background: #9ACD32;" | 30 - 383
+|-
+| Transmembrane || 383 - 405 || style="background: #9ACD32;" | 384 - 406 || style="background: #9ACD32;" | 382 - 405 || style="background: #9ACD32;" | 382 - 405 || style="background: #9ACD32;" | 384 - 404 || style="background: #9ACD32;" | 384 - 404
+|-
+| Cytoplasmic || 406 - 417 || style="background: #9ACD32;" | 407 - 417 || style="background: #9ACD32;" | 406 - 417 || style="background: #9ACD32;" | 406 - 417 || style="background: #9ACD32;" | 405 - 417 || style="background: #9ACD32;" | 405 - 417
+|-
+|}
+Phobius, Polyphobius and Spoctopus predict the one transmembrane alpha-helix correctly. Octopus has the common error that it assigns a cytoplasmic and transmembrane region to the signal peptide of the protein. TMHMM also predicts a transmembrane region for the signal peptide. The developers of TMHMM indicate this problem on the [http://www.cbs.dtu.dk/services/TMHMM/TMHMM2.0b.guide.php instruction page of TMHMM].
-REGION        1     10       N-REGION.
+====Signal peptides====
-REGION       11     22       H-REGION.
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
-REGION       23     28       C-REGION.
+! Region
+! Reference
-TOPO_DOM     29    381       NON CYTOPLASMIC.
+! Phobius
+! Polyphobius
-TRANSMEM    382    405
+! Spoctopus
+! SignalP
-TOPO_DOM    406    417       CYTOPLASMIC.
+|-
-</code>
+| Signal peptide || 1 - 28 || style="background: #9ACD32;" | 1 - 28 || style="background: #9ACD32;" | 1 - 28 || style="background: #FFFF00;" | 12 - 29 || style="background: #9ACD32;" | 1 - 28
+|-
-====PolyPhobius====
+| N-Region || - || style="background: lightgray;" | 1 - 10 || style="background: lightgray;" | 1 - 9 || style="background: lightgray;" | - || style="background: lightgray;" | -
-[[File:GLA_Poly_lamp1.png]]
+|-
+| H-Region || - || style="background: lightgray;" | 11 - 22 || style="background: lightgray;" | 10 - 22 || style="background: lightgray;" | - || style="background: lightgray;" | -
-<code>
+|-
-SIGNAL        1     28
+| C-Region || - || style="background: lightgray;" | 23 - 28 || style="background: lightgray;" | 23 - 28 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
-REGION        1      9       N-REGION.
+|}
-REGION       10     22       H-REGION.
-REGION       23     28       C-REGION.
-TOPO_DOM     29    381       NON CYTOPLASMIC.
-TRANSMEM    382    405
-TOPO_DOM    406    417       CYTOPLASMIC.
-</code>
+All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.
 ===A4_HUMAN===
+The graphical output of the programs is provided on [[Fabry Disease sequence-based prediction of tm-helices and signal peptides A4_HUMAN|this page]].
-====Phobius====
-[[File:GLA_Phob_a4.png]]
+====Transmembrane alpha-helices====
-<code>
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
-SIGNAL        1     17
+|-
+! Region
+! Reference
+! TMHMM
+! Phobius
+! Polyphobius
+! Octopus
+! Spoctopus
+|-
+| Cytoplasmic || - || style="background: lightgray;" | -|| style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 1 - 5 || style="background: lightgray;" | -
+|-
+| Transmembrane || - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: lightgray;" | - || style="background: #FF4500;" | 6 - 11 || style="background: lightgray;" | -
+|-
+| Non-cytoplasmic || 18 - 699 || style="background: #9ACD32;" | 1 - 700 || style="background: #9ACD32;" | 18 - 700 || style="background: #9ACD32;" | 18 - 700 || style="background: #FFFF00;" | 12 - 701 || style="background: #FFFF00;" | 19 - 701
+|-
+| Transmembrane || 700 - 723 || style="background: #9ACD32;" | 701 - 723 || style="background: #9ACD32;" | 701 - 723 || style="background: #9ACD32;" | 701 - 723 || style="background: #FFFF00;" | 702 - 722 || style="background: #FFFF00;" | 702 - 722
+|-
+| Cytoplasmic || 724 - 770 || style="background: #9ACD32;" | 724 - 770 || style="background: #9ACD32;" | 724 - 770 || style="background: #9ACD32;" | 724 - 770 || style="background: #9ACD32;" | 723 - 770 || style="background: #9ACD32;" | 723 - 770
+|-
+|}
+Every program predicts the one transmembrane alpha-helix of the protein except for Octopus.
-REGION        1      1       N-REGION.
+====Signal peptides====
-REGION        2     12       H-REGION.
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+|-
+! Region
+! Reference
+! Phobius
+! Polyphobius
+! Spoctopus
+! SignalP
+|-
+| Signal peptide || 1 - 17 || style="background: #9ACD32;" | 1 - 17 || style="background: #9ACD32;" | 1 - 17 || style="background: #FFFF00;" | 5 - 18 || style="background: #9ACD32;" | 1 - 17
+|-
+| N-Region || - || style="background: lightgray;" | 1 - 1 || style="background: lightgray;" | 1 - 3 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+| H-Region || - || style="background: lightgray;" | 2 - 12 || style="background: lightgray;" | 4 - 12 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+| C-Region || - || style="background: lightgray;" | 13 - 17 || style="background: lightgray;" | 13 - 17 || style="background: lightgray;" | - || style="background: lightgray;" | -
+|-
+|}
+All programs recognize the signal peptide, but only Spoctopus does not include the beginning of the sequence.
-REGION       13     17       C-REGION.
+==TargetP==
-TOPO_DOM     18    700       NON CYTOPLASMIC.
+The result TargetP is shown in the following table. A explanation of the output is given on [http://www.cbs.dtu.dk/services/TargetP-1.1/output.php this page].
-TRANSMEM    701    723
-TOPO_DOM    724    770       CYTOPLASMIC.
-</code>
-====PolyPhobius====
-[[File:GLA_Poly_a4.png]]
-<code>
-SIGNAL        1     17
-REGION        1      3       N-REGION.
-REGION        4     12       H-REGION.
-REGION       13     17       C-REGION.
-TOPO_DOM     18    700       NON CYTOPLASMIC.
-TRANSMEM    701    723
-TOPO_DOM    724    770       CYTOPLASMIC.
-</code>
-==OCTOPUS and SPOCTOPUS==
-http://octopus.cbr.su.se/index.php
-===GLA===
-====Octopus====
-[[File:GLA_Octo_gla.png]]
-====Spoctopus====
-[[File:GLA_Spocto_gla.png]]
-===BARC_HALSA===
-====Octopus====
-[[File:GLA_Octo_bacr.png]]
-====Spoctopus====
-[[File:GLA_Spocto_bacr.png]]
-===RET4_HUMAN===
-====Octopus====
-[[File:GLA_Octo_ret4.png]]
-====Spoctopus====
-[[File:GLA_Spocto_ret4.png]]
-===INSL5_HUMAN===
-====Octopus====
-[[File:GLA_Octo_insl5.png]]
-====Spoctopus====
-[[File:GLA_Spocto_insl5.png]]
-===LAMP1_HUMAN===
-====Octopus====
-[[File:GLA_Octo_lamp1.png]]
-====Spoctopus====
-[[File:GLA_Spocto_lamp1.png]]
-===A4_HUMAN===
-====Octopus====
-[[File:Octo_a4.png]]
-====Spoctopus====
-[[File:GLA_Spocto_a4.png]]
-==SignalP ==
-===GLA===
-===BARC_HALSA===
-===RET4_HUMAN===
-===INSL5_HUMAN===
-===LAMP1_HUMAN===
-===A4_HUMAN===
-==TargetP==
-http://www.cbs.dtu.dk/services/TargetP/
 {|border="1" style="text-align:center; border-spacing:0;"
 !Name
@@ Line 472: / Line 617: @@
 !RC
 |-
-|GLA   ||           429      ||    0.041 || 0.860 ||  0.141 ||  S  ||  2
+|GLA   ||           429      ||    0.041 || '''0.860''' ||  0.141 ||  '''S'''  ||  2
 |-
-|BACR_HALSA || 262    ||      0.019 || 0.897 || 0.562 ||  S  ||  4
+|BACR_HALSA || 262    ||      0.019 || '''0.897''' || 0.562 ||  '''S'''  ||  4
 |-
-|RET4_HUMAN || 201   ||       0.242 || 0.928 || 0.020 || S  ||  2
+|RET4_HUMAN || 201   ||       0.242 || '''0.928''' || 0.020 || '''S'''  ||  2
 |-
-|INSL5_HUMA || 135    ||      0.074 || 0.899 || 0.037 ||  S  ||  1
+|INSL5_HUMA || 135    ||      0.074 || '''0.899''' || 0.037 ||  '''S'''  ||  1
 |-
-|LAMP1_HUMA || 417   ||       0.043 || 0.953 || 0.017 ||  S  ||  1
+|LAMP1_HUMA || 417   ||       0.043 || '''0.953''' || 0.017 ||  '''S'''  ||  1
 |-
-|A4_HUMAN  ||  770    ||      0.035 || 0.937 || 0.084 ||  S  ||  1
+|A4_HUMAN  ||  770    ||      0.035 || '''0.937''' || 0.084 ||  '''S'''  ||  1
 |}
-http://www.cbs.dtu.dk/services/TargetP-1.1/output.php
+TargetP predicts for all six proteins a signal peptide and therefore assigns the protein to the secretory pathway. This prediction is correct for every protein.
 =Prediction of GO terms=
-==Programs==
+==GOPET==
+GOPET stands for Gene Ontology term Prediction and Evaluation Tool and was developed by Vinayagam et al. in 2006<ref name=vinayagam>Vinayagam et al., "GOPET: a tool for automated predictions of Gene Ontology terms.", BMC Bioinformatics. 2006 Mar 20, [http://www.ncbi.nlm.nih.gov/pubmed/16549020 PubMed]</ref>. It is based on homology searches on GO-mapped protein databases and uses [http://en.wikipedia.org/wiki/Support_vector_machine support vector machines] for the calculation of the confidence values.
-===GOPET===
-http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar
-We used the default settings (GO aspect: molecular function, maximum number of predictions: 20, confidence threshold: 60, GOPET model 2007 june, version 2.0, GOPET database 2007). The results only contain GOids of the GO aspect "molecular function", since the other two GO aspects (cellular component and biological process) were not available.
+We used the [http://genius.embnet.dkfz-heidelberg.de/menu/cgi-bin/w2h-open/w2h.open/w2h.startthis?SIMGO=w2h.welcome webserver of GOPET] with the default settings (GO aspect: molecular function, maximum number of predictions: 20, confidence threshold: 60, GOPET model 2007 june, version 2.0, GOPET database 2007) and the FASTA-sequence of the protein as input. The results only contain GOids of the GO aspect "molecular function", since the other two GO aspects (cellular component and biological process) were not available.
-===Pfam===
+==Pfam==
-Pfam is a database composed of the protein domain families that is created by using Hidden Markov Models profiles(HMMs). Each protein domain family is represented by a multiple sequence alignment and a HMMs. One can search one protein sequence against Pfam and obtain all the possible domains that the query sequence might contain.
+Pfam is a database composed of the protein domain families that is created by using Hidden Markov Models profiles (HMMs) and was first described by Sonnhammer et al. in 1997<ref name=sonnhammer>Sonnhammer et al., "Pfam: a comprehensive database of protein domain families based on seed alignments.", Proteins. 1997 Jul, [http://www.ncbi.nlm.nih.gov/pubmed/9223186 PubMed]</ref>. Each protein domain family is represented by a multiple sequence alignment and a HMMs. One can search one protein sequence against Pfam and obtain all the possible domains that the query sequence might contain.
 Pfam database includes two parts A and B where the protein domain families with different quality levels. In the 1.0 release of Pfam, the protein entries in Pfam-A and Pfam-B were from Swissprot (a few initial members of seed alignment in Pfam-A were from several sources: Swissprot, Prosite, ProDom etc.). In the current release of Pfam, the entries in Pfam-A and Pfam-B are from Pfamseq(UniProtKB) and ADDA respectively.
@@ Line 503: / Line 648: @@
 The Pfam-B is created based on the sequence alignment of the entries from ADDA by using HMMs. Those entries existing already in Pfam-A are excluded. There are no confirmed annotation and no manual quality checking for the families in Pfam-B, therefore there could be some errors (e.g. the members in one family could be just randomly aligned) and the overall quality is relative low. However, it still can be useful for the situation that one can not find domain evidence in Pfam-A for the query sequence.
-We used the "sequence search" feature of Pfam to determine potential domains or domain families of the protein. Afterwards we checked out the corresponding page of the domain (family) for a GO annotation. The search was performed with the default settings (cut-off: use E-Value, threshold 1.0), but we also included Pfam-B in the search. Only one hit in Pfam-B was found which does not have any GO annotation and hence there was no gain in including Pfam-B. The classification in respect to the significance of a hit was done by the Pfam search algorithm. The results are listed in the tables below.
+We used the "sequence search" feature of [http://pfam.sanger.ac.uk/ Pfam website] with the FASTA-sequence of the protein to determine potential domains or domain families. Afterwards we checked out the corresponding page of the domain (family) for a GO annotation. The search was performed with the default settings (cut-off: use E-Value, threshold 1.0), but we also included Pfam-B in the search. Only one hit in Pfam-B was found which does not have any GO annotation and hence there was no gain in including Pfam-B. The classification in respect to the significance of a hit was done by the Pfam search algorithm.
-===ProtFun===
+==ProtFun==
+ProtFun tries to assign a function to the query protein. For this purpose, it uses the prediction of several other features like post-translational modification sites or localization of the protein. The prediction of these features itself is based on other programs like [[Sequence-based_predictions_GLA#SignalP|SignalP]], [[Sequence-based_predictions_GLA#TargetP|TargetP]], NetOGlyc, [[Sequence-based_predictions_GLA#TMHMM|TMHMM]] and some others.
-http://www.cbs.dtu.dk/services/ProtFun/
+ProtFun was developed by Jensen et al. in 2002<ref name=jensen_1>Jensen et al., "Prediction of human protein function from post-translational modifications and localization features.", J Mol Biol. 2002 Jun 21, [http://www.ncbi.nlm.nih.gov/pubmed/12079362 PubMed]</ref> and the prediction of the Gene Ontology category was added in 2003<ref name=jensen_2>Jensen et al., "Prediction of human protein function according to Gene Ontology categories.", Bioinformatics. 2003 Mar 22, [http://www.ncbi.nlm.nih.gov/pubmed/12651722 PubMed]</ref>.
-The results of the Gene Ontology category assignment of ProtFun are listed below. The term 'Prob' represents the calculated probability by ProtFun that the query belongs to the category. This probability is dependent on the prior probability of the category. 'Odds' describes the odds that the query belongs to the certain category and is not influenced by the prior probability.<ref name=ProtFun>[http://www.cbs.dtu.dk/services/ProtFun-2.2/output.php Explanation of the ProtFun 2.2 output.]</ref>  The class with the highest [http://en.wikipedia.org/wiki/Self-information information content] and with the highest probability is marked bold. Additionally we provide a table for each query that contains the categories with the highest information content or probability, respectively, and their associated GO id. For this purpose, we used the search feature of the [http://www.geneontology.org/ Gene Ontology website].
+We used the webserver of [http://www.cbs.dtu.dk/services/ProtFun/ ProtFun 2.2] with the default settings and the FASTA-sequence of the protein as the input. The output contains predictions about the functional category, enzyme/nonenzyme, enzyme class and the Gene Ontology category. In our case, only the result of the latter was relevant. The term 'Prob' represents the calculated probability by ProtFun that the query belongs to the category. This probability is dependent on the prior probability of the category. 'Odds' describes the odds that the query belongs to the certain category and is not influenced by the prior probability.<ref name=ProtFun>[http://www.cbs.dtu.dk/services/ProtFun-2.2/output.php Explanation of the ProtFun 2.2 output.]</ref>  The class with the highest [http://en.wikipedia.org/wiki/Self-information information content] and with the highest probability is marked bold. Additionally we provide a table for each query that contains the categories with the highest information content or probability, respectively, and their associated GO id. For this purpose, we used the search feature of the [http://www.geneontology.org/ Gene Ontology website].
 ==Proteins==
+The results of the prorgams are listed for the five proteins in the following sections. If a prediction is correct, it will be marked with a green background. For this purpose, we used the GO annotation of the corresponding protein at the [http://www.ebi.ac.uk/ EBI website] (see [[Sequence-based_predictions_GLA#List_of_annotated_GO_ids|list of annotated GO ids]]).
 ===GLA===
 ====GOPET====
 {|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
 |-
 ! GOid
 ! Confidence
 ! GO term
+|- style="background: #9ACD32;"
-|-
 |GO:0016798 || 98% || hydrolase activity acting on glycosyl bonds
+|- style="background: #9ACD32;"
-|-
 |GO:0004553 || 98% || hydrolase activity hydrolyzing O-glycosyl compounds
+|- style="background: #9ACD32;"
-|-
 |GO:0016787 || 97% || hydrolase activity
+|- style="background: #9ACD32;"
-|-
 |GO:0004557 || 96% || alpha-galactosidase activit
+|- style="background: lightgray;"
-|-
 |GO:0008456 || 89% || alpha-N-acetylgalactosaminidase activity
 |-
@@ Line 541: / Line 688: @@
 ! GO description
 ! GO id
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Melibiase || Family || x || Molecular function || hydrolase activity, hydrolyzing O-glycosyl compounds || GO:0004553
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Melibiase || Family || x || Biological process || carbohydrate metabolic process || GO:0005975
 |-
@@ Line 573: / Line 720: @@
 ! GO aspect
 ! GO id
+|- style="background: lightgray;"
-|-
 | Highest probablity || Signal transducer || Molecular function || GO:0004871
 |-
 |}
+<br/>
-===BARC_HALSA===
+===BACR_HALSA===
 ====GOPET====
 {|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
@@ Line 585: / Line 733: @@
 ! Confidence
 ! GO term
+|- style="background: #9ACD32;"
-|-
 |GO:0005216 || 77%||  ion channel activity
+|- style="background: lightgray;"
-|-
 |GO:0008020 || 75%||  G-protein coupled photoreceptor activity
+|- style="background: lightgray;"
-|-
 |GO:0015078 || 60%||  hydrogen ion transmembrane transporter activity
-|-
 |-
 |}
@@ Line 605: / Line 752: @@
 ! GO description
 ! GO id
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Bacteriorhodopsin-like protein || Domain || x || Cellular component || membrane || GO:0016020
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Bacteriorhodopsin-like protein || Domain || x || Molecular function || ion channel activity || GO:0005216
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Bacteriorhodopsin-like protein || Domain || x || Biological process || ion transport || GO:0006811
+|- style="background: lightgray;"
-|-
 | Pfam-A || Domain of unknown function DUF21 || Family || || - || - || -
 |-
@@ Line 641: / Line 788: @@
 ! GO aspect
 ! GO id
+|- style="background: lightgray;"
-|-
 | Highest information content / highest probability || Transporter || Molecular function || GO:0005215
 |-
 |}
+<br/>
 ===RET4_HUMAN===
@@ Line 653: / Line 801: @@
 ! Confidence
 ! GO term
+|- style="background: #9ACD32;"
-|-
 |GO:0005488 || 90% || binding
+|- style="background: #9ACD32;"
-|-
 |GO:0005501 || 81% || retinoid binding
+|- style="background: lightgray;"
-|-
 |GO:0008289 || 80% || lipid binding
+|- style="background: #9ACD32;"
-|-
 |GO:0019841 || 78% || retinol binding
+|- style="background: #9ACD32;"
-|-
 |GO:0005215 || 78% || transporter activity
+|- style="background: #9ACD32;"
-|-
 |GO:0016918 || 78% || retinal binding
+|- style="background: lightgray;"
-|-
 |GO:0005319 || 69% || lipid transporter activity
+|- style="background: lightgray;"
-|-
 |GO:0008035 || 60% || high-density lipoprotein particle binding
-|-
 |-
 |}
@@ Line 683: / Line 830: @@
 ! GO description
 ! GO id
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Lipocalin / cytosolic fatty-acid binding protein family || Domain || x || Molecular function || binding || GO:0005488
+|- style="background: lightgray;"
-|-
 | Pfam-A || DspF/AvrF protein || Family || || - || - || -
+|- style="background: lightgray;"
-|-
 | Pfam-B || PB008544 || - || - || - || - || -
 |-
@@ Line 717: / Line 864: @@
 ! GO aspect
 ! GO id
+|- style="background: lightgray;"
-|-
 | Highest information content / highest probability || Immune response || Biological process || GO:0006955
 |-
 |}
+<br/>
 ===INSL5_HUMAN===
@@ Line 729: / Line 877: @@
 ! Confidence
 ! GO term
+|- style="background: #9ACD32;"
-|-
 |GO:0005179 || 80% || hormone activity
 |-
@@ Line 744: / Line 892: @@
 ! GO description
 ! GO id
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Insulin/IGF/Relaxin family || Domain || x || Cellular component || extracellular region || GO:0005576
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Insulin/IGF/Relaxin family || Domain || x || Molecular function || hormone activity || GO:0005179
 |-
@@ Line 776: / Line 924: @@
 ! GO aspect
 ! GO id
+|- style="background: #9ACD32;"
-|-
 | Highest information content || Hormone || Molecular function || GO:0005179
+|- style="background: lightgray;"
-|-
 | Highest probability || Signal transducer || Molecular function || GO:0004871
 |-
 |}
+<br/>
 ===LAMP1_HUMAN===
@@ Line 790: / Line 939: @@
 ! Confidence
 ! GO term
+|- style="background: lightgray;"
-|-
 |GO:0004812 || 60% || aminoacyl-tRNA ligase activity
+|- style="background: lightgray;"
-|-
 |GO:0005524 || 60% || ATP binding
 |-
@@ Line 807: / Line 956: @@
 ! GO description
 ! GO id
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Lysosome-associated membrane glycoprotein || Family || x || Cellular component || membrane || GO:0016020
+|- style="background: lightgray;"
-|-
 | Pfam-A || Protein of unknown function DUF1180 || Family || || - || - || -
 |-
@@ Line 839: / Line 988: @@
 ! GO aspect
 ! GO id
+|- style="background: lightgray;"
-|-
 | Highest information content || Immune response || Biological process || GO:0006955
+|- style="background: lightgray;"
-|-
 | Highest probability || Signal transducer || Molecular function || GO:0004871
 |-
 |}
+<br/>
 ===A4_HUMAN===
@@ Line 853: / Line 1,003: @@
 ! Confidence
 ! GO term
+|-  style="background: lightgray;"
-|-
 |GO:0004866 || 87% || endopeptidase inhibitor activity
+|- style="background: #9ACD32;"
-|-
 |GO:0004867 || 86% || serine-type endopeptidase inhibitor activity
+|- style="background: lightgray;"
-|-
 |GO:0030568 || 83% || plasmin inhibitor activity
+|- style="background: lightgray;"
-|-
 |GO:0030304 || 83% || trypsin inhibitor activity
+|- style="background: #9ACD32;"
-|-
 |GO:0030414 || 82% || peptidase inhibitor activity
+|- style="background: #9ACD32;"
-|-
 |GO:0005488 || 79% || binding
+|- style="background: #9ACD32;"
-|-
 |GO:0005515 || 74% || protein binding
+|- style="background: #9ACD32;"
-|-
 |GO:0046872 || 73% || metal ion binding
+|- style="background: #9ACD32;"
-|-
 |GO:0003677 || 71% || DNA binding
+|- style="background: #9ACD32;"
-|-
 |GO:0008201 || 70% || heparin binding
+|- style="background: lightgray;"
-|-
 |GO:0008270 || 69% || zinc ion binding
+|- style="background: lightgray;"
-|-
 |GO:0005507 || 69% || copper ion binding
+|- style="background: lightgray;"
-|-
 |GO:0005506 || 67% || iron ion binding
+|- style="background: lightgray;"
-|-
 |}
@@ Line 892: / Line 1,042: @@
 ! GO description
 ! GO id
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Amyloid A4 N-terminal heparin-binding || Domain || x || Cellular component || integral to membrane || GO:0016021
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Amyloid A4 N-terminal heparin-binding || Domain || x || Molecular function || binding || GO:0005488
+|- style="background: lightgray;"
-|-
 | Pfam-A || Copper-binding of amyloid precursor, CuBD || Domain || x || - || - || -
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Kunitz/Bovine pancreatic trypsin inhibitor domain || Domain || x || Molecular function || serine-type endopeptidase inhibitor activity || GO:0004867
+|- style="background: lightgray;"
-|-
 | Pfam-A || E2 domain of amyloid precursor protein || Domain || x || - || - || -
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Beta-amyloid peptide || Family || x || Cellular component || integral to membrane || GO:0016021
+|- style="background: #9ACD32;"
-|-
 | Pfam-A || Beta-amyloid peptide || Family || x || Molecular function || binding || GO:0005488
+|- style="background: lightgray;"
-|-
 | Pfam-A || beta-amyloid precursor protein C-terminus || Family || x || - || - || -
+|- style="background: lightgray;"
-|-
 | Pfam-A || Exonuclease VII, large subunit || Family ||  || - || - || -
+|- style="background: lightgray;"
-|-
 | Pfam-A || Transcriptional activator TraM || Family ||  || - || - || -
 |-
@@ Line 940: / Line 1,090: @@
 ! GO aspect
 ! GO id
+|- style="background: lightgray;"
-|-
 | Highest information content || Structural protein || Molecular function || GO:0005198
+|- style="background: lightgray;"
-|-
 | Highest probability || Signal transducer || Molecular function || GO:0004871
 |-
 |}
+==Evaluation of the Results==
+We used the Gene Ontology annotation of the corresponding protein of the [http://www.ebi.ac.uk/ EBI website] as the reference for the evaluation of the programs (see [[Sequence-based_predictions_GLA#List_of_annotated_GO_ids|list of annotated GO ids]]). Afterwards we determined the true positives, false positives, true negatives and false negatives and calculated the sensitivity and specificity. For this, we created Venn diagrams which we provide on [[Fabry_Disease_sequence_prediction_GO_terms_venn|this page]]. A true negative was defined as a false positive of one of the other two programs. "Overall" is the evaluation of all six proteins. The highest value of sensitivity/specificity in respect to a certain protein is marked with a green background.
+{|class="wikitable" border="1" style="text-align:center; border-spacing:0;"
+! Protein
+! Program
+! True positives
+! False positives
+! True negatives
+! False negatives
+! Sensitivity
+! Specificity
+|-
+|rowspan="3" | GLA
+| GOPET || 4 || 1 || 1 || 18 || style="background: #9ACD32;" | 0.18 || 0.5
+|-
+| Pfam || 2 || 0 || 2 || 20 || 0.09 || style="background: #9ACD32;" | 1
+|-
+| ProtFun || 0 || 1 || 1 || 22 || 0 || 0.5
+|-
+|rowspan="3" | BACR_HALSA
+| GOPET || 1 || 2 || 1 || 11 || 0.08 || 0.33
+|-
+| Pfam || 3 || 0 || 3 || 9 || style="background: #9ACD32;" | 0.25 || style="background: #9ACD32;" | 1
+|-
+| ProtFun || 0 || 1 || 2 || 12 || 0 || 0.67
+|-
+|rowspan="3" | RET4_HUMAN
+| GOPET || 5 || 3 || 1 || 36 || style="background: #9ACD32;" | 0.12 || 0.25
+|-
+| Pfam || 1 || 0 || 4 || 40 || 0.02 || style="background: #9ACD32;" | 1
+|-
+| ProtFun || 0 || 1 || 3 || 41 || 0 || 0.75
+|-
+|rowspan="3" | INSL5_HUMAN
+| GOPET || 1 || 0 || 1 || 3 || 0.25 || 1
+|-
+| Pfam || 2 || 0 || 1 || 2 || style="background: #9ACD32;" | 0.5 || style="background: #9ACD32;" | 1
+|-
+| ProtFun || 1 || 1 || 0 || 3 || 0.25 || 0
+|-
+|rowspan="3" | LAMP1_HUMAN
+| GOPET || 0 || 2 || 2 || 17 || 0 || 0.5
+|-
+| Pfam || 1 || 0 || 4 || 16 || style="background: #9ACD32;" | 0.06 || style="background: #9ACD32;" | 1
+|-
+| ProtFun || 0 || 2 || 2 || 17 || 0 || 0.5
+|-
+|rowspan="3" | A4_HUMAN
+| GOPET || 7 || 6 || 2 || 71 || style="background: #9ACD32;" | 0.09 || 0.25
+|-
+| Pfam || 3 || 0 || 8 || 75 || 0.04 || style="background: #9ACD32;" | 1
+|-
+| ProtFun || 0 || 2 || 6 || 78 || 0 || 0.75
+|-
+|rowspan="3" | '''Overall'''
+| '''GOPET''' || '''18''' || '''14''' || '''8''' || '''156''' || style="background: #9ACD32;" | '''0.10''' || '''0.36'''
+|-
+| '''Pfam''' || '''12''' || '''0''' || '''22''' || '''162''' || '''0.07''' || style="background: #9ACD32;" | '''1'''
+|-
+| '''ProtFun''' || '''1''' || '''8''' || '''14''' || '''171''' || '''0''' || '''0.64'''
+|-
+|}
+There are two things remarkable. First, Pfam does not have a single false positive prediction and hence the specificity is 1. This leads us to the conclusion that the search feature has also a very good specificity and that the annotation of the domains and families in Pfam is also very accurate in respect to Gene Ontology. Second, ProtFun achieved only one true positive and thus the sensitivity is close to 0. This can be explained due to the fact that ProtFun only predicts very general Gene Ontology categories (e.g. immune response, receptor, etc.) and therefore the prediction misses alot of subannotations. Nevertheless five out of six predictions of the main categories were also wrong and therefore we would not recommend to use ProtFun for a prediction of GO terms.
+In respect to the sensitivity, GOPET achieves the highest value with 0.10 which is still very low. Since the overall sensitivity of Pfam is slighty lower (0.07) and the specificity of Pfam is 1, we would prefer Pfam for the prediction of GO term. Overall the sensitivity is very low and hence you have to keep in mind that you will miss a lot of GO terms when using one of these programs.
+===List of annotated GO ids===
+* [http://www.ebi.ac.uk/QuickGO/GProtein?ac=P06280 GLA]
+* [http://www.ebi.ac.uk/QuickGO/GProtein?ac=P02945 BACR_HALSA]
+* [http://www.ebi.ac.uk/QuickGO/GProtein?ac=P02753 RET4_HUMAN]
+* [http://www.ebi.ac.uk/QuickGO/GProtein?ac=Q9Y5Q6 INSL5_HUMAN]
+* [http://www.ebi.ac.uk/QuickGO/GProtein?ac=P11279 LAMP1_HUMAN]
+* [http://www.ebi.ac.uk/QuickGO/GProtein?ac=P05067 A4_HUMAN]
 = References =

Difference between revisions of "Sequence-based predictions GLA"

Latest revision as of 14:04, 14 June 2011

Contents

Proteins

Secondary structure prediction

PSIPRED

Prediction

Jpred3

Comparison with DSSP

Results

Prediction of disordered regions

DISOPRED

POODLE

POODLE-S: Missing residues

POODLE-S: High B-Factor residues

IUPRED

Short Disorder

Long Disorder

META-Disorder

PROFbval

NORSnet

UCON

Evaluation of the results

Prediction of transmembrane alpha-helices and signal peptides

Programs

TMHMM

Phobius/Polyphobius

Octopus/Spoctopus

TargetP

SignalP

Proteins

General evaluation

GLA

Transmembrane alpha-helices

Signal peptides

BACR_HALSA

Transmembrane alpha-helices

Signal peptides

RET4_HUMAN

Transmembrane alpha-helices

Signal peptides

INSL5_HUMAN

Transmembrane alpha-helices

Signal peptides

LAMP1_HUMAN

Transmembrane alpha-helices

Signal peptides

A4_HUMAN

Transmembrane alpha-helices

Signal peptides

TargetP

Prediction of GO terms

GOPET

Pfam

ProtFun

Proteins

GLA

GOPET

Pfam

ProtFun

BACR_HALSA

GOPET

Pfam

ProtFun

RET4_HUMAN

GOPET

Pfam

ProtFun

INSL5_HUMAN

GOPET

Pfam

ProtFun

LAMP1_HUMAN

GOPET

Pfam

ProtFun

A4_HUMAN

GOPET

Pfam

ProtFun