Canavan Task 3 - Sequence-based predictions

Protocol

Commands, Source Code and other methodocial issues are kept in the protocol.

Secondary Structure Prediction

Information on Proteins

In the following table we summarized information on the three proteins that were used, next to our own protein Aspartoacylase, to predict properties from sequence only.

**<xr nolink id="info_on_prot"/>**
Information on the proteins used for secondary structure predictions.
Identifier	P10775	Q08209	Q9X0E6
Protein	Ribonuclease inhibitor	Serine/threonine phosphatase (Calcineurin)	Divalent-cation tolerance protein
Organism	Sus scrofa (pig)	Homo Sapiens	Thermotoga maritima
Sequence length	456	521	101
Subcellular location	Cytoplasm	Nucleus	Cytoplasm
PDB Identifier	2BNH	1AUI	1O5J
Structure

</figtable>

Consistent nomenclature and sequence issues

Three-state accuracy

We mapped the (slightly differing) secondary structure elements of the three prediction methods onto the three common possible states C (Coil), H (Helix), and E (Extended; Beta-Sheet) to make comparison of methods easier.

UniProt vs PDB Sequences

DSSP assigns secondary structure based on given 3D structures of proteins. The chosen pdb entries for the according UniProt sequences can be found in the table above. However, pdb sequences often differ significantly to their corresponding UniProt sequence due to the circumstances of the experiments performed for solving the structure (missing atoms and residues being the main problem). We therefore performed a pairwise alignment to allow for comparison of predictions.

Predictions - Analysis and Comparison

P45381 - 2O53 - Aspartoacylase

Aspartoacylase has a fairly complex structure, consisting of helices, beta-sheets and coiled regions with no apparent order or regularity. We therefore expected secondary structure prediction to be difficult. <xr id="sec_str_P45381"/> summarizes the Q3 state accuracy, while <xr id="reprof_aspa"/> and <xr id="psipred_aspa"/> visualise the prediction results of the two methods. It is noticeable that there are huge differences in the two sequences (UniProt and PDB), so judging the performance on Q3 alone is clearly wrong. Looking at the figures, you can see that both methods manage to capture wide regions without gaps fairly well.

**<xr nolink id="sec_str_P45381"/>** State Accuracy (P45381 - 2O53).
P10775 - 2BNH	Reprof	PsiPred
Q3	35.03	38.32
QH	42.77	37.95
QE	22.84	32.10
QC	37.91	42.60

</figtable>

<xr nolink id="reprof_aspa"/>
Secondary structure state predictions for Aspartoacylase, P45381, predicted by ReProf. The sequence runs along the x-axis, and four states can be found on the y-axis as assigned by DSSP or predicted by reprof. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure> <figure id="psipred_aspa">

<xr nolink id="psipred_aspa"/>
Secondary structure state predictions for Aspartoacylase, P45381, predicted by PsiPred. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure>

P10775 - 2BNH - Ribonuclease Inhibitor

As can be seen from the cartoon picture in the <xr id="info_on_prot"/>, the ribonuclease inhibitor P01775 basically consists of helices alternating with small beta-sheet elements (and very regularly so), resembling a horseshoe-motif. Also, there are no gaps in the alignment of the pdb and UniProt sequence.

Both prediction methods (reprof and psipred) capture this basic structural motif well (which is also reflected in the scores), even though reprof has a tendency to elongate coils or shift helices. See <xr id="reprof_P10775"/> and <xr id="psipred_P10775"/>.

**<xr nolink id="sec_str_P10775"/>** State Accuracy (P10775 - 2BNH).
P10775 - 2BNH	Reprof	PsiPred
Q3	61.05	91.90
QH	71.94	90.31
QE	21.05	85.96
QC	61.58	95.07

</figtable>

<xr nolink id="reprof_P10775"/>
Secondary structure state predictions for Ribonuclease Inhibitor, P45381, predicted by ReProf. The sequence runs along the x-axis, and four states can be found on the y-axis as assigned by DSSP or predicted by reprof. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure> <figure id="psipred_P10775">

<xr nolink id="psipred_P10775"/>
Secondary structure state predictions for Ribonuclease Inhibitor, P45381, predicted by PsiPred. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure>

Q08209 - 1AUI - Calcineurin

The structure of Calcineurin is again more complex than that of P10775, and, presumably, much harder to predict. This is reflected in both the scores as well as the figures (<xr id="reprof_Q08209"/> and <xr id="psipred_Q08209"/>). The sequence alignment contains some gaps. Again, PsiPred outperforms ReProf.

**<xr nolink id="sec_str_Q08209"/>** State Accuracy (Q08209 - 1AUI).
Q08209 - 1AUI	Reprof	PsiPred
Q3	45.10	59.54
QH	25.00	52.97
QE	62.32	46.38
QC	63.18	74.06

</figtable>

<xr nolink id="reprof_Q08209"/>
Secondary structure state predictions for Calcineurin, Q08209, predicted by ReProf. The sequence runs along the x-axis, and four states can be found on the y-axis as assigned by DSSP or predicted by reprof. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure> <figure id="psipred_Q08209">

<xr nolink id="psipred_Q08209"/>
Secondary structure state predictions for Calcineurin, Q08209, predicted by PsiPred. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure>

Q9X0E6 - 1O5J - Divalent-cation tolerance protein

The structure of the 'divalent-cation tolerance protein CutA' is again more regular. Consequently, both methods, especially PsiPred, perform well and manage to capture many of the regularly occuring states. See <xr id="reprof_Q9X0E6"/> and <xr id="psipred_Q9X0E6" />.

**<xr nolink id="sec_str_Q9X0E6"/>** State Accuracy (Q9X0E6 - 1O5J).
Q9X0E6 - 1O5J	Reprof	PsiPred
Q3	55.24	84.76
QH	94.74	89.47
QE	26.83	80.49
QC	40.00	84.00

</figtable>

<xr nolink id="reprof_Q9X0E6"/>
Secondary structure state predictions for Q9X0E6, predicted by ReProf. The sequence runs along the x-axis, and four states can be found on the y-axis as assigned by DSSP or predicted by reprof. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure> <figure id="psipred_Q9X0E6">

<xr nolink id="psipred_Q9X0E6"/>
Secondary structure state predictions for Q9X0E6, predicted by PsiPred. 1=Helix, 2=Sheet, 3=Coil, 4=Gap

</figure>

Prediction of disordered regions

With IUPred there are three options to predict disorder in a protein, which we applied to all proteins:

globular domains: finds globular domains in a protein, that does not contain disorderd residues
short disorder: single residues that might lead to disorder (like missing residues in the X-ray structure)
- As it is written on the IUPred website: "As chain termini of globular proteins are often disordered in X-ray structures, this is taken into account by an end-adjustment parameter which favors disorder prediction at the ends"
long disorder: at least 30 consecutive residues predicted to be disordered

IUPred generates a score between 0 and 1. Scores above 0.5 indicate disorder.

P45381 - 2O53 - Aspartoacylase

Running IUPred with parameter glob, resulted in the prediction of one globular domain:

Number of globular domains:     1 
          globular domain       1.    1 - 313

IUPred predicts the N- and C-terminus of the protein to contain disorderd residues (~pos 1-10), which can be seen in <xr id="iupred_short_p45381"/>. This case of predicting the termini of the protein as disordered is stated in the description of the method

IUPred does not predict any long range disorder (see <xr id="iupred_long_p45381"/> ).

<xr nolink id="iupred_short_p45381"/>
IUPred output for short disorder in Aspartoacylase

</figure>

<xr nolink id="iupred_long_p45381"/>
IUPred output for long range disorder in Aspartoacylase

</figure>

There were no direct hits for P45381 in DisProt. PSI-Blast search with the P35831 sequence identified three hits, but with high E-Values. Furthermore, only 40 to 80 residues have been aligned, which is why these hits cannot give any reasonable information.

Sequences producing significant alignments        Score (bits)     E Value
        
DP00080                                           29               0.11         
DP00517                                           23               5.3         
DP00102                                           22               8.3

P10775

Running IUPred with parameter glob, resulted in the prediction of one globular domain:

Number of globular domains:     1 
          globular domain       1.    1 - 456

IUPred predicts the N-terminus (pos 1 - 12) and the C-terminus (452-456) of the protein to contain disorderd residues, which can be seen in <xr id="iupred_short_p10775"/>

IUPred does not predict any long range disorder for P10775 (see <xr id="iupred_long_p10775"/> ).

<xr nolink id="iupred_short_p10775"/>
IUPred output for short disorder in P10775

</figure>

<xr nolink id="iupred_long_p10775"/>
IUPred output for long range disorder in P10775

</figure>

There was no direct hit for P10775 in DisProt. Searching via PSI-Blast yielded one significant hit:

Sequences producing significant alignments    Score(bits) E-Value
DP00554                                       123         5e-30

For this protein, DisProt lists one disordered region at the N-terminus (pos 31-50), which is shown in <xr id="disprot_p10775"/>. IUPred in contrast predicts the first 12 residues to form a disordered region.

<xr nolink id="disprot_p10775"/>
Visualization of DisProt annotation for P10775: there is only one disorderd region annotated from pos 31-50

</figure>

Q08209

Running IUPred with parameter glob, resulted in the prediction of one globular domain. Since the protein has a length of 521 residues, the result concludes, that the C-terminal part of the protein is not part of the globular domain and contains disordered regions.

Number of globular domains:     1 
          globular domain       1.    5 - 446

For the parameter "short disorder", IUPred predicts the N-terminus (pos 1 - 20) and the C-terminus (460-521) of the protein to contain disorderd residues, which can be seen in <xr id="iupred_short_q08209"/>

Though IUPred predicts the C-terminus to contain many residues for short range disorder, IUPred does not predict any long range disorder for Q08209 (see <xr id="iupred_long_q08209"/> ).

<xr nolink id="iupred_short_q08209"/>
IUPred output for short disorder in Q08209

</figure>

<xr nolink id="iupred_long_q08209"/>
IUPred output for long range disorder in Q08209

</figure>

DisProt declares 31% of the protein to be disorderd. In <xr id="disprot_q08209"/> can be seen where the annotated disorderd regions are located. DisProt already characterizes a region starting from position 374 as disordered, whereas IUPred predicts residues starting from 460 to be part of a disordered region.

<xr nolink id="disprot_q08209"/>
Visalization of the DisProt annotation for Q08209: 31% of the protein contain disordered regions

</figure>

Q9X0E6

Running IUPred with parameter glob, resulted in the prediction of one globular domain.

Number of globular domains:     1 
          globular domain       1.    1 - 101

With parameter "short range", IUPred predicts just some residues at the N- and C-terminus to be disordered and also for long range, IUPred predicts no disorderd regions.

<xr nolink id="iupred_short_q9x0e6"/>
IUPred output for short diordered segments in Q9X0E6.

</figure>

<xr nolink id="iupred_long_q9x0e6"/>
IUPred output for long diordered segments in Q9X0E6.

</figure>

No direct hits were found in DisProt. Searching via PSI-Blast did not work and via Smith-Waterman resulted in minimal alignments of about only 20 AA. For the found hits, there are no disordered regions annotated in DisProt.

Transmembrane Helix Prediction

We analyzed the prediction of Transmembrane Helices for the proteins listed in <xr id="table_TMH_info"/> and for our protein Aspartoacylase. Next to Polyphobius, we also examined the results for other TMH Predictors, namely TMHMM and PHDhtm.

Information on Proteins

<figtable id="table_TMH_info"> <xr nolink id="table_TMH_info"/> Information on the proteins used for the evaluation of different TMH prediction methods.

Identifier	P35462	Q9YDF8	P47863
Protein	D(3) dopamine receptor	Voltage-gated potassium channel	Aquaporin-4
Organism	Homo sapiens (Human)	Aeropyrum pernix	Rattus norvegicus (Rat)
Sequence length	400	295	323
Subcellular location	Cell membrane; Multi-pass membrane protein	Cell membrane; Multi-pass membrane protein	Membrane; Multi-pass membrane protein
PDB Identifier	3PBL	1ORQ	2D57
Structure

</figtable>

Aspartoacylase

TMH prediction of our Protein yielded the expected prediction of only cytoplasmic residues.

P35462

For P35462 there is only one structure listed in UniProt : 3pbl. For this structure, OPM and PDBTM list 7 TMH, which is the same amount of TMH that can be found in the Uniprot annotation for P35462. There is only a slight difference in the localization of the TMH. Usually, the annotation between these three references differs about 1-4 amino acid residues.

All prediction methods yield the same amount of TMH. Furthermore Polyphobius, TMHMM and PHDhtm predict the location of TMH very accurate with only a small deviations compared to the reference annotations of UniProt, PDBTM and OPM.

In <xr id="table_3pbl"/> the exact localization of the TMH of the reference sources UniProt, PDBTM and OPM is listed as well as for the prediction methods Polyphobius, TMHMM and PHDthm. In <xr id="CD_tm_3pbl"/> the length distribution for the predicted and annotated TMH is depicted. One can see that PDBTM in general finds shorter TMH, whereas Polyphobius and OPM find longer helices. Furthermore the location of the TMH within the sequence is visualized in <xr id="vis_3pbl"/>.

**<xr nolink id="table_3pbl"/>** AA Position of the predicted/annotated TMH for different methods/sources
UniProt	PDBTM	OPM	Polyphobius	TMHMM	PHDhtm
33-55	35-52	34-52	30-55	32-54	31-55
66-88	68-84	67-91	66-88	67-89	65-90
105-126	109-123	101-126	105-126	104-126	101-130
150-170	152-166	150-170	150-170	150-172	151-170
188-212	191-206	187-209	188-212	192-214	188-213
330-351	334-347	330-351	329-352	331-353	331-353
367-388	368-382	363-386	367-386	368-390	362-387

</figtable>

<xr nolink id="CD_tm_3pbl"/>
Length distribution for the TMH annotated in PDBTM and OPM and the predicted TMH with Polyphobius. PDBTM in general lists shorter TMH.

</figure>

<xr nolink id="vis_3pbl"/>
Visualization of the location of the annotated and predicted TMH for P35462(3PBL)

</figure>

P47863

In UniProt there are several structures listed for P47863:

2D57 X-ray 3.20 A
2ZZ9 X-ray 2.80 A
3IYZ electron microscopy 10.00 A

Since 2ZZ9 is a mutant, we decided to use 2D57 as a reference structure with OPM and PDBTM.

Interestingly, OPM lists 8 TMH for P47863, whereas PDBTM agrees with the UniProt annotation and lists 6 TMH. Yet, the two additional helices in OPM are rather short (<10 AA) and correspond to two loop segments in the PDBTM annotation.

Just as there is disagreement between the reference sources, the different prediction methods yield deviating results. Polyphobius and TMHMM predict 6 helices, which correspond to the 6 helices listed in UniProt, PDBTM and OPM. PHDhtm finds only 5 helices, of which helix 2 is about 60 amino residues long and matches helix 2 and 3 found by the other methods. This long helix also incorporates the loop region annotated in PDBTM and the additional helix listed in OPM. Therefore PHDhtm just merged these 3 structural elements into one helical region.

In <xr id="table_2D57"/> the exact localization of the TMH of the reference sources UniProt, PDBTM and OPM is listed as well as for the prediction methods Polyphobius, TMHMM, DAS and PHDthm. In <xr id="CD_tm_2D57"/> the length distribution for the predicted TMH with polyphobius and the annotated TMH is depicted. One can see that PDBTM in general finds shorter helices, wheras OPM and Polyphobius find longer ones. Furthermore the location of the TMH within the sequence is visualized in <xr id="vis_2D57"/>.

**<xr nolink id="table_2D57"/>** AA Position of the predicted/annotated TMH for different methods/sources
UniProt	PDBTM	OPM	Polyphobius	TMHMM	PHDhtm
37-57	39-55	34-56	34-58	33-55	34-56
65-85	72-89	70-88	70-91	70-92	70-137
	95-106(loop)	98-107			70-137(cont)
116-136	116-133	112-136	115-136	112-134	70-137(cont)
156-176	158-177	156-178	156-177	154-176	156-176
185-205	188-205	189-203	188-208	189-211	190-210
	209-222(loop)	214-223
232-252	231-248	231-252	231-252	231-253	224-250

</figtable>

<xr nolink id="CD_tm_2D57"/>
Length distribution for the TMH annotated in PDBTM and OPM and the predicted TMH with Polyphobius.

</figure>

<xr nolink id="vis_2D57"/>
Visualization of the location of the annotated and predicted TMH for P47863(2D57)

</figure>

Q9YDF8

For Q9YDF8,in UniProt one can find the annotation for 6 TM regions and 2 intramembrane regions and four different structures:

1ORQ X-ray 3.20 A 31-253
1ORS X-ray 1.90 A 33-160
2A0L X-ray 3.90 A 20-259
2KYH NMR - 19-160

Since in 1ORS, only residues 33-160 have been crystalized, we decided to use 1ORQ for comparison with the prediction method's output.

For Q9YDF8, Polyphobius did not find any homologues with the blast search. Therefore, no homolgy information could be used for the TMH prediction. The TMH prediction done by Polyphobius in generel coincedes with the UniProt annotation: Polyphobius finds 7 TMH and their overlap with the TMH listed in UniProt is large. However, OPM and PDBTM list very diverse results. There is only a consensus on TMH 5 and 7. When comparing the annotation of OPM for the two structures 1ORQ and 1ORS, one can find tremendous differences:

1ORS: C - Tilt: 19° - Segments: 1(25-46), 2(55-78), 3(86-97), 4(100-107), 5(117-148)
1ORQ: C - Tilt: 31° - Segments: 1(153-172), 2(183-195), 3(207-225)

Yet, if one considers the sequence shift of 13 AA for the 1ORQ PDB sequence and the Q9YDF8 UniProt sequence (see <xr id=seq_shift />), both annotations together represent the identified TMH with Phobius and the annotated TMH in UniProt. The same observations account for the PDBTM annotations.

In general one can say, that the three analyzed TMH prediction methods yield comparable results, that agree with the annotated TMH locations (as far as the annotations agree with each other). Stronger deviations can be found for the prediction of amino acid positions 100-160. Here, Polyphobius predicts two TMH, TMHMM finds only the first helix, whereas PHDhtm detects one long helix spanning 42 residues. Since the annotaion for this region differs for UniProt, OPM and PDBTM, it is hard to decide which methods shows the most accurate result.

In <xr id="table_1orq"/> the exact localization of the TMH of the reference sources UniProt, PDBTM and OPM is listed as well as for the prediction methods Polyphobius, TMHMM and PHDthm. In <xr id="CD_tm_Q9YDF8"/> the length distribution for the predicted TMH with polyphobius and the annotated TMH is depicted. Only for helix five, the three methods agree in the helix length. Furthermore the location of the TMH within the sequence is visualized in <xr id="vis_1orq"/>.

**<xr nolink id="table_1orq"/>** AA Position of the predicted/annotated TMH for different methods/sources
UniProt	PDBTM	OPM(=1ors&1orq)	Polyphobius	TMHMM	PHDhtm
39 – 63	34-65(1ORS:40-63)	38-59	39-61	42-60	42-64
68 – 92	70-93(1ORS:68-88)	68-91	68-88	68-87	69-88
		99-110			42-60
109 – 125	(1ORS:101-120)	113-120	108-129	107-129	107-149
129 – 145	(1ORS:131-155)	130-161	137-157		107-149(cont)
160 – 184	164-184	166-185	163-184	162-184	162-181
196 – 208(intramembrane)		196-208	196-213	199-218	197-212
222 – 253	222-249	220-238	224-244	225-244	220-247

</figtable>

<xr nolink id="CD_tm_Q9YDF8"/>
Length distribution for the TMH annotated in PDBTM and OPM and the predicted TMH with Polyphobius for Q9YDF8(1ORQ).

</figure>

<xr nolink id="vis_1orq"/>
Visualization of the location of the annotated and predicted TMH for Q9YDF8(1ORQ). For OPM and PDBTM the annotation for 1ORS was also taken into account.

</figure>

signal peptides

Checking for possible confusion TMH <=> signalpeptides with SignalP 4.0

SignlaP4.0 Prediction for P35462: no SignalPeptide was found

SignlaP4.0 Prediction for Q9YDF8: no SignalPeptide was found

SignlaP4.0 Prediction for P47863: no SignalPeptide was found

Signal Peptide Prediction

Information on Proteins

<figtable id="table_signalp_info"> <xr nolink id="table_signalp_info"/> Information on the proteins used for the prediction of signal peptides.

Identifier	P02768	P11279	P47863
Protein	Serum albumin	Lysosome-associated membrane glycoprotein 1	Aquaporin-4
Organism	Homo sapiens (Human)	Homo sapiens (Human)	Rattus norvegicus (Rat)
Sequence length	609	417	323
Subcellular location	Secreted	Lysosome membrane, Single-pass type I membrane protein	Membrane; Multi-pass membrane protein
PDB Identifier	1E7I	-	2D57
Structure

</figtable>

P02768

<xr nolink id="signalp_p02768"/>
Signalp4.0 output for P02768. A cleavage site is predicted between pos 18 and 19.

</figure>

SignalP3.0 NN predicts a signal peptide for P02768 with a cleavage site between residue 18 and 19: AYS-RG with a value of 0.880 at a cutoff at 0.43.

SignalP3.0 HM has max. cleavage site probability of 0.759 between pos. 18 and 19

SignalP4.0 predicts the cleavage site between pos. 18 and 19: AYS-RG with D=0.848

In <xr id="signalp_p02768"/> you can see the graphical output of the SignalP prediction for P02768.

Polyphobius also predicts the signal peptide:

N-REGION: 1 - 2
H-REGION: 3 - 13
C-REGION: 14 - 18
NON CYTOPLASMIC: 19 - 609

We also used TargetP to predict the localization of this secreted protein. TargetP can identify the presence of a N-terminal presequence for chloroplast transit peptides (cTP), mitochondrial targeting peptides (mTP) or secretory pathway signal peptides (SP).

TargetP predicts P02768 to be a secretory protein (Loc = S) with a medium reliability (RC=3) and predicts the signal peptide to be 18 residues long. This is in accordance with SignalP (Target P also uses SignalP for cleavage site predictions).

Name                  Len            mTP     SP  other  Loc  RC  TPlen
----------------------------------------------------------------------
sp_P02768_ALBU_HUMAN  609          0.380  0.873  0.013   S    3     18

P11279

<xr nolink id="signalp_p11279"/>
Signalp4.0 output for P11279. A cleavage site is predicted between pos 28 and 29.

</figure>

SignalP3.0 NN predicts the cleavage site between pos. 28 and 29: ASA-AM with a value of 0.931 and a cutoff at 0.43.

SignalP3.0 HMM has max. cleavage site probability of 0.847 between pos. 28 and 29.

SignalP4.0 predicts the cleavage site between pos. 28 and 29: ASA-AM with D=0.952

In <xr id="signalp_p11279"/> you can see the graphical output of the SignalP prediction for P02768.

Polyphobius also predicts the signal peptide:

N-REGION: 1 - 10
H-REGION: 11 - 22
C-REGION: 23 - 28
NON CYTOPLASMIC: 29 - 381
TRANSMEM: 382 - 405
CYTOPLASMIC: 406 - 417

P47863

<xr nolink id="signalp_p47863"/>
Signalp4.0 output for P47863. No cleavage site has been predicted. All three values (C,S and Y) are below the D-cutoff at 0.5.

</figure>

Signal3.0 NN has no consensus on whether P47863 has a signal peptide or not. Most likely, a possible cleavage site is between pos. 54 and 55: SVG-ST.

Signal3.0 HMM has a signal peptide probability of 0.723 and the max. cleavage site probability of 0.533 between pos. 56 and 57.

SignalP4.0 predicts no signal peptide with an value of D=0.154 at a D-cutoff at 0.500.

In <xr id="signalp_p47863"/> you can see the graphical output of the SignalP prediction for P47863.

The prediction of Polyphobius for TMH was already discussed in section "TMH prediction."

Aspartoacylase

<xr nolink id="signalp_p45381"/>
Signalp4.0 output for Aspartoacylase. No cleavage site has been predicted. All three values (C,S and Y) are below the D-cutoff at 0.5.

</figure>

In Signal3.0 NN, the majority of the scores predicts no signal peptide for P45381. Only the C-Score is over the cutoff and predicts the cleavage site at pos. 23.

Signal3.0 HMM predicts no signal peptide (probability of 0.000for signal peptide).

SignalP4.0 predicts no signal peptide with an value of D=0.124 at a D-cutoff of 0.450.

The prediction of Polyphobius for TMH was already discussed in section "TMH prediction".

In <xr id="signalp_p45381"/> you can see the graphical output of the SignalP prediction for p45381.

GO terms and Pfam

GO terms

GOPET

GOPET predicts four GO terms for our proteins, one of them with a very high confidence (96% for hydrolase activity), and three with a fairly high confidence (81-82%). Of course, we know all of these four predictions to be accurate. So, if we were dealing with a truly unknown protein, we would very likely believe in a hydrolase activity. 81 and 82% confidence is still a high number, so we could either also believe that, or, since we are only talking about three terms, could try to have them experimentally validated.

The GOPET predictions indicate that the protein is either very specific and its homologs (on which the predictions are based on) are not involved in many reactions, or only one form of reaction is known in which the homologs take part.

**GOPET function predictions for Aspartoacylase sequence.**
GoID	Ontology	Confidence	Term name	Found in UniProt
GO:0016787	Molecular Function	96%	hydrolase activity	yes
GO:0004046	Molecular Function	82%	aminoacylase activity	yes
GO:0019807	Molecular Function	82%	aspartoacylase activity	yes
GO:0016788	Molecular Function	81%	hydrolase activity acting on ester bonds	yes

</figtable>

ProtFun

ProtFun Output: The arrows indicate the highest information content, not the highest probability in that class. Looking at the output and pretending again we do not know anything about our protein, we would learn that it seems to:

be involved in central intermediary metabolism (highest information content) (true)
have to do with Purines and Pyrimidines (highest score) (not true)
be an enzyme (true)
have a higher probability to be a Transferase (0.202) than a Hydrolase (0.115) (wrong, it is a Hydrolase)
is definitely not an Isomerase (highest information content due to high Odds value)

ProtFun Prediction for Aspartoacylase

# Functional category                  Prob     Odds
 Amino_acid_biosynthesis              0.071    3.233
 Biosynthesis_of_cofactors            0.144    2.003
 Cell_envelope                        0.033    0.535
 Cellular_processes                   0.137    1.875
 Central_intermediary_metabolism   => 0.334    5.309
 Energy_metabolism                    0.226    2.511
 Fatty_acid_metabolism                0.022    1.663
 Purines_and_pyrimidines              0.367    1.512
 Regulatory_functions                 0.021    0.128
 Replication_and_transcription        0.167    0.625
 Translation                          0.113    2.559
 Transport_and_binding                0.017    0.042

# Enzyme/nonenzyme                     Prob     Odds
 Enzyme                            => 0.703    2.454
 Nonenzyme                            0.297    0.416

# Enzyme class                         Prob     Odds
 Oxidoreductase (EC 1.-.-.-)          0.111    0.534
 Transferase    (EC 2.-.-.-)          0.202    0.585
 Hydrolase      (EC 3.-.-.-)          0.115    0.363
 Lyase          (EC 4.-.-.-)          0.031    0.662
 Isomerase      (EC 5.-.-.-)       => 0.084    2.637
 Ligase         (EC 6.-.-.-)          0.074    1.460

# Gene Ontology category               Prob     Odds
 Signal_transducer                    0.053    0.246
 Receptor                             0.004    0.024
 Hormone                              0.001    0.206
 Structural_protein                   0.001    0.041
 Transporter                          0.025    0.230
 Ion_channel                          0.015    0.257
 Voltage-gated_ion_channel            0.004    0.173
 Cation_channel                       0.011    0.234
 Transcription                        0.100    0.785
 Transcription_regulation             0.039    0.313
 Stress_response                      0.010    0.117
 Immune_response                      0.061    0.720
 Growth_factor                        0.006    0.450
 Metal_ion_transport                  0.009    0.020

Function prediction

GOPET provided only few functional annotations, but they are very specific and have a high confidence. ProtFun, in this case, did not really give any clear insight of what kind of protein we are dealing with. Go get a more clear idea, we would suggest (if there was more time now ;) ) to employ at least two more function prediction tools to compare their results and see whether they are all pointing into a similar direction.

Pfam

<xr nolink id="family"/>
Family tree of AstE_AspA with aspartoacylase on the upper stem. Picture taken from EMBL

</figure>

Searching Pfam with the sequence of the human Aspartoacylase (UniProt ID P45381) produced one significant result (E-Value 7.9e-71). It is the Succinylglutamate desuccinylase / Aspartoacylase family, short AstE_AspA. 33 sequences can be found in EMBL for this family, where <xr id="family"/> shows the family tree.

Succinylglutamate desuccinylases catalyse a reaction very similar to that of aspartoaclyase:

N-succinyl-L-glutamate + H2O <-> succinate + L-glutamate

(Recall that aspartoaclyase catalyses the reaction:

N-Acetyl-L-aspartate + H2O <--> Acetate + L-Aspartate,

so the difference lies only in succinyl vs acetyl, and glutamate vs aspartate)

The AstE_AspA familiy is part of the peptidase clan, which again has 12 members.

Canavan Task 3 - Sequence-based predictions

Contents

Protocol

Secondary Structure Prediction

Information on Proteins

Consistent nomenclature and sequence issues

Three-state accuracy

UniProt vs PDB Sequences

Predictions - Analysis and Comparison

P45381 - 2O53 - Aspartoacylase

P10775 - 2BNH - Ribonuclease Inhibitor

Q08209 - 1AUI - Calcineurin

Q9X0E6 - 1O5J - Divalent-cation tolerance protein

Prediction of disordered regions

P45381 - 2O53 - Aspartoacylase

P10775

Q08209

Q9X0E6

Transmembrane Helix Prediction

Information on Proteins

Aspartoacylase

P35462

P47863

Q9YDF8

signal peptides

Signal Peptide Prediction

Information on Proteins

P02768

P11279

P47863

Aspartoacylase

GO terms and Pfam

GO terms

GOPET

ProtFun

Function prediction

Pfam

Navigation menu

Search