Difference between revisions of "Canavan Task 3 - Sequence-based predictions"
(→Information on Proteins) |
(→Information on Proteins) |
||
Line 81: | Line 81: | ||
<table cellspacing=0 cellpadding=5> |
<table cellspacing=0 cellpadding=5> |
||
<tr align="center"> |
<tr align="center"> |
||
− | <td style="text-align:left; border-style: solid; border-width: |
+ | <td style="text-align:left; border-style: solid; border-width: 0 2px 0 2px"><b>Identifier</b></td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0"><b>P35462</b></td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0"><b>P35462</b></td> |
||
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0"><b>Q9YDF8</b></td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0"><b>Q9YDF8</b></td> |
||
Line 88: | Line 88: | ||
<tr align="center"> |
<tr align="center"> |
||
− | <td style="text-align:left; border-style: solid; border-width: |
+ | <td style="text-align:left; border-style: solid; border-width: 0 2px 2px 2px">'''Protein'''</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">D(3) dopamine receptor</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">D(3) dopamine receptor</td> |
||
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Voltage-gated potassium channel</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Voltage-gated potassium channel</td> |
||
Line 95: | Line 95: | ||
<tr align="center"> |
<tr align="center"> |
||
− | <td style="text-align:left; border-style: solid; border-width: |
+ | <td style="text-align:left; border-style: solid; border-width: 0 2px 2px 2px">'''Organism'''</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Homo sapiens (Human)</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Homo sapiens (Human)</td> |
||
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Aeropyrum pernix</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Aeropyrum pernix</td> |
||
Line 102: | Line 102: | ||
<tr align="center"> |
<tr align="center"> |
||
− | <td style="text-align:left; border-style: solid; border-width: |
+ | <td style="text-align:left; border-style: solid; border-width: 0 2px 2px 2px">'''Sequence length'''</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">400</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">400</td> |
||
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">295</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">295</td> |
||
Line 109: | Line 109: | ||
<tr align="center"> |
<tr align="center"> |
||
− | <td style="text-align:left; border-style: solid; border-width: |
+ | <td style="text-align:left; border-style: solid; border-width: 0 2px 2px 2px">'''Subcellular location'''</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Cell membrane; Multi-pass membrane protein</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Cell membrane; Multi-pass membrane protein</td> |
||
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Cell membrane; Multi-pass membrane protein</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">Cell membrane; Multi-pass membrane protein</td> |
||
Line 116: | Line 116: | ||
<tr align="center"> |
<tr align="center"> |
||
− | <td style="text-align:left; border-style: solid; border-width: |
+ | <td style="text-align:left; border-style: solid; border-width: 0 2px 2px 2px">'''PDB Identifier'''</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0">3PBL</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0">3PBL</td> |
||
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0">1ORQ</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 2px 0">1ORQ</td> |
||
Line 123: | Line 123: | ||
<tr align="center"> |
<tr align="center"> |
||
− | <td style="text-align:left; border-style: solid; border-width: |
+ | <td style="text-align:left; border-style: solid; border-width: 0 2px 2px 2px">'''Structure'''</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">[[File:CD_3PBL.jpg|250px]]</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">[[File:CD_3PBL.jpg|250px]]</td> |
||
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">[[File:CD_1ORQ.jpg|250px]]</td> |
<td style="text-align:center; border-style: solid; border-width: 0 2px 0 0">[[File:CD_1ORQ.jpg|250px]]</td> |
Revision as of 16:22, 18 May 2012
Oh, I would sing of mackerel skies, And why the sea is wet, Of jelly-fish and conger-eels, And things that I forget. (taken from "The Cumberbunce" by Paul West)
Contents
Protocol
Commands, Source Code and other methodocial issues are kept in the protocoll.
Secondary Structure Prediction
Information on Proteins
//TODO: picsPrediction of disordered regions
Transmembrane Helix Prediction
We analyzed the prediction of Transmembrane Helices for the proteins listed in <xr id="table_TMH_info"/> and for our protein Aspartoacylase. Next to Polyphobius, we also examined the results for other TMH Predictors, namely TMHMM and PHDhtm.
Information on Proteins
<figtable id="table_TMH_info"> <xr nolink id="table_TMH_info"/> Information on the proteins used for the evaluation of different TMH prediction methods.
</figtable>
Aspartoacylase
TMH prediction of our Protein yielded the expected prediction of only cytoplasmic residues.
P35462
For P35462 there is only one structure listed in UniProt : 3pbl. For this structure, OPM and PDBTM list 7 TMH, which is the same amount of TMH that can be found in the Uniprot annotation for P35462. There is only a slight difference in the localization of the TMH. Usually, the annotation between these three references differs about 1-4 amino acid residues.
All prediction methods yield the same amount of TMH. Furthermore Polyphobius, TMHMM and PHDhtm predict the location of TMH very accurate with only a small deviations compared to the reference annotations of UniProt, PDBTM and OPM.
In <xr id="table_3pbl"/> the exact localization of the TMH of the reference sources UniProt, PDBTM and OPM is listed as well as for the prediction methods Polyphobius, TMHMM and PHDthm. In <xr id="CD_tm_3pbl"/> the length distribution for the predicted and annotated TMH is depicted. One can see that PDBTM in general finds shorter TMH, whereas Polyphobius and OPM find longer helices. Furthermore the location of the TMH within the sequence is visualized in <xr id="vis_3pbl"/>.
<figtable id="table_3pbl" >
UniProt | PDBTM | OPM | Polyphobius | TMHMM | PHDhtm |
33-55 | 35-52 | 34-52 | 30-55 | 32-54 | 31-55 |
66-88 | 68-84 | 67-91 | 66-88 | 67-89 | 65-90 |
105-126 | 109-123 | 101-126 | 105-126 | 104-126 | 101-130 |
150-170 | 152-166 | 150-170 | 150-170 | 150-172 | 151-170 |
188-212 | 191-206 | 187-209 | 188-212 | 192-214 | 188-213 |
330-351 | 334-347 | 330-351 | 329-352 | 331-353 | 331-353 |
367-388 | 368-382 | 363-386 | 367-386 | 368-390 | 362-387 |
</figtable>
<figure id="CD_tm_3pbl"></figure> | <figure id="vis_3pbl"></figure> |
P47863
In UniProt there are several structures listed for P47863:
- 2D57 X-ray 3.20 A
- 2ZZ9 X-ray 2.80 A
- 3IYZ electron microscopy 10.00 A
Since 2ZZ9 is a mutant, we decided to use 2D57 as a reference structure with OPM and PDBTM.
Interestingly, OPM lists 8 TMH for P47863, whereas PDBTM agrees with the UniProt annotation and lists 6 TMH. Yet, the two additional helices in OPM are rather short (<10 AA) and correspond to two loop segments in the PDBTM annotation.
Just as there is disagreement between the reference sources, the different prediction methods yield deviating results. Polyphobius and TMHMM predict 6 helices, which correspond to the 6 helices listed in UniProt, PDBTM and OPM. PHDhtm finds only 5 helices, of which helix 2 is about 60 amino residues long and matches helix 2 and 3 found by the other methods. This long helix also incorporates the loop region annotated in PDBTM and the additional helix listed in OPM. Therefore PHDhtm just merged these 3 structural elements into one helical region.
In <xr id="table_2D57"/> the exact localization of the TMH of the reference sources UniProt, PDBTM and OPM is listed as well as for the prediction methods Polyphobius, TMHMM, DAS and PHDthm. In <xr id="CD_tm_2D57"/> the length distribution for the predicted TMH with polyphobius and the annotated TMH is depicted. One can see that PDBTM in general finds shorter helices, wheras OPM and Polyphobius find longer ones. Furthermore the location of the TMH within the sequence is visualized in <xr id="vis_2D57"/>.
<figtable id="table_2D57" >
UniProt | PDBTM | OPM | Polyphobius | TMHMM | PHDhtm |
37-57 | 39-55 | 34-56 | 34-58 | 33-55 | 34-56 |
65-85 | 72-89 | 70-88 | 70-91 | 70-92 | 70-137 |
95-106(loop) | 98-107 | 70-137(cont) | |||
116-136 | 116-133 | 112-136 | 115-136 | 112-134 | 70-137(cont) |
156-176 | 158-177 | 156-178 | 156-177 | 154-176 | 156-176 |
185-205 | 188-205 | 189-203 | 188-208 | 189-211 | 190-210 |
209-222(loop) | 214-223 | ||||
232-252 | 231-248 | 231-252 | 231-252 | 231-253 | 224-250 |
</figtable>
<figure id="CD_tm_2D57"></figure> | <figure id="vis_2D57"></figure> |
Q9YDF8
For Q9YDF8,in UniProt one can find the annotation for 6 TM regions and 2 intramembrane regions and four different structures:
- 1ORQ X-ray 3.20 A 31-253
- 1ORS X-ray 1.90 A 33-160
- 2A0L X-ray 3.90 A 20-259
- 2KYH NMR - 19-160
Since in 1ORS, only residues 33-160 have been crystalized, we decided to use 1ORQ for comparison with the prediction method's output.
For Q9YDF8, Polyphobius did not find any homologues with the blast search. Therefore, no homolgy information could be used for the TMH prediction. The TMH prediction done by Polyphobius in generel coincedes with the UniProt annotation: Polyphobius finds 7 TMH and their overlap with the TMH listed in UniProt is large. However, OPM and PDBTM list very diverse results. There is only a consensus on TMH 5 and 7. When comparing the annotation of OPM for the two structures 1ORQ and 1ORS, one can find tremendous differences:
- 1ors: C - Tilt: 19° - Segments: 1(25-46), 2(55-78), 3(86-97), 4(100-107), 5(117-148)
- 1orq: C - Tilt: 31° - Segments: 1(153-172), 2(183-195), 3(207-225)
Yet, if one considers the sequence shift of 13 AA for the 1ORQ PDB sequence and the Q9YDF8 UniProt sequence (see <xr id=seq_shift />), both annotations together represent the identified TMH with Phobius and the annotated TMH in UniProt. The same observations account for the PDBTM annotations.
<figure id="seq_shift"> </figure>
In general one can say, that the three analyzed TMH prediction methods yield comparable results, that agree with the annotated TMH locations (as far as the annotations agree with each other). Stronger deviations can be found for the prediction of amino acid positions 100-160. Here, Polyphobius predicts two TMH, TMHMM finds only the first helix, whereas PHDhtm detects one long helix spanning 42 residues. Since the annotaion for this region differs for UniProt, OPM and PDBTM, it is hard to decide which methods shows the most accurate result.
In <xr id="table_1orq"/> the exact localization of the TMH of the reference sources UniProt, PDBTM and OPM is listed as well as for the prediction methods Polyphobius, TMHMM and PHDthm. In <xr id="CD_tm_Q9YDF8"/> the length distribution for the predicted TMH with polyphobius and the annotated TMH is depicted. One can see that PDBTM in general finds shorter helices, wheras OPM and Polyphobius find longer ones. Furthermore the location of the TMH within the sequence is visualized in <xr id="vis_1orq"/>.
<figtable id="table_1orq" >
UniProt | PDBTM | OPM(=1ors&1orq) | Polyphobius | TMHMM | PHDhtm |
39 – 63 | 34-65(1ORS:40-63) | 38-59 | 39-61 | 42-60 | 42-64 |
68 – 92 | 70-93(1ORS:68-88) | 68-91 | 68-88 | 68-87 | 69-88 |
99-110 | 42-60 | ||||
109 – 125 | (1ORS:101-120) | 113-120 | 108-129 | 107-129 | 107-149 |
129 – 145 | (1ORS:131-155) | 130-161 | 137-157 | 107-149(cont) | |
160 – 184 | 164-184 | 166-185 | 163-184 | 162-184 | 162-181 |
196 – 208(intramembrane) | 196-208 | 196-213 | 199-218 | 197-212 | |
222 – 253 | 222-249 | 220-238 | 224-244 | 225-244 | 220-247 |
</figtable>
<figure id="CD_tm_Q9YDF8"></figure> | <figure id="vis_1orq"></figure> |
signal peptides
Checking for possible confusion TMH <=> signalpeptides with SignalP 4.0
Signal Peptide Prediction
Information on Proteins
P02768
<figure id="signalp_p02768">
</figure>
SignalP3.0 NN predicts a signal peptide for P02768 with a cleavage site between residue 18 and 19: AYS-RG with a value of 0.880 at a cutoff at 0.43.
SignalP3.0 HM has max. cleavage site probability of 0.759 between pos. 18 and 19
SignalP4.0 predicts the cleavage site between pos. 18 and 19: AYS-RG with D=0.848
In <xr id="signalp_p02768"/> you can see the graphical output of the SignalP prediction for P02768.
P11279
<figure id="signalp_p11279">
</figure>
SignalP3.0 NN predicts the cleavage site between pos. 28 and 29: ASA-AM with a value of 0.931 and a cutoff at 0.43.
SignalP3.0 HMM has max. cleavage site probability of 0.847 between pos. 28 and 29.
SignalP4.0 predicts the cleavage site between pos. 28 and 29: ASA-AM with D=0.952
In <xr id="signalp_p11279"/> you can see the graphical output of the SignalP prediction for P02768.
P47863
<figure id="signalp_p47863">
</figure>
Signal3.0 NN has no consensus on whether P47863 has a signal peptide or not. Most likely, a possible cleavage site is between pos. 54 and 55: SVG-ST.
Signal3.0 HMM has a signal peptide probability of 0.723 and the max. cleavage site probability of 0.533 between pos. 56 and 57.
SignalP4.0 predicts no signal peptide with an value of D=0.154 at a D-cutoff at 0.500.
In <xr id="signalp_p47863"/> you can see the graphical output of the SignalP prediction for P47863.
Aspartoacylase
<figure id="signalp_p45381">
</figure>
In Signal3.0 NN, the majority of the scores predicts no signal peptide for P45381. Only the C-Score is over the cutoff and predicts the cleavage site at pos. 23.
Signal3.0 HMM predicts no signal peptide (probability of 0.000for signal peptide).
SignalP4.0 predicts no signal peptide with an value of D=0.124 at a D-cutoff of 0.450.
In <xr id="signalp_p45381"/> you can see the graphical output of the SignalP prediction for p45381.
GO terms and Pfam
Pfam
AstE_AspA family: Succinylglutamate desuccinylase / Aspartoacylase family