Difference between revisions of "Hemochromatosis: Sequence based predictions"
(→Transmembrane Helices) |
|||
Line 252: | Line 252: | ||
! style="background-color:#adceff; text-align:center" colspan="4" | HFE (Q30201) |
! style="background-color:#adceff; text-align:center" colspan="4" | HFE (Q30201) |
||
|- |
|- |
||
− | ! Memsat || Polyphobius || OPM || PDBTM |
+ | ! style="background-color:#efefef" Memsat || style="background-color:#efefef" | Polyphobius || style="background-color:#efefef" | OPM || style="background-color:#efefef" | PDBTM |
|- |
|- |
||
|[[File:Hemo_memsat_Q30201.png|center|frameless|400px]] || [[File:hemo_poly_Q30201.png|center|frameless|500px]] || - || - |
|[[File:Hemo_memsat_Q30201.png|center|frameless|400px]] || [[File:hemo_poly_Q30201.png|center|frameless|500px]] || - || - |
||
Line 264: | Line 264: | ||
! style="background-color:#adceff; text-align:center" colspan="4" | Voltage-gated potassium channel (Q9YDF8) |
! style="background-color:#adceff; text-align:center" colspan="4" | Voltage-gated potassium channel (Q9YDF8) |
||
|- |
|- |
||
− | ! Memsat || Polyphobius || OPM || PDBTM |
+ | ! style="background-color:#efefef" | Memsat ||style="background-color:#efefef" | Polyphobius || style="background-color:#efefef" | OPM || style="background-color:#efefef" | PDBTM |
|- |
|- |
||
|[[File:hemo_memsat_Q9YDF8.png|center|frameless|400px]] || [[File:hemo_poly_Q9YDF8.png|center|frameless|500px]] || [[File:1orq_opm.png|center|frameless|500px]] || [[File:Hemo_1orq_lm.png|center|frameless|500px]] |
|[[File:hemo_memsat_Q9YDF8.png|center|frameless|400px]] || [[File:hemo_poly_Q9YDF8.png|center|frameless|500px]] || [[File:1orq_opm.png|center|frameless|500px]] || [[File:Hemo_1orq_lm.png|center|frameless|500px]] |
||
Line 276: | Line 276: | ||
! style="background-color:#adceff; text-align:center" colspan="4" | D(3) dopamine receptor (P35462) |
! style="background-color:#adceff; text-align:center" colspan="4" | D(3) dopamine receptor (P35462) |
||
|- |
|- |
||
− | ! Memsat || Polyphobius || OPM || PDBTM |
+ | !style="background-color:#efefef" | Memsat || style="background-color:#efefef" | Polyphobius || style="background-color:#efefef" | OPM || style="background-color:#efefef" | PDBTM |
|- |
|- |
||
|[[File:hemo_memsat_P35462.png|center|frameless|400px]] || [[File:hemo_poly_P35462.png|center|frameless|500px]] || [[File:3pbl_opm.png|center|frameless|500px]] || [[File:hemo_3pbl_lm.png|center|frameless|500px]] |
|[[File:hemo_memsat_P35462.png|center|frameless|400px]] || [[File:hemo_poly_P35462.png|center|frameless|500px]] || [[File:3pbl_opm.png|center|frameless|500px]] || [[File:hemo_3pbl_lm.png|center|frameless|500px]] |
||
Line 288: | Line 288: | ||
! style="background-color:#adceff; text-align:center" colspan="4" | Aquaporin-4 (P47863) |
! style="background-color:#adceff; text-align:center" colspan="4" | Aquaporin-4 (P47863) |
||
|- |
|- |
||
− | ! Memsat || Polyphobius || OPM || PDBTM |
+ | ! style="background-color:#efefef" | Memsat ||style="background-color:#efefef" | Polyphobius || style="background-color:#efefef" | OPM || style="background-color:#efefef" | PDBTM |
|- |
|- |
||
|[[File:hemo_memsat_P47863.png|center|frameless|400px]] || [[File:hemo_poly_P47863.png|center|frameless|500px]] || [[File:2d57_opm.png|center|frameless|500px]] || [[File:hemo_2d57_lm.png|center|frameless|500px]] |
|[[File:hemo_memsat_P47863.png|center|frameless|400px]] || [[File:hemo_poly_P47863.png|center|frameless|500px]] || [[File:2d57_opm.png|center|frameless|500px]] || [[File:hemo_2d57_lm.png|center|frameless|500px]] |
Revision as of 12:47, 30 August 2013
Contents
Secondary Structure
In this task, secondary structure of proteins is predicted using the programs reProf and PsiPred. The results are then compared to the DSSP[1] secondary structure assignments from corresponding crystal structures in the PDB.
<figtable id="sequences">
Uniprot | PDB | |||||
---|---|---|---|---|---|---|
UID | Name | Length | ID | Resolution [A] | Chain | Length |
Q30201 | Hereditary hemochromatosis protein | 348 | A16Z | 2.60 | A | 275 |
P10775 | Ribonuclease inhibitor | 456 | 2BNH | 2.30 | A | 457 |
Q9X0E6 | Divalent-cation tolerance protein CutA | 101 | 1VHF | 1.54 | A | 113 |
Q08209 | CAM-PRP catalytic subunit | 521 | 1M63 | 2.80 | A | 372 |
</figtable>
Thus, the crystal structures listed in <xr id="sequences"/> were selected for comparison. The first priority for selection was to get the protein in its native state and not bound to another molecule. The other criteria were the quality of the structure and the alignment to the protein sequence (having one continous segment).
In the next step, the output of the prediction programs and the DSSP assignments have to be made comparable. DSSP assigns 8 different classes of secondary structure, whereas reProf and PsiPred only predict helix(H), sheet(E) and loop(L or C). Therefore, H and G are mapped to H, E to E and all other DSSP classes to C.
To assess the quality of the prediction, the class specific accuracy, coverage and F1-measure were used along with the Q3 and SOV3<ref>Zemla. A, et al.;A Modified Definition of Sov, a Segment-Based Measure for Protein Secondary Structure Prediction Assessment; PROTEINS 34:220–223 (1999)</ref> measures. The Q3 is defined as follows:
Reprof takes as input either sequences or PSSMs. Therefore, PSSMs were generated by querying the HFE (Q30201) sequence against the big_80, SwissProt and PDB databases. The tool of choice for this was PsiBlast with standard parameters (2 iterations and e-value cutoff of 0.002).
<figtable id="prediction quality">
Prediction methods | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reprof + Sequence | Reprof + Big80 | Reprof + SwissProt | Reprof + PDB | Psipred | |||||||||||
Q3 | 0.76 | 0.81 | 0.86 | 0.84 | 0.84 | ||||||||||
SOV3 | 0.66 | 0.75 | 0.84 | 0.84 | 0.73 | ||||||||||
Acc | Cov | F1 | Acc | Cov | F1 | Acc | Cov | F1 | Acc | Cov | F1 | Acc | Cov | F1 | |
H | 0.63 | 0.33 | 0.44 | 0.74 | 0.48 | 0.59 | 0.84 | 0.65 | 0.74 | 0.85 | 0.52 | 0.64 | 0.98 | 0.77 | 0.86 |
C | 0.57 | 0.69 | 0.63 | 0.63 | 0.75 | 0.68 | 0.68 | 0.81 | 0.74 | 0.64 | 0.83 | 0.72 | 0.61 | 0.95 | 0.74 |
E | 0.79 | 0.63 | 0.70 | 0.81 | 0.84 | 0.83 | 0.89 | 0.86 | 0.87 | 0.88 | 0.85 | 0.86 | 0.94 | 0.55 | 0.69 |
</figtable>
The deciding criteria for the Reprof method choice were the Q3, SOV3 and F1-measures. For the Q3 and the SOV3, the SwissProt PSSM performed best and the F1-measures were also reasonable (see <xr id="prediction quality"/>). Therefore, this method was chosen for all future predictions.
The DSSP assignments and ReProf and PSiPred prediction for each of the four proteins can be found at Hemochromatosis SS Alignments.
<figtable id="comparison">
Protein | Method | Q3 | SOV3 |
---|---|---|---|
Q08209 | ReProf | 0.83 | 0.78 |
Psipred | 0.87 | 0.79 | |
Q30201 | ReProf | 0.86 | 0.84 |
Psipred | 0.84 | 0.73 | |
P10775 | ReProf | 0.91 | 0.93 |
Psipred | 0.93 | 0.94 | |
Q9X0E6 | ReProf | 0.75 | 0.65 |
Psipred | 0.89 | 0.86 |
</figtable>
<xr id="comparison"/> shows a comparison of the ReProf and PsiPred prediction quality. The predictions were compared to the DSSP assignment of the corresponding pdb structures. While SOV3 and Q3 do not always agree which of the two methods performs better, they do so most of the time. Nevertheless, it was decided that SOV3 is to be trusted more than Q3, because it takes into account the per segment accuracy rather than just the per residue accuracy. Thus, PsiPred outperforms reprof in 3 out of 4 cases. It is also notable, that the SOV3 values range from 0.65 for Q9X0E6(101 residues) to 0.94 for P10775(456 residues). So the performance of the method depends on the length of the protein and the protein itself, but PsiPred performed best overall.
TODO: Find out more about the example proteins (and yours) using UniProt and the PDB.
Disorder
We searched DisProt for the best matches to the four proteins, but there was only one direct match for Q08209. We used the PsiBlast and Smith Waterman search to find matches for the other three proteins. The first best matches are listed in <xr id="disprot id"/>.
<css>
- dis { border-collapse:collapse }
- dis td { border: black 1px solid }
- dis th { border: black 1px solid }
</css>
<figtable id="disprot id">
UID | DisProt ID | SeqID | E-value | Method |
---|---|---|---|---|
Q30201 | DP00670 | 0.29 | 2e-30 | psiblast |
P10775 | DP00554 | 0.4 | 7e-30 | psiblast |
Q9X0E6 | DP00564 | 0.25 | 0.36 | smith-waterman |
Q08209 | DP00092 | - | - | direct match |
</figtable>
IUPred is a disorder prediction server that predicts three different types:
- long disorder
- short disorder
- global (globular not disordered domains)
When prediction the long and short types of disorder, the output contains a likelihood of disorder for each residue and the globular domains prediction contains the start and end positions.
MetaDisorder (MD) is a predictor that is based on several prediction program from predict protein. It combines those predictions into one value for each residue, which is the likelihood of being part of a disordered region.
<figure id=disorder pred1">
</figure>
<figure id="disorder pred2">
</figure>
All predictions are combined into plots in <xr id="disorder pred2"/>. A residue is predicted to be disordered if its likelihood is higher than 0.5, indicated by the gray horizontal line.
Unofrtunately could we not find good matches in the DisProt database, apart from the direct match for Q08209. The matches for Q30201 and Q9X0E6 have a sequence identity below 30% and the e-value for Q9X0E6 is much too high with 0.36. But the hit for P10775 with a sequence identity of 40% and an e-value of 7e-30 can possibly be used for an annotation transfer. Nevertheless, we included the annotations for the Q30201 and Q9X0E6 matches for the sake of completeness.
All predictions for the HFE protein Q30201 show nearly no disordered regions. The MD tendencies are always below 0.5 and there is only a short region after residue 250 where IUPred predicted long and short disordered regions. The DisProt annotation contain a small disordered region after residue 150, but this region can be neglected due to the low sequence identity of the corresponding DisProt protein. IUPred predicted, that the HFE protein consists of a simgle globular not disordered domain. In summary, the HFE protein is correctly predicted to be not disordered.
P10775 was also predicted to be not disordered by all methods. The region of the DisProt entry DP00465 that matched P10775 does not contain an disordered region, so the predictions are also right in this case.
The DisProt annotation of the entry DP00564 that matched the Q9X0E6 contains a long disordered region. In contrast, IUPred predicts no disordered residues in Q9X0E6. Only the MD likelihoods for disorder are higher, always above 0.3. But we trust DisPot the most, since all IUPred predictions, including the global, agree that Q9X0E6 is not disordered and DP00564 has an e-value of 0.36.
In contrast to the first three proteins, is Q08209 clearly disordered. The DisProt annotation of the protein contains several disordered regions near the C-terminus. IUPred's global prediction agrees that the C-terminus is disordered, and also the long, short and MD predictions are above 0.5.
Alle N and C-termini of the proteins have short disorder tendencies with values above 0.5, but since the ends of all protein chains can be characterized as disordered, to some extend, we would not count that as false predictions.
Transmembrane Helices
<figtable id="pdb structures">
Uniprot | PDB | |||||
---|---|---|---|---|---|---|
UID | Name | Length | ID | Resolution [A] | Chain | Length |
Q30201 | Hereditary hemochromatosis protein | 348 | A16Z | 2.60 | A | 275 |
P35462 | D(3) dopamine receptor | 400 | 3PBL | 2.89 | A | 481 |
Q9YDF8 | Voltage-gated potassium channel | 295 | 1ORQ | 3.20 | C | 223 |
P47863 | Aquaporin-4 | 323 | 2D57 | 3.20 | A | 301 |
</figtable>
HFE
The HFE protein is not a transmembrane protein, thus, there are no entries in the OPM and PDBTM databases.
<figtable id="TM hfe">
HFE (Q30201) | |||
---|---|---|---|
style="background-color:#efefef" Memsat | Polyphobius | OPM | PDBTM |
- | - |
</figtable>
<figtable id="TM Q9YDF8">
Voltage-gated potassium channel (Q9YDF8) | |||
---|---|---|---|
Memsat | Polyphobius | OPM | PDBTM |
</figtable>
<figtable id="TM P35462">
D(3) dopamine receptor (P35462) | |||
---|---|---|---|
Memsat | Polyphobius | OPM | PDBTM |
</figtable>
<figtable id="TM P47863">
Aquaporin-4 (P47863) | |||
---|---|---|---|
Memsat | Polyphobius | OPM | PDBTM |
</figtable>
Signal Peptides
<figtable id ="signalPep">
</figtable> GO TermsGoPet
References<references /> |