Difference between revisions of "Sequence-Based Mutation Analysis Hemochromatosis"
Bernhoferm (talk | contribs) (→A162S) |
Bernhoferm (talk | contribs) m (→L183P) |
||
Line 603: | Line 603: | ||
<br style="clear:both;"> |
<br style="clear:both;"> |
||
− | == L183P == |
+ | === L183P === |
Revision as of 19:27, 17 June 2012
Hemochromatosis>>Task 6: Sequence-based mutation analysis
Contents
Short task description
Detailed description: Sequence-based mutation analysis
Protocol
A protocol with a description of the data acquisition and other scripts used for this task is available here.
TODO: Fill in references!!
SNPs
From MSUD: M35T V53M G93R Q127H A162S L183P T217I R224W E277K C282S
Amino acid features
As a first estimation of the SNPs' effects we looked at the physicochemical properties of the wildtype and mutated amino acids (see <xr id="aa_features"/>). We collected the hydropathy index, the polarity, the isoelectric point (charge), and the van der Waals volume of the corresponding amino acids.
The most striking differences can be spotted in the following mutations:
- G93R which goes from neutral hydrophobicity to highly hydrophilic, from uncharged to positive, and triples in size.
- L183P has a change of 5.4 in hydrophobicity (from hydrophobic to neutral).
- T217I, the opposite of L183P, which goes from neutral (-0.7) to hydrophobic (4.5).
- R224W becomes neutral (former hydrophilic) and also loses its positive charge.
- E277K changes from acidic (negative charge) to basic (positive charge).
Based on this information alone, these SNPs could be considered disease causing.
<figtable id="aa_features">
Mutation | Hydrophobicity (wt) | Hydrophobicity (mt) | Polarity (wt) | Polarity (mt) | pI (wt) | pI (mt) | v.d.W. volume (wt) | v.d.W. volume (mt) |
---|---|---|---|---|---|---|---|---|
M35T | 1.9 | -0.7 | nonpolar | polar | 5.74 | 5.60 | 124 | 93 |
V53M | 4.2 | 1.9 | nonpolar | nonpolar | 6.00 | 5.74 | 105 | 124 |
G93R | -0.4 | -4.5 | nonpolar | polar | 6.06 | 10.76 | 48 | 148 |
Q127H | -3.5 | -3.2 | polar | polar | 5.65 | 7.60 | 114 | 118 |
A162S | 1.8 | -0.8 | nonpolar | polar | 6.01 | 5.68 | 67 | 73 |
L183P | 3.8 | -1.6 | nonpolar | nonpolar | 6.01 | 6.30 | 124 | 90 |
T217I | -0.7 | 4.5 | polar | nonpolar | 5.60 | 6.05 | 93 | 124 |
R224W | -4.5 | -0.9 | polar | nonpolar | 10.76 | 5.89 | 148 | 163 |
E277K | -3.5 | -3.9 | polar | polar | 3.15 | 9.60 | 109 | 135 |
C282S | 2.5 | -0.8 | polar | polar | 5.05 | 5.68 | 86 | 73 |
</figtable>
Evolutionary analysis
The next step was to look at the evolutionary constraints for each mutation. Starting with simple statistics such as BLOSUM62, PAM1, and PAM250, and then moving to more sophisticated methods such as PSI-BLAST's PSSM and multiple sequence alignments.
BLOSUM62/PAM1/PAM250
When looking at all three matrices (cf. <xr id="bpp_matrices"/>) G93R, L183P, and R224W stand out as the most unlikely mutations of all 10 SNPs. M35T and T217I, while still quite rare, don't show such a strong signal. V53M and C282S are special cases. While BLOSUM62 ranks V53M as a more or less common mutation, PAM1/250 classify it as very rare. For C282S its the other way around.
<figtable id="bpp_matrices">
Mutation | BLOSUM62 | PAM1 | PAM250 |
---|---|---|---|
M35T | -1 | 6 | 500 |
V53M | 1 | 4 | 200 |
G93R | -2 | 0 | 200 |
Q127H | 0 | 20 | 700 |
A162S | 1 | 28 | 900 |
L183P | -3 | 2 | 300 |
T217I | -1 | 7 | 400 |
R224W | -3 | 2 | 200 |
E277K | 1 | 7 | 800 |
C282S | -1 | 11 | 700 |
</figtable>
PSSM
We calculated the PSSM-Matrix of our sequence with 5 iterations against the big-database TODO:REF.
The important values can be found in <xr id="pssm_matrix_important_positions"/>
<figtable id="pssm_matrix_important_positions">
Mutation | wt | mt | ||
---|---|---|---|---|
PSSM-value | frequency | PSSM-value | frequency | |
M35T | 3 | 16% | 5 | 78% |
V53M | 5 | 99% | 1 | 1% |
G93R | 3 | 29% | -2 | 1% |
Q127H | 2 | 16% | -2 | 0% |
A162S | 5 | 100% | 1 | 0% |
L183P | 4 | 95% | -3 | 0% |
T217I | 2 | 16% | -2 | 0% |
R224W | 6 | 100% | -3 | 0% |
E277K | 6 | 100% | 0 | 0% |
C282S | 10 | 100% | -1 | 0% |
</figtable>
These values lead to our following conclusions:
M35T would not be predicted as a disease causing mutation as the mutant is occuring frequently.
V53M would most probably be predicted as a disease causing mutation, because the mutant type occurs more often than expected. Another evidence is the high wildtype conservation at this position. The only thing that does not fit the prediction is, that the mutation is seen more often than expected, which could be a sign of a non-disease-causing mutation.
G93R is hard to predict based on the given numbers, as the wildtype is not very conserved with 29%, but because the mutation gets a value of -2 (meaning the occurrence of this mutation is fewer than expected) this position is more likely to be disease causing. In total, 7 different amino acids were observed at this position (A, R, D, E, G, T, V).
Q127H, predicted by only the conservation of wild type and mutation would be quite difficult. The conclusion would be that (like for G93R) the position is disease causing because the mutant type occurrence is lower than expected. Another fact supporting this prediction is, that only 3 different amino acids are observed at position 127 (Q, E, G), which might be an indicator of the importance of this position for the protein.
A162S would be predicted as a disease causing mutation, based on the 100% conservation of the wild type.
L183P would be predicted as a disease causing mutation, based on the wildtype conservation (95%) and the low frequency of occurence (lower than expected) of the mutated position).
T217I is another difficult prediction case when only looking at wildtype and mutation type conservation. Probably it would be predicted as disease causing because of the lower-than-expected frequency of the mutant type. Another supporting fact for this prediction is, that only two amino acids were observed at that position, meaning the sequence is fairly conserved.
R224W would be predicted as a disease causing mutation because of the high wildtype conservation (100%), supported by the lower than expected frequency of the mutant type.
E277K would be predicted as a disease causing mutation because of the high wildtype conservation (100%).
C282S would be predicted as a disease causing mutation because of the high wildtype conservation (100%), supported by the lower than expected frequency of the mutant type.
MSA conservation
The following <xr id="msa_table"/> has been retrieved through the MSAs of the HFE-sequence and homologs (found via PSIBlast). The MSAs can be seen here (Muscle MSA) and here (ClustalW MSA).
<figtable id="msa_table">
Mutation | Muscle | Clustal | ||||
---|---|---|---|---|---|---|
conservation score | consensus | consensus percentage | conservation score | consensus | consensus percentage | |
M35T | 10 | T | 98% | 10 | T | 98% |
V53M | 10 | V | 99% | 10 | V | 99% |
G93R | 5 | A | 58% | 5 | A | 58% |
Q127H | 10 | V | 99% | 10 | V | 99% |
A162S | 11 | A | 100% | 11 | A | 100% |
L183P | 9 | L | 92% | 9 | L | 92% |
T217I | 10 | S | 99% | 10 | S | 99% |
R224W | 11 | R | 100% | 11 | R | 100% |
E277K | 10 | E | 99% | 10 | E | 99% |
C282S | 11 | C | 100% | 11 | C | 100% |
</figtable>
The predictions based on these informations would be:
M35T: not disease causing (as the consensus in 98% of the positions T occurs)
V53M: disease causing
Q127H: disease causing
A162S: disease causing
L183P: disease causing
T217I: disease causing
R224W: disease causing
E277K: disease causing
C282S: disease causing
Secondary structure analysis
M35T
- Secondary structure assignment: sheet
SEQ (mt): LRSHSLHYLFTGASEQDLGLS DSSP (wt): CCEEEEEEEEEEECCCCCCE PsiPred (wt): CCCCCCCEEEEEEECCCCCCC PsiPred (mt): CCCCCCCEEEEEEECCCCCCC
<figtable id="M35T_pymol">
</figtable>
M35T is located in the "sheet complex" of the MHC I domain. According to PyMol it forms two hydrogen bonds with the neighboring sheet (cf. <xr id="M35T_pymol"/>) which should stabilize the formation. These bonds are still present in the mutated protein. The right figure also shows that this substitution causes almost no clashes (shown as green blocks). This suggests that M35T is a not a disease causing mutation.
V53M
- Secondary structure assignment: sheet
SEQ (mt): GLSLFEALGYMDDQLFVFYDH DSSP (wt): CCECCEEEEEECCEEEEEEEC PsiPred (wt): CCCEEEEEEEECCEEEEEEEC PsiPred (mt): CCCEEEEEEEECCEEEEEEEC
<figtable id="V53M_pymol">
</figtable>
V53M seems to also have a stabilizing function (cf. hydrogen bonds shown in <xr id="V53M_pymol"/>) in the MHC I domain's sheet complex. Although this stabilization is retained by the mutant, the size of the methionine causes several clashes with one of the helices in the same domain. This means that the whole domain needs to undergo some structural corrections to fit in the mutation which might cause a decrease or loss in function.
G93R
- Secondary structure assignment: helix
SEQ (mt): MWLQLSQSLKRWDHMFTVDFW DSSP (wt): HHHHHHHHHHHHHHHHHHHHH PsiPred (wt): HHHHHHHHHHHHHHHHHHHHH PsiPred (mt): HHHHHHHHHHHHHHHHHHHHH
<figtable id="G93R_pymol">
</figtable>
G93R is within the first big helix of HFE's MHC I domain. As a helical residue the hydrogen bonds are very important for stability. As shown in <xr id="G93R_pymol"/> the new residue not only retains these bonds, it also forms additional ones. There are also no clashes with other residues which is very surprising when considering the triplication in size (cf. <xr id="aa_features"/>). On the other hand it must also be considered that this residue is located on the outside of the protein. The new bulk might as well prevent proper complex formation with other proteins or the new hydrogen bonds could make the protein too stiff.
Q127H
- Secondary structure assignment: coil
SEQ (mt): TLQVILGCEMHEDNSTEGYWK DSSP (wt): EEEEEEEEEECCCCCEEEEEE PsiPred (wt): EEEEECCCCCCCCCCCCCEEE PsiPred (mt): CEEEECCCEECCCCCCCCCCE
<figtable id="Q127H_pymol">
</figtable>
Q127H is again within the sheet complex of the MHC I domain. It's just behind a sheet and seems to help in forming and stabilizing the loop to the next (neighboring) sheet (see <xr id="Q127H_pymol"/>). The substitution of the glutamine with a histidine causes the loss of one of the hydrogen bonds as well as some clashes with other residues. As the loss of the stabilizing bond might have severe effects on HFE's tertiary structure this mutation could be considered disease causing.
A162S
- Secondary structure assignment: helix
SEQ (mt): TLDWRAAEPRSWPTKLEWERH DSSP (wt): HCEEEECCHHHHHHHHHHHCC PsiPred (wt): CCCEECCCCCHHHHHHHHHHH PsiPred (mt): CCCEECCCCCHHHHHHHHHHH
<figtable id="A162S_pymol">
</figtable>
A162S is at the beginning of a helix (within MHC I domain). The mutation causes some clashes as well as the formation of several new hydrogen bonds (cf. <xr id="A162S_pymol"/>). Unlike the previous mutations this one is buried within the protein which means that the increase in size does not directly affect the protein surface and therefore is unlikely to directly affect complex formation (although the additional stability through the new hydrogen bonds might). Overall this mutation seems unlikely to be disease causing.
L183P
- Secondary structure assignment: helix (trusting DSSP)
SEQ (mt): KIRARQNRAYPERDCPAQLQQ DSSP (wt): CHHHHHHHHHHHHHHHHHHHH PsiPred (wt): HHHHHHHHCCCCCCHHHHHHH PsiPred (mt): HHHHHHHHCCCCCCHHHHHHH
<figtable id="L183P_pymol">
</figtable>
L183P seems very likely to be disease causing. It is located in one of the MHC I domain's big helices and causes severe clashes with other residues (cf. <xr id="L183P_pymol"/>). In addition proline is known as "the helixbreaker" which suggests a severe change in the secondary (and perhaps tertiary structure) in this section.
T217I
- Secondary structure assignment: coil
SEQ (mt): PPLVKVTHHVISSVTTLRCRA DSSP (wt): CCEEEEEEEECCCCEEEEEEE PsiPred (wt): CCCEEEECCCCCCCCEEEEEE PsiPred (mt): CCCEEEECCCCCCCCEEEEEE
<figtable id="T217I_pymol">
</figtable>
T217I is located within a loop between two of the C1 domain's beta sheets. The wildtype residue forms several hydrogen bonds which should help in stabilizing the loop. The mutant loses all but one of these bonds and causes a few clashes (see <xr id="T217I_pymol"/>). While the loss of the hydrogen bonds surely destabilizes this region, the perhaps most important one is retaint. This measure of importance is based on the fact that this bond specifically reaches directly across to the other side of the loop and therefore holds the two sheets close together. The other bonds are mainly located at the loop's turning point.
R224W
- Secondary structure assignment: sheet
SEQ (mt): HHVTSSVTTLWCRALNYYPQN DSSP (wt): EEECCCCEEEEEEEEEEECCC PsiPred (wt): CCCCCCCCEEEEEECCCCCCC PsiPred (mt): CCCCCCCCEEEEEECCCCCCC
<figtable id="R224W_pymol">
</figtable>
R224W is part of one sheet inside the C1 domain and forms two hydrogen bonds with the neighboring sheet, thus stabilizing both sheets. Although the mutant retains these bonds, the increased size of the tryptophan causes several clashes with the other sheet's residues (cf. <xr id="R224W_pymol"/>). This might in turn rather destabilize the whole sheet formation. The severity of these clashes suggest that this mutation is likely to be a disease causing one.
E277K
- Secondary structure assignment: helix (trusting DSSP)
SEQ (mt): WITLAVPPGEKQRYTCQVEHP DSSP (wt): EEEEEECCCHHHHEEEEEECC PsiPred (wt): EEEEEECCCCCCCEEEEEECC PsiPred (mt): EEEEEECCCCCCCEEEEEECC
<figtable id="E277K_pymol">
</figtable>
E277K lies within a short helix between two sheets in the C1 domain. The glutamic acid forms several hydrogen bonds: one to the tyrosine at the start of the following sheet, two within the helix (for the helix stabilization), and one to THR221 which is located at the start of another sheet within the C1 domain (see <xr id="E277K_pymol"/>). This suggests that it is a very important residue for the structure of the C1 domain. Thus the loss of two of these bonds (one helical and the one to THR221) in the mutant residue, as well as the clashes with other residues, strongly suggests to classify E277K as disease causing.
C282S
- Secondary structure assignment: sheet
SEQ (mt): VPPGEEQRYTSQVEHPGLDQP DSSP (wt): ECCCHHHHEEEEEECCCCCCC PsiPred (wt): ECCCCCCCEEEEEECCCCCCC PsiPred (mt): ECCCCCCCCEEEEECCCCCCC
<figtable id="C282S_pymol">
</figtable>
C282S is located within another of the C1 domain's sheets. At first the mutation seems rather harmless (see <xr id="C282S_pymol"/>). It loses none of its hydrogen bounds, but gains an additional one, and produces not that many clashes. It's also buried within the protein (no surface interactions). Nevertheless this mutation should have a severe impact on the tertiary structure of the C1 domain as it destroys the only disulfide bond within this domain (C225-C282). Therefore it should be considered disease causing.
Predictions
SIFT
<figtable id="sift_results">
Mutation | Score | Prediction |
---|---|---|
M35T | 1.00 | TOLERATED |
V53M | 0.00 | AFFECT PROTEIN FUNCTION |
G93R | 0.26 | TOLERATED |
Q127H | 0.01 | AFFECT PROTEIN FUNCTION |
A162S | 0.02 | AFFECT PROTEIN FUNCTION |
L183P | 0.00 | AFFECT PROTEIN FUNCTION |
T217I | 0.92 | TOLERATED |
R224W | 0.00 | AFFECT PROTEIN FUNCTION |
E277K | 0.04 | AFFECT PROTEIN FUNCTION |
C282S | 0.00 | AFFECT PROTEIN FUNCTION |
</figtable>
SNAP
<figtable id="snap_results">
Mutation | Prediction | Reliability | Expected Accuracy |
---|---|---|---|
M35T | Neutral | 1 | 53% |
V53M | Neutral | 6 | 49% |
G93R | Neutral | 0 | 51% |
Q127H | Neutral | 7 | 85% |
A162S | Neutral | 8 | 91% |
L183P | Non-neutral | 1 | 60% |
T217I | Non-neutral | 1 | 60% |
R224W | Non-neutral | 6 | 77% |
E277K | Neutral | 1 | 53% |
C282S | Non-neutral | 2 | 63% |
</figtable>
PolyPhen2
<figtable id="polyphen2_results">
Mutation | Prediction |
---|---|
M35T | possibly damaging |
V53M | possibly damaging |
G93R | possibly damaging |
Q127H | benign |
A162S | possibly damaging |
L183P | possibly damaging |
T217I | benign |
R224W | possibly damaging |
E277K | possibly damaging |
C282S | possibly damaging |
</figtable>
Conclusion
<figtable id="consensus_tab">
Mutation | AA features | BLOSUM62 | PAM1 | PAM250 | PSSM | MSA | SS | SIFT | SNAP | Polyphen2 | Consensus | Validation | Source |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
M35T | neutral | neutral | neutral | neutral | benign | benign | benign | benign | benign | malign | benign | benign | SNPdbe (cluster) |
V53M | neutral | benign | neutral | malign | neutral | malign | neutral | malign | neutral | malign | malign | malign | HGMD, SNPdbe |
G93R | malign | malign | malign | malign | malign | neutral | neutral | neutral | benign | malign | malign | malign | HGMD, SNPdbe |
Q127H | benign | neutral | benign | benign | malign | neutral | malign | malign | benign | benign | benign | malign | HGMD, SNPdbe |
A162S | neutral | benign | benign | benign | neutral | malign | neutral | malign | benign | malign | benign | benign | SNPdbe (freq) |
L183P | malign | malign | malign | malign | malign | malign | malign | malign | malign | malign | malign | malign | HGMD |
T217I | malign | neutral | benign | neutral | malign | neutral | neutral | benign | malign | benign | neutral | benign | SNPdbe (cluster, freq) |
R224W | malign | malign | malign | malign | malign | malign | malign | malign | malign | malign | malign | benign | SNPdbe (freq) |
E277K | malign | benign | benign | benign | neutral | malign | malign | malign | benign | malign | malign | malign | HGMD |
C282S | neutral | neutral | benign | benign | malign | malign | malign | malign | malign | malign | malign | malign | HGMD |
</figtable>
References
<references/>