Difference between revisions of "Sequence-Based Mutation Analysis Hemochromatosis"

From Bioinformatikpedia
(E277K)
(E277K)
Line 724: Line 724:
 
</figtable>
 
</figtable>
   
E277K lies within a short helix between two sheets in the C1 domain. The glutamic acid forms several hydrogen bonds: one to the tyrosine at the start of the following sheet, two within the helix (for the helix stabilization), and one to THR221 which is located at the start of another sheet within the C1 domain (see <xr id="E277K_pymol"/>). This suggests that it is a very important residue for the structure of the C1 domain. Thus the loss of two of these bonds (one helical and the one to THR221) in the mutant residue, as well as the clashes with other residues, strongly suggests to classify E277K as disease causing. The SwissModel, which should be a better simulation than the PyMol mutation, loses even more hydrogen bonds.
+
E277K lies within a short helix between two sheets in the C1 domain. The glutamic acid forms several hydrogen bonds: one to the tyrosine at the start of the following sheet, two within the helix (for the helix stabilization), and one to THR221 which is located at the start of another sheet within the C1 domain (see <xr id="E277K_pymol"/>). This suggests that it is a very important residue for the structure of the C1 domain. Thus the loss of two of these bonds (one helical and the one to THR221) in the mutant residue, as well as the clashes with other residues, strongly suggests to classify E277K as disease causing. The SwissModel, which should be a better simulation than the PyMol mutation, loses even all but one hydrogen bond.
   
 
<br style="clear:both;">
 
<br style="clear:both;">

Revision as of 17:04, 18 June 2012

Hemochromatosis>>Task 6: Sequence-based mutation analysis


Short task description

Detailed description: Sequence-based mutation analysis


Protocol

A protocol with a description of the data acquisition and other scripts used for this task is available here.


TODO: Fill in references!!


SNPs

From MSUD: M35T V53M G93R Q127H A162S L183P T217I R224W E277K C282S


Amino acid features

As a first estimation of the SNPs' effects we looked at the physicochemical properties<ref name="wiki:aa">http://en.wikipedia.org/wiki/Amino_acid</ref><ref name="wiki:paa">http://en.wikipedia.org/wiki/Proteinogenic_amino_acid</ref> of the wildtype and mutated amino acids (see <xr id="aa_features"/>). We collected the hydropathy index<ref name="kyte_doolittle">Kyte J, Doolittle RF (May 1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology 157 (1): 105–32. OI:10.1016/0022-2836(82)90515-0. PMID 7108955.</ref>, the polarity, the isoelectric point (charge), and the van der Waals volume of the corresponding amino acids.

The most striking differences can be spotted in the following mutations:

  • G93R which goes from neutral hydrophobicity to highly hydrophilic, from uncharged to positive, and triples in size.
  • L183P has a change of 5.4 in hydrophobicity (from hydrophobic to neutral).
  • T217I, the opposite of L183P, which goes from neutral (-0.7) to hydrophobic (4.5).
  • R224W becomes neutral (former hydrophilic) and also loses its positive charge.
  • E277K changes from acidic (negative charge) to basic (positive charge).

Based on this information alone, these SNPs could be considered disease causing.

<figtable id="aa_features">

Mutation Hydrophobicity (wt) Hydrophobicity (mt) Polarity (wt) Polarity (mt) pI (wt) pI (mt) v.d.W. volume (wt) v.d.W. volume (mt)
M35T 1.9 -0.7 nonpolar polar 5.74 5.60 124 93
V53M 4.2 1.9 nonpolar nonpolar 6.00 5.74 105 124
G93R -0.4 -4.5 nonpolar polar 6.06 10.76 48 148
Q127H -3.5 -3.2 polar polar 5.65 7.60 114 118
A162S 1.8 -0.8 nonpolar polar 6.01 5.68 67 73
L183P 3.8 -1.6 nonpolar nonpolar 6.01 6.30 124 90
T217I -0.7 4.5 polar nonpolar 5.60 6.05 93 124
R224W -4.5 -0.9 polar nonpolar 10.76 5.89 148 163
E277K -3.5 -3.9 polar polar 3.15 9.60 109 135
C282S 2.5 -0.8 polar polar 5.05 5.68 86 73
Table 1: Comparison of the physicochemical properties between the wildtype (wt) and mutant (mt) amino acids. From left to right: the hydropathy index, the polarity, the isoelectric point (pI), and the van der Waals volume.

</figtable>


Evolutionary analysis

The next step was to look at the evolutionary constraints for each mutation. Starting with simple statistics such as BLOSUM62<ref name="blosum62>http://www.ncbi.nlm.nih.gov/Class/BLAST/BLOSUM62.txt</ref>, PAM1<ref name="pam1">http://www.icp.ucl.ac.be/~opperd/private/pam1.html</ref>, and PAM250<ref name="pam250">http://www.icp.ucl.ac.be/~opperd/private/pam250.html</ref>, and then moving to more sophisticated methods such as PSI-BLAST's PSSM and multiple sequence alignments.


BLOSUM62/PAM1/PAM250

When looking at all three matrices (cf. <xr id="bpp_matrices"/>) G93R, L183P, and R224W stand out as the most unlikely mutations of all 10 SNPs. M35T and T217I, while still quite rare, don't show such a strong signal. V53M and C282S are special cases. While BLOSUM62 ranks V53M as a more or less common mutation, PAM1/250 classify it as very rare. For C282S its the other way around.

<figtable id="bpp_matrices">

Mutation BLOSUM62 PAM1 PAM250
M35T -1 6 500
V53M 1 4 200
G93R -2 0 200
Q127H 0 20 700
A162S 1 28 900
L183P -3 2 300
T217I -1 7 400
R224W -3 2 200
E277K 1 7 800
C282S -1 11 700
Table 2: Summary of the BLOSUM62, PAM1, and PAM250 scores for the different mutations. For a better readability the values of the PAM1 and PAM250 matrices are multiplied by 10000.

</figtable>


PSSM

We calculated the PSSM-Matrix of our sequence with 5 iterations against the "big" database.

The important values can be found in <xr id="pssm_matrix_important_positions"/>

<figtable id="pssm_matrix_important_positions">

Mutation wt mt
PSSM-value frequency PSSM-value frequency
M35T 3 16% 5 78%
V53M 5 99% 1 1%
G93R 3 29% -2 1%
Q127H 2 16% -2 0%
A162S 5 100% 1 0%
L183P 4 95% -3 0%
T217I 2 16% -2 0%
R224W 6 100% -3 0%
E277K 6 100% 0 0%
C282S 10 100% -1 0%
Table 3: The mutation depending position scores extracted from the PSSM Matrix.

</figtable>

These values lead to our following conclusions:


M35T would not be predicted as a disease causing mutation as the mutant is occuring frequently.

V53M would most probably be predicted as a disease causing mutation, because the mutant type occurs more often than expected. Another evidence is the high wildtype conservation at this position. The only thing that does not fit the prediction is, that the mutation is seen more often than expected, which could be a sign of a non-disease-causing mutation.

G93R is hard to predict based on the given numbers, as the wildtype is not very conserved with 29%, but because the mutation gets a value of -2 (meaning the occurrence of this mutation is fewer than expected) this position is more likely to be disease causing. In total, 7 different amino acids were observed at this position (A, R, D, E, G, T, V).

Q127H, predicted by only the conservation of wild type and mutation would be quite difficult. The conclusion would be that (like for G93R) the position is disease causing because the mutant type occurrence is lower than expected. Another fact supporting this prediction is, that only 3 different amino acids are observed at position 127 (Q, E, G), which might be an indicator of the importance of this position for the protein.

A162S would be predicted as a disease causing mutation, based on the 100% conservation of the wild type.

L183P would be predicted as a disease causing mutation, based on the wildtype conservation (95%) and the low frequency of occurence (lower than expected) of the mutated position).

T217I is another difficult prediction case when only looking at wildtype and mutation type conservation. Probably it would be predicted as disease causing because of the lower-than-expected frequency of the mutant type. Another supporting fact for this prediction is, that only two amino acids were observed at that position, meaning the sequence is fairly conserved.

R224W would be predicted as a disease causing mutation because of the high wildtype conservation (100%), supported by the lower than expected frequency of the mutant type.

E277K would be predicted as a disease causing mutation because of the high wildtype conservation (100%).

C282S would be predicted as a disease causing mutation because of the high wildtype conservation (100%), supported by the lower than expected frequency of the mutant type.



MSA conservation

The following <xr id="msa_table"/> has been retrieved through the MSAs of the HFE-sequence and homologs (found via PSIBlast). The MSAs can be seen here (Muscle MSA) and here (ClustalW MSA).

<figtable id="msa_table">

Mutation Muscle Clustal
conservation score consensus consensus percentage conservation score consensus consensus percentage
M35T 10 T 98% 10 T 98%
V53M 10 V 99% 10 V 99%
G93R 5 A 58% 5 A 58%
Q127H 10 V 99% 10 V 99%
A162S 11 A 100% 11 A 100%
L183P 9 L 92% 9 L 92%
T217I 10 S 99% 10 S 99%
R224W 11 R 100% 11 R 100%
E277K 10 E 99% 10 E 99%
C282S 11 C 100% 11 C 100%
Table 4: Table showing the conservation values, consensus and consensus percentage of different MSAs. The used MSAs were from Muscle and ClustalW and inherited the HFE sequence as well as found homologs (found via PSIBlast).

</figtable>


The predictions based on these informations would be:

M35T: not disease causing (as the consensus in 98% of the positions T occurs)

V53M: disease causing

Q127H: disease causing

A162S: disease causing

L183P: disease causing

T217I: disease causing

R224W: disease causing

E277K: disease causing

C282S: disease causing

Secondary structure analysis

We also looked at the secondary structure around every mutation and tried to estimate their effects on the mutated protein structure. To simulate these mutations we used PyMol and its "Mutagenesis wizard"<ref name="pymol:mutagenesis">http://www.pymolwiki.org/index.php/Mutagenesis</ref>. The new rotamers were chosen to cause the least clashes with other residues. We also generated a SwissModel for each mutation (template 1a6z). In addition we predicted the secondary structure of the mutated sequence with PsiPred, but this showed no significant results.


M35T

  • Secondary structure assignment: sheet
SEQ (mt):     LRSHSLHYLFTGASEQDLGLS
DSSP (wt):     CCEEEEEEEEEEECCCCCCE
PsiPred (wt): CCCCCCCEEEEEEECCCCCCC
PsiPred (mt): CCCCCCCEEEEEEECCCCCCC

<figtable id="M35T_pymol">

Wildtype.
Mutant.
SwissModel.
Table 5: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for M35T. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

M35T is located in the "sheet complex" of the MHC I domain. According to PyMol it forms two hydrogen bonds with the neighboring sheet (cf. <xr id="M35T_pymol"/>) which should stabilize the formation. These bonds are still present in the mutated protein. The right figure also shows that this substitution causes almost no clashes (shown as green disks). In the SwissModel an additional hydrogen bond could be established parallel to the old ones which should stabilize the sheets even further. This suggests that M35T is a not a disease causing mutation.


V53M

  • Secondary structure assignment: sheet
SEQ (mt):     GLSLFEALGYMDDQLFVFYDH
DSSP (wt):    CCECCEEEEEECCEEEEEEEC
PsiPred (wt): CCCEEEEEEEECCEEEEEEEC
PsiPred (mt): CCCEEEEEEEECCEEEEEEEC

<figtable id="V53M_pymol">

Wildtype.
Mutant.
SwissModel.
Table 6: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for V53M. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

V53M seems to also have a stabilizing function (cf. hydrogen bonds shown in <xr id="V53M_pymol"/>) in the MHC I domain's sheet complex. Although this stabilization is retained by the mutant, the size of the methionine causes several clashes with one of the helices in the same domain. This means that the whole domain needs to undergo some structural corrections to fit in the mutation which might cause a decrease or loss in function. The SwissModel is almost the same as the PyMol mutation, except that a different rotamer was chosen which suggests an even bigger structural change as this rotamer would have caused more clashes in the PyMol simulation.


G93R

  • Secondary structure assignment: helix
SEQ (mt):     MWLQLSQSLKRWDHMFTVDFW
DSSP (wt):    HHHHHHHHHHHHHHHHHHHHH
PsiPred (wt): HHHHHHHHHHHHHHHHHHHHH
PsiPred (mt): HHHHHHHHHHHHHHHHHHHHH

<figtable id="G93R_pymol">

Wildtype.
Mutant.
SwissModel.
Table 7: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for G93R. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

G93R is within the first big helix of HFE's MHC I domain. As a helical residue the hydrogen bonds are very important for stability. As shown in <xr id="G93R_pymol"/> the new residue not only retains these bonds, it also forms additional ones. There are also no clashes with other residues which is very surprising when considering the triplication in size (cf. <xr id="aa_features"/>). In the SwissModel a slightly different rotamer was chosen which again added an additional hydrogen bond. On the other hand it must also be considered that this residue is located on the outside of the protein. The new "bulk" might as well prevent proper complex formation with other proteins.


Q127H

  • Secondary structure assignment: coil
SEQ (mt):     TLQVILGCEMHEDNSTEGYWK
DSSP (wt):    EEEEEEEEEECCCCCEEEEEE
PsiPred (wt): EEEEECCCCCCCCCCCCCEEE
PsiPred (mt): CEEEECCCEECCCCCCCCCCE

<figtable id="Q127H_pymol">

Wildtype.
Mutant.
SwissModel.
Table 8: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for Q127H. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

Q127H is again within the sheet complex of the MHC I domain. It's just behind a sheet and seems to help in forming and stabilizing the loop to the next (neighboring) sheet (see <xr id="Q127H_pymol"/>). The substitution of the glutamine with a histidine causes the loss of one of the hydrogen bonds as well as some clashes with other residues. While the SwissModel has the former amount of three hydrogen bonds, it's still missing the hydrogen bond which connected it to previous glutamic acid (125). This might destabilize the previous sheet (cf. DSSP annotation). As the loss of the stabilizing bond might have severe effects on HFE's secondary/tertiary structure this mutation could be considered disease causing.


A162S

  • Secondary structure assignment: helix
SEQ (mt):     TLDWRAAEPRSWPTKLEWERH
DSSP (wt):    HCEEEECCHHHHHHHHHHHCC
PsiPred (wt): CCCEECCCCCHHHHHHHHHHH
PsiPred (mt): CCCEECCCCCHHHHHHHHHHH

<figtable id="A162S_pymol">

Wildtype.
Mutant.
SwissModel.
Table 9: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for A162S. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

A162S is at the beginning of a helix (within MHC I domain). The mutation causes some clashes as well as the formation of several new hydrogen bonds (cf. <xr id="A162S_pymol"/>). Unlike the previous mutations this one is buried within the protein which means that the increase in size does not directly affect the protein surface and therefore is unlikely to directly affect complex formation (although the additional stability through the new hydrogen bonds might). Overall this mutation seems unlikely to be disease causing. The decrease of additional hydrogen bonds in the SwissModel further supports this classification.


L183P

  • Secondary structure assignment: helix (trusting DSSP)
SEQ (mt):     KIRARQNRAYPERDCPAQLQQ
DSSP (wt):    CHHHHHHHHHHHHHHHHHHHH
PsiPred (wt): HHHHHHHHCCCCCCHHHHHHH
PsiPred (mt): HHHHHHHHCCCCCCHHHHHHH

<figtable id="L183P_pymol">

Wildtype.
Mutant.
SwissModel.
Table 10: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for L183P. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

L183P seems very likely to be disease causing. It is located in one of the MHC I domain's big helices and causes severe clashes with other residues (cf. <xr id="L183P_pymol"/>). In addition proline is known as "the helixbreaker" which suggests a severe change in the secondary (and perhaps tertiary structure) in this section. The consequence of the clashes are shown in the SwissModel as both hydrogen bonds are lost which should destabilize the helix. It's strange though that SwissModel still models this region as a helix.


T217I

  • Secondary structure assignment: coil
SEQ (mt):     PPLVKVTHHVISSVTTLRCRA
DSSP (wt):    CCEEEEEEEECCCCEEEEEEE
PsiPred (wt): CCCEEEECCCCCCCCEEEEEE
PsiPred (mt): CCCEEEECCCCCCCCEEEEEE

<figtable id="T217I_pymol">

Wildtype.
Mutant.
SwissModel.
Table 11: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for T217I. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

T217I is located within a loop between two of the C1 domain's beta sheets. The wildtype residue forms several hydrogen bonds which should help in stabilizing the loop. The mutant loses all but one of these bonds and causes a few clashes (see <xr id="T217I_pymol"/>). While the loss of the hydrogen bonds surely destabilizes this region, the perhaps most important one is retaint. This measure of importance is based on the fact that this bond specifically reaches directly across to the other side of the loop and therefore holds the two sheets close together. The other bonds are mainly located at the loop's turning point. The SwissModel is almost identical to the PyMol mutation (minor rotamer change).


R224W

  • Secondary structure assignment: sheet
SEQ (mt):     HHVTSSVTTLWCRALNYYPQN
DSSP (wt):    EEECCCCEEEEEEEEEEECCC
PsiPred (wt): CCCCCCCCEEEEEECCCCCCC
PsiPred (mt): CCCCCCCCEEEEEECCCCCCC

<figtable id="R224W_pymol">

Wildtype.
Mutant.
SwissModel.
Table 12: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for R224W. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

R224W is part of one sheet inside the C1 domain and forms two hydrogen bonds with the neighboring sheet, thus stabilizing both sheets. Although the mutant retains these bonds, the increased size of the tryptophan causes several clashes with the other sheet's residues (cf. <xr id="R224W_pymol"/>) as well as a big "bulk" on the protein surface. This might in turn rather destabilize the whole sheet formation. SwissModel uses a different rotamer which should cause even more clashes. The severity of these clashes suggest that this mutation is likely to be a disease causing one.


E277K

  • Secondary structure assignment: helix (trusting DSSP)
SEQ (mt):     WITLAVPPGEKQRYTCQVEHP
DSSP (wt):    EEEEEECCCHHHHEEEEEECC
PsiPred (wt): EEEEEECCCCCCCEEEEEECC
PsiPred (mt): EEEEEECCCCCCCEEEEEECC

<figtable id="E277K_pymol">

Wildtype.
Mutant.
SwissModel.
Table 13: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for E277K. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

E277K lies within a short helix between two sheets in the C1 domain. The glutamic acid forms several hydrogen bonds: one to the tyrosine at the start of the following sheet, two within the helix (for the helix stabilization), and one to THR221 which is located at the start of another sheet within the C1 domain (see <xr id="E277K_pymol"/>). This suggests that it is a very important residue for the structure of the C1 domain. Thus the loss of two of these bonds (one helical and the one to THR221) in the mutant residue, as well as the clashes with other residues, strongly suggests to classify E277K as disease causing. The SwissModel, which should be a better simulation than the PyMol mutation, loses even all but one hydrogen bond.


C282S

  • Secondary structure assignment: sheet
SEQ (mt):     VPPGEEQRYTSQVEHPGLDQP
DSSP (wt):    ECCCHHHHEEEEEECCCCCCC
PsiPred (wt): ECCCCCCCEEEEEECCCCCCC
PsiPred (mt): ECCCCCCCCEEEEECCCCCCC

<figtable id="C282S_pymol">

Wildtype.
Mutant.
SwissModel.
Table 14: Comparison between the wildtype (left), mutant (right), and mutated SwissModel (bottom) residues for C282S. The mutation was generated with PyMol's Mutagenesis wizard. Hydrogen bonds are depicted in yellow, clashes with other residues as green or red disks.

</figtable>

C282S is located within another of the C1 domain's sheets. At first the mutation seems rather harmless (see <xr id="C282S_pymol"/>). It loses none of its hydrogen bounds, but gains an additional one, and produces not that many clashes. It's also buried within the protein (no surface interactions). Nevertheless this mutation should have a severe impact on the tertiary structure of the C1 domain as it destroys the only disulfide bond within this domain (C225-C282). Therefore it should be considered disease causing.


Predictions

For further testing we predicted the impact of different mutations via SIFT<ref name="sift">http://sift.jcvi.org/</ref>,SNAP<ref name="snap">http://www.rostlab.org/services/snap/</ref> and PolyPhen2<ref name="polyphen2">http://genetics.bwh.harvard.edu/pph2/index.shtml</ref>. The results can be seen in the following tables.

SIFT

<figtable id="sift_results">

Mutation Score Prediction
M35T 1.00 TOLERATED
V53M 0.00 AFFECT PROTEIN FUNCTION
G93R 0.26 TOLERATED
Q127H 0.01 AFFECT PROTEIN FUNCTION
A162S 0.02 AFFECT PROTEIN FUNCTION
L183P 0.00 AFFECT PROTEIN FUNCTION
T217I 0.92 TOLERATED
R224W 0.00 AFFECT PROTEIN FUNCTION
E277K 0.04 AFFECT PROTEIN FUNCTION
C282S 0.00 AFFECT PROTEIN FUNCTION
Table 15: The predictions and scores of each mutation (calculated by SIFT).

</figtable>


SNAP

<figtable id="snap_results">

Mutation Prediction Reliability Expected Accuracy # of non-neutral mutations at that position
M35T Neutral 1 53% 7
V53M Neutral 6 49% 3
G93R Neutral 0 51% 9
Q127H Neutral 7 85% 1
A162S Neutral 8 91% 1
L183P Non-neutral 1 60% 5
T217I Non-neutral 1 60% 10
R224W Non-neutral 6 77% 18
E277K Neutral 1 53% 1
C282S Non-neutral 2 63% 16
Table 16: The predictions and scores of each mutation (calculated by SNAP).

</figtable>

For the SNAP<ref name="snap">http://www.rostlab.org/services/snap/</ref> method we also noted the total number of non-neutral mutations. As noted in <xr id="snap_results"/> the most important positions seem to be 224 and 282. For the other positions this prediction is not as clear, as this number does not show the properties of the amino acids causing the non-neutral mutations.

<figure id="figure_nnm">

Figure1: Graph showing the number of "Non-neutral" mutations predicted by SNAP<ref name="snap">http://www.rostlab.org/services/snap/</ref> at each position of the HFE protein.

</figure> When looking at the general distibution of these values one can see (<xr id="figure_nnm"/>) that there seem to be no areas in which mutations would have a severe effect besides 250-260. As on these positions no sequential helix or sheet is notated, it is most probably random. This also supports our observation in Task 5, that the SNP positions are scattered all over the protein.


PolyPhen2

<figtable id="polyphen2_results">

Mutation Prediction
M35T possibly damaging
V53M possibly damaging
G93R possibly damaging
Q127H benign
A162S possibly damaging
L183P possibly damaging
T217I benign
R224W possibly damaging
E277K possibly damaging
C282S possibly damaging
Table 17: The prediction of each mutation (calculated by PolyPhen2).

</figtable>

The used methods differ in most predictions but L183P, R224W and C282S, which are noted as malign in each case.

Conclusion

<xr id="consensus_tab"/> shows a summary of the individual methods we used to classify the mutations. Cases we couldn't clearly put into one of the two categories (malign, benign) were classified as "neutral". This also includes some cases from SIFT and SNAP: G93R (SIFT) was labeled neutral because of its low score of only 0.26, V53M (SNAP) because the expected accuracy was below 50%. The consensus for each mutation is simply based on the majority class (no weighting) among all methods.

When comparing the consensus to the actual annotations you can see that 6 out of 10 mutations were correctly predicted. V53M and T217I could be considered partial matches as they were classified as neutral. The only one that really stands out is R224W which was predicted as malign by every method, but is annotated as benign. On the other hand there is indeed a malign mutation at this position, R224Q, which is labeled "altered iron status" in HGMD. We'd guess that this mutation just wasn't associated with any disease in SNPdbe, yet, which caused its false classification as "no effect".

To estimate the performance of the automated prediction methods (SIFT, SNAP, Polyphen2) for our case study we used the original prediction output (no "neutral" mutations). SIFT has an accuracy of 70%, SNAP of 40%, and Polyphen2 of 60%. These values should be treated with care though as we only had 10 mutations to test.

<figtable id="consensus_tab">

Mutation AA features BLOSUM62 PAM1 PAM250 PSSM MSA SS SIFT SNAP Polyphen2 Consensus Validation Source
M35T neutral neutral neutral neutral benign benign benign benign benign malign benign benign SNPdbe (cluster)
V53M neutral benign neutral malign neutral malign neutral malign neutral malign neutral malign HGMD, SNPdbe
G93R malign malign malign malign malign neutral neutral neutral benign malign malign malign HGMD, SNPdbe
Q127H benign neutral benign benign malign neutral malign malign benign benign benign malign HGMD, SNPdbe
A162S neutral benign benign benign neutral malign neutral malign benign malign benign benign SNPdbe (freq)
L183P malign malign malign malign malign malign malign malign malign malign malign malign HGMD
T217I malign neutral benign neutral malign neutral neutral benign malign benign neutral benign SNPdbe (cluster, freq)
R224W malign malign malign malign malign malign malign malign malign malign malign benign SNPdbe (freq)
E277K malign benign benign benign neutral malign malign malign benign malign malign malign HGMD
C282S neutral neutral benign benign malign malign malign malign malign malign malign malign HGMD
Table 18: Summary of the results for each individual analysis. Disease causing mutations are considered "malign", non disease causing ones "benign". Cases that were hard to classify into one of the two categories are labeled "neutral". The consensus is based on the most common category for that mutation. The validation is based on the annotation within the corresponding source database.

</figtable>


References

<references/>