Difference between revisions of "Sequence-based mutation analysis Gaucher Disease"
(→Discussion) |
|||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. The mutations have been selected by another group from our [[Researching SNPs Gaucher Disease|set of mutations]] such that their impact had been unknown for us prior to this task. We investigated the provided mutations with respect to their physicochemical properties, structural features, as well as their conservation and employed the tools, SIFT, Polyphen2, as well as SNAP for predicting their impact on the phenotype. For quantifying to which extend the mutations are disease causing, we |
+ | The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. The mutations have been selected by another group from our [[Researching SNPs Gaucher Disease|set of mutations]] such that their impact had been unknown for us prior to this task. We investigated the provided mutations with respect to their physicochemical properties, structural features, as well as their conservation and employed the tools, SIFT, Polyphen2, as well as SNAP for predicting their impact on the phenotype. For quantifying to which extend the mutations are disease causing, we assigned a ''disease score'' where -1 means ''non-disease causing'', 0 ''ambiguous'', and 1 ''disease causing''. We averaged the disease scores to obtain a final prediction which we compared with the true impact of the mutation on the phenotype. Technical details are reported in our [[Gaucher_Task06_Protocol|protocol]]. |
== Mutations == |
== Mutations == |
||
Line 171: | Line 171: | ||
<br clear="all"/> |
<br clear="all"/> |
||
− | W248R takes place in the hydrophobic core region of the TIM beta/alpha-barrel domain and inserting a hydrophilic, arginine is likely to impair the catalytic activity of the enzyme, although it does no change the secondary structure according to the PSI-PRED prediction. Same holds true for A423D. We though L509P as disease-causing as Proline turned the sheet into a loop region at this site. W351C might change the protein structure due to the formation of |
+ | W248R takes place in the hydrophobic core region of the TIM beta/alpha-barrel domain and inserting a hydrophilic, arginine is likely to impair the catalytic activity of the enzyme, although it does no change the secondary structure according to the PSI-PRED prediction. Same holds true for A423D. We though L509P as disease-causing as Proline turned the sheet into a loop region at this site. W351C might change the protein structure due to the formation of disulfide bonds. |
== Conservation == |
== Conservation == |
||
Line 555: | Line 555: | ||
</figtable> |
</figtable> |
||
+ | By combining all methods, an overall prediction accuracy of 80% could be achieved. SNAP turned out to be the most accurate method, classifying nine out of then mutations correctly. The accuracy of Polypen2 is 80%, just as high as looking solely at the physicochemical properties. Than comes SIFT and the conservation-based predictions. At the first glance, taking into account the conservation appears to be less helpful, but the accuracy would be 80% if ambiguous predictions (disease score 0) were treated as non-disease causing. This would amount to classifying also substitutions with a score close to zero as non-disease causing, which is reasonable since such mutations are neither positively nor negatively selected. |
||
− | <br/> |
||
− | |||
− | Here we can see the individual properties/methods that we used to predict the effect of the mutations show variant prediction performance. Simply Using the physicochemical analysis returned 80% prediction accuracy. It suggests that the mutation causing physicochemical change will tend to affect significantly the protein function. |
||
− | |||
− | Other property analysis, like the structural analysis, conservation analysis using BLOSUM and PSSM, do not perform well (with around 50% prediction accuracy). Since such analysis was done manually and could be very subjective. Therefore the prediction results vary from person to person. Still, the quite low prediction accuracy from these analysis maybe suggest that only considering the structure change or sequence conservation lonely is not sufficient to make a good prediction. Combining these methods might improve the prediction accuracy. |
||
− | |||
− | All the prediction tools, SIFT, PolyPhen-2 and SNAP, have returned satisfied prediction results. Among them, SNAP shows the best prediction accuracy, 90%. |
||
− | The final prediction has 80% prediction accuracy which satisfied us. However, a little worse than that from SNAP. The reason is clear: since the final prediction was made by combining all the individual properties/methods prediction, relative worse prediction results from structure and sequence conservation analysis brought a negative influence. |
||
=== H99R === |
=== H99R === |
||
*H99R is not disease causing, since it is not listed in the HGMD. |
*H99R is not disease causing, since it is not listed in the HGMD. |
||
Line 578: | Line 571: | ||
*Valine and isoleucine are both hydrophobic and are thus interchangeable without changing the structure/function of the enzyme. This is also indicated by the positive substitution score. |
*Valine and isoleucine are both hydrophobic and are thus interchangeable without changing the structure/function of the enzyme. This is also indicated by the positive substitution score. |
||
*The structural analysis showed that the mutated site is exposed and is not part of a helix or sheet which makes the mutation less likely to impair the structure/function of the protein. |
*The structural analysis showed that the mutated site is exposed and is not part of a helix or sheet which makes the mutation less likely to impair the structure/function of the protein. |
||
− | *All three scoring tools correctly classified the mutation as non- |
+ | *All three scoring tools correctly classified the mutation as non-disease-causing. |
=== E150K === |
=== E150K === |
Latest revision as of 08:13, 19 June 2012
The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. The mutations have been selected by another group from our set of mutations such that their impact had been unknown for us prior to this task. We investigated the provided mutations with respect to their physicochemical properties, structural features, as well as their conservation and employed the tools, SIFT, Polyphen2, as well as SNAP for predicting their impact on the phenotype. For quantifying to which extend the mutations are disease causing, we assigned a disease score where -1 means non-disease causing, 0 ambiguous, and 1 disease causing. We averaged the disease scores to obtain a final prediction which we compared with the true impact of the mutation on the phenotype. Technical details are reported in our protocol.
Contents
Mutations
<xr id="tab:mutations"/> contains five randomly chosen Gaucher disease-causing and five non-disease-causing mutations. Disease causing mutations were sampled from the HGMD whereas non-disease causing mutations were sampled from a set of mutations which were present in the dbSNP but not in the HGMD. Reference sequence was P04062 which has a 39 residue signal peptide. The ten mutations listed in <xr id="tab:mutations"/> were investigated in the following.
<figtable id="tab:mutations">
Nr | Position | From | To |
---|---|---|---|
1 | 99 | H | R |
2 | 211 | V | I |
3 | 150 | E | K |
4 | 236 | L | P |
5 | 248 | W | R |
6 | 509 | L | P |
7 | 351 | W | C |
8 | 423 | A | D |
9 | 482 | D | N |
10 | 83 | R | S |
Randomly selected mutations from HGMD and dbSNP which were used for the sequence-based mutation analysis. </figtable>
Physicochemical properties
We compared the charge, polarity, size, and the aromatic character of the wild-type and mutant amino-acid and assigned a disease-score of 1 to those mutations, which have a severe impact on the physicochemical properties (cf. <xr id="tab:props"/>). Mutations number 3 changes the polarity since glutamate is acidic but lysine basic. We also considered mutation number 5 and 7 disease-causing as tryptophan is aromatic and unpolar, in contrast to the target residues. Substituting alanine, which is small and unpolar, by the long and acidic aspartate might also impact the structure and function of the protein.
<figtable id="tab:props">
Nr | Wildtype | Mutant | Disease score | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
AA | Charge | Polarity | Size | Aromatic | AA | Charge | Polarity | Size | Aromatic | ||
1 | H | negative | polar | large | no | R | negative | polar | large | no | -1 |
2 | V | neutral | unpolar | medium | no | I | neutral | unpolar | medium | no | -1 |
3 | E | positive | polar | large | no | K | negative | polar | large | no | 1 |
4 | L | neutral | unpolar | medium | no | P | neutral | unpolar | medium | no | -1 |
5 | W | neutral | unpolar | large | yes | R | negative | polar | large | no | 1 |
6 | L | neutral | unpolar | medium | no | P | neutral | unpolar | medium | no | -1 |
7 | W | neutral | unpolar | large | yes | C | neutral | polar | small | no | 1 |
8 | A | neutral | unpolar | small | no | D | positive | polar | medium | no | 1 |
9 | D | positive | polar | medium | no | N | neutral | polar | medium | no | -1 |
10 | R | negative | polar | large | no | S | neutral | polar | small | no | 0 |
Physicochemical properiets of the wildtype and mutatant amino acid which were used to classify the mutation as severe or non-severe. </figtable>
Structural analysis
We used the HHsearch alignment for mapping the mutations of <xr id="tab:mutations"/> onto 2nt0_Aand investigated for each mutated site its solvent accessibility (buried or exposed), secondary structure (H=Helix, S=Sheet, C=coil), and whether it takes places in a domain region. Based-upon these features, we estimated a disease score (cf. <xr id="tab:structure"/>, <xr id="fig:structure_all"/>).
<figtable id="tab:structure">
Location of mutations in 2nt0_A. Blue: wildtype; Red: mutant; Acc: Solvent accessibility. </figtable>
<figure id="fig:structure_all">
</figure>
W248R takes place in the hydrophobic core region of the TIM beta/alpha-barrel domain and inserting a hydrophilic, arginine is likely to impair the catalytic activity of the enzyme, although it does no change the secondary structure according to the PSI-PRED prediction. Same holds true for A423D. We though L509P as disease-causing as Proline turned the sheet into a loop region at this site. W351C might change the protein structure due to the formation of disulfide bonds.
Conservation
The conservation of an amino acid indicates its importance for the structure and function of the protein. The log-odds substitution score is used to quantify the likelihood of a substitution where a negative score indicates that the substitution is observed less frequently than expected by chance. This is primarily due to different physicochemical properties which cause severe structural changes such that the resulting protein is negatively selected. Hence, substitutions with a negative score a likely to be disease-causing whereas a positive score indicates that the mutation does not affect the protein.
BLOSUM62 scores
The BLOSUM62 matrix substitution matrix was derived by clustering sequences of the Blocks database with a minimal identity of 62% and counting inter-cluster substitutions. The evolutionary distance underlying the BLOSUM62 matrix turned out to be suitable for many applications. We labeled substitutions with a score close to the minimal score as disease-causing (cf. <xr id="tab:subst_blosum"/>).
<figtable id="tab:subst_blosum">
Nr | Mutation | Score mutation |
Score min |
Score max |
Disease score |
---|---|---|---|---|---|
1 | H99R | 0 | -3 | 8 | 0 |
2 | V211I | 3 | -3 | 4 | -1 |
3 | E150K | 1 | -4 | 5 | 0 |
4 | L236P | -3 | -4 | 4 | 1 |
5 | W248R | -3 | -4 | 11 | 1 |
6 | L509P | -3 | -4 | 4 | 1 |
7 | W351C | -2 | -4 | 11 | 1 |
8 | A423D | -2 | -3 | 4 | 1 |
9 | D482N | 1 | -4 | 6 | 0 |
10 | R83S | -1 | -3 | 5 | 0 |
BLOSUM62 scores of the selected mutations. </figtable>
PSSM of all hits
A Position Specific Scoring Matrix (PSSM) or profile is a matrix which stores the probability P(a|i) to an observed amino acid a at position i. It is derived from a sequence alignment and the position specific substitution scores S(a,b)=log P(a|i)/ P(a) are more precise than the general BLOSUM62 scores. We therefore computed an alignment (cf. <xr id="fig:subst_pssm_all_ali"/>) from all significant sequences found by performing five rounds PSI-BLAST, computed a PSSM (cf. <xr id="fig:subst_pssm_all"/>) and used the position specific substitution scores to assign a disease score for each mutation (cf. <xr id="tab:subst_pssm_all"/>). Mutations 5-7 were though of as disease-causing since their substitution score were close to the minimum and the sites were highly conserved.
</figure> </figure><figure id="fig:subst_pssm_all_ali"> |
<figure id="fig:subst_pssm_all"> |
<figtable id="tab:subst_pssm_all">
Position specific substitution scores derived from all significant hits after 5 rounds PSI-BLAST. The respective profile column is shown on the right. </figtable>
PSSM of close homologous sequences
By performing five rounds PSI-BLAST, also distant homologous sequences are recognised whose function is not conversed, i.e. proteins with different functions are incorporated into the alignment. We therefore built an alignment using only the closest homologous sequences which probably exhibit the same catalytic activity than the query sequence. For this, we used HHfilter and the option -qsc 1.0 for filtering the alignment depicted in <xr id="fig:subst_pssm_all_ali"/> from 1050 sequences to only 60 sequences. The resulting alignment is shown in <xr id="fig:subst_pssm_best_ali"/> and the corresponding profile in <xr id="fig:subst_pssm_best"/> which is clearly less diverse than the profile shown in <xr id="fig:subst_pssm_all"/>. Since less sequences entered the alignment, also the substitution scores became more extreme (cf. <xr id="tab:subst_pssm_best"/>). However, we assigned the same disease scores, expect for L236P, as the lysine was more conserved.
</figure> </figure><figure id="fig:subst_pssm_best_ali"> |
<figure id="fig:subst_pssm_best"> |
<figtable id="tab:subst_pssm_best">
Position specific substitution scores derived from the 60 closest homologous sequences after 5 rounds PSI-BLAST. The respective profile column is shown on the right. </figtable>
Scoring Mutants
SIFT
SIFT (Sorting Intolerant From Tolerant) is a sequence homology-based tool that predicts whether an amino acid substitution in a protein will effect its function or not. SIFT is based on the thought that the functional related amino acids should be conserved and the mutations at such positions will lead to the change of protein function. protein evolution is correlated with protein function. On the contrary, the unimportant position will show much more amino acids variation.
The predicted results from SIFT is shown in <xr id="tab:subst_sift"/>, where the substitutions with a score less than 0.05 are predicted to affect the protein function, otherwise should be tolerated:
<figtable id="tab:subst_sift">
Nr | Mutation | Prediction | Score | Sequence conservation | Disease score |
---|---|---|---|---|---|
1 | H99R | TOLERATED | 0.74 | 3.11 | -1 |
2 | V211I | TOLERATED | 1.0 | 3.10 | -1 |
3 | E150K | TOLERATED | 0.44 | 3.10 | -1 |
4 | L236P | AFFECT PROTEIN FUNCTION | 0.00 | 3.10 | 1 |
5 | W248R | AFFECT PROTEIN FUNCTION | 0.00 | 3.10 | 1 |
6 | L509P | AFFECT PROTEIN FUNCTION | 0.01 | 3.11 | 1 |
7 | W351C | AFFECT PROTEIN FUNCTION | 0.00 | 3.10 | 1 |
8 | A423D | AFFECT PROTEIN FUNCTION | 0.01 | 3.10 | 1 |
9 | D482N | TOLERATED | 0.77 | 3.10 | -1 |
10 | R83S | AFFECT PROTEIN FUNCTION | 0.05 | 3.10 | 1 |
SIFT prediction results of the selected mutations. </figtable>
Polyphen2
PolyPhen-2 (Polymorphism Phenotyping v2) predicts whether an amino acid substitution will effect the protein function by using straightforward physical and comparative considerations. There are two pairs of datasets used to train and test PolyPhen-2 prediction models. HumDiv uses all damaging alleles with known effects on the molecular function causing human Mendelian diseases from UniProtKB database. The differences between human proteins and their closely related mammalian homologs are assumed to be non-damaging. HumVar uses all human disease-causing mutations from UniProtKB. The common human nsSNPs (MAF>1%) without annotated involvement in disease are assumed to be as non-damaging.
The predicted results from Polyphen2 is shown in <xr id="tab:subst_polyphen2"/>. Mutations with score near to 0 are predicted as "benign" and those with score near to 1 are predicted as "probably damaging":
<figtable id="tab:subst_polyphen2">
Nr | Mutation | HumDiv | HumVar | Disease score | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Prediction | Score | Sensitivity | Specificity | Prediction | Score | Sensitivity | Specificity | |||
1 | H99R | benign | 0.000 | 1.00 | 0.00 | benign | 0.000 | 1.00 | 0.00 | -1 |
2 | V211I | benign | 0.000 | 1.00 | 0.00 | benign | 0.001 | 0.99 | 0.09 | -1 |
3 | E150K | benign | 0.000 | 1.00 | 0.00 | benign | 0.001 | 0.99 | 0.09 | -1 |
4 | L236P | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 1.000 | 0.00 | 1.00 | 1 |
5 | W248R | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 0.999 | 0.09 | 0.99 | 1 |
6 | L509P | probably damaging | 0.992 | 0.70 | 0.97 | probably damaging | 0.988 | 0.53 | 0.95 | 1 |
7 | W351C | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 1.000 | 0.00 | 1.00 | 1 |
8 | A423D | probably damaging | 1.000 | 0.00 | 1.00 | probably damaging | 0.996 | 0.36 | 0.97 | 1 |
9 | D482N | benign | 0.000 | 1.00 | 0.00 | benign | 0.002 | 0.99 | 0.18 | -1 |
10 | R83S | benign | 0.007 | 0.96 | 0.75 | benign | 0.019 | 0.95 | 0.55 | -1 |
PolyPhen2 prediction results of the selected mutations.
</figtable>
SNAP
SNAP(screening for non-acceptable polymorphisms) is a tool for evaluating effects of single amino acid substitutions on protein function. It was developed by Yana Bromberg in Rost Lab, at Columbia University, New York.
The predicted results from SNAP is shown in <xr id="tab:subst_sift"/>, where Reliability indices are indicative of confidence in prediction and Expected Accuracy illustrate the likelihood that a given prediction is correct.
<figtable id="tab:subst_sift">
Nr | Mutation | Prediction | Reliability Index | Expected Accuracy | Disease score |
---|---|---|---|---|---|
1 | H99R | Neutral | 7 | 94% | -1 |
2 | V211I | Neutral | 7 | 94% | -1 |
3 | E150K | Neutral | 6 | 92% | -1 |
4 | L236P | Non-neutral | 0 | 58% | 1 |
5 | W248R | Non-neutral | 2 | 70% | 1 |
6 | L509P | Neutral | 0 | 53% | -1 |
7 | W351C | Non-neutral | 3 | 78% | 1 |
8 | A423D | Non-neutral | 1 | 63% | 1 |
9 | D482N | Neutral | 4 | 85% | -1 |
10 | R83S | Neutral | 3 | 78% | -1 |
SNAP prediction results of the selected mutations. </figtable>
Discussion
Summary
<xr id="tab:discussion"/> is a summary of the sequence-based mutation analysis. In contains the disease scores of all analysis which is defined as follows:
-1 = non-disease causing 0 = ambiguous +1 = disease causing
A final disease score is obtained by computing the weighted average of all individual disease scores:
Average disease score = sum over all methods m(weight of m * disease score m) / sum over all methods m (weight of m)
The average disease score is used to obtain the final prediction:
"non-disease causing" if Average disease score <= 0.0 "disease causing" if Average disease score > 0.0
<figtable id="tab:discussion">
Property | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Prediction | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Name | Weight | H99R | V211I | E150K | L236P | W248R | L509P | W351C | A423D | D482N | R83S | Accuracy |
Physicochemical | 1.0 | -1 | -1 | 1 | -1 | 1 | -1 | 1 | 1 | -1 | 0 | 80% |
Structure | 1.0 | -1 | -1 | -1 | 0 | 1 | 1 | 0 | 1 | -1 | -1 | 60% |
BLOSUM62 | 0.2 | 0 | -1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 50% |
PSSM all | 0.4 | 0 | -1 | 0 | 0 | 1 | 1 | 1 | 1 | -1 | 0 | 50% |
PSSM close | 0.4 | 0 | -1 | 0 | 1 | 1 | 1 | 1 | 1 | -1 | 0 | 60% |
SIFT | 1.0 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | -1 | 1 | 70% |
Polyphen2 | 1.0 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | 80% |
SNAP | 1.0 | -1 | -1 | -1 | 1 | 1 | -1 | 1 | 1 | -1 | -1 | 90% |
Average disease score | -0.83 | -1.00 | -0.50 | 0.43 | 1.00 | 0.33 | 0.83 | 1.00 | -0.83 | -0.33 | ||
Prediction | !Disease | !Disease | !Disease | Disease | Disease | Disease | Disease | Disease | !Disease | !Disease | 80% | |
Verification | !Disease | !Disease | Disease | Disease | Disease | !Disease | Disease | Disease | !Disease | !Disease |
Summary of the sequence-based mutation analysis. A final disease score is obtained by computing the weighted average of all individual disease scores. Mutations with an average disease score above 0.0 are considered as disease-causing and non-disease causing otherwise. green: "non-disease causing", red: "disease causing", yellow: "ambiguous". </figtable>
By combining all methods, an overall prediction accuracy of 80% could be achieved. SNAP turned out to be the most accurate method, classifying nine out of then mutations correctly. The accuracy of Polypen2 is 80%, just as high as looking solely at the physicochemical properties. Than comes SIFT and the conservation-based predictions. At the first glance, taking into account the conservation appears to be less helpful, but the accuracy would be 80% if ambiguous predictions (disease score 0) were treated as non-disease causing. This would amount to classifying also substitutions with a score close to zero as non-disease causing, which is reasonable since such mutations are neither positively nor negatively selected.
H99R
- H99R is not disease causing, since it is not listed in the HGMD.
- The final prediction is correct.
- All the individual properties/methods are correct expect for the conservation-based approaches.
- With respect to the conservation, we decided to assign a disease score of 0 (ambiguous), since the mutation happens just as often as expected by random and is not positively selected. Since the score is not negative, i.e. the mutation is not negatively selected, we might also have defined the mutation as non-disease-causing.
- The fact that H99R is non-disease-causing can be accounted primarily by the similar physicochemical properties of histidine and arginine (both basic) and the fact that the mutation does no affect the core region of the enzyme.
- All three scoring tools correctly classified the mutation as non-disease-causing.
V211I
- V211I is not disease causing, since it is not listed in the HGMD.
- The final prediction is correct.
- All the individual properties/methods are correct.
- Valine and isoleucine are both hydrophobic and are thus interchangeable without changing the structure/function of the enzyme. This is also indicated by the positive substitution score.
- The structural analysis showed that the mutated site is exposed and is not part of a helix or sheet which makes the mutation less likely to impair the structure/function of the protein.
- All three scoring tools correctly classified the mutation as non-disease-causing.
E150K
- Gaucher disease type 1 <ref name="E150K">Rozenberg R, Fox DC, Sobreira E, Pereira LV. (2006). Detection of 12 new mutations in Gaucher disease Brazilian patients. Blood Cells Mol Dis. [1]</ref>
- The final prediction is wrong.
- Only the physicochemical analysis was correct.
- Changing the acidic glutamate into a basic lysine probably impairs the enzymatic activity although the mutated site is not in the core region of the enzyme. This suggests that also mutation outside of the core region can effect the protein function.
- All three scoring tools classified the mutation as non-disease-causing despite the different physicochemical properties of the residues.
L236P
- Gaucher disease type 1 <ref name="L236P">Beutler E, Gelbart T, Scott CR. (2005). Hematologically important mutations: Gaucher disease. Blood Cells Mol Dis. [2]</ref>
- The final prediction is wrong.
- The physicochemical and structural analysis is wrong. The prediction of SIFT, Polypen2, and SNAP are correct.
- With respect to the physicochemical properties, we defined the mutation as non-diseases-causing since both amino acids are hydrophobic.
- With respect to the structural features, we defined the mutation as ambiguous, since (1) the mutated site is not part of a helix or a sheet which would be broken by proline due to the limitation of the torsion angles, and (2) the mutated site is not part of one of the two domains. However, the mutated site is in the vicinity of the core region which might account the detrimental effect of the mutation.
- The conservation scores are mostly negative which suggests that such a mutation might happen very rarely and might be disease causing.
- All three scoring tools correctly classified the mutation as disease-causing.
W248R
- Gaucher disease <ref name="W248R">Hruska KS, LaMarca ME, Scott CR, Sidransky E. (2008). Gaucher disease: mutation and polymorphism spectrum in the glucocerebrosidase gene (GBA). Hum Mutat. [3]</ref>
- The final prediction is correct.
- All the individual properties/methods are correct.
- Tryptophane is hydrophobic and aromatic whereas arginine is hydrophilic and aliphatic. These contrary properties are likely to impact the protein structure/function.
- The negative substitution score and the high conservation of tryptophane also suggest an incompatibility of the involved residues.
- All three scoring tools correctly classified the mutation as disease-causing.
L509P
- L509P is not disease causing, since it is not listed in the HGMD.
- The final prediction is correct.
- Only the physicochemical and SNAP prediction is correct.
- With respect to the physicochemical properties, we defined the mutation as non-diseases-causing since both amino acids are hydrophobic.
- With respect to the structural features, we defined the mutation as disease-causing since proline breaks the sheet which is part of a domain region.
- All substitution scores were negative which suggests a negative effect on the protein structure/function.
- SIFT and PolyPhen-2 correctly classified the mutation as disease-causing, but SNAP as non-disease-causing.
- Since there are good reasons to consider this mutation as disease causing, it might be that it is missing in the HGMD.
W351C
- Gaucher disease type 1 <ref name="W351C">Latham TE, Theophilus BD, Grabowski GA, Smith FI. (1991). Heterogeneity of mutations in the acid beta-glucosidase gene of Gaucher disease patients. DNA Cell Biol. [4]</ref>
- The final prediction is correct.
- All the individual properties/methods are correct except for the structural analysis.
- Tryptophane is hydrophobic and aromatic whereas cysteine is hydrophilic an aliphatic.
- The mutated site is exposed but located in a sheet region of TIM beta/alpha-barrel domain where the formation of a disulfide bridge might change the structure/function. We therefore classified the mutation as ambiguous.
- The conservation score tend to be negative which suggests a detrimental effect of the mutation.
- All three scoring tools correctly classified the mutation as disease-causing.
A423D
- Gaucher disease <ref name="A423D">Bukina TM, Tsvetkova IV. (2007). Distribution of mutations of acid beta-D-glucosidase gene (GBA) among 68 Russian patients with Gaucher's disease. Biomed Khim. [5]</ref>
- The final prediction is correct.
- All the individual properties/methods are correct.
- Alanine is hydrophobic and small compared to aspartate which is hydrophilic.
- Similar to the mutation W248R, this mutation occurs in the hydrophobic core region of the TIM beta/alpha-barrel domain. Inserting an aspartic acid might influences the catalytic activity of the enzyme.
- The conservation score tend to be negative which suggests a detrimental effect of the mutation.
- All three scoring tools correctly classified the mutation as disease-causing.
D482N
- D482N is not disease causing, since it is not listed in the HGMD.
- The final prediction is correct.
- All the individual properties/methods expect for the structural analysis and the BLOSUM62 score are correct.
- Asparagine has a amino group instead of a hydroxyl group which accounts for the similar physicochemical properties of both residues.
- The mutation is far of the core of the protein and not part of a helix or sheet.
- The conservation score tend to be positive which suggests no detrimental effect of the mutation.
- All three scoring tools correctly classified the mutation as disease-causing.
R83S
- R83S is not disease causing, since it is not listed in the HGMD.
- The final prediction is correct.
- The structural analysis, PolyPhen-2 and SNAP returned are correct.
- Arginine and serine are both hydrophilic, but serine is neutral and smaller than arginine. The different properties rather suggest and detrimental impact of the mutation and we classified it as ambiguous.
- The mutation is far of the core of the protein and not part of a helix or sheet.
- The substitution scores are close to zero such that we classified it as ambiguous.
- Polyphen-2 and SNAP correctly classified it as non-disease-causing. However, SIFT classified it as disease mutation. However, the SIFT score is only 0.05 which indicates a rather mild effect of the mutation. Hence, the prediction of the scoring tools are throughout correct.
- In conclusion, this mutation might impact the phenotype only slightly.
References
<references/>