Task 8: Sequence-based mutation analysis

From Bioinformatikpedia
Revision as of 20:46, 25 September 2013 by Betza (talk | contribs) (Prediction methods)

Lab journal task 8

Mutation selection

This task is based on ten mutations that were randomly selected from the HGMD and dbSNP based on the results from Task 7: Research SNPs.

<css> table.colBasic2 { margin-left: auto; margin-right: auto; border: 2px solid black; border-collapse:collapse; width: 90%; }

.colBasic2 th,td { padding: 3px; border: 2px solid black; }

.colBasic2 td { text-align:left; }

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}

</css>

<figtable id="mutations">

database accession number nucleotide change amino acid change
nucleotide position codon change
HGMD CM994469 157 GTG -> ATG Val53Met
HGMD CM960827 187 CAT -> GAT His63Asp
dbSNP rs139523708 200 CGT -> CAT Arg67His
dbSNP rs376650371 291 ATG -> ATA Met97Ile
dbSNP rs201885016 389 AAC-> AGC Asn130Ser
HGMD CM004810 502 GAG -> CAG Glu168Gln
HGMD CM081301 548 CTG -> CCG Leu183Pro
dbSNP rs4986950 650 ACC -> ATC Thr217Ile
HGMD CM960828 845 TGC -> TAC Cys282Tyr
HGMD CM990722 989 AGG -> ATG Arg330Met
Table 1: List of the 10 selected mutations from HGMD and dbSNP.

</figtable>

<xr id="mutations"/> list all ten mutations, the dababases from which they were taken and a specification of each mutation. The following analyses were conducted without knowing which of the mutation disease causing are.

Sequence based mutation analysis

<figtable id="summary">

mutation pyhsicochemical properties strucural properties conservation consensus
from to pymol visualisation secondary structure BLOSSUM62 PAM250 PSSM score PSSM occurence WT PSSM occurence mutant MSA occurence WT MSA occurence mutant
Val53Met branched chain, hydrophobic, nonpolar, neutral charge sulfur containing, nonpolar, neutral charge
Val53Met mutation with valine in yellow and methionine in red.
E (beta sheet) 1 2 1 47% 3% 7% 0% small negative effect
His63Asp aromatic ring, basic polar, mainly neutral charge acidic polar, negative charge
His63Asp mutation with histidine in yellow and aspartic acid in red.
C (loop) 1 2 -3 2% 2% 26% 0% small negative effect
Arg67His aliphatic, basic polar, delocalised positive charge aromatic ring, basic polar, mainly neutral charge
Arg67His mutation in the center with arginine in yellow and histidine in red.
C (loop) 0 2 -1 33% 1% 100% 0% neutral
Met97Ile sulfur containing, nonpolar, neutral charge hydrophobic, nonpolar, neutral charge
Met97Ile mutation with methoinine in yellow and isoleucine in red.
H (helix) 1 2 0 5% 5% 23% 4% negative effect
Asn130Ser polar, neutral charge polar, neutral charge
Asn130Ser mutation with asparagine in yellow and serine in red.
C (loop) 1 1 -1 10% 5% 23% 0% neutral
Glu168Gln acidic polar, negative charge polar, neutral charge
Glu168Gln mutation with glutamic acid in yellow and glutamine in red.
H (helix) 2 2 -1 7% 2% 23% 0% negative effect
Leu183Pro branched chain, hydrophobic, nonpolar, neutral charge nonpolar, neutral charge
Leu183Pro mutation with leucine in yellow and proline in red.
H (helix) -3 -3 -4 45% 1% 96% 0% strong negative effect
Thr217Ile polar, neutral charge hydrophobic, nonpolar, neutral charge
Thr217Ile mutation with threonine in yellow and isoleucine in red.
C (loop) -1 0 -3 6% 1% 12% 12% negative effect
Cys282Tyr thiol side chain, nonpolar, neutral charge polar, neutral charge
Cys282Tyr mutation with cysteine in yellow and threonine in red.
E (beta sheet) -2 0 -5 94% 1% 100% 0% strong negative effect
Arg330Met aliphatic, basic polar, delocalised positive charge sulfur containing, nonpolar, neutral charge Could not be visualized because this resdiue is not contained in the the pdb structure. - -1 0 -3 28% 0% 30% 47% negative effect
Table 2: Summary of the results of the analysis of all mutations.

</figtable>

All results from the mutation analysis are summarised in <xr id="summary"/>. Physicochemical properties are specified as characteristics of the aa, side chain polarity and charge. The BLOSUM62 and PAM250 were used as substitution matrices. A score around 0 indicates that the aa substitution is neutral. A score below 0 indicates that the substitution is not conserved and thus probably has a negative effect on the protein. A score above 0 denotes a substitution that has no effect on the protein. The PSSM contains a score for each position in the sequence and each possible aa. A positive score indicates that the aa substitution is more frequent in the alignment as would be expected by change and a negative score indicates that the substitution occurs less frequent than expected. Large positive scores thus indicate that this position is highly conserved and that it is important for the protein function. An MSA with the 100 homologous mammalian sequences with the smallest E-value was generated, as well as a MSA for all homologous sequences. <xr id="summary"/> only shows the values from the MSA with 100 sequences. The PSSMs for both alignments can be found in the Lab journal task 8.

Val53Met
The main effect of the change from valine to methionine is mainly due to the structure, because both aa are nonpolar and neutral. Methionine is linear and valine has a branched structure. As can be seen in the picture, this could lead to clashes with the alpha helix above the methionine. Also, valine is hydrophob whereas methionine is not. The scoring matrices have small positive values, but the PSSM and MSA show no conservation of the mutant. We therefore think that this mutation has a small negative effect.
His63Asp
This mutation is could be disease causing, because the basic aa histidine is replaced by the acidic aspartic acid. Besides, histidine is mainly neutral whereas the aspartic acid is negatively charged. But the pymol picture shows that residue 63 is located in a surface loop. Since both aa are not hydrophob, the implication of this aa exchange for the function of the protein is therefor not as severe as if this mutation would be located in the interior of the protein. The scoring matrices contain small positive values for the substitution, but the PSSM score is -3 (worst possible score -5) and there is no conservation for the mutant. We thus predict a negative effect of this mutation.
Arg67His
Arginine is able to form multiple H-bonds due to its delocalised positive charge and histidine is also able to form H-bonds. Both aa are basic polar. Thus, the main difference between arginine and histidine is the positive and neutral charge and that histidine contains an aromatic ring whereas arginine is aliphatic. The pymol visualisation of the mutation shows that the mutation is located in a surface loop and since both aa are not hydrophob, we think that the effect of the mutation should not be too severe. The scoring matrices also have the scores 0 and 2 and the PSSM score is not too bad with a -1 (-6 worst possible). The wild type is 33% conserved in the Psiblast alignment and 100% conserved in the MSA. Nevertheless, we think that this mutation has no effect.
Met97Ile
The main difference between methionine and isolecine is that methionine contains a sulfur atom and isoleucine not. Both aa are nonpolar and neutral charged, but isoleucine is hydropbob and methionine not due to its sulfur. The residue 97 is part of an alpha helix. The Psiblast PSSM shows that this position is not conserved, because all 20 aa have an occurrence of 5%. Hence, the mutation could disturb the local secondary structure formation and cause a negative effect.
Asn130Ser
Asparagine can form H-bonds with the backbone and serine is also able to form H-bonds. The physicochemical proberties of both aa are very similar, therefore we think, that this mutations is probably not disease causing. This opinion is also supported by the 3D visualisation of the mutation which shows that the asparagine is located at the end of a beta sheet on the surface of the proteins. The isoleucine at this position does not disrupt the local structure. Besides, the PSSM score is -1 (worst possible -6) and the wildtype and mutant are both low conserved.
Glu168Gln
Glutatimc acid as well as glutamie are both polar, but glutamic acid has a negative charge whereas glutamine is neutral. The only difference between the two aa is that glutamine has an NH group where glutamic acid contains an OH group. Consequently there is not big difference in 3D structure but only in physiochemical properties. This can also be seen in the pymol picture. The substitution matrices both have a score of 2 for this mutations, which indicates that the mutation probably do not effect the protein much. On the other hand, the PSSM score is -1 and because the worst score for glutatmic acid is -3, this does indicate a negative effect of the substitution.
Leu183Pro
Proline is a unique aa in that it disrupts secondary structure. Since the leucine is located inside an alpha helix (see picture), the mutation to proline disrupts the helix and thus most probably leads to a disturbed function of the protein and causes the disease. This is also indicated by the negative score of -3 in both substitution matrices (worst case -4) . But the Proline is not the worst possible substitution, because several aa lead to a score of -4 in the BLOSUM62 and the OAM250 even gives a -6 for the substitution with cysteine. Nevertheless, the PSSM score of -4 (worst -5) and the high conservation of the wild type in the PSSM and MSA also indicate a strong negative effect.
Thr217Ile
Threonine is polar whereas isoleucine is nonpolar and thus hydrophob. Since the threonine is located on the surface of the protein, as can be seen in the Pymol visualisation, a hydrophob residue at this position might lead to a structural change of the protein. Besides, the PSSM score is -3 (worst possible -6) and we thus believe that this mutation could have a negative effect.
Cys282Tyr
Cystein can form a disulfide bonds with another close-by cystein residue. Disulfide bonds are very important for the protein structure and stability. The picture shows that the cystein is part of a beta sheet and located in the interior of the protein. The polar threonine thus disrupts possible disulfide bonds and the protein structure. This mutation is therefore likely disease causing. The BLOSUM62 matrix has a score of -2 that supports an effect of the SNP, but the PAM250 matrix has a 0 that says that the mutation is neutral. Nevertheless, the PSSM score of -5 (worst case -9) and the high conservation of the wild type in both the PSSM and the MSA indicate that this mutation has a strong negative effect on the protein function.
Arg330Met
Arginine and methionine have very different physicochemical properties. Arginine is basic polar and positive charged whereas methionine is nonpolar and uncharged. This mutation could therefore have an effect on the protein function. Also, the BLOSUM62 substitution matrix has a slight negative value (-1) and the PSSM an even more negative score of -3 (worst -6). Besides, the wildtype is in 28% of the sequences for the PSSM conserved. Therefore, we propose a negative effect of this mutation.

Prediction methods

In addition to the analysis based on the physicochemical properties and the conservation scores, we also used four different prediction programs: SIFT, Polyphen2, MutationTaster and SNAP2. <xr id="predictions"/> contains the predictions for all 10 mutations. The complete results of the four prediction programs can be found in the Lab journal task 8.

<figtable id="predictions">

database mutation consensus <xr id="summary"/> SIFT Polyphen2 MutationTaster SNAP2 consensus truth
prediction score HumDiv prediction HumDiv score HumVar prediction HumVar score prediction probability prediction reliability
HGMD Val53Met small negative effect AFFECT PROTEIN FUNCTION 0.00 probably damaging 0.998 possibly damaging 0.841 disease causing 0.974610381496817 Neutral 0 disease causing disease causing
HGMD His63Asp small negative effect AFFECT PROTEIN FUNCTION 0.00 benign 0.142 benign 0.161 disease causing 0.974610381496817 Non-neutral 2 disease causing disease causing
dbSNP Arg67His neutral AFFECT PROTEIN FUNCTION 0.00 benign 0.145 benign 0.031 polymorphism 0.999999997930159 Non-neutral 4 neutral neutral
dbSNP Met97Ile negative effect TOLERATED 0.53 possibly damaging 0.575 benign 0.114 disease causing 0.943409356836766 Neutral 7 neutral neutral
dbSNP Asn130Ser neutral AFFECT PROTEIN FUNCTION 0.02 possibly damaging 0.883 benign 0.282 polymorphism 0.999999996637944 Neutral 3 neutral neutral
HGMD Glu168Gln negative effect AFFECT PROTEIN FUNCTION 0.01 probably damaging 1.000 probably damaging 0.980 polymorphism 0.707489599782817 Neutral 7 disease causing disease causing
HGMD Leu183Pro strong negative effect AFFECT PROTEIN FUNCTION 0.00 probably damaging 1.000 probably damaging 1.000 disease causing 0.999999979100498 Non-neutral 8 disease causing disease causing
dbSNP Thr217Ile negative effect TOLERATED 0.92 benign 0.118 benign 0.097 polymorphism 0.999999999993365 Neutral 0 neutral neutral
HGMD Cys282Tyr strong negative effect AFFECT PROTEIN FUNCTION 0.00 probably damaging 0.961 possibly damaging 0.667 disease causing 0.999999999736277 Non-neutral 7 disease causing disease causing
HGMD Arg330Met negative effect TOLERATED 0.11 probably damaging 0.997 possibly damaging 0.781 disease causing 8.23173030693237e-06 Non-neutral 7 disease causing disease causing
Table 3: Comparison of the consensus based on the physicochemical properties of the aa and the scoring matrices (see <xr id="summary"/> ) and the prediction results of SIFT, Polyphen2, MutationTaster and SNAP2.

</figtable>

SIFT: The median sequence conservation for the ten mutations is always 3.03 except for Arg330Met where it is 3.01. It is ideal, because it lies inside the range of 2.75 and 3.5 (see SIFT documentation). The number of sequences represented at the position of the mutation are 255 for Arg330Met and 396 for the other mutations. Those numbers are high enough to expect good results from SIFT. The SIFT score ranges from 0 to 1. If it is below or equal 0.05, then the substitution is predicted to be damaging, else tolerated. SIFT predicted the effect of seven mutations right and 3 wrong.

Polyphen2 makes predictions based on two different datasets: HumDiv and HumVar. HumDiv should be used to evluate rare alleles and HumVar should be used to distinguish between mutations with drastic effects from all other mutations (see PolyPhen2 documentation). Thus, HumVar is more important for our task. PolyPhen2 discriminates between probably damaging, possibly damaging and benign mutations, based on the score. The resulst of the predctions based on HumDiv (three wrong) are worse than the prediction based on HumVar (one wrong), as expected. Polyohen2 with HumVar produced the best predictions in comparison to the other three methods.

MutationTaster predicts if the mutation is disease causing or a harmless polymorphism. It also outputs a probability value between 0 and 1, where 1 indicates a very secure prediction (see MutationTaster documentation). The MutationTaster program got two predictions wrong.

SNAP2: The reliability ranges between 0 and 9, with 9 indicating a high confidence in the prediction (seeSNAP help). SNAP2 predicted the effect of 3 mutations wrong.

PolyPhen2 achieved the best results with the HumVar database. It only predicted the His63Asp mutation wrong. But the other methods also predicted far more than 50% of the mutations correct. Therefore, the consensus predictions in <xr id="predictions"/>, which is a major vote based on the consensus from <xr id="summary" /> and the predictions of the four programs, agree with the actual effect of the mutations in all cases.

Conclusion

In our opinion, the physicochemical properties together with the 3D visualisation give a very good first impression of the mutation. Also the PSSM score shows mutations that are clearly negativ. The MSA alignment in our case did not help very much. All in all, our consensus only differs from the truth in 2 cases and is thus even better than SIFT and SNAP2, because we identified all mutations that are listed in the HGMD.