Task 8: Sequence-based mutation analysis
Mutation selection
This task is based on ten mutations that were randomly selected from the HGMD and dbSNP based on the results from Task 7: Research SNPs.
<css> table.colBasic2 { margin-left: auto; margin-right: auto; border: 2px solid black; border-collapse:collapse; width: 90%; }
.colBasic2 th,td { padding: 3px; border: 2px solid black; }
.colBasic2 td { text-align:left; }
.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}
</css>
<figtable id="mutations">
database | accession number | nucleotide change | amino acid change | |
---|---|---|---|---|
nucleotide position | codon change | |||
HGMD | CM994469 | 157 | GTG -> ATG | Val53Met |
HGMD | CM960827 | 187 | CAT -> GAT | His63Asp |
dbSNP | rs139523708 | 200 | CGT -> CAT | Arg67His |
dbSNP | rs376650371 | 291 | ATG -> ATA | Met97Ile |
dbSNP | rs201885016 | 389 | AAC-> AGC | Asn130Ser |
HGMD | CM004810 | 502 | GAG -> CAG | Glu168Gln |
HGMD | CM081301 | 548 | CTG -> CCG | Leu183Pro |
dbSNP | rs4986950 | 650 | ACC -> ATC | Thr217Ile |
HGMD | CM960828 | 845 | TGC -> TAC | Cys282Tyr |
HGMD | CM990722 | 989 | AGG -> ATG | Arg330Met |
</figtable>
<xr id="mutations"/> list all ten mutations, the dababases from which they were taken and a specification of each mutation. The following analyses were conducted without knowing which of the mutation disease causing are.
Sequence based mutation analysis
<figtable id="summary">
mutation | pyhsicochemical properties | strucural properties | conservation | consensus | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
from | to | pymol visualisation | secondary structure | BLOSSUM62 | PAM250 | PSSM score | PSSM occurence WT | PSSM occurence mutant | MSA occurence WT | MSA occurence mutant | ||
Val53Met | branched chain, hydrophobic, nonpolar, neutral charge | sulfur containing, nonpolar, neutral charge | E (beta sheet) | 1 | 2 | 1 | 47% | 3% | 7% | 0% | small negative effect | |
His63Asp | aromatic ring, basic polar, mainly neutral charge | acidic polar, negative charge | C (loop) | 1 | 2 | -3 | 2% | 2% | 26% | 0% | small negative effect | |
Arg67His | aliphatic, basic polar, delocalised positive charge | aromatic ring, basic polar, mainly neutral charge | C (loop) | 0 | 2 | -1 | 33% | 1% | 100% | 0% | neutral | |
Met97Ile | sulfur containing, nonpolar, neutral charge | hydrophobic, nonpolar, neutral charge | H (helix) | 1 | 2 | 0 | 5% | 5% | 23% | 4% | negative effect | |
Asn130Ser | polar, neutral charge | polar, neutral charge | C (loop) | 1 | 1 | -1 | 10% | 5% | 23% | 0% | neutral | |
Glu168Gln | acidic polar, negative charge | polar, neutral charge | H (helix) | 2 | 2 | -1 | 7% | 2% | 23% | 0% | negative effect | |
Leu183Pro | branched chain, hydrophobic, nonpolar, neutral charge | nonpolar, neutral charge | H (helix) | -3 | -3 | -4 | 45% | 1% | 96% | 0% | strong negative effect | |
Thr217Ile | polar, neutral charge | hydrophobic, nonpolar, neutral charge | C (loop) | -1 | 0 | -3 | 6% | 1% | 12% | 12% | negative effect | |
Cys282Tyr | thiol side chain, nonpolar, neutral charge | polar, neutral charge | E (beta sheet) | -2 | 0 | -5 | 94% | 1% | 100% | 0% | strong negative effect | |
Arg330Met | aliphatic, basic polar, delocalised positive charge | sulfur containing, nonpolar, neutral charge | Could not be visualized because this resdiue is not contained in the the pdb structure. | - | -1 | 0 | -3 | 28% | 0% | 30% | 47% | negative effect |
</figtable>
All results from the mutation analysis are summarised in <xr id="summary"/>. Physicochemical properties are specified as characteristics of the aa, side chain polarity and charge. The BLOSUM62 and PAM250 were used as substitution matrices. A score around 0 indicates that the aa substitution is neutral. A score below 0 indicates that the substitution is not conserved and thus probably has a negative effect on the protein. A score above 0 denotes a substitution that has no effect on the protein.
The PSSM contains a score for each position in the sequence and each possible aa. A positive score indicates that the aa substitution is more frequent in the alignment as would be expected by chance and a negative score indicates that the substitution occurs less frequent than expected. Large positive scores thus indicate that this position is highly conserved and that it is important for the protein function.
An MSA with the 100 homologous mammalian sequences with the smallest E-value was generated, as well as a MSA for all homologous sequences. <xr id="summary"/> only shows the values from the MSA with 100 sequences. The PSSMs for both alignments can be found in the Lab journal task 8.
Val53Met
The main effect of the change from valine to methionine is mainly due to the structure, because both aa are nonpolar and neutral. Methionine is linear and valine has a branched structure. As can be seen in the picture, this could lead to clashes with the alpha helix above the methionine. Also, valine is hydrophob whereas methionine is not. The scoring matrices have small positive values, but the PSSM and MSA show no conservation of the mutant.
We therefore think that this mutation has a small negative effect.
His63Asp
This mutation is could be disease causing, because the basic aa histidine is replaced by the acidic aspartic acid. Besides, histidine is mainly neutral whereas the aspartic acid is negatively charged. But the pymol picture shows that residue 63 is located in a surface loop. Since both aa are not hydrophob, the implication of this aa exchange for the function of the protein is therefor not as severe as if this mutation would be located in the interior of the protein. The scoring matrices contain small positive values for the substitution, but the PSSM score is -3 (worst possible score -5) and there is no conservation for the mutant. We thus predict a negative effect of this mutation.
Arg67His
Arginine is able to form multiple H-bonds due to its delocalised positive charge and histidine is also able to form H-bonds. Both aa are basic polar. Thus, the main difference between arginine and histidine is the positive and neutral charge and that histidine contains an aromatic ring whereas arginine is aliphatic. The pymol visualisation of the mutation shows that the mutation is located in a surface loop and since both aa are not hydrophob, we think that the effect of the mutation should not be too severe. The scoring matrices also have the scores 0 and 2 and the PSSM score is not too bad with a -1 (-6 worst possible). The wild type is 33% conserved in the Psiblast alignment and 100% conserved in the MSA. Nevertheless, we think that this mutation has no effect.
Met97Ile
The main difference between methionine and isolecine is that methionine contains a sulfur atom and isoleucine not. Both aa are nonpolar and neutral charged, but isoleucine is hydropbob and methionine not due to its sulfur. The residue 97 is part of an alpha helix. The Psiblast PSSM shows that this position is not conserved, because all 20 aa have an occurrence of 5%. Hence, the mutation could disturb the local secondary structure formation and cause a negative effect.
Asn130Ser
Asparagine can form H-bonds with the backbone and serine is also able to form H-bonds. The physicochemical proberties of both aa are very similar, therefore we think, that this mutations is probably not disease causing. This opinion is also supported by the 3D visualisation of the mutation which shows that the asparagine is located at the end of a beta sheet on the surface of the proteins. The isoleucine at this position does not disrupt the local structure. Besides, the PSSM score is -1 (worst possible -6) and the wildtype and mutant are both low conserved.
Glu168Gln
Glutatimc acid as well as glutamie are both polar, but glutamic acid has a negative charge whereas glutamine is neutral. The only difference between the two aa is that glutamine has an NH group where glutamic acid contains an OH group. Consequently there is not big difference in 3D structure but only in physiochemical properties. This can also be seen in the pymol picture. The substitution matrices both have a score of 2 for this mutations, which indicates that the mutation probably do not effect the protein much. On the other hand, the PSSM score is -1 and because the worst score for glutatmic acid is -3, this does indicate a negative effect of the substitution.
Leu183Pro
Proline is a unique aa in that it disrupts secondary structure. Since the leucine is located inside an alpha helix (see picture), the mutation to proline disrupts the helix and thus most probably leads to a disturbed function of the protein and causes the disease. This is also indicated by the negative score of -3 in both substitution matrices (worst case -4) . But the Proline is not the worst possible substitution, because several aa lead to a score of -4 in the BLOSUM62 and the OAM250 even gives a -6 for the substitution with cysteine. Nevertheless, the PSSM score of -4 (worst -5) and the high conservation of the wild type in the PSSM and MSA also indicate a strong negative effect.
Thr217Ile
Threonine is polar whereas isoleucine is nonpolar and thus hydrophob. Since the threonine is located on the surface of the protein, as can be seen in the Pymol visualisation, a hydrophob residue at this position might lead to a structural change of the protein. Besides, the PSSM score is -3 (worst possible -6) and we thus believe that this mutation could have a negative effect.
Cys282Tyr
Cystein can form a disulfide bonds with another close-by cystein residue. Disulfide bonds are very important for the protein structure and stability. The picture shows that the cystein is part of a beta sheet and located in the interior of the protein. The polar threonine thus disrupts possible disulfide bonds and the protein structure. This mutation is therefore likely disease causing. The BLOSUM62 matrix has a score of -2 that supports an effect of the SNP, but the PAM250 matrix has a 0 that says that the mutation is neutral. Nevertheless, the PSSM score of -5 (worst case -9) and the high conservation of the wild type in both the PSSM and the MSA indicate that this mutation has a strong negative effect on the protein function.
Arg330Met
Arginine and methionine have very different physicochemical properties. Arginine is basic polar and positive charged whereas methionine is nonpolar and uncharged. This mutation could therefore have an effect on the protein function. Also, the BLOSUM62 substitution matrix has a slight negative value (-1) and the PSSM an even more negative score of -3 (worst -6). Besides, the wildtype is in 28% of the sequences for the PSSM conserved. Therefore, we propose a negative effect of this mutation.
Prediction methods
In addition to the analysis based on the physicochemical properties and the conservation scores, we also used four different prediction programs: SIFT, Polyphen2, MutationTaster and SNAP2. <xr id="predictions"/> contains the predictions for all 10 mutations. The complete results of the four prediction programs can be found in the Lab journal task 8.
<figtable id="predictions">
database | mutation | consensus <xr id="summary"/> | SIFT | Polyphen2 | MutationTaster | SNAP2 | consensus | truth | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
prediction | score | HumDiv prediction | HumDiv score | HumVar prediction | HumVar score | prediction | probability | prediction | reliability | |||||
HGMD | Val53Met | small negative effect | AFFECT PROTEIN FUNCTION | 0.00 | probably damaging | 0.998 | possibly damaging | 0.841 | disease causing | 0.974610381496817 | Neutral | 0 | disease causing | disease causing |
HGMD | His63Asp | small negative effect | AFFECT PROTEIN FUNCTION | 0.00 | benign | 0.142 | benign | 0.161 | disease causing | 0.974610381496817 | Non-neutral | 2 | disease causing | disease causing |
dbSNP | Arg67His | neutral | AFFECT PROTEIN FUNCTION | 0.00 | benign | 0.145 | benign | 0.031 | polymorphism | 0.999999997930159 | Non-neutral | 4 | neutral | neutral |
dbSNP | Met97Ile | negative effect | TOLERATED | 0.53 | possibly damaging | 0.575 | benign | 0.114 | disease causing | 0.943409356836766 | Neutral | 7 | neutral | neutral |
dbSNP | Asn130Ser | neutral | AFFECT PROTEIN FUNCTION | 0.02 | possibly damaging | 0.883 | benign | 0.282 | polymorphism | 0.999999996637944 | Neutral | 3 | neutral | neutral |
HGMD | Glu168Gln | negative effect | AFFECT PROTEIN FUNCTION | 0.01 | probably damaging | 1.000 | probably damaging | 0.980 | polymorphism | 0.707489599782817 | Neutral | 7 | disease causing | disease causing |
HGMD | Leu183Pro | strong negative effect | AFFECT PROTEIN FUNCTION | 0.00 | probably damaging | 1.000 | probably damaging | 1.000 | disease causing | 0.999999979100498 | Non-neutral | 8 | disease causing | disease causing |
dbSNP | Thr217Ile | negative effect | TOLERATED | 0.92 | benign | 0.118 | benign | 0.097 | polymorphism | 0.999999999993365 | Neutral | 0 | neutral | neutral |
HGMD | Cys282Tyr | strong negative effect | AFFECT PROTEIN FUNCTION | 0.00 | probably damaging | 0.961 | possibly damaging | 0.667 | disease causing | 0.999999999736277 | Non-neutral | 7 | disease causing | disease causing |
HGMD | Arg330Met | negative effect | TOLERATED | 0.11 | probably damaging | 0.997 | possibly damaging | 0.781 | disease causing | 8.23173030693237e-06 | Non-neutral | 7 | disease causing | disease causing |
</figtable>
SIFT: The median sequence conservation for the ten mutations is always 3.03 except for Arg330Met where it is 3.01. It is ideal, because it lies inside the range of 2.75 and 3.5 (see SIFT documentation). The number of sequences represented at the position of the mutation are 255 for Arg330Met and 396 for the other mutations. Those numbers are high enough to expect good results from SIFT. The SIFT score ranges from 0 to 1. If it is below or equal 0.05, then the substitution is predicted to be damaging, else tolerated. SIFT predicted the effect of seven mutations right and 3 wrong.
Polyphen2 makes predictions based on two different datasets: HumDiv and HumVar. HumDiv should be used to evluate rare alleles and HumVar should be used to distinguish between mutations with drastic effects from all other mutations (see PolyPhen2 documentation). Thus, HumVar is more important for our task. PolyPhen2 discriminates between probably damaging, possibly damaging and benign mutations, based on the score. The resulst of the predctions based on HumDiv (three wrong) are worse than the prediction based on HumVar (one wrong), as expected. Polyohen2 with HumVar produced the best predictions in comparison to the other three methods.
MutationTaster predicts if the mutation is disease causing or a harmless polymorphism. It also outputs a probability value between 0 and 1, where 1 indicates a very secure prediction (see MutationTaster documentation). The MutationTaster program got two predictions wrong.
SNAP2: The reliability ranges between 0 and 9, with 9 indicating a high confidence in the prediction (seeSNAP help). SNAP2 predicted the effect of 3 mutations wrong.
PolyPhen2 achieved the best results with the HumVar database. It only predicted the His63Asp mutation wrong. But the other methods also predicted far more than 50% of the mutations correct. Therefore, the consensus predictions in <xr id="predictions"/>, which is a major vote based on the consensus from <xr id="summary" /> and the predictions of the four programs, agree with the actual effect of the mutations in all cases.
Conclusion
In our opinion, the physicochemical properties together with the 3D visualisation give a very good first impression of the mutation. Also the PSSM score shows mutations that are clearly negativ. The MSA alignment in our case did not help very much. All in all, our consensus only differs from the truth in 2 cases and is thus even better than SIFT and SNAP2, because we identified all mutations that are listed in the HGMD.