Canavan Disease: Task 08 - Sequence-based Mutation Analysis
Sequence-based mutation analysis is important, since mutations may effect the protein stability or function. The analysis can also be used to predict, if a mutation is disease causing, or not.
Wild Type - Mutant Approach
To get a feeling of interpreting the data of different SNPs, ten amino acid mutations were randomly chosen from HGMD (disease causing) and dbSNP (non-synonymous mutations). Those were sorted in ascending order to its original amino acid to shuffle HGMD and dbSNP, such that a memorization from which database it came from is not possible any longer. <xr id="data"></xr> gives a short summary about these mutations in terms of sidechain changes or the secondary structure:
<figtable id="data">
Comparison of Changes from Wild Type to Mutation Type | |||||||
---|---|---|---|---|---|---|---|
Mutation | Sidechain Polarity | Sidechain Charge | Visualization | Sec. Struc. | Uniprot Info | ||
from | to | from | to | ||||
Arg233Trp | basic polar | nonpolar | positive | neutral | LOOP | - | |
Asn121Asp | polar | acidic polar | neutral | negative | LOOP | - | |
His21Pro | basic polar | nonpolar | neutral | neutral | LOOP | metal binding | |
Ile157Thr | nonpolar | polar | neutral | neutral | LOOP | - | |
Leu272Pro | nonpolar | nonpolar | neutral | neutral | LOOP | - | |
Lys213Glu | basic polar | acidic polar | positive | negative | LOOP | - | |
Pro149Ala | nonpolar | nonpolar | neutral | neutral | LOOP near HELIX |
- | |
Pro257Arg | nonpolar | basic polar | neutral | positive | LOOP | - | |
Thr166Ile | polar | nonpolar | neutral | neutral | LOOP near HELIX |
in binding region | |
Tyr288Cys | polar | nonpolar | neutral | neutral | HELIX | binding site |
</figtable>
Interpreting <xr id="data"></xr> above, someone may assume the following observations:
- Arg233Trp: The sidechain polarity changes from basic polar to nonpolar. The charge changes from positive to neutral. The mutation site is located within a LOOP region and there is no information whether it is a functional residue (as it could be found in Uniprot). Without further investigation a first impression is, that it is possibly disease causing.
- Asn121Asp: The sidechain polarity changes from polar to acidic polar. The charge changes from neutral to negative. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is possibly disease causing.
- His21Pro: The sidechain polarity changes from basic polar to nonpolar. The charge does not change. The mutation site is located within a LOOP region, but within the active center. This position is needed for the zinc binding. Without further investigation a first impression is, that it is disease causing.
- Ile157Thr: The sidechain polarity changes from nonpolar to polar. The charge does not change. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is not disease causing.
- Leu272Pro: The sidechain polarity does not change. The charge does not change. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is not disease causing.
- Lys213Glu: The sidechain polarity changes from basic polar to acidic polar. The charge changes from positive to negative. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is disease causing.
- Pro149Ala: The sidechain polarity does not change. The charge does not change. The mutation site is located within a LOOP region, quite near to a HELIX and there is no information whether it is a functional residue, or not. Since Proline is known to be a typical HELIX-breaker, maybe this Proline is necessary for the sequence. Without further investigation a first impression is, that it is possibly disease causing.
- Pro257Arg: The sidechain polarity changes from nonpolar to basic polar. The charge changes from neutral to positive. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is possibly disease causing.
- Thr166Ile: The sidechain polarity changes from polar to nonpolar. The charge does not change. The mutation site is located within a LOOP region next to a HELIX. This position is known to be pat of the binding region of aspartoacylase. Without further investigation a first impression is, that it is disease causing.
- Tyr288Cys: The sidechain polarity changes from polar to nonpolar. The charge does not change. The mutation site is located within a HELIX. This position is known to be a binding site. Without further investigation a first impression is, that it is disease causing.
For further investigation, the results of matrices as BLOSUM 62, PAM 1/250 and PSSM were taken into account (compare to <xr id="matrices"></xr>), as well as a comparison to the conservation of aspartoacylase within homologous species, as defined in <xr id="msa"></xr>. The PSSM matrix was calculated for aspartoacylase using PsiBlast with 5 iterations.
<figtable id="matrices">
Comparison using Matrix Information | |||||||
---|---|---|---|---|---|---|---|
Mutation | BLOSUM 62 | PAM 1/250 | PSSM Matrix | PSSM Conservation | MSA Conservation | ||
WT | mut | WT | mut | ||||
Arg233Trp | -3 | 2 | -5 | 30% | 0% | 0.95 | 0.0 |
Asn121Asp | 1 | 2 | -1 | 15% | 2% | 1.0 | 0.0 |
His21Pro | -2 | 0 | -9 | 99% | 0% | 1.0 | 0.0 |
Ile157Thr | -1 | 0 | 1 | 4% | 8% | 1.0 | 0.0 |
Leu272Pro | -3 | -3 | -3 | 4% | 2% | 1.0 | 0.0 |
Lys213Glu | 1 | 1 | 1 | 8% | 5% | 0.95 | 0.05 |
Pro149Ala | -1 | 1 | 0 | 10% | 7% | 1.0 | 0.0 |
Pro257Arg | -2 | 0 | 1 | 23% | 8% | 1.0 | 0.0 |
Thr166Ile | -1 | 0 | -3 | 16% | 1% | 1.0 | 0.0 |
Tyr288Cys | -2 | 0 | -4 | 81% | 0% | 1.0 | 0.0 |
A multiple sequence alignment (MSA) was made to look for evolutionary conservations of the sites compares to homologous sequences from mammalian species. WT=wild type, mut=mutation.
</figtable>
Considering <xr id="matrices"></xr> above, a deeper look into matrices are done. Therefore a short reflection: In BLOSUM positive values indicate a common chemical substitution. Whereas common amino acids have a low and rare amino acids a high weight. In PAM highly negative values correlate to a high mismatch penalty on this mutation. If the PSSM conservation in the wild type is high and relative to this value very low in the mutation type the PSSM matrix value is negative and indicates, that the mutation is more likely to be disease causing. The more negative it is, the higher is the possibility to have an disease causing effect. The multiple sequence alignment of homologous sequences of aspartoacylase shows the conservation of the original (wild type) and mutated amino acid. A high value in the mutation type would indicate a normal mutation within range of homologous sequences from other mammalian species.
- Arg233Trp: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows no real significant data. The PSSM conservation is high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would result into a change from possibly disease causing to disease causing.
- Asn121Asp: The first impression was that it is possibly disease causing. BLOSUM represents a positive value, PAM shows no real significant data. The PSSM conservation is medium high in the wild type amino acid and the score is slightly negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
- His21Pro: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is very high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.
- Ile157Thr: The first impression was that it is not disease causing. BLOSUM represents a slightly negative value, PAM shows zero. The PSSM conservation is higher in the mutation type than in the wild type, therefore score is positive. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is not disease causing.
- Leu272Pro: The first impression was that it is not disease causing. BLOSUM represents a negative value, PAM shows also a negative value. The PSSM conservation is slightly higher in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would result into a change from not disease causing to possibly disease causing.
- Lys213Glu: The first impression was that it is disease causing. BLOSUM represents a positive value, PAM shows also a positive value. The PSSM conservation is slightly higher in the wild type amino acid and the score is slightly positive. The mutated amino acid is conserved in some homologous species. Since the sidechain changes in the first impression are so extreme, the second impression would remain the decision that it is disease causing.
- Pro149Ala: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows a positive value. The PSSM conservation is slightly higher in the wild type amino acid and the score is zero. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
- Pro257Arg: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows a zero. The PSSM conservation is higher in the wild type amino acid and the score is slightly positive. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
- Thr166Ile: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.
- Tyr288Cys: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is very high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.
The following <xr id="msa"></xr> shows the homologous sequences used for the multiple sequence alignment conservation approach. Those were resulting from a Blast search against aspartoacylase in Uniprot using the mammalian database. Only one sequence per species was used to prevent a bias towards those sequences.
<figtable id="msa">
Mammalian Homologous Sequences | ||
---|---|---|
Homolog | Protein | Organism |
H2QBW4 | Aspartoacylase (Canavan disease) | Pan troglodytes |
G3QQC1 | Uncharacterized protein | Gorilla gorilla |
Q5R9E0 | Aspartoacylase | Pongo abelii |
G1S5Z4 | Uncharacterized protein | Nomascus leucogenys |
G7PT66 | Aspartoacylase | Macaca fascicularis |
F6WMI4 | Uncharacterized protein | Equus caballus |
B1PK17 | Aspartoacylase | Sus scrofa |
P46446 | Aspartoacylase | Bos taurus |
M3Y3U3 | Uncharacterized protein | Mustela putorius furo |
D2HZN6 | Uncharacterized protein (Fragment) | Ailuropoda melanoleuca |
E2R8M6 | Uncharacterized protein | Canis familiaris |
M3X3I5 | Uncharacterized protein | Felis catus |
G1SPT6 | Uncharacterized protein | Oryctolagus cuniculus |
I3N0V6 | Uncharacterized protein | Spermophilus tridecemlineatus |
G5B939 | Aspartoacylase | Heterocephalus glaber |
G3TAV8 | Uncharacterized protein | Loxodonta africana |
G1P679 | Uncharacterized protein | Myotis lucifugus |
H0WW85 | Uncharacterized protein | Otolemur garnettii |
H0UYA8 | Uncharacterized protein (Fragment) | Cavia porcellus |
Q9R1T5 | Aspartoacylase | Rattus norvegicus |
Q8R3P0 | Aspartoacylase | Mus musculus |
</figtable>
Scoring Approach
The next step was to use different methods available online to check whether they predict a mutation to be disease causing, or to show an effect concerning the protein function, or not. For this approach SIFT, PolyPhen, MutationTaster and SNAP2 were used. The single results with their prediction probabilities for each method can be found in the Supplement at the end of this Task. The results concerning the protein function are listed in a summary <xr id="comparison"></xr> in the next section.
Comparison
To get a better overview of all methods used in this Task the following <xr id="comparison"></xr> represent the prediction for each mutation, whereas the color-coding indicates:
- red - predicted to be disease causing (or having a functional effect for the mutation)
- yellow - predicted to be possibly disease causing and therefore maybe disease causing
- green - predicted to be not disease causing and therefore neutral
<figtable id="comparison">
Prediction of Different Approaches and Validation | ||||||||
---|---|---|---|---|---|---|---|---|
Mutation | First Personal Impression |
Second Personal Impression |
SIFT | PolyPhen | MutationTaster | SNAP | Validation | |
Arg233Trp | maybe | dis.caus. | dis.caus. | dis.caus. | dis.caus. | dis.caus. | <-- | not sure (dbSNP data) |
Asn121Asp | maybe | maybe | dis.caus. | dis.caus. | dis.caus. | dis.caus. | <-- | (dbSNP data) but another mutation known to be disease causing |
His21Pro | dis.caus. | dis.caus. | dis.caus. | dis.caus. | dis.caus. | dis.caus. | <-- | definitively disease causing (HGMD data) |
Ile157Thr | neutral | neutral | neutral | neutral | dis.caus. | neutral | <-- | (dbSNP data) but SNPdbe without a reference to Canavan Disease |
Leu272Pro | neutral | maybe | dis.caus. | dis.caus. | dis.caus. | dis.caus. | <-- | definitively disease causing (HGMD data) |
Lys213Glu | dis.caus. | dis.caus. | neutral | neutral | dis.caus. | neutral | <-- | definitively disease causing (HGMD data) |
Pro149Ala | maybe | maybe | neutral | maybe | dis.caus. | neutral | <-- | not sure (dbSNP data) |
Pro257Arg | maybe | maybe | neutral | maybe | dis.caus. | neutral | <-- | not sure (dbSNP data) |
Thr166Ile | dis.caus. | dis.caus. | dis.caus. | dis.caus. | dis.caus. | dis.caus. | <-- | definitively disease causing (HGMD data) |
Tyr288Cys | dis.caus. | dis.caus. | dis.caus. | dis.caus. | dis.caus. | dis.caus. | <-- | definitively disease causing (HGMD data) |
Validation - using the information from Task 07.
</figtable>
As it can be seen in <xr id="comparison"></xr> the personal impressions from the wild type to mutation type approach are quite comparable to those predicted with available online methods. Someone should consider to run those methods, if the personal impression stays with the maybe disease causing interpretation. Using matrices as BLOSUM, PAM or PSSM definitely influenced the personal impression in a positive manner compared to the validation result. Interestingly Lys213Glu was only predicted correctly from MutationTaster and the personal impression from the simple approach, which shows that a simple approach (looking at the data) could be a good way to filter first.
A further validation of the positions showed, that position 121 (here Asn->Asp) is known to be associated with Canavan Disease in HGMD (Asn->Ile). Therefore the assumption that this position is disease causing. Position 157 (here Ile->Thr) can also be found in SNPdbe, but without any association to Canavan Disease. This leads to the assumption that it is neutral. For all other positions any further validation was not possible.
Supplement
Tasks
- Link to Task 01: Canavan Disease
- Link to Task 02: Alignments
- Link to Task 03: Sequence-based Predictions
- Link to Task 04: Structural Alignments
- Link to Task 05: Homology Modelling
- Link to Task 06: Protein Structure Prediction from Evolutionary Sequence Variation
- Link to Task 07: Researching SNPs
- Link to Task 08: Sequence-based Mutation Analysis
- Link to Task 09: Structure-based Mutation Analysis
- Link to Task 10: Normal Mode Analysis