Canavan Disease: Task 08 - Sequence-based Mutation Analysis

From Bioinformatikpedia
Revision as of 13:28, 5 September 2013 by Boehma (talk | contribs) (Comparison)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Sequence-based mutation analysis is important, since mutations may effect the protein stability or function. The analysis can also be used to predict, if a mutation is disease causing, or not.

Wild Type - Mutant Approach

To get a feeling of interpreting the data of different SNPs, ten amino acid mutations were randomly chosen from HGMD (disease causing) and dbSNP (non-synonymous mutations). Those were sorted in ascending order to its original amino acid to shuffle HGMD and dbSNP, such that a memorization from which database it came from is not possible any longer. <xr id="data"></xr> gives a short summary about these mutations in terms of sidechain changes or the secondary structure:

<figtable id="data">

Comparison of Changes from Wild Type to Mutation Type
Mutation Sidechain Polarity Sidechain Charge Visualization Sec. Struc. Uniprot Info
from to from to
Arg233Trp basic polar nonpolar positive neutral
Canavan Mutation Arg233Trp.png
LOOP -
Asn121Asp polar acidic polar neutral negative
Canavan Mutation Asn121Asp.png
LOOP -
His21Pro basic polar nonpolar neutral neutral
Canavan Mutation His21Pro.png
LOOP metal binding
Ile157Thr nonpolar polar neutral neutral
Canavan Mutation Ile157Thr.png
LOOP -
Leu272Pro nonpolar nonpolar neutral neutral
Mutation Leu272Pro.png
LOOP -
Lys213Glu basic polar acidic polar positive negative
Canavan Mutation Lys213Glu.png
LOOP -
Pro149Ala nonpolar nonpolar neutral neutral
Canavan Mutation Pro149Ala.png
LOOP
near HELIX
-
Pro257Arg nonpolar basic polar neutral positive
Canavan Mutation Pro257Arg.png
LOOP -
Thr166Ile polar nonpolar neutral neutral
Canavan Mutation Thr166Ile.png
LOOP
near HELIX
in binding region
Tyr288Cys polar nonpolar neutral neutral
Canavan Mutation Tyr288Cys.png
HELIX binding site
Investigation of changes in sidechain, secondary structure of funtional residue (as listed in Uniprot). For each mutation the wild type is colored green, the mutation is colored blue.

</figtable>

Interpreting <xr id="data"></xr> above, someone may assume the following observations:

  • Arg233Trp: The sidechain polarity changes from basic polar to nonpolar. The charge changes from positive to neutral. The mutation site is located within a LOOP region and there is no information whether it is a functional residue (as it could be found in Uniprot). Without further investigation a first impression is, that it is possibly disease causing.
  • Asn121Asp: The sidechain polarity changes from polar to acidic polar. The charge changes from neutral to negative. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is possibly disease causing.
  • His21Pro: The sidechain polarity changes from basic polar to nonpolar. The charge does not change. The mutation site is located within a LOOP region, but within the active center. This position is needed for the zinc binding. Without further investigation a first impression is, that it is disease causing.
  • Ile157Thr: The sidechain polarity changes from nonpolar to polar. The charge does not change. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is not disease causing.
  • Leu272Pro: The sidechain polarity does not change. The charge does not change. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is not disease causing.
  • Lys213Glu: The sidechain polarity changes from basic polar to acidic polar. The charge changes from positive to negative. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is disease causing.
  • Pro149Ala: The sidechain polarity does not change. The charge does not change. The mutation site is located within a LOOP region, quite near to a HELIX and there is no information whether it is a functional residue, or not. Since Proline is known to be a typical HELIX-breaker, maybe this Proline is necessary for the sequence. Without further investigation a first impression is, that it is possibly disease causing.
  • Pro257Arg: The sidechain polarity changes from nonpolar to basic polar. The charge changes from neutral to positive. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is possibly disease causing.
  • Thr166Ile: The sidechain polarity changes from polar to nonpolar. The charge does not change. The mutation site is located within a LOOP region next to a HELIX. This position is known to be pat of the binding region of aspartoacylase. Without further investigation a first impression is, that it is disease causing.
  • Tyr288Cys: The sidechain polarity changes from polar to nonpolar. The charge does not change. The mutation site is located within a HELIX. This position is known to be a binding site. Without further investigation a first impression is, that it is disease causing.

For further investigation, the results of matrices as BLOSUM 62, PAM 1/250 and PSSM were taken into account (compare to <xr id="matrices"></xr>), as well as a comparison to the conservation of aspartoacylase within homologous species, as defined in <xr id="msa"></xr>. The PSSM matrix was calculated for aspartoacylase using PsiBlast with 5 iterations.

<figtable id="matrices">

Comparison using Matrix Information
Mutation BLOSUM 62 PAM 1/250 PSSM Matrix PSSM Conservation MSA Conservation
WT mut WT mut
Arg233Trp -3 2 -5 30% 0% 0.95 0.0
Asn121Asp 1 2 -1 15% 2% 1.0 0.0
His21Pro -2 0 -9 99% 0% 1.0 0.0
Ile157Thr -1 0 1 4% 8% 1.0 0.0
Leu272Pro -3 -3 -3 4% 2% 1.0 0.0
Lys213Glu 1 1 1 8% 5% 0.95 0.05
Pro149Ala -1 1 0 10% 7% 1.0 0.0
Pro257Arg -2 0 1 23% 8% 1.0 0.0
Thr166Ile -1 0 -3 16% 1% 1.0 0.0
Tyr288Cys -2 0 -4 81% 0% 1.0 0.0
Representation of effects of a mutation using standard matrices like BLOSUM and PAM. The PSSM was calculated via PsiBlast.
A multiple sequence alignment (MSA) was made to look for evolutionary conservations of the sites compares to homologous sequences from mammalian species. WT=wild type, mut=mutation.

</figtable>

Considering <xr id="matrices"></xr> above, a deeper look into matrices are done. Therefore a short reflection: In BLOSUM positive values indicate a common chemical substitution. Whereas common amino acids have a low and rare amino acids a high weight. In PAM highly negative values correlate to a high mismatch penalty on this mutation. If the PSSM conservation in the wild type is high and relative to this value very low in the mutation type the PSSM matrix value is negative and indicates, that the mutation is more likely to be disease causing. The more negative it is, the higher is the possibility to have an disease causing effect. The multiple sequence alignment of homologous sequences of aspartoacylase shows the conservation of the original (wild type) and mutated amino acid. A high value in the mutation type would indicate a normal mutation within range of homologous sequences from other mammalian species.

  • Arg233Trp: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows no real significant data. The PSSM conservation is high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would result into a change from possibly disease causing to disease causing.
  • Asn121Asp: The first impression was that it is possibly disease causing. BLOSUM represents a positive value, PAM shows no real significant data. The PSSM conservation is medium high in the wild type amino acid and the score is slightly negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
  • His21Pro: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is very high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.
  • Ile157Thr: The first impression was that it is not disease causing. BLOSUM represents a slightly negative value, PAM shows zero. The PSSM conservation is higher in the mutation type than in the wild type, therefore score is positive. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is not disease causing.
  • Leu272Pro: The first impression was that it is not disease causing. BLOSUM represents a negative value, PAM shows also a negative value. The PSSM conservation is slightly higher in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would result into a change from not disease causing to possibly disease causing.
  • Lys213Glu: The first impression was that it is disease causing. BLOSUM represents a positive value, PAM shows also a positive value. The PSSM conservation is slightly higher in the wild type amino acid and the score is slightly positive. The mutated amino acid is conserved in some homologous species. Since the sidechain changes in the first impression are so extreme, the second impression would remain the decision that it is disease causing.
  • Pro149Ala: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows a positive value. The PSSM conservation is slightly higher in the wild type amino acid and the score is zero. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
  • Pro257Arg: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows a zero. The PSSM conservation is higher in the wild type amino acid and the score is slightly positive. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
  • Thr166Ile: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.
  • Tyr288Cys: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is very high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.

The following <xr id="msa"></xr> shows the homologous sequences used for the multiple sequence alignment conservation approach. Those were resulting from a Blast search against aspartoacylase in Uniprot using the mammalian database. Only one sequence per species was used to prevent a bias towards those sequences.

<figtable id="msa">

Mammalian Homologous Sequences
Homolog Protein Organism
H2QBW4 Aspartoacylase (Canavan disease) Pan troglodytes
G3QQC1 Uncharacterized protein Gorilla gorilla
Q5R9E0 Aspartoacylase Pongo abelii
G1S5Z4 Uncharacterized protein Nomascus leucogenys
G7PT66 Aspartoacylase Macaca fascicularis
F6WMI4 Uncharacterized protein Equus caballus
B1PK17 Aspartoacylase Sus scrofa
P46446 Aspartoacylase Bos taurus
M3Y3U3 Uncharacterized protein Mustela putorius furo
D2HZN6 Uncharacterized protein (Fragment) Ailuropoda melanoleuca
E2R8M6 Uncharacterized protein Canis familiaris
M3X3I5 Uncharacterized protein Felis catus
G1SPT6 Uncharacterized protein Oryctolagus cuniculus
I3N0V6 Uncharacterized protein Spermophilus tridecemlineatus
G5B939 Aspartoacylase Heterocephalus glaber
G3TAV8 Uncharacterized protein Loxodonta africana
G1P679 Uncharacterized protein Myotis lucifugus
H0WW85 Uncharacterized protein Otolemur garnettii
H0UYA8 Uncharacterized protein (Fragment) Cavia porcellus
Q9R1T5 Aspartoacylase Rattus norvegicus
Q8R3P0 Aspartoacylase Mus musculus
List of all homologous sequences to aspartoacylase found by a Blast search against a mammalian database.

</figtable>

Scoring Approach

The next step was to use different methods available online to check whether they predict a mutation to be disease causing, or to show an effect concerning the protein function, or not. For this approach SIFT, PolyPhen, MutationTaster and SNAP2 were used. The single results with their prediction probabilities for each method can be found in the Supplement at the end of this Task. The results concerning the protein function are listed in a summary <xr id="comparison"></xr> in the next section.

Comparison

To get a better overview of all methods used in this Task the following <xr id="comparison"></xr> represent the prediction for each mutation, whereas the color-coding indicates:

  • red - predicted to be disease causing (or having a functional effect for the mutation)
  • yellow - predicted to be possibly disease causing and therefore maybe disease causing
  • green - predicted to be not disease causing and therefore neutral


<figtable id="comparison">

Prediction of Different Approaches and Validation
Mutation First Personal
Impression
Second Personal
Impression
SIFT PolyPhen MutationTaster SNAP Validation
Arg233Trp maybe dis.caus. dis.caus. dis.caus. dis.caus. dis.caus. <-- not sure
(dbSNP data)
Asn121Asp maybe maybe dis.caus. dis.caus. dis.caus. dis.caus. <-- (dbSNP data) but
another mutation known to be disease causing
His21Pro dis.caus. dis.caus. dis.caus. dis.caus. dis.caus. dis.caus. <-- definitively disease causing
(HGMD data)
Ile157Thr neutral neutral neutral neutral dis.caus. neutral <-- (dbSNP data) but
SNPdbe without a reference to Canavan Disease
Leu272Pro neutral maybe dis.caus. dis.caus. dis.caus. dis.caus. <-- definitively disease causing
(HGMD data)
Lys213Glu dis.caus. dis.caus. neutral neutral dis.caus. neutral <-- definitively disease causing
(HGMD data)
Pro149Ala maybe maybe neutral maybe dis.caus. neutral <-- not sure
(dbSNP data)
Pro257Arg maybe maybe neutral maybe dis.caus. neutral <-- not sure
(dbSNP data)
Thr166Ile dis.caus. dis.caus. dis.caus. dis.caus. dis.caus. dis.caus. <-- definitively disease causing
(HGMD data)
Tyr288Cys dis.caus. dis.caus. dis.caus. dis.caus. dis.caus. dis.caus. <-- definitively disease causing
(HGMD data)
Resulting predictions for each method. Red = disease causing / functional effect, yellow = possibly disease causing, green = neutral, no functional effect.
Validation - using the information from Task 07.

</figtable>

As it can be seen in <xr id="comparison"></xr> the personal impressions from the wild type to mutation type approach are quite comparable to those predicted with available online methods. Someone should consider to run those methods, if the personal impression stays with the maybe disease causing interpretation. Using matrices as BLOSUM, PAM or PSSM definitely influenced the personal impression in a positive manner compared to the validation result. Interestingly Lys213Glu was only predicted correctly from MutationTaster and the personal impression from the simple approach, which shows that a simple approach (looking at the data) could be a good way to filter first.
A further validation of the positions showed, that position 121 (here Asn->Asp) is known to be associated with Canavan Disease in HGMD (Asn->Ile). Therefore the assumption that this position is disease causing. Position 157 (here Ile->Thr) can also be found in SNPdbe, but without any association to Canavan Disease. This leads to the assumption that it is neutral. For all other positions any further validation was not possible.

Supplement

Tasks