Canavan Disease: Task 08 - Sequence-based Mutation Analysis

Sequence-based mutation analysis is important, since mutations may effect the protein stability or function. The analysis can also be used to predict, if a mutation is disease causing, or not.

Wild Type - Mutant Approach

To get a feeling of interpreting the data of different SNPs, ten amino acid mutations were randomly chosen from HGMD (disease causing) and dbSNP (non-synonymous mutations). Those were sorted in ascending order to its original amino acid to shuffle HGMD and dbSNP, such that a memorization from which database it came from is not possible any longer. <xr id="data"></xr> gives a short summary about these mutations in terms of sidechain changes or the secondary structure:

Comparison of Changes from Wild Type to Mutation Type
Mutation	Sidechain Polarity		Sidechain Charge		Visualization	Sec. Struc.	Uniprot Info
	from	to	from	to
Arg233Trp	basic polar	nonpolar	positive	neutral		LOOP	-
Asn121Asp	polar	acidic polar	neutral	negative		LOOP	-
His21Pro	basic polar	nonpolar	neutral	neutral		LOOP	metal binding
Ile157Thr	nonpolar	polar	neutral	neutral		LOOP	-
Leu272Pro	nonpolar	nonpolar	neutral	neutral		LOOP	-
Lys213Glu	basic polar	acidic polar	positive	negative		LOOP	-
Pro149Ala	nonpolar	nonpolar	neutral	neutral		LOOP near HELIX	-
Pro257Arg	nonpolar	basic polar	neutral	positive		LOOP	-
Thr166Ile	polar	nonpolar	neutral	neutral		LOOP near HELIX	in binding region
Tyr288Cys	polar	nonpolar	neutral	neutral		HELIX	binding site

Investigation of changes in sidechain, secondary structure of funtional residue (as listed in Uniprot). For each mutation the wild type is colored green, the mutation is colored blue.

</figtable>

Interpreting <xr id="data"></xr> above, someone may assume the following observations:

Arg233Trp: The sidechain polarity changes from basic polar to nonpolar. The charge changes from positive to neutral. The mutation site is located within a LOOP region and there is no information whether it is a functional residue (as it could be found in Uniprot). Without further investigation a first impression is, that it is possibly disease causing.
Asn121Asp: The sidechain polarity changes from polar to acidic polar. The charge changes from neutral to negative. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is possibly disease causing.
His21Pro: The sidechain polarity changes from basic polar to nonpolar. The charge does not change. The mutation site is located within a LOOP region, but within the active center. This position is needed for the zinc binding. Without further investigation a first impression is, that it is disease causing.
Ile157Thr: The sidechain polarity changes from nonpolar to polar. The charge does not change. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is not disease causing.
Leu272Pro: The sidechain polarity does not change. The charge does not change. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is not disease causing.
Lys213Glu: The sidechain polarity changes from basic polar to acidic polar. The charge changes from positive to negative. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is disease causing.
Pro149Ala: The sidechain polarity does not change. The charge does not change. The mutation site is located within a LOOP region, quite near to a HELIX and there is no information whether it is a functional residue, or not. Since Proline is known to be a typical HELIX-breaker, maybe this Proline is necessary for the sequence. Without further investigation a first impression is, that it is possibly disease causing.
Pro257Arg: The sidechain polarity changes from nonpolar to basic polar. The charge changes from neutral to positive. The mutation site is located within a LOOP region and there is no information whether it is a functional residue, or not. Without further investigation a first impression is, that it is possibly disease causing.
Thr166Ile: The sidechain polarity changes from polar to nonpolar. The charge does not change. The mutation site is located within a LOOP region next to a HELIX. This position is known to be pat of the binding region of aspartoacylase. Without further investigation a first impression is, that it is disease causing.
Tyr288Cys: The sidechain polarity changes from polar to nonpolar. The charge does not change. The mutation site is located within a HELIX. This position is known to be a binding site. Without further investigation a first impression is, that it is disease causing.

For further investigation, the results of matrices as BLOSUM 62, PAM 1/250 and PSSM were taken into account (compare to <xr id="matrices"></xr>), as well as a comparison to the conservation of aspartoacylase within homologous species, as defined in <xr id="msa"></xr>. The PSSM matrix was calculated for aspartoacylase using PsiBlast with 5 iterations.

Comparison using Matrix Information
Mutation	BLOSUM 62	PAM 1/250	PSSM Matrix	PSSM Conservation		MSA Conservation
				WT	mut	WT	mut
Arg233Trp	-3	2	-5	30%	0%	0.95	0.0
Asn121Asp	1	2	-1	15%	2%	1.0	0.0
His21Pro	-2	0	-9	99%	0%	1.0	0.0
Ile157Thr	-1	0	1	4%	8%	1.0	0.0
Leu272Pro	-3	-3	-3	4%	2%	1.0	0.0
Lys213Glu	1	1	1	8%	5%	0.95	0.05
Pro149Ala	-1	1	0	10%	7%	1.0	0.0
Pro257Arg	-2	0	1	23%	8%	1.0	0.0
Thr166Ile	-1	0	-3	16%	1%	1.0	0.0
Tyr288Cys	-2	0	-4	81%	0%	1.0	0.0

Representation of effects of a mutation using standard matrices like BLOSUM and PAM. The PSSM was calculated via PsiBlast.
A multiple sequence alignment (MSA) was made to look for evolutionary conservations of the sites compares to homologous sequences from mammalian species. WT=wild type, mut=mutation.

</figtable>

Considering <xr id="matrices"></xr> above, a deeper look into matrices are done. Therefore a short reflection: In BLOSUM positive values indicate a common chemical substitution. Whereas common amino acids have a low and rare amino acids a high weight. In PAM highly negative values correlate to a high mismatch penalty on this mutation. If the PSSM conservation in the wild type is high and relative to this value very low in the mutation type the PSSM matrix value is negative and indicates, that the mutation is more likely to be disease causing. The more negative it is, the higher is the possibility to have an disease causing effect. The multiple sequence alignment of homologous sequences of aspartoacylase shows the conservation of the original (wild type) and mutated amino acid. A high value in the mutation type would indicate a normal mutation within range of homologous sequences from other mammalian species.

Arg233Trp: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows no real significant data. The PSSM conservation is high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would result into a change from possibly disease causing to disease causing.
Asn121Asp: The first impression was that it is possibly disease causing. BLOSUM represents a positive value, PAM shows no real significant data. The PSSM conservation is medium high in the wild type amino acid and the score is slightly negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
His21Pro: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is very high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.
Ile157Thr: The first impression was that it is not disease causing. BLOSUM represents a slightly negative value, PAM shows zero. The PSSM conservation is higher in the mutation type than in the wild type, therefore score is positive. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is not disease causing.
Leu272Pro: The first impression was that it is not disease causing. BLOSUM represents a negative value, PAM shows also a negative value. The PSSM conservation is slightly higher in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would result into a change from not disease causing to possibly disease causing.
Lys213Glu: The first impression was that it is disease causing. BLOSUM represents a positive value, PAM shows also a positive value. The PSSM conservation is slightly higher in the wild type amino acid and the score is slightly positive. The mutated amino acid is conserved in some homologous species. Since the sidechain changes in the first impression are so extreme, the second impression would remain the decision that it is disease causing.
Pro149Ala: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows a positive value. The PSSM conservation is slightly higher in the wild type amino acid and the score is zero. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
Pro257Arg: The first impression was that it is possibly disease causing. BLOSUM represents a negative value, PAM shows a zero. The PSSM conservation is higher in the wild type amino acid and the score is slightly positive. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is possibly disease causing.
Thr166Ile: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.
Tyr288Cys: The first impression was that it is disease causing. BLOSUM represents a negative value, PAM shows zero. The PSSM conservation is very high in the wild type amino acid and the score is negative. The mutated amino acid is not part of any of the homologous species. The second impression would remain the decision that it is disease causing.

The following <xr id="msa"></xr> shows the homologous sequences used for the multiple sequence alignment conservation approach. Those were resulting from a Blast search against aspartoacylase in Uniprot using the mammalian database. Only one sequence per species was used to prevent a bias towards those sequences.

Mammalian Homologous Sequences
Homolog	Protein	Organism
H2QBW4	Aspartoacylase (Canavan disease)	Pan troglodytes
G3QQC1	Uncharacterized protein	Gorilla gorilla
Q5R9E0	Aspartoacylase	Pongo abelii
G1S5Z4	Uncharacterized protein	Nomascus leucogenys
G7PT66	Aspartoacylase	Macaca fascicularis
F6WMI4	Uncharacterized protein	Equus caballus
B1PK17	Aspartoacylase	Sus scrofa
P46446	Aspartoacylase	Bos taurus
M3Y3U3	Uncharacterized protein	Mustela putorius furo
D2HZN6	Uncharacterized protein (Fragment)	Ailuropoda melanoleuca
E2R8M6	Uncharacterized protein	Canis familiaris
M3X3I5	Uncharacterized protein	Felis catus
G1SPT6	Uncharacterized protein	Oryctolagus cuniculus
I3N0V6	Uncharacterized protein	Spermophilus tridecemlineatus
G5B939	Aspartoacylase	Heterocephalus glaber
G3TAV8	Uncharacterized protein	Loxodonta africana
G1P679	Uncharacterized protein	Myotis lucifugus
H0WW85	Uncharacterized protein	Otolemur garnettii
H0UYA8	Uncharacterized protein (Fragment)	Cavia porcellus
Q9R1T5	Aspartoacylase	Rattus norvegicus
Q8R3P0	Aspartoacylase	Mus musculus

List of all homologous sequences to aspartoacylase found by a Blast search against a mammalian database.

</figtable>

Scoring Approach

The next step was to use different methods available online to check whether they predict a mutation to be disease causing, or to show an effect concerning the protein function, or not. For this approach SIFT, PolyPhen, MutationTaster and SNAP2 were used. The single results with their prediction probabilities for each method can be found in the Supplement at the end of this Task. The results concerning the protein function are listed in a summary <xr id="comparison"></xr> in the next section.

Comparison

To get a better overview of all methods used in this Task the following <xr id="comparison"></xr> represent the prediction for each mutation, whereas the color-coding indicates:

red - predicted to be disease causing (or having a functional effect for the mutation)
yellow - predicted to be possibly disease causing and therefore maybe disease causing
green - predicted to be not disease causing and therefore neutral

Prediction of Different Approaches and Validation
Mutation	First Personal Impression	Second Personal Impression	SIFT	PolyPhen	MutationTaster	SNAP	Validation
Arg233Trp	maybe	dis.caus.	dis.caus.	dis.caus.	dis.caus.	dis.caus.	<--	not sure (dbSNP data)
Asn121Asp	maybe	maybe	dis.caus.	dis.caus.	dis.caus.	dis.caus.	<--	(dbSNP data) but another mutation known to be disease causing
His21Pro	dis.caus.	dis.caus.	dis.caus.	dis.caus.	dis.caus.	dis.caus.	<--	definitively disease causing (HGMD data)
Ile157Thr	neutral	neutral	neutral	neutral	dis.caus.	neutral	<--	(dbSNP data) but SNPdbe without a reference to Canavan Disease
Leu272Pro	neutral	maybe	dis.caus.	dis.caus.	dis.caus.	dis.caus.	<--	definitively disease causing (HGMD data)
Lys213Glu	dis.caus.	dis.caus.	neutral	neutral	dis.caus.	neutral	<--	definitively disease causing (HGMD data)
Pro149Ala	maybe	maybe	neutral	maybe	dis.caus.	neutral	<--	not sure (dbSNP data)
Pro257Arg	maybe	maybe	neutral	maybe	dis.caus.	neutral	<--	not sure (dbSNP data)
Thr166Ile	dis.caus.	dis.caus.	dis.caus.	dis.caus.	dis.caus.	dis.caus.	<--	definitively disease causing (HGMD data)
Tyr288Cys	dis.caus.	dis.caus.	dis.caus.	dis.caus.	dis.caus.	dis.caus.	<--	definitively disease causing (HGMD data)

Resulting predictions for each method. Red = disease causing / functional effect, yellow = possibly disease causing, green = neutral, no functional effect.
Validation - using the information from Task 07.

</figtable>

As it can be seen in <xr id="comparison"></xr> the personal impressions from the wild type to mutation type approach are quite comparable to those predicted with available online methods. Someone should consider to run those methods, if the personal impression stays with the maybe disease causing interpretation. Using matrices as BLOSUM, PAM or PSSM definitely influenced the personal impression in a positive manner compared to the validation result. Interestingly Lys213Glu was only predicted correctly from MutationTaster and the personal impression from the simple approach, which shows that a simple approach (looking at the data) could be a good way to filter first.
A further validation of the positions showed, that position 121 (here Asn->Asp) is known to be associated with Canavan Disease in HGMD (Asn->Ile). Therefore the assumption that this position is disease causing. Position 157 (here Ile->Thr) can also be found in SNPdbe, but without any association to Canavan Disease. This leads to the assumption that it is neutral. For all other positions any further validation was not possible.

Supplement

Tasks

Link to Task 01: Canavan Disease
Link to Task 02: Alignments
Link to Task 03: Sequence-based Predictions
Link to Task 04: Structural Alignments
Link to Task 05: Homology Modelling
Link to Task 06: Protein Structure Prediction from Evolutionary Sequence Variation
Link to Task 07: Researching SNPs
Link to Task 08: Sequence-based Mutation Analysis
Link to Task 09: Structure-based Mutation Analysis
Link to Task 10: Normal Mode Analysis

Canavan Disease: Task 08 - Sequence-based Mutation Analysis

Contents

Wild Type - Mutant Approach

Scoring Approach

Comparison

Supplement

Tasks

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools