Canavan Task 6 - Sequence-based mutation analysis

From Bioinformatikpedia
Revision as of 23:00, 29 August 2012 by Gatzmannf (talk | contribs) (E235K)

Protocol

Further information can be found in the protocol.

Choosing mutations

The ten chosen mutations are listed below in <xr id="canavan_muts"/>.

<figtable id="canavan_muts"> <xr nolink id="canavan_muts"/> Listed are the 10 chosen mutations used for this task and their database sources. Mutations colored in red are disease causing. Included is also a superposition of the native residue with the mutanted residue.

Mutant E285A A305E G123E R71H R71K K213E V278M M82T E235K I270T
Known effect <= 1% activity left 0% activity left <= 25% Canavan Disease not disease related Canavan Disease not disease related not disease related Canavan Disease not disease related
Sources dbSNP
SNPdbe
OMIM
HGMD
dbSNP
SNPdbe
OMIM
HGMD
SNPdbe
HGMD
dbSNP
SNPdbe
HGMD
SNPdbe HGMD dbSNP
SNPdbe
dbSNP
SNPdbe
dbSNP
SNPdbe
dbSNP

</figtable>

Physico-chemical effects

<xr id="mutations_summary"/> is meant to give a first idea of the nature of the mutations. For each of the ten mutations, we included following properties for the amino acids:

  • Hydrophobicity scores <ref name="hydrophobicity"> Kyte J, Doolittle RF (May 1982).Journal of Molecular Biology 157 (1): 105–32. </ref> range from -7.5 (Arg) to 3.1 (Ile), where Arginine is most hydrophilic and Isoleucine most hydrophobic
  • the volume <ref name="volume">Volume: A.A. Zamyatin, Prog. Biophys. Mol. Biol., 24(1972)107-123</ref>, measured in Å3
  • the isoelectric point <ref name="pi">JC Biro, Theor Biol Med Model. 2006; 3: 15</ref>, which is the pH at which the amino acid's overall charge is zero
  • the Grantham score <ref name="grantham">Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185: 862–864</ref>, which classifies amino acid substitutions into the classes conservative (0-50, coloured green), moderately conservative (51-100, yellow), moderately radical (101-150, orange), or radical (≥151, red)


<figtable id="mutations_summary">


<xr nolink id="mutations_summary"/> Physico-chemical properties of the amino acids involved in the investigated mutations are shown.
Mutation Amino acid Hydrophobicity (and diff.) Volume [Å3] (and diff.) Isoelectric point pI(Charge) Grantham Score
E285A Glutamic Acid -3.5 (polar) 5.3 138.4 (bulky) 49.8 3.2 (negative) 2.8 107
Alanine 1.8 (nonpolar) 88.6 (tiny) 6.0 (neutral)
A305E Alanine 1.8 (nonpolar) 5.3 88.6 (tiny) 49.8 6.0 (neutral) 2.8 107
Glutamic Acid -3.5 (polar) 138.4 (bulky) 3.2 (negative)
G123E Glycine -0.4 (nonpolar) 3.1 60.1 (tiny) 78.3 6.0 (neutral) 2.8 98
Glutamic Acid -3.5 (polar) 138.4 (bulky) 3.2 (negative)
R71H Arginine -4.5 (polar) 1.3 173.4 (bulky) 20.2 10.8 (positive) 3.2 29
Histidine -3.2 (polar) 153.2 (bulky) 7.6 (neutral)
R71K Arginine -4.5 (polar) 0.6 173.4 (bulky) 4.8 10.8 (positive) 1.1 26
Lysine -3.9 (polar) 168.6 (bulky) 9.7 (positive)
K213E Lysine -3.9 (polar) 0.4 168.6 (bulky) 30.2 9.7 (positive) 6.5 26
Glutamic Acid -3.5 (polar) 138.4 (bulky) 3.2 (negative)
V278M Valine 4.2 (nonpolar) 2.3 140.0 (small) 22.9 6.0 (neutral) 0.3 21
Methionine 1.9 (nonpolar) 162.9 (bulky) 5.7 (neutral)
M82T Methionine 1.9 (nonpolar) 1.2 162.9 (bulky) 46.8 5.7 (neutral) 0.2 81
Threonin -0.7 (polar) 116.1 (small 5.9 (neutral)
E235K Glutamic Acid -3.5 (polar) 0.4 138.4 (bulky) 30.2 3.2 (negative) 6.5 56
Lysine -3.9 (polar) 168.6 (bulky) 9.7 (positive)
I270T Isoleucine 4.5 (nonpolar) 3.8 166.7 (bulky) 50.6 5.9 (neutral) 0.0 89
Threonin -0.7 (polar) 116.1 (small) 5.9 (neutral)

</figtable>

Substitution matrices

Blosum62 <ref>Henikoff, S.; Henikoff, J.G. (1992). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (22): 10915–10919</ref>

This substitution matrix can be found in the protocol. The lowest score in Blosum62, that represents a really unlikely mutation event is -4 and the highest score, that represents a likely mutation is 11. Blosum substitution matrices are derived from non gapped local alignments of the BLOCKS database. Blosum62 is derived from local alignments of sequences with at most 62% sequence similarity. The scores in the matrix represent the log-odds values of a substitution. Negative scores meaning the substituion is less than randomly expected and positive values meaning the substitution is observed more than randomly expected.

PAM1<ref>M O Dayhoff, R M Schwartz, B C Orcutt, A model of evolutionary change in proteins, Atlas of protein sequence and structure (1978), Volume: 5, Issue: Suppl 3, Publisher: National Biomedical Research Foundation, Pages: 345-352</ref>

This substitution matrix can also be found in the protocol. The worst score in PAM1 is 0 for 0% mutation probability and the highest score is 56 for 0.56% mutation probability (except for synonymous substitutions). A score in PAM1 expresses how probable a mutation from A to B will be, assuming the two proteins are 99% similar (1% AA mutations in both sequences). Scores in the matrix are multiplied by 10000.

This matrix is well suited for our case, as we are just considering variants of the same protein. Mutations generally are very unlikely with the most probable mutation being Asp to Glu with 0.56%.

PAM250

This substitution matrix can be found in the protocol. It considers alignments of sequences that are about 20% similar (250% AA mutations). The highest score is 72 for substitution of W to V and the lowest is of course 0. The scores in the table are multiplied by 100, meaning the substitution of W to V has a probability of 72%.

Mutations

E285A

Location

<figure id="e285a_hb" >

<xr nolink id="e285a_hb"/>
Hydrogen bonds interactions of E285: this residue forms multiple hydrogen bonds, that can not be established by the mutant residue alanine.

</figure>

This amino acid is located at the beginning of helix eight in the binding pocket. Yet is it not involved in substrate binding. DSSP does not assign a state to E285.

As can be seen in <xr id="e285a_hb"/> E285 has several hydrogen bonds with the backbone as well as Y118. In case of the mutation to alanine, these hydrogen bonds can not be established anymore and might destabilize the structure.

physico-chemical properties

Glutamic acid and Alanine have different physico-chemical properties, which is represented in a moderate Grantham Score of 107. Whereas glutamic acid is negatively charged, alanine has no charge or polarity. Furthermore Alanine is much smaller than glutamic acid. From these different properties one would expect an effect of the mutation.

substitution matrices

In Blosum62 the substitution E --> A is scored -1. This means, that a substitution from E to A is very unlikely, since the lowest score in Blosum62 is -4 and negative scores refer to less than randomly expected events.

In PAM1 the substitution E --> A is scored 17 which equals a mutation probability of 0.17%.

In PAM250 the substitution E --> A is scored 9 with the highest mutation probability in PAM250 being 72. This means there is a 9% chance of glutamic acid being substituted for alanine in distantly related proteins.

Therefore all substitution matrices expect this mutation to be rather unlikely.

PSSM

Besides glutamic acid, proline also has a high score at this position. This is surprising because both amino acids have very different properties. Alanine still has the third highest score, which suggests, that this substitution can be observed relatively often. The high information content at this position also stands for a well conserved position with only few specific substitutions.

         A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V     A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
285 E    2  0 -1 -1 -6 -2  5 -4  1 -5 -4 -4 -3 -5  5 -2 -2 -6 -6 -3     17  4   3   3   0   1  31   1   4   0   2   1   1   0  26   2   3   0   0   2  0.92 inf

MSA

In the MSA, 28 of 29 sequences have glutamic acid at position 285, meaning this position is highly conserved. This goes along with the PSSM result.


SIFT

SIFT predicts an effect of this mutation with a score of 0.00. Amino acids with probabilities < .05 are predicted to be deleterious. The confidence for this prediction is 0.94 and therefore very high, so that the prediction seems reasonable. Furthermore, SIFT predicts ANY substitution of E at this position to be deleterious. Also the conservation of E at this position is very high with a value of 3.01.

Substitution at pos 285 from E to A is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
Median sequence conservation: 3.01
Sequences represented at this position:17
pos 		A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y 
285E 0.94 	0.00 	0.00 	0.00 	1.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00 	0.00

Polyphen2

This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

Polyphen used 64 sequences of a multiple sequence alignment at position 285.

SNAP

SNAP predicts E285 to be non-neutral with an RI of 2 and an expected accuracy of 63%.

Evaluation

All three SNP effect prediction methods predict this SNP to cause a mutation, PolyPhen and SIFT do so with a higher confidence than SNAP. One also would expect this result when taking into account the physico chemical properties of the involved amino acids. They differ quite a lot. For example, alanine would not be able to establish the four hydrogen bonds, that E285 is involved in. Furthermore the close proximity to the active side lets assume that the mutation could influence the activity of the enzyme.

This mutation is the most common Canavan Disease causing mutation among the Ashkenazi Jews. It causes an activity loss of 99% of the enzyme. In this case, the prediction methods predicted correctly.



A305E

location

<figure id="a305e_crowded">

<xr nolink id="a305e_crowded"/>
Presented is a possible orientation for the mutated residue E305. As can be seen from the overlap of the spheres of E305 with neighboring residues there is not enough space for glutamic acid at this position.

</figure>

A305 is located at the end of the 13th beta sheet at the C-terminus of the protein (protein length: 313 AA). DSSP also assignes state E to this residue as the last residue in the respective sheet.

As can be seen in <xr id="a305e_crowded"/> the space at this position is rather crowded, so that alanine as a small residue fits this very well in this position. Glutamic acid instead, hardly finds space and overlaps with neighboring residues. Therefore the mutation A305E probably interferes with the local secondary structure as this residue needs more space.

physico-chemical properties

As for E285A, the physico-chemical properties for Alanine and Glutamic acid are different, which is represented in a moderate Grantham Score of 107. Whereas glutamic acid is negatively charged, alanine has no charge or polarity. Furthermore Alanine is much smaller than glutamic acid. From these different properties one would expect an effect of the mutation.

substitution matrices

As Blosum is a symmetrical substitution matrix, the score for A305E is the same as for E285A: -1. The lowest score in Blosum64, that represents a really unlikely mutation event is -4 and the highest score, that represents a likely mutation is 11. This means, that a substitution from E to A is very unlikely.

In PAM1 the substitution A --> E is scored 10 which equals a mutation probability of 0.1%.

In PAM250 the substitution E --> A is scored 5 which equals a mutation probability of 5.

Therefore both substitution matrices expect this mutation to be rather unlikely.


PSSM

A substitution to leucine seems to be accepted as well and also Isoleucine has a higher score. This is not surprising, since these three amino acids have similar physico chemical properties. A mutation to glutamic acid is uncommon, but not less often observed than other substitutions. The low information content at this position (0.37) lets suggest, that this position is not well conserved.


        A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V      A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
305 A   2 -4 -4 -4  0 -1 -2 -4 -4  2  2 -3  1  0 -1 -2 -2 -4 -3  3     23   0   0   0   2   3   3   0   0  13  23   0   3   5   2   1   1   0   0  21  0.37 inf

MSA

At this position Alanin is conserved in 20 out of 28 sequences. Six sequences have Valin at this position and two Isoleucine. Valine and Isoleucine have similar physico chemical properties and thus might be substituted for Alanine. Glutamic acid is not expected to be substituted.

SIFT

SIFT predicts an effect of this mutation with a score of 0.02. Amino acids with probabilities < .05 are predicted to be deleterious. The confidence for this prediction is 0.89 and therefore still very reliable. Furthermore, 13 mutations are predicted to be deleterious at sequence position 305.

Substitution at pos 305 from A to E is predicted to AFFECT PROTEIN FUNCTION with a score of 0.02.
Median sequence conservation: 3.03
Sequences represented at this position:16

pos 		A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
305A 0.89 	1.00 	0.03 	0.01 	0.02 	0.01 	0.05 	0.01 	0.10 	0.02 	0.08 	0.02 	0.01 	0.04 	0.02 	0.01 	0.06 	0.08 	0.72 	0.00 	0.01

Polyphen 2 This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

Polyphen used 44 sequences of a multiple sequence alignment at position 305.

SNAP
SNAP predicts A305 to be a neutral mutation with an RI of 3 and an expected accuracy of 62%.

Evaluation

Both SIFT and PolyPhen give a clear result and predict this SNP to cause a mutation, while SNAP predicts it to be neutral.The substitution matrices give further confidence, that this mutation is unfavored and might induce some effects. The physico-chemical properties suggest the same conclusion, as both involved amino acids possess differing properties.

Only looking at the location of the mutation almost at the C-terminus one would not expect structural or functional implications, since the mutant residue is far away from the known active site or the dimer interface. Yet, when taking into account the information about the restricted space for an amino acid at this location, one would expect changes in the local structure: neighboring sidechains might move to allow glutamic acid to have enough space. These movements in turn might also influence other parts of the enzyme structure.


This mutation is the most common Canavan Disease causing mutation among the non-Ashkenazi Population. It goes along with a complete loss of funtion of the protein. This severe effect is not expected from the location of the mutation and from SNAP. But two prediction methods have been correct.



G123E

Location

<figure id="g123e_space">

<xr nolink id="g123e_space"/>
The mutated residue E123 is presented in red. There is no steric hindrance for this mutated residue.

</figure>

G123 is located at the beginning of the fourth beta strand in Aspartoacylase. DSSP also assignes and E to G123, considering it to be in a sheet. This strand is not buried and is solvent accessible. As can be seen from <xr id="g123e_space"/>, one would not expect any effect of this muation. Eventhough glutamic acid is much larger than glycin, there is enough space and no clashes occur. Furthermore, since Glycin is not involved in any interactions, there are thus no interactions that are be lost due to the mutaion.

physico-chemical properties

The physico-chemical protperties for glutamic acid and glycine are as much different as for alanine and glutamic acid. Whereas glutamic acid is very bulky and has a negative charge, glycine is the smalles amino acid and has no charge. These different properties are represented by a still moderate Grantham Score of 98. From these different properties one would expect an effect of the mutation.

substitution matrices

The Blosum62 score for the substitution of glycine with glutamic acid is -2. This value represents a likelihood of the mutation to occur less often than random. The lowest score in Blosum64, that represents a really unlikely mutation event is -4 and the highest score, that represents a likely mutation is 11.

In PAM1 the substitution G--> E is scored 4 which equals a mutation probability of 0.04%. The highest probability of a mutation in PAM1 is 0.56%.

In PAM250 the substitution G --> E is scored 5 which stands for a mutation probability of 5%.

Therefore all substitution matrices imply that this mutation is unlikely.

PSSM

A substitution to glutamic acid is the most often occuring mutation at this position. The low information content at this position implies that glycin is not very conserved and has no important function in the protein. This goes along with the insights gained from the analysis of the location of the mutation in the structure.

         A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V      A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
123 G   -1  0  2  1 -3 -1  2  2  1 -2 -2 -1 -1 -1  1 -1 -1 -4  0 -2      6   5  12   6   1   2  16  14   4   2   5   3   2   2   8   2   3   0   3   4  0.20 inf

MSA At this position Glycin is well conserved. 28 out of 29 sequences have Glycin at this position and only one has alanine. One would assume that small amino acids need to be placed at this position. This finding is different from the PSSM.

SIFT

SIFT predicts this mutation to have an effect. The confidence for this result is very high (1.00). Furthermore, SIFT predict any substitution except for glutamate to be deleterious. It is somewhat surprising, that glutamate is not predicted to be deleterious, whereas the structural very similar glutamic acid is not. This lets suggest, that the effect is due to the negative charge of glutamic acid.

Substitution at pos 123 from G to E is predicted to AFFECT PROTEIN FUNCTION with a score of 0.02.
Median sequence conservation: 2.99
Sequences represented at this position:18

pos            A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
123G 1.00 	0.04 	0.01 	0.03 	0.02 	0.00 	1.00 	0.01 	0.00 	0.02 	0.01 	0.00 	0.03 	0.02 	0.10 	0.01 	0.03 	0.01 	0.01 	0.00 	0.00

Polyphen This mutation is predicted to be probably damaging with a score of 0.994 (sensitivity: 0.46; specificity: 0.96) Polyphen used 75 sequences for the MSA.

SNAP
SNAP predicts G123E to be non-neutral with an RI of 2 and an expected accuracy of 63%.

Evaluation

All three SNP effect prediction methods predict this mutation to be deleterious. The same impact can be deduced from the difference in physico-chemical properties between the both residues. Blosum62 and PAM1 state that the substitution from glycine to glutamic acid is very unlikely.

Other conclusions can be made from the analysis of the structural location of the mutant. No effect is expected from a change from G to E at this solvent accessible position in the structure. Additionaly the PSSM suggests, that the substitution of G to E is very likely in related sequences.

As has been found in "Identification and expression of eight novel mutations among non-Jewish patients with Canavan disease.", Kaul R et al, Am. J. Hum. Genet. 59:95-102(1996), the mutation G123E leads to a 25% decrease in enzyme activity and leads to in Canavan Disease. In contrast to the first two disease causing mutations, where the prediction results were correct and in agreement, this time different conclusions could be made. However for the first two mutations, the effect were very dramatic resulting in a complete loss of function. This mutations leaves a functionality of 25% and maybe that is why the results are not clear without ambiguity.



R71H

Location

<figure id="r71h_hbonds">

<xr nolink id="r71h_hbonds"/>
The mutated residue H71 is presented in red. H71 is not able to build the Hbonds with the substrate or neighboring residues as does R71.

</figure>

R71 is located at the end of the fourth helix in the active site of the enzyme. It is involved in substrate binding via one Hbond. It also forms other Hbonds with an active water molecule and D68. DSSP assigns this position the state 'G', which stands for a 3-10 helix.

Due to its positioning in the active site and the several Hbonds R71 is involved in, it seems very likely that any mutation is deleterious. The interactions of R71 and the respective positioning of H71 is shown in <xr id="r71h_hbonds"/>.

physico-chemical properties

Arginine and histidine have almost similar physico chemical properties. Both are polar and bulky amino acids, with arginine having a stretched shape whereas histidine has a round shape. Another small difference is the positive charge of arginine versus no charge of histidine (only at pH below 6 histidine bears a positive charge). Therefore only from the chemical properties one would not expect huge impact of the mutation.

substitution matrices

The Blosum62 score for the substitution of arginine with histidine acid is 0. This means, histidine substitution for arginine occur as often as would be expected to happen by chance.

In PAM1 the substitution R--> H is scored 8 which equals a mutation probability of 0.08%. The highest probability of a mutation in PAM1 is 0.56%.

In PAM250 the substitution R --> H is scored 5 which stands for a mutation probability of 5%.

Therefore from all substitution matrices one can imply that this mutation is not very likely.


PSSM

From the information content of 2.8 at this position one can conclude the big importance of arginine at this position. This goes along with the analysis of the location of arginine in the binding site of the enzyme. Obviously no other amino acid is favoured at this postition in MSAs of related sequences. Therefore, one would expect a huge effect from the mutation R71H.

        A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
71 R   -4  9 -5 -5 -8 -3 -6 -7 -3 -7 -7 -3 -1 -7 -7 -5 -4 -3 -4 -7    2  88   1   1   0   1   0   0   1   0   0   1   2   0   0   1   1   0   1   0  2.80 inf


MSA

All 29 sequences have arginine at this position. This conservation might be based on the importance of R71 in the binding of substrate. This finding agrees with the PSSM result.


SIFT

SIFT predicts an effect for the mutation R71H. The conservation of R at position 71 is very high with 2.99. Besides Methionine and Lysine, all other substitutions are predicted to be deleterious. This goes along with the results from the PSSM.


Substitution at pos 71 from R to H is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01.
Median sequence conservation: 2.99
Sequences represented at this position:18

pos 		A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
71R 1.00 	0.02 	0.00 	0.00 	0.01 	0.00 	0.01 	0.01 	0.01 	0.06 	0.02 	0.09 	0.01 	0.01 	0.02 	1.00 	0.01 	0.01 	0.01 	0.00 	0.01


Polyphen2

This mutation is predicted to be probably damaging with a score of 0.909 (sensitivity: 0.69; specificity: 0.90). Polyphen used 75 sequences in the MSA.

SNAP
SNAP predicts mutation R71H to be neutral with an RI of 5 and an expected accuracy of 73%.

Evaluation

Both PolyPhen and SIFT predict this mutation to be deleterious, while SNAP predicts it to be neutral. The substitution matrices and PSSM give only low likelihoods and also from analysis of the location of R71 in the binding site one would expect, that any substitution has an impact on the binding of substrate and therefore activity of the enzyme.

Most of these predictions agree with the annotation in HGMD, which associates R71H with Canavan Disease.


R71K

Location

<figure id="r71k_hbonds">

<xr nolink id="r71k_hbonds"/>
The mutated residue K71 is presented in red. K71 almost has the same shape as R71 and might also be able to form an Hbond to the substrate or D68.

</figure>

Again, R71 is located at the end of the fourth helix in the active site of the enzyme. It is involved in substrate binding via one Hbond. It also forms other Hbonds with an active water molecule and D68. DSSP assigns this position the state 'G', which stands for a 3-10 helix.

As can be seen in <xr id="r71k_hbonds"/>, K71 could be oriented in the same way as R71. Therefore is is possible for lysine to interact with the substrate or with D68. In contrast to R71H, the effect of this mutation is not expected to have such a big influence, since some of the interactions might be kept.

physico-chemical properties

Both residues have similar physico-chemical properties. Both residues are positively charged and therefore polar. Both are bulky residues that have the same overal shape. One difference is the amount of amino groups in the side chain: whereas arginine has two amino groups to form Hbonds, lysine has only one. From these similar properties, one would not expect a big impact of this muation.

substitution matrices

The substitution of R to K gets a score of 2 in the Blosum62 matrix. This means, this substitution is favoured and expected to occur slightly more often than random.

In PAM1 this substitution is scored 37, which is one of the higher scores in this matrix with the highest value being 57. It means, that the probability for this mutation to happen equals 0.37%.

In PAM250 this substitution is scored 18, which is even a higher score, than for the substitution of R for itself (score 17). Therefore this 18% mutation probability is really high.

Therefore, from the substitution matrices one can conclude that this mutation is almost likely.

PSSM

As already mentioned for the mutation R71H, this position in sequence is highly conserved, and allows hardyl no other amino acids. Even for lysine, with its similar physico chemical properties, hardly any substitutions are observed in related sequences.

        A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
71 R   -4  9 -5 -5 -8 -3 -6 -7 -3 -7 -7 -3 -1 -7 -7 -5 -4 -3 -4 -7    2  88   1   1   0   1   0   0   1   0   0   1   2   0   0   1   1   0   1   0  2.80 inf


MSA

All 29 sequences have arginine at this position. This conservation might be based on the importance of R71 in the binding of substrate. This finding agrees with the PSSM result.

SIFT

SIFT predicts the mutation R71K to be tolerated with a score of 0.06. This score is just above the threshold of 0.5, below which substitutions are classified as deleterious. Arginine is very highly conserved at this position (2.99) and only lysine and methione have values above 0.5 and are thus tolerated.


Substitution at pos 71 from R to K is predicted to be TOLERATED with a score of 0.06.
Median sequence conservation: 2.99
Sequences represented at this position:18

pos 		A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
71R 1.00 	0.02 	0.00 	0.00 	0.01 	0.00 	0.01 	0.01 	0.01 	0.06 	0.02 	0.09 	0.01 	0.01 	0.02 	1.00 	0.01 	0.01 	0.01 	0.00 	0.01


Polyphen2

This mutation is predicted to be benign with a score of 0.421 (sensitivity: 0.84; specificity: 0.79). Polyphen used 75 sequences in the MSA.

SNAP
SNAP predicts R71K to be neutral with an RI of 0 and an expected accuracy of 51%, so it is fairlz unsure about its prediction.

Evaluation

From the prediction methods as well as from the other analysis one can expect this mutation to be tolerated. Also arginine is involved in substrate binding and is highly conserved among related proteins, the similar physico chemical properties of arginine and lysine might allow this substitution.

This goes along with the annotation for this mutation, that is not classifies as disease causing. Yet there is one source of information <ref name="r71k">Le Coq J., Pavlovsky A., Malik R., Sanishvili R., Xu C., Viola R.E, Examination of the mechanism of human brain aspartoacylase through the binding of an intermediate analogue, Biochemistry 47:3484-3492(2008)</ref> about mutation experiments, from which is conducted that this mutation reduces activity of the enzyme by 99%!



K213E

Location

<figure id="k213e">

<xr nolink id="k213e"/>
The mutated residue K213E is presented in blue.

</figure>

Residue 213 is located on the outside of the protein, on an outer loop region, and it is nowhere close to the dimer interaction site, either. It does not form any backbone-backbone interactions.

Physico-chemical properties
Lysine and Glutamic Acid are fairly similar amino acids: they are both polar and bulky, and only their charge differs (Lysine is positively charged, Glutamic Acid negatively). The Grantham score lies at 26, so the substitution is considered conservative.

Substitution Matrices
Blosum62 score: 1, i.e, it is expected to occur slightly more often than random.
PAM1 score: 4, so, it is only expected to occur with 0.04% probability, where 0.19% is the highest possible score for mutations of lysine. PAM250 score: 5, meaning there is a 5% mutation probability, where 10% is the highest expected mutation probabilty for lysine to valine.

-> The position-independent substitution matrices consider this substitution as not very likely, but possible.

PSSM
Information content on position 213 is only 0.09, which fits well to the position on an outer loop. A substitution to Glutamic Acid seems not exactly likely, but should not have very large impact either.

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
 213 K    1  0 -1  0 -2  0  0 -1  1 -1 -1  1  0 -1  2  0  0 -2 -1  1   10   5   2   5   1   4   7   4   3   4   6   7   2   3  12   5   7   1   2  10  0.09 inf

MSA

18 out of 29 sequences have lysine at this position.But in the other 11 sequences other amino acids can be found, like arginine, threonine, alanine and methionine. This goes along with the PSSM results, that this position is not very highly conserved and substitutions are possible.

SIFT
SIFT predicts the mutation K213E to be tolerated with a score of 0.92 and a confidence of 1.00.

pos 		 A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
213K 1.00 	0.77 	0.08 	0.51 	0.92 	0.09 	0.32 	0.14 	0.33 	1.00 	0.39 	0.17 	0.42 	0.27 	0.54 	0.58 	0.63 	0.67 	0.63 	0.03 	0.09


PolyPhen2
This mutation is predicted to be benign with a score of 0.004 (sensitivity: 0.98; specificity: 0.35)

SNAP
SNAP predicts K213E to be non-neutral with an RI of 2 and an expected accuracy of 63%.

Evaluation
Grantham 26 When looking at the position in the protein and the physico-chemical properties, the mutation could be expected to be tolerated. Both substitution matrices, however, suggest that it is not very likely to occur, and the PSSM value is ambiguous.

The prediction methods SIFT and PolyPhen expect this mutation to be neutral, while SNAP predicts it to affect protein function.

According to the HGMD, this mutation is associated with the Canavan Disease.



V278M

Location

<figure id="V278M">

<xr nolink id="V278M"/>
The mutated residue V278M is presented in blue. Again, this residue is placed on the outside of the protein.

</figure>

Residue V278M is located on the outside of the protein and does not form any HBonds.

Physico-chemical properties
Valine and Methionine are both hydrophobic (Valine slightly more so), and both are neutrally charged. Valine is smaller and rounder than Methionine. Methionine contains sulfur, but is not able to form disulfide bonds, so this fact does not make a big difference. This is reflected in a conservative Grantham score of 21.

Substitution Matrices
Blosum62 score: 1, i.e., it is expected to occur slightly more often than random.
PAM1 score: 4, i.e., 0.04% probability to occur (where 0.33% is the highest probability here for a substition of valine to isoleucine). PAM250 score: 2, meaning 2% mutation probability (where 13% is the highest probability here for a substitution of valine to leucine).

-> The substitution matrices do not consider the mutation likely, but not impossible.


PSSM
The information content of this position is 41, so it is definitely not unconserved. A substitution to Methionine results in a very low to low score compared to other amino acids.

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
 278 V    0 -4 -5 -3  0 -3  2  2 -3  1  0 -4 -3  2 -1 -3 -2 -2 -3  4    6   1   0   1   2   1  14  14   1   7  10   0   0   9   3   2   2   1   0  26  0.41 inf

MSA

At this position, out of 28 sequences 26 have valine conserved. Only two sequences have methionine and isoleucine at this position. In PSSM, Methionine is not specified as a very unlikely substituent. The high conservation of valine is in constrast to the PSSM result, which assigns a low information content to position 278.

SIFT
SIFT predicts the mutation V278M to affect function with a score of 0.01 and a confidence of 0.94.

pos 		 A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
278V 0.94 	0.08 	0.01 	0.00 	0.00 	0.01 	0.00 	0.00 	0.34 	0.00 	0.04 	0.01 	0.00 	0.00 	0.00 	0.00 	0.00 	0.02 	1.00 	0.00 	0.00

PolyPhen2
This mutation is predicted to be probably damaging with a score of 0.950 (sensitivity: 0.64; specificity: 0.92)

SNAP
SNAP predicts V278M to be neutral with an RI of 5 and an expected accuracy of 73%.

Evaluation According to the physico-chemical properties and the substitution matrices, this mutation is not considered to be deleterious. The PSSM, however, suggests that is is fairly unlikely.

SIFT and PolyPhen consider it to be strongly damaging, while SNAP predicts it to be neutral.

This mutation is not related to the Canavan Disease.



M82T

Location

<figure id="M82T">

<xr nolink id="M82T"/>
The mutated residue M82T is presented in blue. It is located on an outer loop of the protein, pointing inwards. The original amino acid does not form any HBonds.

</figure>

Residue M82T is located on an outer loop of the protein, pointing inwards. It does not form any hydrogen bonds.

Physico-chemical properties
Methionine is nonpolar, whereas Threonine is slighly polar. Methionine is large and longish, while Threonine is much smaller (difference of approx 40 Angstroms) and more round. Threonine has an additional hydroxyl group, while Methionine contains sulfur. Both are neutrally charged.
Still, the other differences are large enough to result in a Grantham score of 81 (moderately conservative).

Substitution Matrices
Blosum62 score: -1, i.e., this mutation is expected slightly less often than random.
PAM1 score: 6, i.e., 0.06% and not very likely (45 being the highest score, for a substitution of methionine to leucine).
PAM250 score: 5, meaning 5% mutation probability (20% being the highest probability for a substitution of methionine to leucine)

-> The matrices consider this substitution as slightly unlikely, PAM1 somewhat less likely than Blosum62.

PSSM
The information content in this position is not very high (0.08) and the substitution to Threonine is amongst the second most likely substitutions.

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  82 M    1 -1  0  1 -1  0  0  1  0 -1 -1  0 -1 -2  1  1  1  0 -1 -1   11   3   3   9   1   4   6  10   2   3   5   4   2   2   7  10  10   1   2   5  0.08 inf

MSA

12 out of 29 sequences have methionine at this position. This implies a not conserved position which goes along with the low information content assigned to this position in the PSSM. Threonine is not among the substitutions at this position in the other 17 sequences. Substitutions found are: valine, lysine, proline, alanine, glycine, glutamic acid.

SIFT
SIFT predicts the mutation M82T to be tolerated with a score of 0.60 and a confidence of 1.00.

pos 		 A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
82M 1.00 	0.80 	0.08 	0.63 	1.00 	0.15 	0.42 	0.20 	0.24 	0.92 	0.38 	0.14 	0.52 	0.29 	0.60 	0.55 	0.75 	0.60 	0.36 	0.05 	0.19

PolyPhen2
This mutation is predicted to be benign with a score of 0.006 (sensitivity: 0.97; specificity: 0.45)

SNAP
M82T is predicted to be neutral with an RI of 3 and an expected accuracy of 62%.

Evaluation
According to the physico-chemical properties (Grantham) and the substitution matrices, this mutation is fairly unlikely and might affect protein function. The PSSM, on the other hand, suggests that it is neutral, which agrees with all thre prediction methods.

This mutation is not associated with the Canavan Disease.



E235K

Location

<figure id="E235K">

<xr nolink id="E235K"/>
The mutated residue E235K is presented in blue. The residue is located on an outer loop of the protein, far away from the active or dimer interaction site.

</figure>

Residue 235 is located on an outer loop of Aspartoacylase, pointing outward. It is also far away from the active or dimer interaction site.

Physico-chemical properties
Glutamic acid is (obviously) acidic, while Lysine is basic. Both are polar, and both are charged (Glu negatively, Lys positively). Both are bulky (but Lysine is bigger). Their Grantham score is still moderately conservative at 56.

Substitution Matrices
Blosum62 score: 1 -> slightly more often than random

PAM1 score: 7, so it is fairly unlikely (53 is the highest score for a mutation from glutamic acid to aspartic acid)

PAM250 score: 8, meaning 8% mutation probability (10 is the highest score or a mutation from glutamic acid to aspartic acid)

-> All matrices consider the substitution as not very likely, but not impossible.

PSSM An information contenct of 0.41 means this position is not unmeaningful, and the scores to favour such a substitution are not very high.

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
 235 E    0 -1 -2  3 -3  1  3 -1  2 -3 -3  0 -4 -5  3  1 -1 -6 -4 -3    9   3   2  17   1   5  17   4   4   1   2   5   0   0  12   9   4   0   1   2  0.41 inf

MSA

20 out of 29 sequences have glutamic acid conserved at this position. Substitutions include lysine, alanine, aspartate and proline. So as can be seen in the substitution matrices and the PSSM, a mutation to lysine is not favoured, but possible.

SIFT
SIFT predicts the mutation M82T to be tolerated with a score of 0.79 and a confidence of 1.00.

pos 		 A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
235E 1.00 	0.60 	0.03 	0.47 	1.00 	0.04 	0.22 	0.09 	0.11 	0.79 	0.17 	0.06 	0.27 	0.19 	0.39 	0.35 	0.41 	0.47 	0.17 	0.01 	0.04

PolyPhen2
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00).

SNAP
SNAP predicts this mutation to be neutral with an RI of 2 and an expected accuracy of 59%.

Evaluation
Physico-chemical properties and substitution matrices agree that this mutation is unusual, but not very much so. The PSSM agrees with that.

All three prediction methods do not expect this mutation to affect protein function.

However, this mutation is reported by the HGMD to affect protein function.



I270T

Location


Physico-chemical properties
Isoleucine is very unpolar and bulky, while Threonine is slightly polar and small (volume difference of 50Angstrom^3). Both are neutrally charged. Threonine has an additional hydroxyl group. This results in a still moderately conservative Grantham score of 89.

Substitution Matrices
Blosum62 score: -1, i.e., slightly less often than random.

PAM1 score: 11, so fairly unlikely (57 being the highest probability for a mutation from isoleucine to valine).

PAM250 score: 6, meaning 6% mutation probability (15 being the highest probability for a mutation from isoleucine to leucine)

-> All matrices consider the substitution as unlikely.

PSSM The information content for this position is at 0.74, indicating that it is conserved and might be sensitive to substitutions. Additionally, the substitution scores for Threonine are amongst the lowest, so this substitution is very unfavourable.

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
 270 I   -3  2 -4 -5 -3 -3 -4 -5 -2  5  2 -2 -1 -3 -2 -4 -4 -4 -1  4    2  11   1   1   0   1   1   1   1  29  17   3   1   1   3   2   1   0   2  22  0.74 inf

MSA

19 out of 28 sequences have Isoleucine conserved at this position. Substitutions include only three different amino acids, which a physicochemically similar to isoleucine: methionine, valin, leucine. This observation is similar to the PSSM result, which favours leucine and valine substitutions. Muation to threonine might thus be considered unlikely.

SIFT
SIFT predicts the mutation I270T to affect protein function with a score of 0.01 and a confidence of 0.94.

pos 		 A 	C 	D 	E 	F 	G 	H 	I 	K 	L 	M 	N 	P 	Q 	R 	S 	T 	V 	W 	Y
270I 0.94 	0.00 	0.00 	0.00 	0.00 	0.01 	0.00 	0.00 	1.00 	0.00 	0.12 	0.01 	0.00 	0.00 	0.00 	0.00 	0.00 	0.01 	0.23 	0.00 	0.00

PolyPhen2
This mutation is predicted to be probably damaging with a score of 0.997 (sensitivity: 0.27; specificity: 0.98)

SNAP
SNAP predicts I270T to be neutral with an RI of 1 and an expected accuracy of 53%.

Evaluation
The physico-chemical property scores from the Grantham value and the substitution matrices suggest that this mutation is rather unlikely. The PSSM supports the idea that it is a deleterious mutation, as well as the MSA.

SIFT and PolyPhen predict it to affect protein function with a very high confidence, while SNAP predicts it to be neutral with a low confidence.

This mutation is not associated with the Canavan Disease.

Discussion

In table <xr id="summary_table"/> we summarized the results from all methods issues we have been analyzing. For the prediction methods SIFT, Polyphen and SNAP one can conclude that SIFT and Polyphen provided the same results, predicting 60% of the SNP effects correctly. SNAP provided different results for some SNPs and also has an accuracy of 60%. One has to take into account that the SNAP results all have low reliability, usually about 60%.

The analysis of the substitution matrices, PSSM and the MSA had varying results. In some cases, where the amino acid is highly conserved they give a hint on the effect of substitutions. But in cases, where the position is not well conserved, it is hard to extract a reliable prediction.


Looking at the physico-chemical properties and the location in structure, often gives a good impression on what is structurally happening in the enzyme. Yet, from these structural implications it is hard to predict a functional effect. In clearly deleterious mutations, the Grantham score is a good measure. But we often felt, that it did not represent the actual change in properties for some substitutions (eg. R71H, K213E).

In general we conclude, that for clearly deleterious mutations with complete loss of function effects on the enzyme (eg E285A, A305E, G123E), one can reliably predict the effect. But not highly dramatic effects are hard to predict. And furthermore it is not obvious, which method should be used. All method did not have outstanding accuracies and maybe some kind of consensus decision gives the best prediction result.


<figtable id="summary_table">



Mutation Location and Structure Phys-Chem Properties(Grantham) Substitution Matrices PSSM SIFT Polyphen SNAP annotated effect
E285A possible effect moderately radical
107
possible effect no effect deleterious effect
0.00
damaging effect
1.00
non-neutral mutation
RI 2, 63%
Canavan Disease
<= 1% activity
A305E structural effect moderately radical
107
possible effect possible effect deleterious effect
0.02
damaging effect
1.00
neutral mutation
RI 3, 62%
Canavan Disease
0% activity
G123E no effect nearly mod radical
98
deleterious effect no effect deleterious effect
0.02
damaging effect
0.994
non-neutral mutation
RI 2, 63%
Canavan Disease
~ 25% activity
R71H deleterious effect conservative
29
possible effect deleterious effect deleterious effect 0.02 damaging effect 0.909 neutral mutation
RI 5, 73%
Canavan Disease
R71K no effect conservative
26
no effect deleterious effect tolerated mutation
0.06
benign
0.421
neutral mutation
RI 0, 51%
not disease causing
mutation analysis: 99% red. activity
K213E no effect conservative
26
little to no effect little to no effect tolerated mutation
0.92
benign
0.004
non-neutral mutation
RI 2, 63%
Canavan disease
V278M no effect conservative
21
little to no effect possible effect deleterious effect
0.01
probably damaging effect
0.95
neutral mutation
RI 5, 73%
not disease related
M82T no effect moderately conservative
81
possible effect no effect tolerated mutation
0.60
benign
0.006
neutral mutation
RI 3, 62%
not disease related
E235K no effect moderately conservative
56
little to no effect possible effect tolerated mutation
0.79
benign neutral mutation
RI 2, 59%
Canavan Disease
I270T structural effect moderately conservative
89
possible effect deleterious effect deleterious effect
0.01
probably damaging effect
0.997
neutral mutation
RI 1, 53%
not disease related

</figtable>

References

<references/>