Canavan Task 6 - Sequence-based mutation analysis
Contents
- 1 Protocol
- 2 Choosing mutations
- 3 Physico-chemical effects
- 4 Substitution matrices
- 4.1 Blosum62 <ref>Henikoff, S.; Henikoff, J.G. (1992). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (22): 10915–10919</ref>
- 4.2 PAM1<ref>M O Dayhoff, R M Schwartz, B C Orcutt, A model of evolutionary change in proteins, Atlas of protein sequence and structure (1978), Volume: 5, Issue: Suppl 3, Publisher: National Biomedical Research Foundation, Pages: 345-352</ref>
- 4.3 PAM250
- 5 Mutations
- 6 Discussion
- 7 References
Protocol
Further information can be found in the protocol.
Choosing mutations
The ten chosen mutations are listed below in <xr id="canavan_muts"/>.
<figtable id="canavan_muts"> <xr nolink id="canavan_muts"/> Listed are the 10 chosen mutations used for this task and their database sources. Mutations colored in red are disease causing. Included is also a superposition of the native residue with the mutanted residue.
Mutant | E285A | A305E | G123E | R71H | R71K | K213E | V278M | M82T | E235K | I270T |
Known effect | <= 1% activity left | 0% activity left | <= 25% | Canavan Disease | not disease related | Canavan Disease | not disease related | not disease related | Canavan Disease | not disease related |
Sources | dbSNP SNPdbe OMIM HGMD |
dbSNP SNPdbe OMIM HGMD |
SNPdbe HGMD |
dbSNP SNPdbe HGMD |
SNPdbe | HGMD | dbSNP SNPdbe |
dbSNP SNPdbe |
dbSNP SNPdbe |
dbSNP |
</figtable>
Physico-chemical effects
<xr id="mutations_summary"/> is meant to give a first idea of the nature of the mutations. For each of the ten mutations, we included following properties for the amino acids:
- Hydrophobicity scores <ref name="hydrophobicity"> Kyte J, Doolittle RF (May 1982).Journal of Molecular Biology 157 (1): 105–32. </ref> range from -7.5 (Arg) to 3.1 (Ile), where Arginine is most hydrophilic and Isoleucine most hydrophobic
- the volume <ref name="volume">Volume: A.A. Zamyatin, Prog. Biophys. Mol. Biol., 24(1972)107-123</ref>, measured in Å3
- the isoelectric point <ref name="pi">JC Biro, Theor Biol Med Model. 2006; 3: 15</ref>, which is the pH at which the amino acid's overall charge is zero
- the Grantham score <ref name="grantham">Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185: 862–864</ref>, which classifies amino acid substitutions into the classes conservative (0-50, coloured green), moderately conservative (51-100, yellow), moderately radical (101-150, orange), or radical (≥151, red)
<figtable id="mutations_summary">
Mutation | Amino acid | Hydrophobicity (and diff.) | Volume [Å3] (and diff.) | Isoelectric point pI(Charge) | Grantham Score | |||
E285A | Glutamic Acid | -3.5 (polar) | 5.3 | 138.4 (bulky) | 49.8 | 3.2 (negative) | 2.8 | 107 |
Alanine | 1.8 (nonpolar) | 88.6 (tiny) | 6.0 (neutral) | |||||
A305E | Alanine | 1.8 (nonpolar) | 5.3 | 88.6 (tiny) | 49.8 | 6.0 (neutral) | 2.8 | 107 |
Glutamic Acid | -3.5 (polar) | 138.4 (bulky) | 3.2 (negative) | |||||
G123E | Glycine | -0.4 (nonpolar) | 3.1 | 60.1 (tiny) | 78.3 | 6.0 (neutral) | 2.8 | 98 |
Glutamic Acid | -3.5 (polar) | 138.4 (bulky) | 3.2 (negative) | |||||
R71H | Arginine | -4.5 (polar) | 1.3 | 173.4 (bulky) | 20.2 | 10.8 (positive) | 3.2 | 29 |
Histidine | -3.2 (polar) | 153.2 (bulky) | 7.6 (neutral) | |||||
R71K | Arginine | -4.5 (polar) | 0.6 | 173.4 (bulky) | 4.8 | 10.8 (positive) | 1.1 | 26 |
Lysine | -3.9 (polar) | 168.6 (bulky) | 9.7 (positive) | |||||
K213E | Lysine | -3.9 (polar) | 0.4 | 168.6 (bulky) | 30.2 | 9.7 (positive) | 6.5 | 26 |
Glutamic Acid | -3.5 (polar) | 138.4 (bulky) | 3.2 (negative) | |||||
V278M | Valine | 4.2 (nonpolar) | 2.3 | 140.0 (small) | 22.9 | 6.0 (neutral) | 0.3 | 21 |
Methionine | 1.9 (nonpolar) | 162.9 (bulky) | 5.7 (neutral) | |||||
M82T | Methionine | 1.9 (nonpolar) | 1.2 | 162.9 (bulky) | 46.8 | 5.7 (neutral) | 0.2 | 81 |
Threonin | -0.7 (polar) | 116.1 (small | 5.9 (neutral) | |||||
E235K | Glutamic Acid | -3.5 (polar) | 0.4 | 138.4 (bulky) | 30.2 | 3.2 (negative) | 6.5 | 56 |
Lysine | -3.9 (polar) | 168.6 (bulky) | 9.7 (positive) | |||||
I270T | Isoleucine | 4.5 (nonpolar) | 3.8 | 166.7 (bulky) | 50.6 | 5.9 (neutral) | 0.0 | 89 |
Threonin | -0.7 (polar) | 116.1 (small) | 5.9 (neutral) |
</figtable>
Substitution matrices
Blosum62 <ref>Henikoff, S.; Henikoff, J.G. (1992). "Amino Acid Substitution Matrices from Protein Blocks". PNAS 89 (22): 10915–10919</ref>
This substitution matrix can be found in the protocol. The lowest score in Blosum62, that represents a really unlikely mutation event is -4 and the highest score, that represents a likely mutation is 11. Blosum substitution matrices are derived from non gapped local alignments of the BLOCKS database. Blosum62 is derived from local alignments of sequences with at most 62% sequence similarity. The scores in the matrix represent the log-odds values of a substitution. Negative scores meaning the substituion is less than randomly expected and positive values meaning the substitution is observed more than randomly expected.
PAM1<ref>M O Dayhoff, R M Schwartz, B C Orcutt, A model of evolutionary change in proteins, Atlas of protein sequence and structure (1978), Volume: 5, Issue: Suppl 3, Publisher: National Biomedical Research Foundation, Pages: 345-352</ref>
This substitution matrix can also be found in the protocol. The worst score in PAM1 is 0 for 0% mutation probability and the highest score is 56 for 0.56% mutation probability (except for synonymous substitutions). A score in PAM1 expresses how probable a mutation from A to B will be, assuming the two proteins are 99% similar (1% AA mutations in both sequences). Scores in the matrix are multiplied by 10000.
This matrix is well suited for our case, as we are just considering variants of the same protein. Mutations generally are very unlikely with the most probable mutation being Asp to Glu with 0.56%.
PAM250
This substitution matrix can be found in the protocol. It considers alignments of sequences that are about 20% similar (250% AA mutations). The highest score is 72 for substitution of W to V and the lowest is of course 0. The scores in the table are multiplied by 100, meaning the substitution of W to V has a probability of 72%.
Mutations
E285A
Location
<figure id="e285a_hb" >
</figure>
This amino acid is located at the beginning of helix eight in the binding pocket. Yet is it not involved in substrate binding. DSSP does not assign a state to E285.
As can be seen in <xr id="e285a_hb"/> E285 has several hydrogen bonds with the backbone as well as Y118. In case of the mutation to alanine, these hydrogen bonds can not be established anymore and might destabilize the structure.
physico-chemical properties
Glutamic acid and Alanine have different physico-chemical properties, which is represented in a moderate Grantham Score of 107. Whereas glutamic acid is negatively charged, alanine has no charge or polarity. Furthermore Alanine is much smaller than glutamic acid. From these different properties one would expect an effect of the mutation.
substitution matrices
In Blosum62 the substitution E --> A is scored -1. This means, that a substitution from E to A is very unlikely, since the lowest score in Blosum62 is -4 and negative scores refer to less than randomly expected events.
In PAM1 the substitution E --> A is scored 17 which equals a mutation probability of 0.17%.
In PAM250 the substitution E --> A is scored 9 with the highest mutation probability in PAM250 being 72. This means there is a 9% chance of glutamic acid being substituted for alanine in distantly related proteins.
Therefore all substitution matrices expect this mutation to be rather unlikely.
PSSM
Besides glutamic acid, proline also has a high score at this position. This is surprising because both amino acids have very different properties. Alanine still has the third highest score, which suggests, that this substitution can be observed relatively often. The high information content at this position also stands for a well conserved position with only few specific substitutions.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 285 E 2 0 -1 -1 -6 -2 5 -4 1 -5 -4 -4 -3 -5 5 -2 -2 -6 -6 -3 17 4 3 3 0 1 31 1 4 0 2 1 1 0 26 2 3 0 0 2 0.92 inf
MSA
In the MSA, 28 of 29 sequences have glutamic acid at position 285, meaning this position is highly conserved. This goes along with the PSSM result.
SIFT
SIFT predicts an effect of this mutation with a score of 0.00. Amino acids with probabilities < .05 are predicted to be deleterious. The confidence for this prediction is 0.94 and therefore very high, so that the prediction seems reasonable. Furthermore, SIFT predicts ANY substitution of E at this position to be deleterious. Also the conservation of E at this position is very high with a value of 3.01.
Substitution at pos 285 from E to A is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.01 Sequences represented at this position:17
pos A C D E F G H I K L M N P Q R S T V W Y 285E 0.94 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Polyphen2
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
Polyphen used 64 sequences of a multiple sequence alignment at position 285.
SNAP
SNAP predicts E285 to be non-neutral with an RI of 2 and an expected accuracy of 63%.
Evaluation
All three SNP effect prediction methods predict this SNP to cause a mutation, PolyPhen and SIFT do so with a higher confidence than SNAP. One also would expect this result when taking into account the physico chemical properties of the involved amino acids. They differ quite a lot. For example, alanine would not be able to establish the four hydrogen bonds, that E285 is involved in. Furthermore the close proximity to the active side lets assume that the mutation could influence the activity of the enzyme.
This mutation is the most common Canavan Disease causing mutation among the Ashkenazi Jews. It causes an activity loss of 99% of the enzyme. In this case, the prediction methods predicted correctly.
A305E
location
<figure id="a305e_crowded">
</figure>
A305 is located at the end of the 13th beta sheet at the C-terminus of the protein (protein length: 313 AA). DSSP also assignes state E to this residue as the last residue in the respective sheet.
As can be seen in <xr id="a305e_crowded"/> the space at this position is rather crowded, so that alanine as a small residue fits this very well in this position. Glutamic acid instead, hardly finds space and overlaps with neighboring residues. Therefore the mutation A305E probably interferes with the local secondary structure as this residue needs more space.
physico-chemical properties
As for E285A, the physico-chemical properties for Alanine and Glutamic acid are different, which is represented in a moderate Grantham Score of 107. Whereas glutamic acid is negatively charged, alanine has no charge or polarity. Furthermore Alanine is much smaller than glutamic acid. From these different properties one would expect an effect of the mutation.
substitution matrices
As Blosum is a symmetrical substitution matrix, the score for A305E is the same as for E285A: -1. The lowest score in Blosum64, that represents a really unlikely mutation event is -4 and the highest score, that represents a likely mutation is 11. This means, that a substitution from E to A is very unlikely.
In PAM1 the substitution A --> E is scored 10 which equals a mutation probability of 0.1%.
In PAM250 the substitution E --> A is scored 5 which equals a mutation probability of 5.
Therefore both substitution matrices expect this mutation to be rather unlikely.
PSSM
A substitution to leucine seems to be accepted as well and also Isoleucine has a higher score. This is not surprising, since these three amino acids have similar physico chemical properties. A mutation to glutamic acid is uncommon, but not less often observed than other substitutions. The low information content at this position (0.37) lets suggest, that this position is not well conserved.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 305 A 2 -4 -4 -4 0 -1 -2 -4 -4 2 2 -3 1 0 -1 -2 -2 -4 -3 3 23 0 0 0 2 3 3 0 0 13 23 0 3 5 2 1 1 0 0 21 0.37 inf
MSA
At this position Alanin is conserved in 20 out of 28 sequences. Six sequences have Valin at this position and two Isoleucine. Valine and Isoleucine have similar physico chemical properties and thus might be substituted for Alanine. Glutamic acid is not expected to be substituted.
SIFT
SIFT predicts an effect of this mutation with a score of 0.02. Amino acids with probabilities < .05 are predicted to be deleterious. The confidence for this prediction is 0.89 and therefore still very reliable. Furthermore, 13 mutations are predicted to be deleterious at sequence position 305.
Substitution at pos 305 from A to E is predicted to AFFECT PROTEIN FUNCTION with a score of 0.02. Median sequence conservation: 3.03 Sequences represented at this position:16 pos A C D E F G H I K L M N P Q R S T V W Y 305A 0.89 1.00 0.03 0.01 0.02 0.01 0.05 0.01 0.10 0.02 0.08 0.02 0.01 0.04 0.02 0.01 0.06 0.08 0.72 0.00 0.01
Polyphen 2 This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
Polyphen used 44 sequences of a multiple sequence alignment at position 305.
SNAP
SNAP predicts A305 to be a neutral mutation with an RI of 3 and an expected accuracy of 62%.
Evaluation
Both SIFT and PolyPhen give a clear result and predict this SNP to cause a mutation, while SNAP predicts it to be neutral.The substitution matrices give further confidence, that this mutation is unfavored and might induce some effects. The physico-chemical properties suggest the same conclusion, as both involved amino acids possess differing properties.
Only looking at the location of the mutation almost at the C-terminus one would not expect structural or functional implications, since the mutant residue is far away from the known active site or the dimer interface. Yet, when taking into account the information about the restricted space for an amino acid at this location, one would expect changes in the local structure: neighboring sidechains might move to allow glutamic acid to have enough space. These movements in turn might also influence other parts of the enzyme structure.
This mutation is the most common Canavan Disease causing mutation among the non-Ashkenazi Population. It goes along with a complete loss of funtion of the protein. This severe effect is not expected from the location of the mutation and from SNAP. But two prediction methods have been correct.
G123E
Location
<figure id="g123e_space">
</figure>
G123 is located at the beginning of the fourth beta strand in Aspartoacylase. DSSP also assignes and E to G123, considering it to be in a sheet. This strand is not buried and is solvent accessible. As can be seen from <xr id="g123e_space"/>, one would not expect any effect of this muation. Eventhough glutamic acid is much larger than glycin, there is enough space and no clashes occur. Furthermore, since Glycin is not involved in any interactions, there are thus no interactions that are be lost due to the mutaion.
physico-chemical properties
The physico-chemical protperties for glutamic acid and glycine are as much different as for alanine and glutamic acid. Whereas glutamic acid is very bulky and has a negative charge, glycine is the smalles amino acid and has no charge. These different properties are represented by a still moderate Grantham Score of 98. From these different properties one would expect an effect of the mutation.
substitution matrices
The Blosum62 score for the substitution of glycine with glutamic acid is -2. This value represents a likelihood of the mutation to occur less often than random. The lowest score in Blosum64, that represents a really unlikely mutation event is -4 and the highest score, that represents a likely mutation is 11.
In PAM1 the substitution G--> E is scored 4 which equals a mutation probability of 0.04%. The highest probability of a mutation in PAM1 is 0.56%.
In PAM250 the substitution G --> E is scored 5 which stands for a mutation probability of 5%.
Therefore all substitution matrices imply that this mutation is unlikely.
PSSM
A substitution to glutamic acid is the most often occuring mutation at this position. The low information content at this position implies that glycin is not very conserved and has no important function in the protein. This goes along with the insights gained from the analysis of the location of the mutation in the structure.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 123 G -1 0 2 1 -3 -1 2 2 1 -2 -2 -1 -1 -1 1 -1 -1 -4 0 -2 6 5 12 6 1 2 16 14 4 2 5 3 2 2 8 2 3 0 3 4 0.20 inf
MSA At this position Glycin is well conserved. 28 out of 29 sequences have Glycin at this position and only one has alanine. One would assume that small amino acids need to be placed at this position. This finding is different from the PSSM.
SIFT
SIFT predicts this mutation to have an effect. The confidence for this result is very high (1.00). Furthermore, SIFT predict any substitution except for glutamate to be deleterious. It is somewhat surprising, that glutamate is not predicted to be deleterious, whereas the structural very similar glutamic acid is not. This lets suggest, that the effect is due to the negative charge of glutamic acid.
Substitution at pos 123 from G to E is predicted to AFFECT PROTEIN FUNCTION with a score of 0.02. Median sequence conservation: 2.99 Sequences represented at this position:18 pos A C D E F G H I K L M N P Q R S T V W Y 123G 1.00 0.04 0.01 0.03 0.02 0.00 1.00 0.01 0.00 0.02 0.01 0.00 0.03 0.02 0.10 0.01 0.03 0.01 0.01 0.00 0.00
Polyphen This mutation is predicted to be probably damaging with a score of 0.994 (sensitivity: 0.46; specificity: 0.96) Polyphen used 75 sequences for the MSA.
SNAP
SNAP predicts G123E to be non-neutral with an RI of 2 and an expected accuracy of 63%.
Evaluation
All three SNP effect prediction methods predict this mutation to be deleterious. The same impact can be deduced from the difference in physico-chemical properties between the both residues. Blosum62 and PAM1 state that the substitution from glycine to glutamic acid is very unlikely.
Other conclusions can be made from the analysis of the structural location of the mutant. No effect is expected from a change from G to E at this solvent accessible position in the structure. Additionaly the PSSM suggests, that the substitution of G to E is very likely in related sequences.
As has been found in "Identification and expression of eight novel mutations among non-Jewish patients with Canavan disease.", Kaul R et al, Am. J. Hum. Genet. 59:95-102(1996), the mutation G123E leads to a 25% decrease in enzyme activity and leads to in Canavan Disease. In contrast to the first two disease causing mutations, where the prediction results were correct and in agreement, this time different conclusions could be made. However for the first two mutations, the effect were very dramatic resulting in a complete loss of function. This mutations leaves a functionality of 25% and maybe that is why the results are not clear without ambiguity.
R71H
Location
<figure id="r71h_hbonds">
</figure>
R71 is located at the end of the fourth helix in the active site of the enzyme. It is involved in substrate binding via one Hbond. It also forms other Hbonds with an active water molecule and D68. DSSP assigns this position the state 'G', which stands for a 3-10 helix.
Due to its positioning in the active site and the several Hbonds R71 is involved in, it seems very likely that any mutation is deleterious. The interactions of R71 and the respective positioning of H71 is shown in <xr id="r71h_hbonds"/>.
physico-chemical properties
Arginine and histidine have almost similar physico chemical properties. Both are polar and bulky amino acids, with arginine having a stretched shape whereas histidine has a round shape. Another small difference is the positive charge of arginine versus no charge of histidine (only at pH below 6 histidine bears a positive charge). Therefore only from the chemical properties one would not expect huge impact of the mutation.
substitution matrices
The Blosum62 score for the substitution of arginine with histidine acid is 0. This means, histidine substitution for arginine occur as often as would be expected to happen by chance.
In PAM1 the substitution R--> H is scored 8 which equals a mutation probability of 0.08%. The highest probability of a mutation in PAM1 is 0.56%.
In PAM250 the substitution R --> H is scored 5 which stands for a mutation probability of 5%.
Therefore from all substitution matrices one can imply that this mutation is not very likely.
PSSM
From the information content of 2.8 at this position one can conclude the big importance of arginine at this position. This goes along with the analysis of the location of arginine in the binding site of the enzyme. Obviously no other amino acid is favoured at this postition in MSAs of related sequences. Therefore, one would expect a huge effect from the mutation R71H.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 71 R -4 9 -5 -5 -8 -3 -6 -7 -3 -7 -7 -3 -1 -7 -7 -5 -4 -3 -4 -7 2 88 1 1 0 1 0 0 1 0 0 1 2 0 0 1 1 0 1 0 2.80 inf
MSA
All 29 sequences have arginine at this position. This conservation might be based on the importance of R71 in the binding of substrate. This finding agrees with the PSSM result.
SIFT
SIFT predicts an effect for the mutation R71H. The conservation of R at position 71 is very high with 2.99. Besides Methionine and Lysine, all other substitutions are predicted to be deleterious. This goes along with the results from the PSSM.
Substitution at pos 71 from R to H is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01. Median sequence conservation: 2.99 Sequences represented at this position:18 pos A C D E F G H I K L M N P Q R S T V W Y 71R 1.00 0.02 0.00 0.00 0.01 0.00 0.01 0.01 0.01 0.06 0.02 0.09 0.01 0.01 0.02 1.00 0.01 0.01 0.01 0.00 0.01
Polyphen2
This mutation is predicted to be probably damaging with a score of 0.909 (sensitivity: 0.69; specificity: 0.90). Polyphen used 75 sequences in the MSA.
SNAP
SNAP predicts mutation R71H to be neutral with an RI of 5 and an expected accuracy of 73%.
Evaluation
Both PolyPhen and SIFT predict this mutation to be deleterious, while SNAP predicts it to be neutral. The substitution matrices and PSSM give only low likelihoods and also from analysis of the location of R71 in the binding site one would expect, that any substitution has an impact on the binding of substrate and therefore activity of the enzyme.
Most of these predictions agree with the annotation in HGMD, which associates R71H with Canavan Disease.
R71K
Location
<figure id="r71k_hbonds">
</figure>
Again, R71 is located at the end of the fourth helix in the active site of the enzyme. It is involved in substrate binding via one Hbond. It also forms other Hbonds with an active water molecule and D68. DSSP assigns this position the state 'G', which stands for a 3-10 helix.
As can be seen in <xr id="r71k_hbonds"/>, K71 could be oriented in the same way as R71. Therefore is is possible for lysine to interact with the substrate or with D68. In contrast to R71H, the effect of this mutation is not expected to have such a big influence, since some of the interactions might be kept.
physico-chemical properties
Both residues have similar physico-chemical properties. Both residues are positively charged and therefore polar. Both are bulky residues that have the same overal shape. One difference is the amount of amino groups in the side chain: whereas arginine has two amino groups to form Hbonds, lysine has only one. From these similar properties, one would not expect a big impact of this muation.
substitution matrices
The substitution of R to K gets a score of 2 in the Blosum62 matrix. This means, this substitution is favoured and expected to occur slightly more often than random.
In PAM1 this substitution is scored 37, which is one of the higher scores in this matrix with the highest value being 57. It means, that the probability for this mutation to happen equals 0.37%.
In PAM250 this substitution is scored 18, which is even a higher score, than for the substitution of R for itself (score 17). Therefore this 18% mutation probability is really high.
Therefore, from the substitution matrices one can conclude that this mutation is almost likely.
PSSM
As already mentioned for the mutation R71H, this position in sequence is highly conserved, and allows hardyl no other amino acids. Even for lysine, with its similar physico chemical properties, hardly any substitutions are observed in related sequences.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 71 R -4 9 -5 -5 -8 -3 -6 -7 -3 -7 -7 -3 -1 -7 -7 -5 -4 -3 -4 -7 2 88 1 1 0 1 0 0 1 0 0 1 2 0 0 1 1 0 1 0 2.80 inf
MSA
All 29 sequences have arginine at this position. This conservation might be based on the importance of R71 in the binding of substrate. This finding agrees with the PSSM result.
SIFT
SIFT predicts the mutation R71K to be tolerated with a score of 0.06. This score is just above the threshold of 0.5, below which substitutions are classified as deleterious. Arginine is very highly conserved at this position (2.99) and only lysine and methione have values above 0.5 and are thus tolerated.
Substitution at pos 71 from R to K is predicted to be TOLERATED with a score of 0.06. Median sequence conservation: 2.99 Sequences represented at this position:18 pos A C D E F G H I K L M N P Q R S T V W Y 71R 1.00 0.02 0.00 0.00 0.01 0.00 0.01 0.01 0.01 0.06 0.02 0.09 0.01 0.01 0.02 1.00 0.01 0.01 0.01 0.00 0.01
Polyphen2
This mutation is predicted to be benign with a score of 0.421 (sensitivity: 0.84; specificity: 0.79). Polyphen used 75 sequences in the MSA.
SNAP
SNAP predicts R71K to be neutral with an RI of 0 and an expected accuracy of 51%, so it is fairlz unsure about its prediction.
Evaluation
From the prediction methods as well as from the other analysis one can expect this mutation to be tolerated. Also arginine is involved in substrate binding and is highly conserved among related proteins, the similar physico chemical properties of arginine and lysine might allow this substitution.
This goes along with the annotation for this mutation, that is not classifies as disease causing. Yet there is one source of information <ref name="r71k">Le Coq J., Pavlovsky A., Malik R., Sanishvili R., Xu C., Viola R.E, Examination of the mechanism of human brain aspartoacylase through the binding of an intermediate analogue, Biochemistry 47:3484-3492(2008)</ref> about mutation experiments, from which is conducted that this mutation reduces activity of the enzyme by 99%!
K213E
Location
<figure id="k213e">
</figure>
Residue 213 is located on the outside of the protein, on an outer loop region, and it is nowhere close to the dimer interaction site, either. It does not form any backbone-backbone interactions.
Physico-chemical properties
Lysine and Glutamic Acid are fairly similar amino acids: they are both polar and bulky, and only their charge differs (Lysine is positively charged, Glutamic Acid negatively). The Grantham score lies at 26, so the substitution is considered conservative.
Substitution Matrices
Blosum62 score: 1, i.e, it is expected to occur slightly more often than random.
PAM1 score: 4, so, it is only expected to occur with 0.04% probability, where 0.19% is the highest possible score for mutations of lysine.
PAM250 score: 5, meaning there is a 5% mutation probability, where 10% is the highest expected mutation probabilty for lysine to valine.
-> The position-independent substitution matrices consider this substitution as not very likely, but possible.
PSSM
Information content on position 213 is only 0.09, which fits well to the position on an outer loop. A substitution to Glutamic Acid seems not exactly likely, but should not have very large impact either.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 213 K 1 0 -1 0 -2 0 0 -1 1 -1 -1 1 0 -1 2 0 0 -2 -1 1 10 5 2 5 1 4 7 4 3 4 6 7 2 3 12 5 7 1 2 10 0.09 inf
MSA
18 out of 29 sequences have lysine at this position.But in the other 11 sequences other amino acids can be found, like arginine, threonine, alanine and methionine. This goes along with the PSSM results, that this position is not very highly conserved and substitutions are possible.
SIFT
SIFT predicts the mutation K213E to be tolerated with a score of 0.92 and a confidence of 1.00.
pos A C D E F G H I K L M N P Q R S T V W Y 213K 1.00 0.77 0.08 0.51 0.92 0.09 0.32 0.14 0.33 1.00 0.39 0.17 0.42 0.27 0.54 0.58 0.63 0.67 0.63 0.03 0.09
PolyPhen2
This mutation is predicted to be benign with a score of 0.004 (sensitivity: 0.98; specificity: 0.35)
SNAP
SNAP predicts K213E to be non-neutral with an RI of 2 and an expected accuracy of 63%.
Evaluation
Grantham 26
When looking at the position in the protein and the physico-chemical properties, the mutation could be expected to be tolerated. Both substitution matrices, however, suggest that it is not very likely to occur, and the PSSM value is ambiguous.
The prediction methods SIFT and PolyPhen expect this mutation to be neutral, while SNAP predicts it to affect protein function.
According to the HGMD, this mutation is associated with the Canavan Disease.
V278M
Location
<figure id="V278M">
</figure>
Residue V278M is located on the outside of the protein and does not form any HBonds.
Physico-chemical properties
Valine and Methionine are both hydrophobic (Valine slightly more so), and both are neutrally charged. Valine is smaller and rounder than Methionine. Methionine contains sulfur, but is not able to form disulfide bonds, so this fact does not make a big difference. This is reflected in a conservative Grantham score of 21.
Substitution Matrices
Blosum62 score: 1, i.e., it is expected to occur slightly more often than random.
PAM1 score: 4, i.e., 0.04% probability to occur (where 0.33% is the highest probability here for a substition of valine to isoleucine).
PAM250 score: 2, meaning 2% mutation probability (where 13% is the highest probability here for a substitution of valine to leucine).
-> The substitution matrices do not consider the mutation likely, but not impossible.
PSSM
The information content of this position is 41, so it is definitely not unconserved. A substitution to Methionine results in a very low to low score compared to other amino acids.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 278 V 0 -4 -5 -3 0 -3 2 2 -3 1 0 -4 -3 2 -1 -3 -2 -2 -3 4 6 1 0 1 2 1 14 14 1 7 10 0 0 9 3 2 2 1 0 26 0.41 inf
MSA
At this position, out of 28 sequences 26 have valine conserved. Only two sequences have methionine and isoleucine at this position. In PSSM, Methionine is not specified as a very unlikely substituent. The high conservation of valine is in constrast to the PSSM result, which assigns a low information content to position 278.
SIFT
SIFT predicts the mutation V278M to affect function with a score of 0.01 and a confidence of 0.94.
pos A C D E F G H I K L M N P Q R S T V W Y 278V 0.94 0.08 0.01 0.00 0.00 0.01 0.00 0.00 0.34 0.00 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.02 1.00 0.00 0.00
PolyPhen2
This mutation is predicted to be probably damaging with a score of 0.950 (sensitivity: 0.64; specificity: 0.92)
SNAP
SNAP predicts V278M to be neutral with an RI of 5 and an expected accuracy of 73%.
Evaluation According to the physico-chemical properties and the substitution matrices, this mutation is not considered to be deleterious. The PSSM, however, suggests that is is fairly unlikely.
SIFT and PolyPhen consider it to be strongly damaging, while SNAP predicts it to be neutral.
This mutation is not related to the Canavan Disease.
M82T
Location
<figure id="M82T">
</figure>
Residue M82T is located on an outer loop of the protein, pointing inwards. It does not form any hydrogen bonds.
Physico-chemical properties
Methionine is nonpolar, whereas Threonine is slighly polar. Methionine is large and longish, while Threonine is much smaller (difference of approx 40 Angstroms) and more round. Threonine has an additional hydroxyl group, while Methionine contains sulfur. Both are neutrally charged. Still, the other differences are large enough to result in a Grantham score of 81 (moderately conservative).
Substitution Matrices
Blosum62 score: -1, i.e., this mutation is expected slightly less often than random.
PAM1 score: 6, i.e., 0.06% and not very likely (45 being the highest score, for a substitution of methionine to leucine).
PAM250 score: 5, meaning 5% mutation probability (20% being the highest probability for a substitution of methionine to leucine)
-> The matrices consider this substitution as slightly unlikely, PAM1 somewhat less likely than Blosum62.
PSSM The information content in this position is not very high (0.08) and the substitution to Threonine is amongst the second most likely substitutions.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 82 M 1 -1 0 1 -1 0 0 1 0 -1 -1 0 -1 -2 1 1 1 0 -1 -1 11 3 3 9 1 4 6 10 2 3 5 4 2 2 7 10 10 1 2 5 0.08 inf
MSA
12 out of 29 sequences have methionine at this position. This implies a not conserved position which goes along with the low information content assigned to this position in the PSSM. Threonine is not among the substitutions at this position in the other 17 sequences. Substitutions found are: valine, lysine, proline, alanine, glycine, glutamic acid.
SIFT
SIFT predicts the mutation M82T to be tolerated with a score of 0.60 and a confidence of 1.00.
pos A C D E F G H I K L M N P Q R S T V W Y 82M 1.00 0.80 0.08 0.63 1.00 0.15 0.42 0.20 0.24 0.92 0.38 0.14 0.52 0.29 0.60 0.55 0.75 0.60 0.36 0.05 0.19
PolyPhen2
This mutation is predicted to be benign with a score of 0.006 (sensitivity: 0.97; specificity: 0.45)
SNAP
M82T is predicted to be neutral with an RI of 3 and an expected accuracy of 62%.
Evaluation
According to the physico-chemical properties (Grantham) and the substitution matrices, this mutation is fairly unlikely and might affect protein function. The PSSM, on the other hand, suggests that it is neutral, which agrees with all thre prediction methods.
This mutation is not associated with the Canavan Disease.
E235K
Location
Physico-chemical properties
Glutamic acid is (obviously) acidic, while Lysine is basic. Both are polar, and both are charged (Glu negatively, Lys positively). Both are bulky (but Lysine is bigger). Their Grantham score is still moderately conservative at 56.
Substitution Matrices
Blosum62 score: 1 -> slightly more often than random
PAM1 score: 7, so it is fairly unlikely (53 is the highest score for a mutation from glutamic acid to aspartic acid)
PAM250 score: 8, meaning 8% mutation probability (10 is the highest score or a mutation from glutamic acid to aspartic acid)
-> All matrices consider the substitution as not very likely, but not impossible.
PSSM An information contenct of 0.41 means this position is not unmeaningful, and the scores to favour such a substitution are not very high.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 235 E 0 -1 -2 3 -3 1 3 -1 2 -3 -3 0 -4 -5 3 1 -1 -6 -4 -3 9 3 2 17 1 5 17 4 4 1 2 5 0 0 12 9 4 0 1 2 0.41 inf
MSA
20 out of 29 sequences have glutamic acid conserved at this position. Substitutions include lysine, alanine, aspartate and proline. So as can be seen in the substitution matrices and the PSSM, a mutation to lysine is not favoured, but possible.
SIFT
SIFT predicts the mutation M82T to be tolerated with a score of 0.79 and a confidence of 1.00.
pos A C D E F G H I K L M N P Q R S T V W Y 235E 1.00 0.60 0.03 0.47 1.00 0.04 0.22 0.09 0.11 0.79 0.17 0.06 0.27 0.19 0.39 0.35 0.41 0.47 0.17 0.01 0.04
PolyPhen2
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00).
SNAP
SNAP predicts this mutation to be neutral with an RI of 2 and an expected accuracy of 59%.
Evaluation
Physico-chemical properties and substitution matrices agree that this mutation is unusual, but not very much so. The PSSM agrees with that.
All three prediction methods do not expect this mutation to affect protein function.
However, this mutation is reported by the HGMD to affect protein function.
I270T
Location
Physico-chemical properties
Isoleucine is very unpolar and bulky, while Threonine is slightly polar and small (volume difference of 50Angstrom^3). Both are neutrally charged. Threonine has an additional hydroxyl group. This results in a still moderately conservative Grantham score of 89.
Substitution Matrices
Blosum62 score: -1, i.e., slightly less often than random.
PAM1 score: 11, so fairly unlikely (57 being the highest probability for a mutation from isoleucine to valine).
PAM250 score: 6, meaning 6% mutation probability (15 being the highest probability for a mutation from isoleucine to leucine)
-> All matrices consider the substitution as unlikely.
PSSM The information content for this position is at 0.74, indicating that it is conserved and might be sensitive to substitutions. Additionally, the substitution scores for Threonine are amongst the lowest, so this substitution is very unfavourable.
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 270 I -3 2 -4 -5 -3 -3 -4 -5 -2 5 2 -2 -1 -3 -2 -4 -4 -4 -1 4 2 11 1 1 0 1 1 1 1 29 17 3 1 1 3 2 1 0 2 22 0.74 inf
MSA
19 out of 28 sequences have Isoleucine conserved at this position. Substitutions include only three different amino acids, which a physicochemically similar to isoleucine: methionine, valin, leucine. This observation is similar to the PSSM result, which favours leucine and valine substitutions. Muation to threonine might thus be considered unlikely.
SIFT
SIFT predicts the mutation I270T to affect protein function with a score of 0.01 and a confidence of 0.94.
pos A C D E F G H I K L M N P Q R S T V W Y 270I 0.94 0.00 0.00 0.00 0.00 0.01 0.00 0.00 1.00 0.00 0.12 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.23 0.00 0.00
PolyPhen2
This mutation is predicted to be probably damaging with a score of 0.997 (sensitivity: 0.27; specificity: 0.98)
SNAP
SNAP predicts I270T to be neutral with an RI of 1 and an expected accuracy of 53%.
Evaluation
The physico-chemical property scores from the Grantham value and the substitution matrices suggest that this mutation is rather unlikely.
The PSSM supports the idea that it is a deleterious mutation, as well as the MSA.
SIFT and PolyPhen predict it to affect protein function with a very high confidence, while SNAP predicts it to be neutral with a low confidence.
This mutation is not associated with the Canavan Disease.
Discussion
In table <xr id="summary_table"/> we summarized the results from all methods issues we have been analyzing. For the prediction methods SIFT, Polyphen and SNAP one can conclude that SIFT and Polyphen provided the same results, predicting 60% of the SNP effects correctly. SNAP provided different results for some SNPs and also has an accuracy of 60%. One has to take into account that the SNAP results all have low reliability, usually about 60%.
The analysis of the substitution matrices, PSSM and the MSA had varying results. In some cases, where the amino acid is highly conserved they give a hint on the effect of substitutions. But in cases, where the position is not well conserved, it is hard to extract a reliable prediction.
Looking at the physico-chemical properties and the location in structure, often gives a good impression on what is structurally happening in the enzyme. Yet, from these structural implications it is hard to predict a functional effect. In clearly deleterious mutations, the Grantham score is a good measure. But we often felt, that it did not represent the actual change in properties for some substitutions (eg. R71H, K213E).
In general we conclude, that for clearly deleterious mutations with complete loss of function effects on the enzyme (eg E285A, A305E, G123E), one can reliably predict the effect. But not highly dramatic effects are hard to predict. And furthermore it is not obvious, which method should be used. All method did not have outstanding accuracies and maybe some kind of consensus decision gives the best prediction result.
<figtable id="summary_table">
Mutation | Location and Structure | Phys-Chem Properties(Grantham) | Substitution Matrices | PSSM | SIFT | Polyphen | SNAP | annotated effect |
E285A | possible effect | moderately radical 107 |
possible effect | no effect | deleterious effect 0.00 |
damaging effect 1.00 |
non-neutral mutation RI 2, 63% |
Canavan Disease <= 1% activity |
A305E | structural effect | moderately radical 107 |
possible effect | possible effect | deleterious effect 0.02 |
damaging effect 1.00 |
neutral mutation RI 3, 62% |
Canavan Disease 0% activity |
G123E | no effect | nearly mod radical 98 |
deleterious effect | no effect | deleterious effect 0.02 |
damaging effect 0.994 |
non-neutral mutation RI 2, 63% |
Canavan Disease ~ 25% activity |
R71H | deleterious effect | conservative 29 |
possible effect | deleterious effect | deleterious effect 0.02 | damaging effect 0.909 | neutral mutation RI 5, 73% |
Canavan Disease |
R71K | no effect | conservative 26 |
no effect | deleterious effect | tolerated mutation 0.06 |
benign 0.421 |
neutral mutation RI 0, 51% |
not disease causing mutation analysis: 99% red. activity |
K213E | no effect | conservative 26 |
little to no effect | little to no effect | tolerated mutation 0.92 |
benign 0.004 |
non-neutral mutation RI 2, 63% |
Canavan disease |
V278M | no effect | conservative 21 |
little to no effect | possible effect | deleterious effect 0.01 |
probably damaging effect 0.95 |
neutral mutation RI 5, 73% |
not disease related |
M82T | no effect | moderately conservative 81 |
possible effect | no effect | tolerated mutation 0.60 |
benign 0.006 |
neutral mutation RI 3, 62% |
not disease related |
E235K | no effect | moderately conservative 56 |
little to no effect | possible effect | tolerated mutation 0.79 |
benign | neutral mutation RI 2, 59% |
Canavan Disease |
I270T | structural effect | moderately conservative 89 |
possible effect | deleterious effect | deleterious effect 0.01 |
probably damaging effect 0.997 |
neutral mutation RI 1, 53% |
not disease related |
</figtable>
References
<references/>