Difference between revisions of "Sequence-based mutation analysis GLA"

From Bioinformatikpedia
(A143T (Mutation 4))
m (Indiviudal Analysis of a Mutation)
Line 529: Line 529:
 
We assigned the status non-neutral to the mutation in respect to the change of physicochemical properties, because there is a drastic change of the hydrophobicity index from a positive to a negative value and the residue becomes polar. The analysis of the substitution matrices show that the mutation is unlikely to occur and the low value in the PSSM indicate that a threonine appears very rare in related proteins at this position. The latter observation is confirmed with the examination of the conservation in two MSAs, since the conservation of this position is very high (10 and 11). All the programs (SNAP, SIFT and PolyPhen) predict that this mutation is also non-neutral with very clear scores/indices. To recap, every method besides the secondary structure examination indicate that this mutation is a non-neutral one and therefore we conclude the same.
 
We assigned the status non-neutral to the mutation in respect to the change of physicochemical properties, because there is a drastic change of the hydrophobicity index from a positive to a negative value and the residue becomes polar. The analysis of the substitution matrices show that the mutation is unlikely to occur and the low value in the PSSM indicate that a threonine appears very rare in related proteins at this position. The latter observation is confirmed with the examination of the conservation in two MSAs, since the conservation of this position is very high (10 and 11). All the programs (SNAP, SIFT and PolyPhen) predict that this mutation is also non-neutral with very clear scores/indices. To recap, every method besides the secondary structure examination indicate that this mutation is a non-neutral one and therefore we conclude the same.
   
Prediction: a non-neutral mutation.
+
Prediction: a '''non-neutral''' mutation.
   
 
===S65T (Mutation 2)===
 
===S65T (Mutation 2)===
Line 541: Line 541:
 
Isoleucine is a nonpolar amino acid with a strong positive hydrophobicity index, whereas serine is polar and has a slightly negative hydrophobicity index. Therefore we assume a change in the physicochemical properties by this substitution and assign the status non-neutral. The values in the substitution matrices are not the lowest possible, but still are very low for all three. The same applies to the value in the PSSM and is supported by a high conservation in the MSA. All three programs predict a non-neutral mutation with sufficiently high scores. Since there is only one category that suggest a neutral mutation, we conclude that this mutation is a non-neutral one.
 
Isoleucine is a nonpolar amino acid with a strong positive hydrophobicity index, whereas serine is polar and has a slightly negative hydrophobicity index. Therefore we assume a change in the physicochemical properties by this substitution and assign the status non-neutral. The values in the substitution matrices are not the lowest possible, but still are very low for all three. The same applies to the value in the PSSM and is supported by a high conservation in the MSA. All three programs predict a non-neutral mutation with sufficiently high scores. Since there is only one category that suggest a neutral mutation, we conclude that this mutation is a non-neutral one.
   
Prediction: a non-neutral mutation.
+
Prediction: a '''non-neutral''' mutation.
   
 
===A143T (Mutation 4)===
 
===A143T (Mutation 4)===
Line 550: Line 550:
 
To recap, the substitution matrices and the PSSM suggest a neutral mutation. The MSAs reveal that the conservation of this position could be higher. The prediction of the programs leans towards a non-neutral mutation. Because the confidence of the programs is not very high, we weigh the observations of the other methods higher and assume that this mutation is a neutral one.
 
To recap, the substitution matrices and the PSSM suggest a neutral mutation. The MSAs reveal that the conservation of this position could be higher. The prediction of the programs leans towards a non-neutral mutation. Because the confidence of the programs is not very high, we weigh the observations of the other methods higher and assume that this mutation is a neutral one.
   
Prediction: a neutral mutation.
+
Prediction: a '''neutral''' mutation.
   
 
===H186R (Mutation 5)===
 
===H186R (Mutation 5)===
Line 559: Line 559:
 
All three programs predict a neutral mutation with a very high confidence. To recap, there are only two methods that suggest a non-neutral mutation. Since we do not rule out, that the assignment of the non-neutral status according to the substitution matrix PAM1 could be wrong, and the change of physicochemical properties were very small for the most part, we also conclude a neutral mutation.
 
All three programs predict a neutral mutation with a very high confidence. To recap, there are only two methods that suggest a non-neutral mutation. Since we do not rule out, that the assignment of the non-neutral status according to the substitution matrix PAM1 could be wrong, and the change of physicochemical properties were very small for the most part, we also conclude a neutral mutation.
   
Prediction: a neutral mutation.
+
Prediction: a '''neutral''' mutation.
   
 
===P205T (Mutation 6)===
 
===P205T (Mutation 6)===
Line 568: Line 568:
 
So, we have the case that the MSAs, PSSM and the prediction progams point towards a neutral mutation and the substitution matrices and physicochemical properties indicate the opposite. The latter group evaluate the substitution in general, i.e. only based on the two amino acids. In contrast, the first group is able to take the specific functional importance into account, i.e. the conservation in related proteins. Because of this and the fact that the programs show very positive scores, we weigh the results of the first group higher and assume that this mutation is non-neutral.
 
So, we have the case that the MSAs, PSSM and the prediction progams point towards a neutral mutation and the substitution matrices and physicochemical properties indicate the opposite. The latter group evaluate the substitution in general, i.e. only based on the two amino acids. In contrast, the first group is able to take the specific functional importance into account, i.e. the conservation in related proteins. Because of this and the fact that the programs show very positive scores, we weigh the results of the first group higher and assume that this mutation is non-neutral.
   
Prediction: a non-neutral mutation.
+
Prediction: a '''non-neutral''' mutation.
   
 
===D244H (Mutation 7)===
 
===D244H (Mutation 7)===
Line 579: Line 579:
 
To sum up, we are able to argue for both possibilities for this substitution and it is a very close decision. We declare this mutation as non-neutral, because the analysis of PSSM and PAM250 may apply more for far related proteins and these methods state a neutral mutation. Additionally the programs have a small tendency towards a non-neutral mutation.
 
To sum up, we are able to argue for both possibilities for this substitution and it is a very close decision. We declare this mutation as non-neutral, because the analysis of PSSM and PAM250 may apply more for far related proteins and these methods state a neutral mutation. Additionally the programs have a small tendency towards a non-neutral mutation.
   
Prediction: non-neutral mutation.
+
Prediction: a '''non-neutral''' mutation.
   
 
===Q283P (Mutation 8)===
 
===Q283P (Mutation 8)===
Line 586: Line 586:
 
The conservation of this position has the maximum value in the MSAs and proline has the lowest value of all amino acids in the PSSM at this position. All predictions of the programs predict a non-neutral mutation with a high confidence. Since there is no analysis that strongly indicates a neutral mutation, we assume that this mutation is a non-neutral one.
 
The conservation of this position has the maximum value in the MSAs and proline has the lowest value of all amino acids in the PSSM at this position. All predictions of the programs predict a non-neutral mutation with a high confidence. Since there is no analysis that strongly indicates a neutral mutation, we assume that this mutation is a non-neutral one.
   
Prediction: a non-neutral mutation.
+
Prediction: a '''non-neutral''' mutation.
   
 
===Q321E (Mutation 9)===
 
===Q321E (Mutation 9)===
Line 593: Line 593:
 
The prediction of all programs states a non-neutral mutation. The prediction of SNAP achieves the highest confidence among the ten choosen mutations. Since two of the previous methods (physicochemical properties, MSA and PSSM) also indicate a non-neutral mutation and the fact that we weigh the results of the programs a little bit higher, we assume that this mutation is a non-neutral one.
 
The prediction of all programs states a non-neutral mutation. The prediction of SNAP achieves the highest confidence among the ten choosen mutations. Since two of the previous methods (physicochemical properties, MSA and PSSM) also indicate a non-neutral mutation and the fact that we weigh the results of the programs a little bit higher, we assume that this mutation is a non-neutral one.
   
Prediction: a non-neutral mutation.
+
Prediction: a '''non-neutral''' mutation.
   
 
===R363C (Mutation 10)===
 
===R363C (Mutation 10)===
Line 604: Line 604:
 
To summarize, the first group of predictions clearly indicates a non-neutral mutation, whereas the second and third group are leaning towards to a neutral mutation. As discussed in a section before, we weigh the prediction of the two latter higher.
 
To summarize, the first group of predictions clearly indicates a non-neutral mutation, whereas the second and third group are leaning towards to a neutral mutation. As discussed in a section before, we weigh the prediction of the two latter higher.
   
Prediction: a neutral mutation.
+
Prediction: a '''neutral''' mutation.
   
 
= References =
 
= References =

Revision as of 03:52, 28 August 2011

by Benjamin Drexler and Fabian Grandke

Introduction

Selected Mutations

The process of the selection and the mutations itself are described on this page: Selected Mutations

Mutation Analysis

Physicochemical Properties and Changes

The physicochemical properties of an amino acid have an influence on the structure and functionality of the protein. Therefore we examine the substitution and their changes of these properties to evaluate wether this mutation is tolerated or not. We extracted the information of polarity, charge, hydrophobicity index, acidity and aromatic/aliphatic character out of the wikipedia site for amino acids. The charge is given at an enviroment with a pH of 7.4. The hydrophobicity index is according to Kyte and Doolittle<ref name=hydrophobicity>Kyte J, Doolittle RF (May 1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology 157 (1): 105–32. PubMed</ref>.

We applied the following coloring scheme:

  • green: there is no change of this property
  • yellow: there is a small change of this property
  • red: there is a drastic change of this property

Mutation 1

  • Position 42: Methionine -> Threonine
Property Methionine Threonine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index 1.9 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 2

  • Position 65: Serine -> Threonine
Property Serine Threonine
Polarity polar polar
Charge neutral neutral
Hydrophobicity index -0.8 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 3

  • Position 117: Isoleucine -> Serine
Property Isoleucine Serine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index 4.5 -0.8
Acidity - -
Aromatic or aliphatic - -


Mutation 4

  • Position 143: Alanine -> Threonine
Property Alanine Threonine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index 1.8 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 5

  • Position 186: Histidine -> Arginine
Property Histidine Arginine
Polarity polar nonpolar
Charge positive (10%), neutral (90%) positive
Hydrophobicity index -3.2 -4.5
Acidity weak basic strongly basic
Aromatic or aliphatic aromatic -


Mutation 6

  • Position 205: Proline -> Threonine
Property Proline Threonine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index -1.6 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 7

  • Position 244: Aspartic acid -> Histidine
Property Aspartic acid Histidine
Polarity polar polar
Charge negative positive (10%), neutral (90%)
Hydrophobicity index -3.5 -3.2
Acidity acidic weak basic
Aromatic or aliphatic - aromatic


Mutation 8

  • Position 283: Glutamine -> Proline
Property Glutamine Proline
Polarity polar nonpolar
Charge neutral neutral
Hydrophobicity index -3.5 -1.6
Acidity - -
Aromatic or aliphatic - -


Mutation 9

  • Position 321: Glutamine -> Glutamic acid
Property Glutamine Glutamic acid
Polarity polar polar
Charge neutral negative
Hydrophobicity index -3.5 -3.5
Acidity - acidic
Aromatic or aliphatic - -


Mutation 10

  • Position 363: Arginine -> Cysteine
Property Arginine Cysteine
Polarity nonpolar nonpolar
Charge positive neutral
Hydrophobicity index -4.5 2.5
Acidity strongly basic acidic
Aromatic or aliphatic - -

Substitution Matrices

In this section, we take a look at substitution matrices to evaluate whether the introduced substitution of the mutation is favorable in a biological context. For this, we use two different kinds of substitution matrices. First, Blocks of Amino Acid Substitution Matrix (BLOSUM) is an evidence based matrix which is calculated of alignments between proteins <ref name=blosum>en.wikipedia.org/wiki/BLOSUM</ref>. Second, Point Accepted Mutation or Percent Accepeted Mutation (PAM) is a set of matrices that is derived of from the amino acid substitutions between closely related proteins <ref name=pam>en.wikipedia.org/wiki/Point_accepted_mutation</ref>. In general, a high value in a substitution matrix indicates a more likely substitution.


Number Substitution BLOSUM62 PAM1 PAM250
Mutation Best1 Worst2 Mutation Best Worst Mutation Best Worst
1 Met -> Thr -1 2 -3 2 8 0 1 3 0
2 Ser -> Thr 1 1 -3 38 38 0 9 9 3
3 Ile -> Ser -2 2 -4 1 33 0 3 9 1
4 Ala -> Thr -1 1 -3 32 35 0 11 12 2
5 His -> Arg 0 2 -3 8 20 0 5 7 2
6 Pro -> Thr 1 1 -4 4 13 0 5 7 1
7 Asp -> His -1 2 -4 4 53 0 6 10 1
8 Gln -> Pro -1 2 -3 6 27 0 4 7 1
9 Gln -> Glu 2 2 -3 27 27 0 7 7 1
10 Arg -> Cys -3 2 -3 1 19 0 2 9 1

1 Best is the highest value in the regarding column/row except for the self-substitution (e.g. Met -> Met).
2 Worst is the lowest value in the regarding column/row.


The following coloring scheme was applied:

  • green: the substitution value of the mutation is closer to the best value than to the worst value
  • red: the substitution value of the mutation is closer to the worst value than to the best value
  • yellow: the substitution value of the mutation has the same absolute difference to both values


The following substitution matrices were used:

PSSM

In this section, we use a point specific scoring matrix (PSSM) to evaluate how the conservation of the wildtype and mutant residue is in related proteins. For this, we used PSI-BLAST with the following command:

blastpgp -i GLA.fasta -j 5 -d /data/blast/nr/nr -e 10E-6 -Q psiblast.mat -o psiblast.out

The relevant values are listed in the following table. The full PSSM of the certain positions is provided on this page.

Number Substitution PSSM
Mutation Wildtype Best1 Worst2
1 Met -> Thr -3 9 9 -6
2 Ser -> Thr 3 3 5 -3
3 Ile -> Ser -3 4 5 -6
4 Ala -> Thr -1 1 2 -5
5 His -> Arg 1 0 2 -5
6 Pro -> Thr -2 2 3 -5
7 Asp -> His 1 4 4 -5
8 Gln -> Pro -5 3 8 -5
9 Gln -> Glu 1 7 7 -5
10 Arg -> Cys -1 2 3 -4

1 Best is the highest value in the regarding row.
2 Worst is the lowest value in the regarding row.


The following coloring scheme was applied:

  • green: the substitution value of the mutation is closer to the best value than to the worst value
  • red: the substitution value of the mutation is closer to the worst value than to the best value
  • yellow: the substitution value of the mutation has the same absolute difference to both values

Multiple Sequence Alignment

We take a look at a multiple sequence alignment (MSA) to evaluate the conservation of the residues which are affected by one of the mutations. We used BLAST to get the sequences for the MSA. A table of the sequences is provided on this page. Afterwards we created two MSAs with the locally installed T-Coffee.

  • MSA with 100 sequences: see here
  • MSA with 25 sequences: see figure 1
Figure 1: multiple sequence alignment of the best 25 sequences by T-Coffee in JalView.

We used the conservation index according to Livingstone C.D. and Barton G.J.<ref name=livingstone>Livingstone C.D. and Barton G.J. (1993), "Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of Residue Conservation.", CABIOS Vol. 9 No. 6 (745-756)), PubMed</ref> to determine whether a residue is conserved or not. The conservation index was calculated by JalView and ranges from 0 to 11.

Number Position Conservation (100 sequences) Conservation (25 sequences)
1 42 10 11
2 65 8 11
3 117 9 9
4 143 10 8
5 186 4 2
6 205 10 11
7 244 10 11
8 283 11 11
9 321 11 11
10 363 3 5

The following coloring scheme was applied:

  • green: conservation is between 8 and 11
  • yellow: conservation is between 5 and 7
  • red: conservation is between 0 and 4

Secondary Structure

We examined the potential influence of the mutation on the secondary structure of the α-galactosidase. For this, we used two programs that predict the secondary structure, i.e. PSIPRED and JPred3. Please see task 2 - sequence-based predictions for further explanations of the programs.

We performed one run with the wildtype sequence of α-galactosidase and afterwards ten runs with an isolated mutation, so that the concurrence of two or more mutations at the same time do not influence the prediction. Even though this would be unlikely, since the distance between each pair of mutations should be large enough. The results are listed in the table below. As it can be seen, none of the mutations influence the prediction of the secondary structure.


Number Substitution UniProt PSIPRED PSIPRED
Wildtype Mutation Wildtype Mutation
1 Met -> Thr Beta strand Coil Coil Coil Coil
2 Ser -> Thr Coil Coil Coil Helix Helix
3 Ile -> Ser Helix Coil Coil Helix Helix
4 Ala -> Thr Coil Coil Coil Coil Coil
5 His -> Arg Helix Helix Helix Helix Helix
6 Pro -> Thr Helix Coil Coil Coil Coil
7 Asp -> His Helix Helix Helix Helix Helix
8 Gln -> Pro Helix Helix Helix Helix Helix
9 Gln -> Glu Coil Coil Coil Coil Coil
10 Arg -> Cys Beta strand Beta strand Beta strand Beta strand Beta strand


The following coloring scheme was applied:

  • green: The prediction of the wildtype is correct and the mutation does not change the prediction
  • yellow: The prediction of the wildtype was wrong, but the mutation does not change the prediction
  • red: The mutation changes the prediction

Programs

SNAP

SNAP (screening for non-acceptable polymorphisms) is a program that tries to predict wether a single point mutation is non-neutral or neutral and was established by Bromberg and Rost in 2007 <ref name=snap>Yana Bromberg and Burkhard Rost, "SNAP: predict effect of non-synonymous polymorphisms on function", Nucleic Acids Research, 2007, Vol. 35, No. 11 3823-3835, PubMed</ref>. It is based on feed-forward neural networks and was trained on datasets of protein mutant database and a specific subset of SWISS-PROT. SNAP gets the sequence and the proposed mutations as input and it assigns the prediction "neutral" or "non-neutral" to the mutation. It also gives a reliability index as output which is a normalization of the differences between the two predictions and ranges from 0 to 9.

We used the local installed version with the following command:
snapfun -i gla.fasta -m mutations.txt -o gla_snap.out


Number Substitution Prediction Reliability index Expected accuracy
1 Met -> Thr Non-neutral 6 93%
2 Ser -> Thr Non-neutral 3 78%
3 Ile -> Ser Non-neutral 4 82%
4 Ala -> Thr Non-neutral 3 78%
5 His -> Arg Tolerated 4 85%
6 Pro -> Thr Non-neutral 6 93%
7 Asp -> His Non-neutral 6 93%
8 Gln -> Pro Non-neutral 5 87%
9 Gln -> Glu Non-neutral 6 93%
10 Arg -> Cys Non-neutral 1 63%

SIFT

SIFT tries to sort intolerant from tolerant amino acid substitutions and was developed by Henikoff et al. in 2001<ref name=polyphen>Ng PC, Henikoff S., "Predicting deleterious amino acid substitutions.", Genome Res. 2001 May, PubMed</ref>. It is based on the assumption that protein function is correlated with protein evolution. Hence it builds a multiple sequence alignment of closely related proteins and tries to identify functionally important residues.

We used the webserver of SIFT for our examinations with the default settings and the results are listed in the table below.

Number Substitution Prediction Score Median sequence conservation # sequences1
1 Met -> Thr Affect protein function 0.00 2.99 46
2 Ser -> Thr Affect protein function 0.01 3.00 48
3 Ile -> Ser Affect protein function 0.03 3.00 52
4 Ala -> Thr Affect protein function 0.01 3.00 52
5 His -> Arg Tolerated 0.25 3.00 52
6 Pro -> Thr Affect protein function 0.00 3.00 52
7 Asp -> His Affect protein function 0.01 3.00 52
8 Gln -> Pro Affect protein function 0.00 3.00 52
9 Gln -> Glu Affect protein function 0.00 3.00 52
10 Arg -> Cys Tolerated 0.18 2.99 51

1This column describes the number of sequences that are present at this position in the multiple sequence alignment. The overall number of sequences was 55.

PolyPhen

PolyPhen tries to predict the influence of an amino acid substitution on the structure and function of the protein and was developed by Adzhubei et al. in 2010<ref name=polyphen>Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR, "A method and server for predicting damaging missense mutations.", Nat Methods 2010 Apr, PubMed</ref>. It creates a multiple alignment of homologous sequences to estimate the functional importance of a residue. It uses a naive Bayes classifier and was trained on two datasets, i.e. HumDiv and HumVar.

We used the webserver of PolyPhen for our examinations with the default settings and the results are listed in the table below.


Number Substitution HumDiv HumVar
Prediction Score Sensitivity Specificity Prediction Score Sensitivity Specificity
1 Met -> Thr Probably damaging 0.984 0.73 0.96 Probably damaging 0.983 0.52 0.95
2 Ser -> Thr Probably damaging 0.984 0.73 0.96 Possibly damaging 0.874 0.70 0.89
3 Ile -> Ser Probably damaging 0.979 0.74 0.96 Possibly damaging 0.870 0.70 0.89
4 Ala -> Thr Possibly damaging 0.950 0.79 0.95 Possibly damaging 0.620 0.80 0.83
5 His -> Arg Benign 0.000 1.00 0.00 Benign 0.000 1.00 0.00
6 Pro -> Thr Probably damaging 1.000 0.00 1.00 Probably damaging 0.977 0.55 0.95
7 Asp -> His Possibly damaging 0.735 0.86 0.92 Benign 0.177 0.89 0.70
8 Gln -> Pro Probably damaging 1.000 0.00 1.00 Probably damaging 0.993 0.42 0.97
9 Gln -> Glu Probably damaging 0.998 0.27 0.99 Probably damaging 0.908 0.67 0.90
10 Arg -> Cys Possibly damaging 0.496 0.89 0.91 Benign 0.046 0.94 0.59

Discussion

Discussion of the Methods in General

We applied several methods to evaluate whether a mutation is neutral or non-neutral. It is neccessary to discuss how we weigh these different methods, since it will not be the case that every method states the same result.

We can distinguish between three groups of methods. The first group contains methods (comparison of physicochemical properties, PAM and BLOSUM substitution matrices) that only include information about the amino acids itself and the position of the mutation in the sequence is irrelevant. The methods of the second group can be described as conservation based (PSSM, MSA). These methods consider the conservation of the position and evaluate how likely the certain substitution is. The last group are the applied programs (SNAP, PolyPhen and SIFT) which also rely on information about the conservation and use machine learning approaches.

The methods of the first group are a very general approach. The evaluation is only based on the substitution itself, whereas the methods of the second group put these information in context of the sequence. The do not only use information about the substiution, but also things like conservation and influence on the structure. Since the methods of the second group consider more information, the evaluation becomes more specific. The group of the programs goes even further. These programs use information about the conservation and do an evaluation based on machine learning methods. So to speak, these programs have the most information available and hence are most likely to be even more specific.

To summarize, the amount of information which are used for the evaluation is the highest in group three followed by group two and the least in group one. Hence, we are going to rely more on the more specific evaluations of group two and three in the case the results do not suggest a clear prediction.

An exception to this is the analysis of the secondary structure. Since there was no change of the prediction due to the amino acid substitution, we do not gain any information with this analysis. Therefore we will discuss the results of this analysis only, if there is something noteworthy or special, but the result of the secondary structure analysis will not have much influence on our evaluation.

Summary of the Mutation Analysis

A summarization of the results of the mutation analysis is shown in the table below. A mutation can be "neutral", "non-neutral" or "-" when it is not unambiguous.

Method M42T S65T I117S A143T H186R P205T D244H Q283P Q321E R363C
Physicochemical properties non-neutral neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral
PAM1 non-neutral neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral non-neutral
PAM250 non-neutral neutral non-neutral neutral neutral neutral neutral - neutral non-neutral
BLOSUM62 non-neutral neutral non-neutral - neutral neutral - - neutral non-neutral
PSSM non-neutral neutral non-neutral neutral neutral non-neutral neutral non-neutral - non-neutral
MSA non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral
Secondary structure neutral neutral neutral neutral neutral neutral neutral neutral neutral neutral
SNAP non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
SIFT non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral
PolyPhen (HumDiv) non-neutral non-neutral non-neutral - neutral non-neutral - non-neutral non-neutral -
PolyPhen (HumVar) non-neutral - - - neutral non-neutral neutral non-neutral non-neutral neutral

Indiviudal Analysis of a Mutation

M42T (Mutation 1)

We assigned the status non-neutral to the mutation in respect to the change of physicochemical properties, because there is a drastic change of the hydrophobicity index from a positive to a negative value and the residue becomes polar. The analysis of the substitution matrices show that the mutation is unlikely to occur and the low value in the PSSM indicate that a threonine appears very rare in related proteins at this position. The latter observation is confirmed with the examination of the conservation in two MSAs, since the conservation of this position is very high (10 and 11). All the programs (SNAP, SIFT and PolyPhen) predict that this mutation is also non-neutral with very clear scores/indices. To recap, every method besides the secondary structure examination indicate that this mutation is a non-neutral one and therefore we conclude the same.

Prediction: a non-neutral mutation.

S65T (Mutation 2)

Since only the residue becomes a weak acidic property and all the other properties stay the same, there is no change of physicochemical properties and we assigned the status neutral in this category. The examination of the substitution matrices reveal that this change of amino acids has the highest possible value in all three matrices which is clearly a strong sign towards a substitution that occurs with a high frequency. Even though the conservation of this position is high in the two MSAs, the mutation achieves the same value as the wildtype in the PSSM. This could be due to the fact that the PSI-BLAST is able to include far related proteins.

In contrast to these observations which suggest a neutral mutation, all three programs predict a non-neutral mutation.

...

I117S (Mutation 3)

Isoleucine is a nonpolar amino acid with a strong positive hydrophobicity index, whereas serine is polar and has a slightly negative hydrophobicity index. Therefore we assume a change in the physicochemical properties by this substitution and assign the status non-neutral. The values in the substitution matrices are not the lowest possible, but still are very low for all three. The same applies to the value in the PSSM and is supported by a high conservation in the MSA. All three programs predict a non-neutral mutation with sufficiently high scores. Since there is only one category that suggest a neutral mutation, we conclude that this mutation is a non-neutral one.

Prediction: a non-neutral mutation.

A143T (Mutation 4)

This substitution introduces a polar, hydrophobic and weak acidic character at this position. Because of the amount of changes, we asssign a non-neutral status in respect to the physicochemical properties. Even though there are some drastic changes, this substitution has high values in the substitution matrices. In PAM1 and PAM250, these values are almost the best.

The conservation of this position is high, but in comparison to most of the other mutations it is slightly lower. The substitution seems to occur frequently in far related proteins according to the values of the PSSM. Two out of four programs predict a non-neutral mutation. SNAP does this with a mediocre confidence. The two predictions of PolyPhen indicate a possible non-neutral mutation.

To recap, the substitution matrices and the PSSM suggest a neutral mutation. The MSAs reveal that the conservation of this position could be higher. The prediction of the programs leans towards a non-neutral mutation. Because the confidence of the programs is not very high, we weigh the observations of the other methods higher and assume that this mutation is a neutral one.

Prediction: a neutral mutation.

H186R (Mutation 5)

Besides the hydrophobicity index, there is a drastic or slight change of every physicochemical property. Therefore, we consider this substitution as non-neutral in respect to the physicochemical properties. Two out of three substitution matrices indicate a neutral mutation. Even though the value is pretty high in PAM1, we assigned the status non-neutral. It could be the case that the evaluation method (i.e. the assignment of status and color) was not appropiate or sophisticated enough.

The conservation in closely related proteins of this residue is the lowest among the ten mutations and it seems like that the substitution of histidine to arginine occurs with a decent frequency, because the value in the PSSM is very close to the best possible value.

All three programs predict a neutral mutation with a very high confidence. To recap, there are only two methods that suggest a non-neutral mutation. Since we do not rule out, that the assignment of the non-neutral status according to the substitution matrix PAM1 could be wrong, and the change of physicochemical properties were very small for the most part, we also conclude a neutral mutation.

Prediction: a neutral mutation.

P205T (Mutation 6)

This substitution introduces only a polar character and the other properties stay pretty much the same. Interestingly, proline is part of an α-helix according to the UniProt entry, whereas proline is to be considered as a helix breaker in the literature<ref name=proline_helix>Gunasekaran et al., "Stereochemical punctuation marks in protein structures: glycine and proline containing helix stop signals.". J Mol Biol. 1998 Feb 6. PubMed</ref>. This could be an explanation why the predicition of the secondary structure for the wildtype is wrong.

Just like in the case of the H186R substitution, PAM250 and BLOSUM62 suggest a neutral mutation. The value is somewhat in the twilight zone for PAM1, but once again it was classified as a non-neutral mutation according to our scheme, which could be not appropiate. The conservation of this residue is very close to the highest possible values in related proteins (MSAs) and substitution to threonine in far related proteins is very unlikely (PSSM). This mutation is one of the four which is predicted to be non-neutral by all of the three programs.

So, we have the case that the MSAs, PSSM and the prediction progams point towards a neutral mutation and the substitution matrices and physicochemical properties indicate the opposite. The latter group evaluate the substitution in general, i.e. only based on the two amino acids. In contrast, the first group is able to take the specific functional importance into account, i.e. the conservation in related proteins. Because of this and the fact that the programs show very positive scores, we weigh the results of the first group higher and assume that this mutation is non-neutral.

Prediction: a non-neutral mutation.

D244H (Mutation 7)

Due to the substitution from aspartic acid to histidine, there are changes in the charge, acidity and the positions becomes aromatic. Because of the amount of changes, we assign the status non-neutral in respect to the physicochemical properties.

The examination of the substitution matrices reveals an very interesting result. Even though PAM1 and PAM250 are based on the same assumptions, they state a contrary result. The evaluation becomes more complicated, since the value in BLOSUM is also not unambiguous. This contrariness also shows up in the examination of the conservation which is very close to the maximum in the MSAs, but the substitution to histidine is not that unlikely according to the value of the PSSM.

The programs SNAP and SIFT predict a non-neutral mutation with a very high confidence, whereas PolyPhen predicts a neutral or not unambiguous mutation.

To sum up, we are able to argue for both possibilities for this substitution and it is a very close decision. We declare this mutation as non-neutral, because the analysis of PSSM and PAM250 may apply more for far related proteins and these methods state a neutral mutation. Additionally the programs have a small tendency towards a non-neutral mutation.

Prediction: a non-neutral mutation.

Q283P (Mutation 8)

Even though there is no drastic change beside the polarity, we asssigned a non-neutral in respect to the change of physicochemical properties, because the loss of polarity might result in non exsistance of hydrogen bonds. According to the values of the substitution matrices, this substitution seems to be in the twilight zone with a tendency towards being non-neutral. Even though proline is considered to be a α-helix breaker, the mutation does not lead to a change of the prediction of the secondary structure which is kind of interesting.

The conservation of this position has the maximum value in the MSAs and proline has the lowest value of all amino acids in the PSSM at this position. All predictions of the programs predict a non-neutral mutation with a high confidence. Since there is no analysis that strongly indicates a neutral mutation, we assume that this mutation is a non-neutral one.

Prediction: a non-neutral mutation.

Q321E (Mutation 9)

Glutamine can be considered the amide of glutamic acid. Hence this substitution leads to the loss of the acidic character and the charge becomes neutral. Overall these changes might be sufficient to have an influence and hence we assign a the status non-neutral in respect to the physicochemical properties. Since these two amino acids are quite similar, the substitution obtain high values in the substitution matrices. The position of the substitution shows the highest possible conservation in the MSAs, but the examination of the PSSM reveals that this substitution occurs with an average frequency.

The prediction of all programs states a non-neutral mutation. The prediction of SNAP achieves the highest confidence among the ten choosen mutations. Since two of the previous methods (physicochemical properties, MSA and PSSM) also indicate a non-neutral mutation and the fact that we weigh the results of the programs a little bit higher, we assume that this mutation is a non-neutral one.

Prediction: a non-neutral mutation.

R363C (Mutation 10)

The substitution from arginine to cysteine changes the hydrophobicity of this position from a very negative value to a positive one. In fact, arginine is the amino acid with the lowest hydrophobicity value. In addition to this, the position becomes acidic instead of strongly basic. Because of these major changes, we assign the status non-neutral to the analysis of the physicochemical properties. These obversations are confirmed by the substitution matrices in which the substitution is very unlikely due to the low values.

The conservation of this position is very low according to the MSAs. This is remarkable since this is one of the two positions that show almost no conservation in the MSAs. The assignment in respect to the value of the PSSM is non-neutral, but it was very close to being non-unambiguous.

Two out of four predictions of the programs are neutral, one is non-unambiguous and one is neutral. The latter is the prediction of SNAP and it has only a reliability index of 1 which is very low. Therefore, we combine the predictions of the programs to neutral.

To summarize, the first group of predictions clearly indicates a non-neutral mutation, whereas the second and third group are leaning towards to a neutral mutation. As discussed in a section before, we weigh the prediction of the two latter higher.

Prediction: a neutral mutation.

References

<references />