Sequence-based mutation analysis GLA

From Bioinformatikpedia
Revision as of 01:29, 26 August 2011 by Drexler (talk | contribs) (Discussion)

by Benjamin Drexler and Fabian Grandke

Introduction

Selected Mutations

We randomly selected ten annotated point mutations of the human gene GLA and they were chosen out of a pool of mutations that consist of two subsets. The first subset contains mutations that are present in HGMD and these mutations were already gathered in the task 4 Mapping SNPs. The second subset are mutations that are present in dbSNP, but not included in HGMD. This was only the case for three mutations.

Mutations at the amino acid position between 1 and 31 were not included in the selection process, because they are part of the signal peptide (see UniProt entry) and they are not present in the reference structure (PDB ID 1R47).

Number AA-Position Codon change Amino acid change Visualization
1 42 ATG-ACG Met -> Thr
Close-up of the residue number 42 of GLA.
2 65 AGT-ACG Ser -> Thr
Close-up of the residue number 65 of GLA.
3 117 ATT-AGT Ile -> Ser
Close-up of the residue number 117 of GLA.
4 143 cGCA-ACA Ala -> Thr
Close-up of the residue number 143 of GLA.
5 186 CAC-CGC His -> Arg
Close-up of the residue number 186 of GLA.
6 205 gCCT-ACT Pro -> Thr
Close-up of the residue number 205 of GLA.
7 244 gGAC-CAC Asp -> His
Close-up of the residue number 244 of GLA.
8 283 CAG-CCG Gln -> Pro
Close-up of the residue number 283 of GLA.
9 321 tCAG-TAG Gln -> Glu
Close-up of the residue number 321 of GLA.
10 363 TATa-TAA Arg -> Cys
Close-up of the residue number 363 of GLA.

The visualization was done by using PyMol and the mutagensis of the residue was performed according to this tutorial. The residue of the wildtype is colored green and the mutated residue is colored red.

Mutation Analysis

Physicochemical Properties and Changes

The physicochemical properties of an amino acid have an influence on the structure and functionality of the protein. Therefore we examine the substitution and their changes of these properties to evaluate wether this mutation is tolerated or not. We extracted the information of polarity, charge, hydrophobicity index, acidity and aromatic/aliphatic character out of the wikipedia site for amino acids. The charge is given at an enviroment with a pH of 7.4. The hydrophobicity index is according to Kyte and Doolittle<ref name=hydrophobicity>Kyte J, Doolittle RF (May 1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology 157 (1): 105–32. PubMed</ref>.

We applied the following coloring scheme:

  • green: there is no change of this property
  • yellow: there is a small change of this property
  • red: there is a drastic change of this property

Mutation 1

  • Position 42: Methionine -> Threonine
Property Methionine Threonine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index 1.9 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 2

  • Position 65: Serine -> Threonine
Property Serine Threonine
Polarity polar polar
Charge neutral neutral
Hydrophobicity index -0.8 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 3

  • Position 117: Isoleucine -> Serine
Property Isoleucine Serine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index 4.5 -0.8
Acidity - -
Aromatic or aliphatic - -


Mutation 4

  • Position 143: Alanine -> Threonine
Property Alanine Threonine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index 1.8 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 5

  • Position 186: Histidine -> Arginine
Property Histidine Arginine
Polarity polar nonpolar
Charge positive (10%), neutral (90%) positive
Hydrophobicity index -3.2 -4.5
Acidity weak basic strongly basic
Aromatic or aliphatic aromatic -


Mutation 6

  • Position 205: Proline -> Threonine
Property Proline Threonine
Polarity nonpolar polar
Charge neutral neutral
Hydrophobicity index -1.6 -0.7
Acidity - weak acidic
Aromatic or aliphatic - -


Mutation 7

  • Position 244: Aspartic acid -> Histidine
Property Aspartic acid Histidine
Polarity polar polar
Charge negative positive (10%), neutral (90%)
Hydrophobicity index -3.5 -3.2
Acidity acidic weak basic
Aromatic or aliphatic - aromatic


Mutation 8

  • Position 283: Glutamine -> Proline
Property Glutamine Proline
Polarity polar nonpolar
Charge neutral neutral
Hydrophobicity index -3.5 -1.6
Acidity - -
Aromatic or aliphatic - -


Mutation 9

  • Position 321: Glutamine -> Glutamic acid
Property Glutamine Glutamic acid
Polarity polar polar
Charge neutral negative
Hydrophobicity index -3.5 -3.5
Acidity - acidic
Aromatic or aliphatic - -


Mutation 10

  • Position 363: Arginine -> Cysteine
Property Arginine Cysteine
Polarity nonpolar nonpolar
Charge positive neutral
Hydrophobicity index -4.5 2.5
Acidity strongly basic acidic
Aromatic or aliphatic - -

Substitution Matrices

In this section, we take a look at substitution matrices to evaluate whether the introduced substitution of the mutation is favorable in a biological context. For this, we use two different kinds of substitution matrices. First, Blocks of Amino Acid Substitution Matrix (BLOSUM) is a evidence based matrix which is calculated of alignments between proteins <ref name=blosum>en.wikipedia.org/wiki/BLOSUM</ref>. Second, Point Accepted Mutation or Percent Accepeted Mutation (PAM) is a set of matrices that is derived of from the amino acid substitutions between closely related proteins <ref name=pam>en.wikipedia.org/wiki/Point_accepted_mutation</ref>. In general, a high value in a substitution matrix indicates a more likely substitution.


Number Substitution BLOSUM62 PAM1 PAM250
Mutation Best1 Worst2 Mutation Best Worst Mutation Best Worst
1 Met -> Thr -1 2 -3 2 8 0 1 3 0
2 Ser -> Thr 1 1 -3 38 38 0 9 9 3
3 Ile -> Ser -2 2 -4 1 33 0 3 9 1
4 Ala -> Thr -1 1 -3 32 35 0 11 12 2
5 His -> Arg 0 2 -3 8 20 0 5 7 2
6 Pro -> Thr 1 1 -4 4 13 0 5 7 1
7 Asp -> His -1 2 -4 4 53 0 6 10 1
8 Gln -> Pro -1 2 -3 6 27 0 4 7 1
9 Gln -> Glu 2 2 -3 27 27 0 7 7 1
10 Arg -> Cys -3 2 -3 1 19 0 2 9 1

1 Best is the highest value in the regarding column/row except for the self-substitution (e.g. Met -> Met).
2 Worst is the lowest value in the regarding column/row.


The following coloring scheme was applied:

  • green: the substitution value of the mutation is closer to the best value than to the worst value
  • red: the substitution value of the mutation is closer to the worst value than to the best value
  • yellow: the substitution value of the mutation has the same absolute difference to both values


The following substitution matrices were used:

PSSM

In this section, we use a point specific scoring matrix (PSSM) to evaluate how the conservation of the wildtype and mutant residue is in related proteins. For this, we used PSI-BLAST with the following command:

blastpgp -i GLA.fasta -j 5 -d /data/blast/nr/nr -e 10E-6 -Q psiblast.mat -o psiblast.out

The relevant values are listed in the following table. The full PSSM of the certain positions is provided on this page.

Number Substitution PSSM
Mutation Wildtype Best1 Worst2
1 Met -> Thr -3 9 9 -6
2 Ser -> Thr 3 3 5 -3
3 Ile -> Ser -3 4 5 -6
4 Ala -> Thr -1 1 2 -5
5 His -> Arg 1 0 2 -5
6 Pro -> Thr -2 2 3 -5
7 Asp -> His 1 4 4 -5
8 Gln -> Pro -5 3 8 -5
9 Gln -> Glu 1 7 7 -5
10 Arg -> Cys -1 2 3 -4

1 Best is the highest value in the regarding row.
2 Worst is the lowest value in the regarding row.


The following coloring scheme was applied:

  • green: the substitution value of the mutation is closer to the best value than to the worst value
  • red: the substitution value of the mutation is closer to the worst value than to the best value
  • yellow: the substitution value of the mutation has the same absolute difference to both values

Multiple Sequence Alignment

We take a look at a multiple sequence alignment (MSA) to evaluate the conservation of the residues which are affected by one of the mutations. We used BLAST to get the sequences for the MSA. A table of the sequences is provided on this page. Afterwards we created two MSAs with the locally installed T-Coffee.

  • MSA with 100 sequences: see here
  • MSA with 25 sequences: see figure 1
Figure 1: multiple sequence alignment of the best 25 sequences by T-Coffee in JalView.

We used the conservation index according to Livingstone C.D. and Barton G.J.<ref name=livingstone>Livingstone C.D. and Barton G.J. (1993), "Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of Residue Conservation.", CABIOS Vol. 9 No. 6 (745-756)), PubMed</ref> to determine whether a residue is conserved or not. The conservation index was calculated by JalView and ranges from 0 to 11.

Number Position Conservation (100 sequences) Conservation (25 sequences)
1 42 10 11
2 65 8 11
3 117 9 9
4 143 10 8
5 186 4 2
6 205 10 11
7 244 10 11
8 283 11 11
9 321 11 11
10 363 3 5

The following coloring scheme was applied:

  • green: conservation is between 8 and 11
  • yellow: conservation is between 5 and 7
  • red: conservation is between 0 and 4

Secondary Structure

We examined the potential influence of the mutation on the secondary structure of the α-galactosidase. For this, we used two programs that predict the secondary structure, i.e. PSIPRED and JPred3. Please see task 2 - sequence-based predictions for further explanations of the programs.

We performed one run with the wildtype sequence of α-galactosidase and afterwards ten runs with an isolated mutation, so that the concurrence of two or more mutations at the same time do not influence the prediction. Even though this would be unlikely, since the distance between each pair of mutations should be large enough. The results are listed in the table below. As it can be seen, none of the mutations seem to influence the secondary structure atleast in the prediction.


Number Substitution UniProt PSIPRED PSIPRED
Wildtype Mutation Wildtype Mutation
1 Met -> Thr Beta strand Coil Coil Coil Coil
2 Ser -> Thr Coil Coil Coil Helix Helix
3 Ile -> Ser Helix Coil Coil Helix Helix
4 Ala -> Thr Coil Coil Coil Coil Coil
5 His -> Arg Helix Helix Helix Helix Helix
6 Pro -> Thr Helix Coil Coil Coil Coil
7 Asp -> His Helix Helix Helix Helix Helix
8 Gln -> Pro Helix Helix Helix Helix Helix
9 Gln -> Glu Coil Coil Coil Coil Coil
10 Arg -> Cys Beta strand Beta strand Beta strand Beta strand Beta strand


The following coloring scheme was applied:

  • green: The prediction of the wildtype is correct and the mutation does not change the prediction
  • yellow: The prediction of the wildtype was wrong, but the mutation does not change the prediction
  • red: The mutation changes the prediction

Programs

SNAP

SNAP (screening for non-acceptable polymorphisms) is a program that tries to predict wether a single point mutation is non-neutral or neutral and was established by Bromberg and Rost in 2007 <ref name=snap>Yana Bromberg and Burkhard Rost, "SNAP: predict effect of non-synonymous polymorphisms on function", Nucleic Acids Research, 2007, Vol. 35, No. 11 3823-3835, PubMed</ref>. It is based on feed-forward neural networks and was trained on datasets of protein mutant database and a specific subset of SWISS-PROT. SNAP gets the sequence and the proposed mutations as input and it assigns the prediction "neutral" or "non-neutral" to the mutation. It also gives a reliability index as output which is a normalization of the differences between the two predictions and ranges from 0 to 9.

We used the local installed version with the following command:
snapfun -i gla.fasta -m mutations.txt -o gla_snap.out


Number Substitution Prediction Reliability index Expected accuracy
1 Met -> Thr Non-neutral 6 93%
2 Ser -> Thr Non-neutral 3 78%
3 Ile -> Ser Non-neutral 4 82%
4 Ala -> Thr Non-neutral 3 78%
5 His -> Arg Tolerated 4 85%
6 Pro -> Thr Non-neutral 6 93%
7 Asp -> His Non-neutral 6 93%
8 Gln -> Pro Non-neutral 5 87%
9 Gln -> Glu Non-neutral 6 93%
10 Arg -> Cys Non-neutral 1 63%

SIFT

SIFT tries to sort intolerant from tolerant amino acid substitutions and was developed by Henikoff et al. in 2001<ref name=polyphen>Ng PC, Henikoff S., "Predicting deleterious amino acid substitutions.", Genome Res. 2001 May, PubMed</ref>. It is based on the assumption that protein function is correlated with protein evolution. Hence it builds a multiple sequence alignment of closely related proteins and tries to identify functionally important residues.

We used the webserver of SIFT for our examinations with the default settings and the results are listed in the table below.

Number Substitution Prediction Score Median sequence conservation # sequences1
1 Met -> Thr Affect protein function 0.00 2.99 46
2 Ser -> Thr Affect protein function 0.01 3.00 48
3 Ile -> Ser Affect protein function 0.03 3.00 52
4 Ala -> Thr Affect protein function 0.01 3.00 52
5 His -> Arg Tolerated 0.25 3.00 52
6 Pro -> Thr Affect protein function 0.00 3.00 52
7 Asp -> His Affect protein function 0.01 3.00 52
8 Gln -> Pro Affect protein function 0.00 3.00 52
9 Gln -> Glu Affect protein function 0.00 3.00 52
10 Arg -> Cys Tolerated 0.18 2.99 51

1This column describes the number of sequences that are present at this position in the multiple sequence alignment. The overall number of sequences was 55.

PolyPhen

PolyPhen tries to predict the influence of an amino acid substitution on the structure and function of the protein and was developed by Adzhubei et al. in 2010<ref name=polyphen>Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR, "A method and server for predicting damaging missense mutations.", Nat Methods 2010 Apr, PubMed</ref>. It creates a multiple alignment of homologous sequences to estimate the functional importance of a residue. It uses a naive Bayes classifier and was trained on two datasets, i.e. HumDiv and HumVar.

We used the webserver of PolyPhen for our examinations with the default settings and the results are listed in the table below.


Number Substitution HumDiv HumVar
Prediction Score Sensitivity Specificity Prediction Score Sensitivity Specificity
1 Met -> Thr Probably damaging 0.984 0.73 0.96 Probably damaging 0.983 0.52 0.95
2 Ser -> Thr Probably damaging 0.984 0.73 0.96 Possibly damaging 0.874 0.70 0.89
3 Ile -> Ser Probably damaging 0.979 0.74 0.96 Possibly damaging 0.870 0.70 0.89
4 Ala -> Thr Possibly damaging 0.950 0.79 0.95 Possibly damaging 0.620 0.80 0.83
5 His -> Arg Benign 0.000 1.00 0.00 Benign 0.000 1.00 0.00
6 Pro -> Thr Probably damaging 1.000 0.00 1.00 Probably damaging 0.977 0.55 0.95
7 Asp -> His Possibly damaging 0.735 0.86 0.92 Benign 0.177 0.89 0.70
8 Gln -> Pro Probably damaging 1.000 0.00 1.00 Probably damaging 0.993 0.42 0.97
9 Gln -> Glu Probably damaging 0.998 0.27 0.99 Probably damaging 0.908 0.67 0.90
10 Arg -> Cys Possibly damaging 0.496 0.89 0.91 Benign 0.046 0.94 0.59

Discussion

Summary of the Mutation Analysis

First of all, we summarize the results of the mutation analysis in a table. A mutation can be "neutral", "non-neutral" or "-" when it is not unambigous.

Method M42T S65T I117S A143T H186R P205T D244H Q283P Q321E R363C
Physicochemical properties non-neutral neutral non-neutral non-neutral non-neutral neutral non-neutral neutral non-neutral non-neutral
PAM1 non-neutral neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral non-neutral
PAM250 non-neutral neutral non-neutral neutral neutral neutral neutral - neutral non-neutral
BLOSUM62 non-neutral neutral non-neutral - neutral neutral - - neutral non-neutral
PSSM non-neutral neutral non-neutral neutral neutral non-neutral neutral non-neutral - non-neutral
MSA non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral
Secondary structure neutral neutral neutral neutral neutral neutral neutral neutral neutral neutral
SNAP non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral
SIFT non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral neutral
PolyPhen (HumDiv) non-neutral non-neutral non-neutral - neutral non-neutral - non-neutral non-neutral -
PolyPhen (HumVar) non-neutral - - - neutral non-neutral neutral non-neutral non-neutral neutral

Discussion of the Methods in General

Indiviudal Analysis of a Mutation

M42T (Mutation 1)

We assigned the status non-neutral to the mutation in respect to the change of physicochemical properties, because there is a drastic change of the hydrophobicity index from a positive to a negative value and the residue becomes polar. The analysis of the substitution matrices show that the mutation is unlikely to occur and the low value in the PSSM indicate that a threonine appears very rare in related proteins at this position. The latter observation is confirmed with the examination of the conservation in two MSAs, since the conservation of this position is very high (10 and 11). All the programs (SNAP, SIFT and PolyPhen) predict that this mutation is also non-neutral with very clear scores/indices. To recap, every method besides the secondary structure examination indicate that this mutation is a non-neutral one and therefore we conclude the same.

Prediction: a non-neutral mutation.

S65T (Mutation 2)

Since only the residue becomes a weak acidic property and all the other properties stay the same, there is no change of physicochemical properties and we assigned the status neutral in this category. The examination of the substitution matrices reveal that this change of amino acids has the highest possible value in all three matrices which is clearly a strong sign towards a substitution that occurs with a high frequency. Even though the conservation of this position is high in the two MSAs, the mutation achieves the same value as the wildtype in the PSSM. This could be due to the fact that the PSI-BLAST is able to include far related proteins.

In contrast to these observations that suggest a neutral mutation, all three programs predict a non-neutral mutation.

...

I117S (Mutation 3)

Isoleucine is a nonpolar amino acid with a strong positive hydrophobicity index, whereas serine is polar and has a slightly negative hydrophobicity index. Therefore we assume a change in the physicochemical properties by this substitution and assign the status non-neutral. The values in the substitution matrices are not the lowest possible, but still are very low for all three. The same applies to the value in the PSSM and is supported by a high conservation in the MSA. All three programs predict a non-neutral mutation with sufficiently high scores. Since there is only one category that suggest a neutral mutation, we conclude that this mutation is a non-neutral one.

Prediction: a non-neutral mutation.

A143T (Mutation 4)

H186R (Mutation 5)

Besides the hydrophobicity index, there is a drastic or slight change of every property. Therefore, we consider this substitution as non-neutral. Two out of three substitution matrices indicate a neutral mutation. Even though the value is pretty high in PAM1, we assigned the status non-neutral. It could be the case that the evaluation method (i.e. the assignment of status and color) was not appropiate or sophisticated enough.

The conservation in closely related proteins of this residue is the lowest among the ten mutations and it seems like that the substitution of histidine to arginine occurs with a decent frequency, because the value in the PSSM is very close to the best possible value.

All three programs predict a neutral mutation with a very high confidence. To recap, there are only two methods that suggest a non-neutral mutation. Since we do not rule out, that the assignment of the non-neutral status according to the substitution matrix PAM1 could be wrong, and the change of physicochemical properties were very small for the most part, we also conclude a neutral mutation.

Prediction: a neutral mutation.

P205T (Mutation 6)

This substitution introduces only a polar character and the other properties stay pretty much the same. Interestingly, proline is part of an α-helix according to the UniProt entry, whereas proline is to be considered as a helix breaker in the literature<ref name=proline_helix>Gunasekaran et al., "Stereochemical punctuation marks in protein structures: glycine and proline containing helix stop signals.". J Mol Biol. 1998 Feb 6. PubMed</ref>. This could be an explanation why the predicition of the secondary structure for the wildtype is wrong.

Just like in the case of the H186R substitution, PAM250 and BLOSUM62 suggest a neutral mutation. The value is somewhat in the twilight zone for PAM1, but once again it was classified as a non-neutral mutation according to our scheme, which could be not appropiate. The conservation of this residue is very close to the highest possible values in related proteins (MSAs) and substitution to threonine in far related proteins is very unlikely (PSSM). This mutation is one of the four which is predicted to be non-neutral by all of the three programs.

So, we have the case that the MSAs, PSSM and the prediction progams point towards a neutral mutation and the substitution matrices and physicochemical properties indicate the opposite. The latter group evaluate the substitution in general, i.e. only based on the two amino acids. In contrast, the first group is able to take the specific functional importance into account, i.e. the conservation in related proteins. Because of this and the fact that the programs show very positive scores, we weigh the results of the first group higher and assume that this mutation is non-neutral.

Prediction: a non-neutral mutation.

D244H (Mutation 7)

Q283P (Mutation 8)

Q321E (Mutation 9)

R363C (Mutation 10)

References

<references />