Difference between revisions of "Sequence-based mutation analysis GLA"
m (→Physicochemical Properties and Changes) |
m (→Physicochemical Properties and Changes) |
||
Line 41: | Line 41: | ||
=Mutation Analysis= |
=Mutation Analysis= |
||
==Physicochemical Properties and Changes== |
==Physicochemical Properties and Changes== |
||
− | The physicochemical properties of an amino acid have an influence on the structure and functionality of the protein. Therefore we examine the substitution and their changes of these properties to evaluate wether this mutation is tolerated or not. We extracted the information of polarity, charge and hydrophobicity index out of the [http://en.wikipedia.org/wiki/ |
+ | The physicochemical properties of an amino acid have an influence on the structure and functionality of the protein. Therefore we examine the substitution and their changes of these properties to evaluate wether this mutation is tolerated or not. We extracted the information of polarity, charge and hydrophobicity index out of the [http://en.wikipedia.org/wiki/Proteinogenic_amino_acid wikipedia site for amino acids]. The charge is given at an enviroment with a pH of 7.4. The hydrophobicity index is according to Kyte and Doolittle<ref name=hydrophobicity>Kyte J, Doolittle RF (May 1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology 157 (1): 105–32. [http://www.ncbi.nlm.nih.gov/pubmed/7108955 PubMed]</ref>. |
Revision as of 06:13, 4 July 2011
by Benjamin Drexler and Fabian Grandke
Contents
Introduction
Selected Mutations
We randomly selected ten annotated point mutations of the human gene GLA and they were chosen out of a pool of mutations that consist of two subsets. The first subset contains mutations that are present in HGMD and these mutations were already gathered in the task 4 Mapping SNPs. The second subset are mutations that are present in dbSNP, but not included in HGMD. This was only the case for three mutations.
Mutations at the amino acid position between 1 and 31 were not included in the selection process, because they are part of the signal peptide (see UniProt entry) and they are not present in the reference structure (PDB ID 1R47).
Number | AA-Position | Codon change | Amino acid change | Visualization |
---|---|---|---|---|
1 | 42 | ATG-ACG | Met -> Thr | |
2 | 65 | AGT-ACG | Ser -> Thr | |
3 | 117 | ATT-AGT | Ile -> Ser | |
4 | 143 | cGCA-ACA | Ala -> Thr | |
5 | 186 | CAC-CGC | His -> Arg | |
6 | 205 | gCCT-ACT | Pro -> Thr | |
7 | 244 | gGAC-CAC | Asp -> His | |
8 | 283 | CAG-CCG | Gln -> Pro | |
9 | 321 | tCAG-TAG | Gln -> Glu | |
10 | 363 | TATa-TAA | Arg -> Cys |
The visualization was done by using PyMol and the mutagensis of the residue was performed according to this tutorial. The residue of the wildtype is colored green and the mutated residue is colored red.
Mutation Analysis
Physicochemical Properties and Changes
The physicochemical properties of an amino acid have an influence on the structure and functionality of the protein. Therefore we examine the substitution and their changes of these properties to evaluate wether this mutation is tolerated or not. We extracted the information of polarity, charge and hydrophobicity index out of the wikipedia site for amino acids. The charge is given at an enviroment with a pH of 7.4. The hydrophobicity index is according to Kyte and Doolittle<ref name=hydrophobicity>Kyte J, Doolittle RF (May 1982). "A simple method for displaying the hydropathic character of a protein". Journal of Molecular Biology 157 (1): 105–32. PubMed</ref>.
Mutation 1
- Position 42: Methionine -> Threonine
Property | Methionine | Threonine |
---|---|---|
Polarity | nonpolar | polar |
Charge | neutral | neutral |
Hydrophobicity index | 1.9 | -0.7 |
Acidity | - | weak acidic |
Aromatic or aliphatic | - | - |
Mutation 2
- Position 65: Serine -> Threonine
Property | Serine | Threonine |
---|---|---|
Polarity | polar | polar |
Charge | neutral | neutral |
Hydrophobicity index | -0.8 | -0.7 |
Acidity | - | weak acidic |
Aromatic or aliphatic | - | - |
Mutation 3
- Position 117: Isoleucine -> Serine
Property | Isoleucine | Serine |
---|---|---|
Polarity | nonpolar | polar |
Charge | neutral | neutral |
Hydrophobicity index | 4.5 | -0.8 |
Acidity | - | - |
Aromatic or aliphatic | - | - |
Mutation 4
- Position 143: Alanine -> Threonine
Property | Alanine | Threonine |
---|---|---|
Polarity | nonpolar | polar |
Charge | neutral | neutral |
Hydrophobicity index | 1.8 | -0.7 |
Acidity | - | weak acidic |
Aromatic or aliphatic | - | - |
Mutation 5
- Position 186: Histidine -> Arginine
Property | Histidine | Arginine |
---|---|---|
Polarity | polar | nonpolar |
Charge | positive (10%), neutral (90%) | positive |
Hydrophobicity index | -3.2 | -4.5 |
Acidity | weak basic | strongly basic |
Aromatic or aliphatic | aromatic | - |
Mutation 6
- Position 205: Proline -> Threonine
Property | Proline | Threonine |
---|---|---|
Polarity | nonpolar | polar |
Charge | neutral | neutral |
Hydrophobicity index | -1.6 | -0.7 |
Acidity | - | weak acidic |
Aromatic or aliphatic | - | - |
Mutation 7
- Position 244: Aspartic acid -> Histidine
Property | Aspartic acid | Histidine |
---|---|---|
Polarity | polar | polar |
Charge | negative | positive (10%), neutral (90%) |
Hydrophobicity index | -3.5 | -3.2 |
Acidity | acidic | weak basic |
Aromatic or aliphatic | - | aromatic |
Mutation 8
- Position 283: Glutamine -> Proline
Property | Glutamine | Proline |
---|---|---|
Polarity | polar | nonpolar |
Charge | neutral | neutral |
Hydrophobicity index | -3.5 | -1.6 |
Acidity | - | - |
Aromatic or aliphatic | - | - |
Mutation 9
- Position 321: Glutamine -> Glutamic acid
Property | Glutamine | Glutamic acid |
---|---|---|
Polarity | polar | polar |
Charge | neutral | negative |
Hydrophobicity index | -3.5 | -3.5 |
Acidity | - | acidic |
Aromatic or aliphatic | - | - |
Mutation 10
- Position 363: Arginine -> Cysteine
Property | Arginine | Cysteine |
---|---|---|
Polarity | nonpolar | nonpolar |
Charge | positive | neutral |
Hydrophobicity index | -4.5 | 2.5 |
Acidity | strongly basic | acidic |
Aromatic or aliphatic | - | - |
Substitution Matrices
In this section, we take a look at substitution matrices to evaluate whether the introduced substitution of the mutation is favorable in a biological context. For this, we use two different kinds of substitution matrices. First, Blocks of Amino Acid Substitution Matrix (BLOSUM) is a evidence based matrix which is calculated of alignments between proteins <ref name=blosum>en.wikipedia.org/wiki/BLOSUM</ref>. Second, Point Accepted Mutation or Percent Accepeted Mutation (PAM) is a set of matrices that is derived of from the amino acid substitutions between closely related proteins <ref name=pam>en.wikipedia.org/wiki/Point_accepted_mutation</ref>. In general, a high value in a substitution matrix indicates a more likely substitution.
Number | Substitution | BLOSUM62 | PAM1 | PAM250 | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Mutation | Best1 | Worst2 | Mutation | Best | Worst | Mutation | Best | Worst | ||
1 | Met -> Thr | -1 | 2 | -3 | 2 | 8 | 0 | 1 | 3 | 0 |
2 | Ser -> Thr | 1 | 1 | -3 | 38 | 38 | 0 | 9 | 9 | 3 |
3 | Ile -> Ser | -2 | 2 | -4 | 1 | 33 | 0 | 3 | 9 | 1 |
4 | Ala -> Thr | -1 | 1 | -3 | 32 | 35 | 0 | 11 | 12 | 2 |
5 | His -> Arg | 0 | 2 | -3 | 8 | 20 | 0 | 5 | 7 | 2 |
6 | Pro -> Thr | 1 | 1 | -4 | 4 | 13 | 0 | 5 | 7 | 1 |
7 | Asp -> His | -1 | 2 | -4 | 4 | 53 | 0 | 6 | 10 | 1 |
8 | Gln -> Pro | -1 | 2 | -3 | 6 | 27 | 0 | 4 | 7 | 1 |
9 | Gln -> Glu | 2 | 2 | -3 | 27 | 27 | 0 | 7 | 7 | 1 |
10 | Arg -> Cys | -3 | 2 | -3 | 1 | 19 | 0 | 2 | 9 | 1 |
1 Best is the highest value in the regarding column/row except for the self-substitution (e.g. Met -> Met).
2 Worst is the lowest value in the regarding column/row.
The following coloring scheme was applied:
- green: the substitution value of the mutation is closer to the best value than to the worst value
- red: the substitution value of the mutation is closer to the worst value than to the best value
- yellow: the substitution value of the mutation has the same absolute difference to both values
The following substitution matrices were used:
PSSM
In this section, we use a point specific scoring matrix (PSSM) to evaluate how the conservation of the wildtype and mutant residue is in related proteins. For this, we used PSI-BLAST with the following command:
blastpgp -i GLA.fasta -j 5 -d /data/blast/nr/nr -e 10E-6 -Q psiblast.mat -o psiblast.out
The relevant values are listed in the following table. The full PSSM of the certain positions is provided on this page.
Number | Substitution | PSSM | |||
---|---|---|---|---|---|
Mutation | Wildtype | Best1 | Worst2 | ||
1 | Met -> Thr | -3 | 9 | 9 | -6 |
2 | Ser -> Thr | 3 | 3 | 5 | -3 |
3 | Ile -> Ser | -3 | 4 | 5 | -6 |
4 | Ala -> Thr | -1 | 1 | 2 | -5 |
5 | His -> Arg | 1 | 0 | 2 | -5 |
6 | Pro -> Thr | -2 | 2 | 3 | -5 |
7 | Asp -> His | 1 | 4 | 4 | -5 |
8 | Gln -> Pro | -5 | 3 | 8 | -5 |
9 | Gln -> Glu | 1 | 7 | 7 | -5 |
10 | Arg -> Cys | -1 | 2 | 3 | -4 |
1 Best is the highest value in the regarding row.
2 Worst is the lowest value in the regarding row.
The following coloring scheme was applied:
- green: the substitution value of the mutation is closer to the best value than to the worst value
- red: the substitution value of the mutation is closer to the worst value than to the best value
- yellow: the substitution value of the mutation has the same absolute difference to both values
Multiple Sequence Alignment
We take a look at a multiple sequence alignment (MSA) to evaluate the conservation of the residues which are affected by one of the mutations. We used BLAST to get the sequences for the MSA. A table of the sequences is provided on this page. Afterwards we created two MSAs with the locally installed T-Coffee.
- MSA with 100 sequences: see here
- MSA with 25 sequences: see figure 1
We used the conservation index according to Livingstone C.D. and Barton G.J.<ref name=livingstone>Livingstone C.D. and Barton G.J. (1993), "Protein Sequence Alignments: A Strategy for the Hierarchical Analysis of Residue Conservation.", CABIOS Vol. 9 No. 6 (745-756)), PubMed</ref> to determine whether a residue is conserved or not. The conservation index was calculated by JalView and ranges from 0 to 11.
Number | Position | Conservation (100 sequences) | Conservation (25 sequences) |
---|---|---|---|
1 | 42 | 10 | 11 |
2 | 65 | 8 | 11 |
3 | 117 | 9 | 9 |
4 | 143 | 10 | 8 |
5 | 186 | 4 | 2 |
6 | 205 | 10 | 11 |
7 | 244 | 10 | 11 |
8 | 283 | 11 | 11 |
9 | 321 | 11 | 11 |
10 | 363 | 3 | 5 |
The following coloring scheme was applied:
- green: conservation is between 8 and 11
- yellow: conservation is between 5 and 7
- red: conservation is between 0 and 4
Secondary Structure
We examined the potential influence of the mutation on the secondary structure of the α-galactosidase. For this, we used two programs that predict the secondary structure, i.e. PSIPRED and JPred3. Please see task 2 - sequence-based predictions for further explanations of the programs.
We performed one run with the wildtype sequence of α-galactosidase and afterwards ten runs with an isolated mutation, so that the concurrence of two or more mutations at the same time do not influence the prediction. Even though this would be unlikely, since the distance between each pair of mutations should be large enough. The results are listed in the table below. As it can be seen, none of the mutations seem to influence the secondary structure atleast in the prediction.
Number | Substitution | UniProt | PSIPRED | PSIPRED | ||
---|---|---|---|---|---|---|
Wildtype | Mutation | Wildtype | Mutation | |||
1 | Met -> Thr | Beta strand | Coil | Coil | Coil | Coil |
2 | Ser -> Thr | Coil | Coil | Coil | Helix | Helix |
3 | Ile -> Ser | Helix | Coil | Coil | Helix | Helix |
4 | Ala -> Thr | Coil | Coil | Coil | Coil | Coil |
5 | His -> Arg | Helix | Helix | Helix | Helix | Helix |
6 | Pro -> Thr | Helix | Coil | Coil | Coil | Coil |
7 | Asp -> His | Helix | Helix | Helix | Helix | Helix |
8 | Gln -> Pro | Helix | Helix | Helix | Helix | Helix |
9 | Gln -> Glu | Coil | Coil | Coil | Coil | Coil |
10 | Arg -> Cys | Beta strand | Beta strand | Beta strand | Beta strand | Beta strand |
Programs
SNAP
SNAP (screening for non-acceptable polymorphisms) is a program that tries to predict wether a single point mutation is non-neutral or neutral and was established by Bromberg and Rost in 2007 <ref name=snap>Yana Bromberg and Burkhard Rost, "SNAP: predict effect of non-synonymous polymorphisms on function", Nucleic Acids Research, 2007, Vol. 35, No. 11 3823-3835, PubMed</ref>. It is based on feed-forward neural networks and was trained on datasets of protein mutant database and a specific subset of SWISS-PROT. SNAP gets the sequence and the proposed mutations as input and it assigns the prediction "neutral" or "non-neutral" to the mutation. It also gives a reliability index as output which is a normalization of the differences between the two predictions and ranges from 0 to 9.
We used the local installed version with the following command:
snapfun -i gla.fasta -m mutations.txt -o gla_snap.out
Number | Substitution | Prediction | Reliability index | Expected accuracy |
---|---|---|---|---|
1 | Met -> Thr | Non-neutral | 6 | 93% |
2 | Ser -> Thr | Non-neutral | 3 | 78% |
3 | Ile -> Ser | Non-neutral | 4 | 82% |
4 | Ala -> Thr | Non-neutral | 3 | 78% |
5 | His -> Arg | Tolerated | 4 | 85% |
6 | Pro -> Thr | Non-neutral | 6 | 93% |
7 | Asp -> His | Non-neutral | 6 | 93% |
8 | Gln -> Pro | Non-neutral | 5 | 87% |
9 | Gln -> Glu | Non-neutral | 6 | 93% |
10 | Arg -> Cys | Non-neutral | 1 | 63% |
SIFT
SIFT tries to sort intolerant from tolerant amino acid substitutions and was developed by Henikoff et al. in 2001<ref name=polyphen>Ng PC, Henikoff S., "Predicting deleterious amino acid substitutions.", Genome Res. 2001 May, PubMed</ref>. It is based on the assumption that protein function is correlated with protein evolution. Hence it builds a multiple sequence alignment of closely related proteins and tries to identify functionally important residues.
We used the webserver of SIFT for our examinations with the default settings and the results are listed in the table below.
Number | Substitution | Prediction | Score | Median sequence conservation | # sequences1 |
---|---|---|---|---|---|
1 | Met -> Thr | Affect protein function | 0.00 | 2.99 | 46 |
2 | Ser -> Thr | Affect protein function | 0.01 | 3.00 | 48 |
3 | Ile -> Ser | Affect protein function | 0.03 | 3.00 | 52 |
4 | Ala -> Thr | Affect protein function | 0.01 | 3.00 | 52 |
5 | His -> Arg | Tolerated | 0.25 | 3.00 | 52 |
6 | Pro -> Thr | Affect protein function | 0.00 | 3.00 | 52 |
7 | Asp -> His | Affect protein function | 0.01 | 3.00 | 52 |
8 | Gln -> Pro | Affect protein function | 0.00 | 3.00 | 52 |
9 | Gln -> Glu | Affect protein function | 0.00 | 3.00 | 52 |
10 | Arg -> Cys | Tolerated | 0.18 | 2.99 | 51 |
1This column describes the number of sequences that are present at this position in the multiple sequence alignment. The overall number of sequences was 55.
PolyPhen
PolyPhen tries to predict the influence of an amino acid substitution on the structure and function of the protein and was developed by Adzhubei et al. in 2010<ref name=polyphen>Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR, "A method and server for predicting damaging missense mutations.", Nat Methods 2010 Apr, PubMed</ref>. It creates a multiple alignment of homologous sequences to estimate the functional importance of a residue. It uses a naive Bayes classifier and was trained on two datasets, i.e. HumDiv and HumVar.
We used the webserver of PolyPhen for our examinations with the default settings and the results are listed in the table below.
Number | Substitution | HumDiv | HumVar | ||||||
---|---|---|---|---|---|---|---|---|---|
Prediction | Score | Sensitivity | Specificity | Prediction | Score | Sensitivity | Specificity | ||
1 | Met -> Thr | Probably damaging | 0.984 | 0.73 | 0.96 | Probably damaging | 0.983 | 0.52 | 0.95 |
2 | Ser -> Thr | Probably damaging | 0.984 | 0.73 | 0.96 | Possibly damaging | 0.874 | 0.70 | 0.89 |
3 | Ile -> Ser | Probably damaging | 0.979 | 0.74 | 0.96 | Possibly damaging | 0.870 | 0.70 | 0.89 |
4 | Ala -> Thr | Possibly damaging | 0.950 | 0.79 | 0.95 | Possibly damaging | 0.620 | 0.80 | 0.83 |
5 | His -> Arg | Benign | 0.000 | 1.00 | 0.00 | Benign | 0.000 | 1.00 | 0.00 |
6 | Pro -> Thr | Probably damaging | 1.000 | 0.00 | 1.00 | Probably damaging | 0.977 | 0.55 | 0.95 |
7 | Asp -> His | Possibly damaging | 0.735 | 0.86 | 0.92 | Benign | 0.177 | 0.89 | 0.70 |
8 | Gln -> Pro | Probably damaging | 1.000 | 0.00 | 1.00 | Probably damaging | 0.993 | 0.42 | 0.97 |
9 | Gln -> Glu | Probably damaging | 0.998 | 0.27 | 0.99 | Probably damaging | 0.908 | 0.67 | 0.90 |
10 | Arg -> Cys | Possibly damaging | 0.496 | 0.89 | 0.91 | Benign | 0.046 | 0.94 | 0.59 |
Discussion
Summary of the Mutation Analysis
First of all, we summarize the results of the mutation analysis in a table. A mutation can be "neutral", "non-neutral" or "-" when it is not unambigous.
Method | M42T | S65T | I117S | A143T | H186R | P205T | D244H | Q283P | Q321E | R363C |
---|---|---|---|---|---|---|---|---|---|---|
Physicochemical properties | non-neutral | neutral | non-neutral | non-neutral | non-neutral | neutral | non-neutral | neutral | non-neutral | non-neutral |
PAM1 | non-neutral | neutral | non-neutral | neutral | non-neutral | non-neutral | non-neutral | non-neutral | neutral | non-neutral |
PAM250 | non-neutral | neutral | non-neutral | neutral | neutral | neutral | neutral | - | neutral | non-neutral |
BLOSUM62 | non-neutral | neutral | non-neutral | - | neutral | neutral | - | - | neutral | non-neutral |
PSSM | non-neutral | neutral | non-neutral | neutral | neutral | non-neutral | neutral | non-neutral | - | non-neutral |
MSA | non-neutral | non-neutral | non-neutral | non-neutral | neutral | non-neutral | non-neutral | non-neutral | non-neutral | neutral |
Secondary structure | neutral | neutral | neutral | neutral | neutral | neutral | neutral | neutral | neutral | neutral |
SNAP | non-neutral | non-neutral | non-neutral | non-neutral | neutral | non-neutral | non-neutral | non-neutral | non-neutral | non-neutral |
SIFT | non-neutral | non-neutral | non-neutral | non-neutral | neutral | non-neutral | non-neutral | non-neutral | non-neutral | neutral |
PolyPhen (HumDiv) | non-neutral | non-neutral | non-neutral | - | neutral | non-neutral | - | non-neutral | non-neutral | - |
PolyPhen (HumVar) | non-neutral | - | - | - | neutral | non-neutral | neutral | non-neutral | non-neutral | neutral |
Indiviudal Analysis of a Mutation
M42T (Mutation 1)
We assigned the status non-neutral to the mutation in respect to the change of physicochemical properties, because there is a drastic change of the hydrophobicity index from a positive to a negative value and the residue becomes polar. The analysis of the substitution matrices show that the mutation is unlikely to occur and the low value in the PSSM indicate that a threonine appears very rare in related proteins at this position. The latter observation is confirmed with the examination of the conservation in two MSAs, since the conservation of this position is very high (10 and 11). All the programs (SNAP, SIFT and PolyPhen) predict that this mutation is also non-neutral with very clear scores/indices. To recap, every method besides the secondary structure examination indicate that this mutation is a non-neutral one and therefore we conclude the same.
Prediction: a non-neutral mutation.
S65T (Mutation 2)
Since only the hydrophobicity index differs by 0.1 and the polarity and charge are identical, there is no change of physicochemical properties and we assigned the status neutral in this category. The examination of the substitution matrices reveal that this change of amino acids has the highest possible value in all three matrices which is clearly a strong sign towards a substitution that occurs with a high frequency. Even though the conservation of this position is high in the two MSAs, the mutation achieves the same value as the wildtype in the PSSM. This could be due to the fact that the PSI-BLAST is able to include far related proteins.
In contrast to these observations that suggest a neutral mutation, all three programs predict a non-neutral mutation.
...
I117S (Mutation 3)
Isoleucine is a nonpolar amino acid with a strong positive hydrophobicity index, whereas serine is polar and has a slightly negative hydrophobicity index. Therefore we assume a change in the physicochemical properties by this substitution and assign the status non-neutral. The values in the substitution matrices are not the lowest possible, but still are very low for all three. The same applies to the value in the PSSM and is supported by a high conservation in the MSA. All three programs predict a non-neutral mutation with sufficiently high scores. Since there is only one category that suggest a neutral mutation, we conclude that this mutation is a non-neutral one.
Prediction: a non-neutral mutation.
A143T (Mutation 4)
H186R (Mutation 5)
P205T (Mutation 6)
D244H (Mutation 7)
Q283P (Mutation 8)
Q321E (Mutation 9)
R363C (Mutation 10)
References
<references />