Sequence based mutation analysis of GBA

From Bioinformatikpedia
Revision as of 21:24, 18 September 2011 by Braunt (talk | contribs) (References)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Introduction

Figure 1: Selected mutations highlighted on structure of 10GS. The different colors indicate, in which database the mutation is listed: dbSNP (blue), HGMD (red), both (green).

The ten SNPs shown in the table below and highlighted in Figure 1 were chosen for the analysis in this task. It was tried to include SNPs all over the protein, in order to investigate the influence of mutations in several parts of the protein. The residues forming hydrogenbonds with the active site of glucocerebrosidase are included, as mutations at these positions should result in either a mal- or nonfunctioning protein. Furthermore, the most common mutation causing Gaucher Disease (N370S) <ref>http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1288120/</ref> among Ashkenazi Jews was included. It was not easy to find missense mutations only listed in dbSNP. The mutations only listed in dbSNP are neutral.

Nr. SNP ID/Accession Number Database Position
including SP
Position
without SP
Amino Acid Change Codon Change Remarks
1 CM081634 HGMD 49 10 Gly - Ser cGGC-AGC
2 rs74953658, CM050263 dbSNP, HGMD 63 24 Asp - Asn tGAC-AAC
3 rs1141820 dbSNP 99 60 His - Arg CAC - CGC suspected, status not validated
4 CM880035 HGMD 159 120 Arg - Gln CGG-CAG synonymos mutation at this position listed in dbSNP;
forming hydrogen bond with active site
5 rs80205046, CM041347 dbSNP, HGMD 221 182 Pro - Leu CCC - CTC
6 rs74731340, CM970620 dbSNP, HGMD 310 271 Ser - Asn AGT - AAT
7 CM880036 HGMD 409 370 Asn - Ser AAC-AGC most common mutation found in gaucher disease type 1 patients
8 CM993703 HGMD 350 311 His - Arg CAT-CGT severe form of gaucher disease 2;
forming hydrogen bond with active site
9 rs80020805, CM052245 dbSNP, HGMD 455 416 Met - Val cATG-GTG
10 rs113825752 dbSNP 509 470 Leu - Pro CTT - CCT



Mutations listed in HGMD (Mutations 1, 4, 7 and 8 ) or both HGMD and dbSNP (Mutations 2, 5, 6 and 9) are known to cause Gaucher Disease, whereas mutations only listed in dbSNP (Mutations 3 and 10) are neutral. In this section it is assumed that these facts are not known and it is tried to predict whether the mutations are damaging or harmless.

Mutation Analysis

In the following section, the different mutations/SNPs will be analyzed to determine, whether they are neutral or whether they will affect the function of glucocerebrosidase. To do so, several facts were taken into account: The physicochemical properties and changes of the amino acids, the substitution values of the specific amino acids in various substitution matrices, the conservation of the specific positions in a multiple sequence alignment of homologous structures and the predictions of different prediction tools (SIFT, SNAP and Polyphen 2).

Physicochemical Properties and Changes

Figure 2: Comparison of the wildtype (red) and mutated (blue) amino acids mapped to the structure of 1OGS.

The physicochemical properties according to http://en.wikipedia.org/wiki/Amino_acid and changes of the wildtype and mutated aminoacids are listed below, and the superpositions, allowing one to see the structural differences, are shown in Figure 2 to the right. Substitutions of amino acids that are structurally and chemically different are more likely to affect the function of a protein, than substitutions of very similar amino acids.

Mutation 1

nonpolar, neutral --> polar, neutral

The wildtype amino acid, Glycine is nonpolar, whereas the mutated amino acid Serin is polar. This different polarity, could be an indication, that the mutation is damaging. Looking at the structure, one can see, that this residue (located at pos. 10 in the mature protein) is situated in a beta strand at the exterior of the protein. As the residue is located at the exterior of the protein, the effect of the mutation might not be that severe, although the residue is part of a secondary structure element.

Mutation 2

polar, negative --> polar, neutral

In this mutation the Aspartic acid, an acidic amino acid, is replaced by its derivative Asparagine. Therefore the residue looses its acidic character, which could have an effect on the folding of the protein. If one looks at the structure one can see, that the residue is not located in the interior of the protein and is not located in a secondary structure element, so the mutation should not affect the proteins function.

Mutation 3

polar, positive(10%), neutral(90%) --> polar, positive

Histidine is mutated to Arginine. Both amino acids have a positively charged functional group that is basic. The structures of both residues are quite different: Histidine forms a ring structure and is aromatic, whereas Arginine forms a straight chain. The residue is also situated at the exterior of the protein, so the influence of the mutation might not be that strong. As they have the same charge the propertiers are almost the same and the mutation may be tolerated. Only the different structure of Arginine may influence the function.

Mutation 4

polar, positive --> polar, neutral

In this mutation the positively charged Arginine is replaced by the charged amino acid Glutamine. Glutamine is a zwitterion and can be positively charged as well as negatively charged. The mutation is situated in the interior of the protein, so the function and structure of the protein might be affected.

Mutation 5

nonpolar, neutral --> nonpolar, neutral, hydrophobic sidechain

Proline is the amino acid, which forms a ring structure. Therefore it has a great influence on the folding of the protein. Both, Proline and Leucine are zwitterions. So the chemical properties are the same despite of the hydrophobicity. As it is also in the interior of the protein the mutation could influence the function of the protein.

Mutation 6

polar, neutral --> polar, neutral

Serine is a hydroxylic amino acid which is mutated to the acidic amino acid Asparagine. They are both zwitterions, aliphatic, polar and neutral and therefore they share chemical propertiers. The structure is also very similar. As mutation is positioned at the exterior of the protein and both amino acids are structurally and chemically similar, the mutation should have no influence on the mature protein.

Mutation 7

polar, neutral --> polar, neutral

Asparagine is an acidic amino acid whereas Serine is hydroxylic. This mutation is exactly the inverse of Mutation Nr. 6. But unlike Mutation Nr. 6, this mutation is part of a helix and is located in the interior of the protein. Therefore the mutation might have an effect on the mature protein.

Mutation 8

polar, positive(10%), neutral(90%) --> polar, positive

Histidine and Arginine are both basic amino acids. Histidine builds an imidazole ring, which is protonated. The residue is in the interior of the protein and involved in the proteins function. Therefore the mutation might not be tolerated.

Mutation 9

nonpolar, neutral, hydrophobic sidechain --> nonpolar, neutral, hydrophobic sidechain

Methionine is a sulfur containing amino acid which is mutated to Valine. Both amino acids are non polar and neutral. As the sulfur of Methionine might be crucial for the structure (e.g. forming disulfid bonds) the folding or/and the function of the protein might be affected. If one looks at the structure one can see, that it is part of a helix.

Mutation 10

nonpolar, neutral, hydrophobic sidechain --> nonpolar, neutral

Leucine is mutated to Proline, which is the amino acid that forms a ring structure and is a helix breaker. Therefore the structure of the protein might change. Otherwise, both amino acids have the same properties: neutral and nonpolar. Only the hydrophobicity changes. The structure in Figure 2 shows, that the mutation is part of a beta sheet, which might be disordered by the mutation and therefore affect the protein's function.


PSSM

The position-specific scoring matrix shows the conservation of the residues in a multiple alignment. The higher the values are for a substitution the more conserved it is. So if the values are high the substitution is tolerated.

Usage

  • command line: blastpgp -i gbaseq.fasta -j 5 -d /data/blast/nr/nr -e 10E-6 -Q psiblast.mat -o psiblast.out

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts

          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  49 G    3 -2 -2 -2 -4  1 -1  3 -4 -4  1  1  1  0  0 -1 -3 -4 -4 -3  26   2   2   2   0   6   3  20   0   0  13  10   3   4   4   4   0   0   0   0  0.35 1.25
  63 D    0 -4  0  5 -5  1 -1  0 -4 -1 -1  0 -4 -5 -1 -2 -3 -5 -5  0   8   0   4  39   0   6   3   7   0   4   8   6   0   0   3   2   0   0   0   8  0.58 1.51
  99 H    0 -1  0  1 -5  0  0 -1  1 -1 -3  0  0 -1  3  1  1  2  0 -1   7   4   4   7   0   5   7   6   3   4   3   6   3   3  13  10   8   3   3   4  0.14 1.99
 159 R   -4  8 -3 -4 -5 -2 -3 -5 -3 -5 -3  2 -4 -5 -4 -3 -4  1 -4 -5   0  79   0   0   0   0   0   0   0   0   3  12   0   0   0   0   0   2   0   0  1.73 1.34
 221 P   -3 -5 -4 -5 -6 -3 -4 -4 -5 -3 -5 -4  0 -6  8 -3 -2 -7 -6 -3   1   0   1   0   0   1   0   1   0   1   1   0   2   0  87   1   2   0   0   2  2.52 1.86
 310 S    3  2  2  0 -1  0 -1 -2  1 -4 -4  1 -4 -3 -3  2  0 -5 -1 -3  25  11  11   4   1   4   4   2   3   1   2   7   0   1   1  13   5   0   2   1  0.38 1.86
 409 N    1 -3  2  4  0 -4 -2  0 -1 -2 -2 -4  1  0 -3  1  0  2  3 -3  10   1   8  24   2   0   2   6   1   2   4   0   3   4   1   9   6   2  12   2  0.40 1.84
 350 H   -4 -3 -2 -4 -6  1 -1 -5 10 -6 -5 -3 -4 -4 -5 -4 -2 -5 -1 -6   0   0   0   0   0   5   2   0  87   0   0   0   0   0   0   0   2   0   0   0  2.71 1.66
 455 M    0  2  1  0  1 -1  1 -2  1 -1 -2  1  1 -2  2  1 -1  1 -4 -1   7   9   6   6   4   3  11   3   3   3   5   7   4   2   9   9   3   2   0   4  0.13 1.79
 509 L   -2 -5 -5 -5 -1  0 -5 -5 -5  3  2 -5 -1  3 -6 -4 -1 -3  3  4   3   0   0   1   1   4   0   1   0  17  17   0   1  14   0   1   3   0   9  28  0.71 1.77


The substitutions for our mutations are colored in red. Our sixth and seventh substitution have values greater than zero which may indicate that the mutations are tolerated. All others have smaller values, so the substitutions may not be tolerated.

Substitution Matrices

To analyze whether the chosen amino acid substitutions are common or not, their scores in different amino acid substitution matrices are looked up. The scores reflect how often an animo acid was substituted with another in an alignment of related sequences. A high score indicates, that they have often been substituted and that the substituted amino acid is compatible with protein structure and function.
Two different families of matrices are taken into account in this analysis: The Dayhoff Amino Acid Substitution Matrices (Percent Accepted Mutation or PAM Matrices), which are based on the differences in closely related proteins, and the Blocks Amino Acid Substitution Matrices (BLOSUM Matrices), based on a small number of protein sequences and an evolutionary model of protein change.


The following matrices have been used:

Nr. Substitution BLOSUM62 PAM1 PAM250
1 Gly - Ser 0 (worst: -4) 16 (worst: 0) 9 (worst: 1)
2 Asp - Asn 1 (worst: -4) 36 (worst: 0) 7 (worst: 1)
3 His - Arg 0 (worst: -3) 10 (worst: 0) 6 (worst: 1)
4 Arg - Gln 1 (worst: -3) 9 (worst: 0) 5 (worst: 1)
5 Pro - Leu -3 (worst: -4) 3 (worst: 0) 5 (worst: 1)
6 Ser - Asn 1 (worst: -3) 20 (worst: 0) 5 (worst: 1)
7 Asn - Ser 1 (worst: -4) 34 (worst: 0) 8 (worst: 1)
8 His - Arg 0 (worst: -3) 10 (worst: 0) 6 (worst: 1)
9 Met - Val 1 (worst: -3) 17 (worst: 0) 4 (worst: 1)
10 Leu - Pro -3 (worst: -4) 2 (worst: 0) 3 (worst: 1)

Multiple Alignment

Figure 3: Multiple alignment of glucecerebrosidase and the found mammal sequences

The conservation in a multiple sequence alignment of homologous structures helps to decide whether a mutation at a certain position alters the function or structure of a protein: a highly conserved position indicates, that this position is crucial for either function or structure of the protein and one can assume that a mutation at this position is damaging. A mutation at a very variable position in the alignment is in contrast more likely to be neutral.
To retrieve homologous sequences, protein blast was used with the option mammals. The resulting sequences are listed in the File:Protein blast results for GBA.pdf for reason of clarity . A multiple sequence alignment of these sequences is shown in Figure 3. The table below shows the conservation of the interesting positions, once calculated for all homologous sequences and once for the 25 best sequences. The conservation listed in the table below was calculated with JalView, which uses the AMAS method of multiple sequence alignment analysis. A score of 11 indicates 100% conservation at the corresponding position. <ref>http://www.jalview.org/help/html/calculations/conservation.html</ref>


Mutation Nr. Position Amino acid change Conservation (all) Conservation (best 25)
1 49 Gly - Ser 0.0 11.0
2 63 Asp - Asn 4.0 11.0
3 99 His - Arg 3.0 8.0
4 159 Arg - Gln 0.0 11.0
5 221 Pro - Leu 0.0 11.0
6 310 Ser - Asn 0.0 11.0
7 409 Asn - Ser 0.0 11.0
8 350 His - Arg 0.0 9.0
9 455 Met - Val 0.0 11.0
10 509 Leu - Pro 0.0 9.0

Secondary Structure

To investigate whether the mutations influence the secondary structure of the resulting protein, secondary structure predictions with JPred3 and PSIPRED have been carried out as described in Task 3. The comparison between the original (wildtype) sequence and the mutated sequence is shown in this File:Secondary structure prediction of mutated GBA sequence.pdf. The mutations show no direct influence on the secondary structure elements: the elements do not get interrupted or destroyed by the mutations chosen in this analysis. There are only some minor variances in the lengths of the elements between the predicted structures of the original and mutated sequences. As these differences are not only located next to the mutations, but all over the protein, this may be due to the fact, that these are only predictions which may not be that accurate. The table below shows the secondary structure assignments and predictions for the relevant positions chosen in this analysis.

Mutation Nr. Position Wildtype Secondary Structure Wildtype Mutation Secondary Structure Mutation
Uniprot PSIPRED JPred3 DSSP PSIPRED JPred3
1 49 Gly Beta sheet Coil - Turn Ser Coil -
2 63 Asp - Coil - - Asn Coil -
3 99 His - Coil - - Arg Coil -
4 159 Arg Beta sheet Beta sheet Beta sheet Bend Gln Beta Sheet Beta Sheet
5 221 Pro - Coil - - Leu Coil -
6 310 Ser - Coil - Turn Asn Coil -
7 409 Asn Helix Helix Helix Helix Ser Helix Helix
8 350 His Beta sheet Beta sheet - Bend Arg Beta Sheet -
9 455 Met Helix - Helix Helix Val Helix -
10 509 Leu Beta sheet Beta sheet Beta sheet Bend Pro Beta Sheet Beta Sheet


To determine whether a mutation is damaging or neutral, only the fact if it is located in a secondary strucutre element or in a loop is taken into account. Residues forming secondary structure elements are more crucial to the structure, and therefore function of a protein than the ones present in a loop or disordered region. The annotation as listed in Uniprot will be used in the further analysis.

SNAP

SNAP (screening for non-acceptable polymorphisms) is a method that predicts the functional effects of non-synonymous SNPs based on neural networks. The method only needs sequence information as input, but if available, one may include functional and structural annotations. SNAP was established by Rost B. and Bromberg Y. in 2007 <ref>Bromberg Y., Rost B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 2007 June; 35(11): 3823–3835.</ref>

Usage

  • command line: snapfun -i gbaseq.fasta -m mutations.txt -o snapfun_out.out

Results
The predictions made by SNAP are shown in the table below. SNAP identifies the majority of mutations as non-neutral: only three mutations are predicted to be neutral. Additionally the tool provides a reliability index (RI) reflecting the confidence of each prediction ranging from 0 (low reliability) to 9 (high reliability). Only half of the predictions made have an RI higer than 5. This indicates, that half of the predictions are not very reliable.


Mutation Nr. AA change Prediction Reliability Index Expected Accuracy
1 G49S Neutral 3 78%
2 D63N Non-neutral 5 87%
3 H99R Neutral 5 89%
4 R159Q Non-neutral 7 96%
5 P221L Non-neutral 5 87%
6 S310N Neutral 0 53%
7 N409S Non-neutral 1 63%
8 H350R Non-neutral 8 96%
9 M455V Non-neutral 3 78%
10 L509P Non-neutral 1 63%

SIFT

SIFT (Sorting Intolerant From Tolerant) is a method which predicts whether a amino acid substitution affects protein function or not. The method is based on the assumption that important amino acids are conserved in the protein family and therefore changes at conserved positions tend to be deletirious. Substitutions with a score less than 0.05 are predicted deletirious. SIFT was introduced in 2003 by Ng P. and Henikoff S. <ref>Ng P., Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003 July 1; 31(13): 3812–3814. </ref>

Usage

Results

Figure 4: Multiple Sequence Alignment on which the SIFT predictions are based.

The predictions made by SIFT are based on the multiple alignment which is shown in Figure 4 to the rights. The predictions made for the chosen mutations are shown in the table below. Half of the mutations are predicted to affect protein function and the other half is predicted to be tolerated.


Mutation Nr. Prediction Score Sequence Conservation
1 tolerated 0.51 3.05
2 tolerated 0.06 3.05
3 tolerated 0.62 3.04
4 affect protein function 0.03 3.01
5 affect protein function 0.00 3.01
6 tolerated 0.54 3.01
7 affect protein function 0.05 3.02
8 affect protein function 0.00 3.11
9 tolerated 0.12 3.01
10 affect protein function 0.01 3.09

PolyPhen-2

PolyPhen-2 (Polymorphism Phenotyping v2) predicts the possible structural and functional influence of an amino acid substitution of a human protein. The method uses three strucutre-based and eight sequence-based predictive features. Two datasets were used to train and test PolyPhen-2: HumDiv and HumVar. The method was intriduced by Adzhubei et al. in 2010. <ref>Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods 7(4):248-249 (2010).</ref>

Usage

Results

The results of PolyPhen-2 are listed in the table below. The predictions for mutation 7 differ subject to the training set. With HumDiv it is predicted as being damaging, whereas the prediction based on HumVar indicates that the mutation is harmless. For the other mutations the predictions are the same, only varying sligthly in score, sensitivity and specificity.


Mutation Nr. HumDiv HumVar
Prediction Score Sensitivity Specificity Prediction Score Sensitivity Specificity
1 probably damaging 0.997 0.40 0.98 probably damaging 0.992 0.44 0.97
2 probably damaging 1.000 0.00 1.00 probably damaging 0.999 0.08 1.00
3 benign 0.000 1.00 0.00 benign 0.000 1.00 0.00
4 probably damaging 1.000 0.00 1.00 probably damaging 1.000 0.00 1.00
5 probably damaging 1.000 0.00 1.00 probably damaging 1.000 0.00 1.00
6 benign 0.100 0.94 0.85 benign 0.120 0.91 0.67
7 possibly damaging 0.573 0.88 0.91 benign 0.131 0.90 0.68
8 probably damaging 1.000 0.00 1.00 probably damaging 0.999 0.08 1.00
9 probably damaging 0.999 0.14 0.99 probably damaging 0.980 0.54 0.95
10 probably damaging 0.978 0.75 0.96 probably damaging 0.966 0.59 0.94

Discussion

Overview

The following table indicates whether the mutation is rather neutral or non-neutral according to the different analysis steps. The different results are classified as follows:

  • Amino-Acid Properties: neutral, if the protperties are the same, non-neutral otherwise.
  • Substitution Matrix Scores: neutral, if the scores are high, non-neutral otherwise.
  • PSSM: neutral, if the score is greater zero, non-neutral otherwise.
  • Conservation: neutral, if mutated amino acid appears in alignment, non-neutral otherwise.
  • Secondary Structure: neutral, if the mutation is situated in a region without an assigned secondary structure element, non-neutral otherwise.
  • SNAP, SIFT, PolyPhen: according to the prediction.

[Detailed descriptions of the applied analysis steps are given in the section above.]

Mutation Amino-Acid Properties Substitution Matrices PSSM Conservation Secondary Structure SNAP SIFT PolyPhen-2
BLOSUM62 PAM1 PAM250 HumDiv HumVar
1 non-neutral neutral neutral neutral non-neutral non-neutral non-neutral neutral neutral non-neutral non-neutral
2 non-neutral neutral neutral neutral non-neutral non-neutral neutral non-neutral neutral non-neutral non-neutral
3 neutral neutral neutral neutral non-neutral neutral neutral neutral neutral neutral neutral
4 non-neutral neutral neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
5 non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral non-neutral non-neutral
6 neutral neutral neutral neutral neutral non-neutral neutral neutral neutral neutral neutral
7 neutral neutral neutral neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral neutral
8 non-neutral neutral neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral
9 non-neutral neutral neutral non-neutral non-neutral non-neutral non-neutral non-neutral neutral non-neutral non-neutral
10 non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral non-neutral


As one can see, the indications of the different analysis steps differ most of the time. Only for Mutation 10, each step indicates that this mutation is non-neutral. The different steps are of various significance, and therefore a detailed analyzation is given for each mutation in the following section.


Mutation 1: Gly - Ser (Pos. 49/10)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
0 (worst: -4) 16 (worst: 0) 9 (worst: 1) -1 0.0/11.0 Beta sheet Coil - Turn
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Neutral
Reliability Index: 3
Expected Accuracy: 78%
Prediction: tolerated
Score: 0.51
Sequence Conservation: 3.05
Prediction: probably damaging
Score: 0.997
Sensitivity: 0.40
Specificity: 0.98
Prediction: probably damaging
Score: 0.992
Sensitivity: 0.44
Specificity
: 0.97

Mutation number 1 describes the change from the aliphatic, unpolar and neutral amino acid Glycine to the aliphatic, polar and neutral amino acid Serine. The PAM1 and PAM250 substitution matrices indicate that this substitution is very common as its values are very high. The value of BLOSUM62 is 0 which does also stand for a common substitution. The PAM matrices have high values with a score of 16 and 9. The PSSM score is low with -1, which means, that the mutation is not common.
The conservation of this position in the multiple sequence alignment is on the other hand very high: If the 25 best homologous sequences are aligned, there is a total conservation and no other protein shows an amino acid exchange. The reason why the conservation in the multiple sequence alignment of all proteins found with blast is 0, is that the alignment has gaps at that position.
In Uniprot this position is assigned as part of a beta sheet, whereas the different tools predicted it as part of a coil or turn. As it is at the exterior of the protein as you can see at the visualization of figure 2, it should not affect the function of the protein that much.
SNAP and SIFT classify the mutation as neutral with an accuracy of 78% and tolerated with a score of 0.51 whereas Polyphen2 predicts it as probably damaging with both datasets. Based on the results listed above one tends to classify the mutation as neutral: It is located at the exterior of the protein, the substitution matrices indicate, that the substitution from Glycine to Serine is common and two of the prediction tools predict the mutations as harmless. But this classification is not explicit: The high conservation in the multiple sequence alignment and the predictions of PolyPhen-2 indicate otherwise.

According to HGMD, this mutation is associated with Gaucher Disease and is therefore damaging and affecting the function of the protein. This shows that the prediction in this analysis was not correct.

Mutation 2: Asp - Asn (Pos. 63/24)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
1 (worst: -4) 36 (worst: 0) 7 (worst: 1) 0 4.0/11.0 - Coil - -
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Non-neutral
Reliability Index: 5
Expected Accuracy: 87%
Prediction: tolerated
Score: 0.06
Sequence Conservation: 3.05
Prediction: probably damaging
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
Prediction: probably damaging
Score: 0.999
Sensitivity: 0.08
Specificity: 1.00

Mutation number 2 describes the exchange from the aliphatic, polar and acidic Aspartic acid to the aliphatic, polar and neutral amino acid Aparagine. Therefore the position loses its acidic character and therefore may change the structure of the protein.
The PAM1 and PAM250 show that the substitution has relatively high values, so it is very common. BLOSUM62 has a value of 1 which means that the substitution is common. The PSSM value with 0 is too low to be neutral but it is not negative. If you look at the alignment the conservation is also very high. There is no protein found with a substitution to Asparagine. So the mutation may be non-neutral.
If you look at the structure you can see that it is part of a coil at the exterior of the protein. This is also the result of our secondary structure predictions. So the mutation may not influence the function of the protein that much, because it is not in the functional part of the protein.
SNAP predicts the mutation as non-neutral with an accuracy of 87%. Polyphen2 also assigns the mutation as probably damaging with a score of even 1.0. Only SIFT predicts it as tolerated but only with a score of 0.06. So the mutation may be damaging.
According to these results it is hard to classify the mutation. But all in all we would tend to classify it as non-neutral because the change from an acidic to a neutral amino acid should change something. Also the high conservation indicates the importance of that position. And the majority of the prediction tools classify it as damaging.
As the mutation is listed in HGMD it affects the function of the protein. It is associated with Gaucher disease 1 and was published in 2005. <ref>Identification and functional characterization of five novel mutant alleles in 58 Italian patients with Gaucher disease type 1. Miocić S, Filocamo M, Dominissini S, Montalvo AL, Vlahovicek K, Deganuto M, Mazzotti R, Cariati R, Bembi B, Pittis MG. Hum Mutat. 2005</ref>

Mutation 3: His - Arg (Pos. 99/60)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
0 (worst: -3) 10 (worst: 0) 6 (worst: 1) -1 3.0/8.0 - Coil - -
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Neutral
Reliability Index: 5
Expected Accuracy: 89%
Prediction: tolerated
Score: 0.62
Sequence Conservation: 3.04
Prediction: benign
Score: 0.000
Sensitivity: 1.00
Specificity: 1.00
Prediction: benign
Score: 0.000
Sensitivity: 1.00
Specificity: 1.00

Mutation number 3 is the change from the aromatic, polar and slightly basic Histidine to the aliphatic, polar and very basic Arginine. As they are both basic and have the same charge the exchange may not influence the function of the protein that much.
The values in the PAM substitution matrices are again relatively high. So the change from Histidine to Arginine is common. Only in BLOSUM62 the value is lower, but still common. The interesting part here is the alignment: Histidine is not highly conserved. If you look at that position there are also many sequences with an Arginine. So there are proteins of other mammals which were mutated during evolution in that way and this exchange seems not to influence the function of the protein. Interestingly the PSSM value does not show this because with -1 it is too low to indicate a common substitution.
Histidine is part of a coil in the secondary structure and is situated in the exterior of the protein. So it is not at the functional part of the protein and the influence may be low.
SNAP, SIFT and Polyphen2 predict the mutation as neutral, tolerated and benign. All these results together lead to the assumption that the mutation is neutral. The convincing points here are the alignment and the results of the prediction tools. There are several sequences in the alignment with a change from Histidine to Arginine which means that it can be tolerated. And also SNAP, SIFT and Polyphen2 predict it as neutral.
If we look at the source of the SNP it is exactly what we expected: The mutation is not listed in HGMD and therefore not associated with Gaucher Disease.

Mutation 4: Arg - Gln (Pos. 159/120)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
1 (worst: -3) 9 (worst: 0) 5 (worst: 1) -2 0.0/11.0 Beta sheet Beta sheet Beta sheet Bend
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Non-neutral
Reliability Index: 7
Expected Accuracy: 96%
Prediction: affect protein function
Score: 0.03
Sequence Conservation: 3.01
Prediction: probably damaging
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
Prediction: probably damaging
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00

Mutation number 4 is the change from the aliphatic, polar and very basic Arginine to the aliphatic, polar and neutral Glutamine. Concerning the PAM substitution matrices the exchange is not rare. The values are relatively high, so it is common. BLOSUM62 has a value of 1 which is not as high as in the PAM matrices but also indicates a common change. The PSSM value indicates a rare change because it is low.
The alignment shows a very high conservation for the best 25 hits. All but two have also an Arginine at that position, the others show a substitution with Tryptophan but no Glutamine. The low conservation for all sequences is again because of gaps in the alignment.
This Arginine is part of a beta sheet and plays an important role in forming hydrogen bonds with the active site. So a mutation should affect the function of the protein. As the amino acid change is from basic to neutral it loses chemical properties that are necessary for the protein's function.
SNAP, SIFT and Polyphen2 predict the mutation as damaging. The accuracy of SNAP with 96% is very high and also the prediction of Polyphen2 has a score of 1.00.
All these results together indicate a damaging mutation. The chemical properties of the amino acid change and as it plays an important role in forming hydrogen bonds with the active site the mutation should influence the function of the protein. Also the prediction tools classify the mutation as damaging.
In HGMD it is associated with Gaucher Disease 1 which was already published in 1988. <ref>Gaucher disease type 1: cloning and characterization of a cDNA encoding acid beta-glucosidase from an Ashkenazi Jewish patient. Graves PN, Grabowski GA, Eisner R, Palese P, Smith FI. DNA. 1988 Oct;7(8):521-8.</ref>

Mutation 5: Pro - Leu (Pos. 221/182)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
-3 (worst: -4) 3 (worst: 0) 5 (worst: 1) -5 0.0/11.0 - Coil - -
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Non-neutral
Reliability Index: 5
Expected Accuracy: 87%
Prediction: affect protein function
Score: 0.00
Sequence Conservation: 3.01
Prediction: probably damaging
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00
Prediction: probably damaging
Score: 1.000
Sensitivity: 0.00
Specificity: 1.00

Mutation number 5 is the change from the heterocyclic, unpolar and neutral Proline to aliphatic, unpolar and neutral Leucine. As Proline forms a ring structure the mutation may change the structure of the protein. Apart from that the polarity remains the same and so the chemical properties.
The substitution is not as common as the mutations mentioned before. BLOSUM62 has with -3 a very low value and also the PAM matrices show with the values 3 and 5 that the substitution is rare. So the mutation may not be tolerated. Also the PSSM value is very low. If you look at the alignment you can also see a high conservation. The found sequences share the Proline at this position. So the mutation may not be tolerated.
The Proline is situated in the interior of the protein at a coil. As the interior is the functional part of the protein the mutation may not be tolerated. But it is not part of the active site.
SNAP, SIFT and Polyphen2 predict the mutation as non-neutral, "affect protein function" and probably damaging and that is what we expected. All in all we would classify the mutation as damaging.
In HGMD it is also associated with Gaucher Disease 2. The mutation was published 2004. <ref>Functional analysis of 13 GBA mutant alleles identified in Gaucher disease patients: Pathogenic changes and "modifier" polymorphisms. Montfort M, Chabás A, Vilageliu L, Grinberg D. Hum Mutat. 2004 Jun;23(6):567-75.</ref>

Mutation 6: Ser - Asn (Pos. 310/271)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
1 (worst: -3) 20 (worst: 0) 5 (worst: 1) 2 0.0/11.0 - Coil - Turn
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumDiv
Prediction: Neutral
Reliability Index: 0
Expected Accuracy: 53%
Prediction: tolerated
Score: 0.54
Sequence Conservation: 3.01
Prediction: benign
Score: 0.100
Sensitivity: 0.94
Specificity: 0.85
Prediction: benign
Score: 0.120
Sensitivity: 0.91
Specificity: 0.67

Mutation number 6 is the change from the aliphatic, polar and neutral Serine to the aliphatic, polar and neutral Asparagine. They are both zwitterions and share chemical properties. The PAM substitution matrices show relatively high values, so the substitution is common. BLOSUM62 has a lower value which indicates also a common substitution. The PSSM value is greater than zero so the mutation may be tolerated.
The conservation is very high. If you look at the best 25 hits they all have Serine at this position. If you look at more there are also some sequences with a Glycine at this position. But there is no change to Asparagine.
This position is part of a coil in the exterior of the protein and not near the active site. Therefore the mutation might not influence the function of the protein that much.
SNAP predicts the mutation as neutral with an expected accuracy of 53%. SIFT also predicts the mutation as tolerated with a score of 0.54 as well as Polyphen2 which predicts it as benign.
All these results together would indicate a tolerated mutation. The amino acids have the same properties, even the PSSM value is high, so we would classify it as a neutral mutation.
But the mutation is associated with Gaucher Disease as published in 1997.<ref>Identification and expression of acid beta-glucosidase mutations causing severe type 1 and neurologic type 2 Gaucher disease in non-Jewish patients. Grace ME, Desnick RJ, Pastores GM. J Clin Invest. 1997 May 15;99(10):2530-7</ref> So the prediction is wrong. The mutation is not tolerated and changes the function of the protein. Although the scores are not that high it was not predicted as damaging.

Mutation 7: Asn - Ser (Pos. 409/370)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
1 (worst: -4) 34 (worst: 0) 8 (worst: 1) 1 0.0/11.0 Helix Helix Helix Helix
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Non-neutral
Reliability Index: 1
Expected Accuracy: 63%
Prediction: affect protein function
Score: 0.05
Sequence Conservation: 3.02
Prediction: possibly damaging
Score: 0.573
Sensitivity: 0.88
Specificity: 0.91
Prediction: benign
Score: 0.131
Sensitivity: 0.90
Specificity: 0.68

Mutation number 7 is the substitution of the aliphatic, polar and neutral Asparagine to the also aliphatic, polar and neutral Serine. This is the most common mutation found in gaucher disease type 1 patients.
Concerning the substitution matrices the mutation is very common. PAM1 has with 34 a very high value as well as PAM250 with 8. Also BLOSUM62 shows that tendency because it has a value of 1. The PSSM has a high value as well.
The conservation in the alignment is very high. All sequences show an Asparagine at that position. So there are no mutations to Serine which would be accepted.
The mutation is situated at a helix in the interior of the protein. So it may affect its function and therefore it may not be tolerated.
SNAP predicts the mutation as non-neutral, SIFT as affecting the protein function. It is interesting, that Polyphen2 HumDiv predicts it as only possibly damaging and Polyphen2 HumVar even as benign.
These results are hard to interpret. The amino acids have the same properties, the substitution matrices indicate a neutral change and also the value in the PSSM is high. The only signs for a damaging mutation are the position in the interior of the protein and the results of SNAP and SIFT. As two of our prediction tools classify the mutation as damaging and the conservation is very high we would tend to classify it as damaging. But it is hard to decide whether there are enough reasons to classify it as damaging.
We know that the mutation is associated with Gaucher Disease 1, which was first published in 1988<ref>Genetic heterogeneity in type 1 Gaucher disease: multiple genotypes in Ashkenazic and non-Ashkenazic individuals. Tsuji S, Martin BM, Barranger JA, Stubblefield BK, LaMarca ME, Ginns EI. Proc Natl Acad Sci U S A. 1988 Apr;85(7):2349-52.</ref> and is supported by a lot of other sources given in HGMD. This mutations is the most common mutation found in Gaucher disease patiens. It is very interesting, that this mutations is this hard to classify.

Mutation 8: His - Arg (Pos. 350/311)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
0 (worst: -3) 10 (worst: 0) 6 (worst: 1) -3 0.0/9.0 Beta sheet Beta sheet - Bend
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Non-neutral
Reliability Index: 8
Expected Accuracy: 96%
Prediction: affect protein function
Score: 0.00
Sequence Conservation: 3.11
Prediction: probably damaging
Score: 1.0
Sensitivity: 0.00
Specificity: 1.00
Prediction: probably damaging
Score: 0.999
Sensitivity: 0.08
Specificity: 1.00

Mutation number 8 is the substitution of the aromatic, polar and slightly basic Histidine to the aliphatic, polar and strong basic Arginine. It is situated in the interior of the protein and part of a beta sheet. So the mutation may cause and change in the protein's function.
Concerning BLOSUM62 the amino acid replacement is common because it has a value of 0. The values for PAM1 and PAM250 are higher with 10 and 6 which would also indicate a common substitution. The value in the PSSM is again very low, so the mutation is rare. The conservation is relatively high, there are only two sequences with a substitution to Glutamic acid but no change to Arginine. Therefore the mutation may not be accepted.
SNAP predicts the mutation as non-neutral, SIFT as "affect protein function" and Polyphen2 as probably damaging.
All these results together point to a damaging mutation. All prediction tools classify it as damaging and also the change from an aromatic to an aliphatic amino acid in a beta sheet indicates a non-neutral change. So we would classify it as damaging.
And the mutation is associated to Gaucher Disease 2 as published in 1999<ref>Is the perinatal lethal form of Gaucher disease more common than classic type 2 Gaucher disease? Stone DL, van Diggelen OP, de Klerk JB, Gaillard JL, Niermeijer MF, Willemsen R, Tayebi N, Sidransky E. Eur J Hum Genet. 1999 May-Jun;7(4):505-9.</ref> So the prediction is right.

Mutation 9: Met - Val (Pos. 455/416)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
1 (worst: -3) 17 (worst: 0) 4 (worst: 1) -1 0.0/11.0 Helix Coil/Helix Helix Helix
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Non-neutral
Reliability Index: 3
Expected Accuracy: 78%
Prediction: tolerated
Score: 0.12
Sequence Conservation: 3.01
Prediction: probably damaging
Score: 0.999
Sensitivity: 0.14
Specificity: 0.99
Prediction: probably damaging
Score: 0.980
Sensitivity: 0.54
Specificity: 0.95

Mutation number 9 is the substitution of the aliphatic, unpolar and neutral Methionine to the aliphatic, unpolar and neutral Valine. The interesting thing is that Methionine contains a sulfur. Therefore it has other chemical properties than Valine. As it is also situated at a helix at an important part of the protein this may affect the protein's function.
The value in the substitution matrix of BLOSUM62 indicates a common substitution, PAM250 has a relatively low value so the substitution is rare. PAM1 also shows a higher value. The PSSM value is very low. The conservation in the alignment is very high and there is no other amino acid at this position. These are all signs for a non-neutral mutation.
SNAP and Polyphen2 predict the mutation as non-neutral and probably damaging. Only SIFT predicts it as tolerated.
All these results together indicate a non-neutral change. We would classfiy it as damaging because the substitution is rare and the chemical properties of the amino acid change. Also the majority of the prediction tools classify it as damaging.
In HGMD it is associated with Gaucher Disease 2 which was published in 2005<ref>Novel mutations in type 2 Gaucher disease in Chinese and their functional characterization by heterologous expression. Tang NL, Zhang W, Grabowski GA, To KF, Choy FY, Ma SL, Shi HP. Hum Mutat. 2005 Jul;26(1):59-60.</ref> So the prediction of SIFT is wrong, SNAP and Polyphen2 predict what we expected.

Mutation 10: Leu - Pro (Pos. 509/470)

BLOSUM62 PAM1 PAM250 PSSM Conservation
(all/best 25)
Secondary structure (Uniprot) Psipred Jpred3 DSSP
-3 (worst: -4) 2 (worst: 0) 3 (worst: 1) -6 0.0/9.0 Beta sheet Beta sheet Beta sheet Bend
SNAP SIFT Polyphen2 HumDiv Polyphen2 HumVar
Prediction: Non-neutral
Reliability Index: 1
Expected Accuracy: 63%
Prediction: affect protein function
Score: 0.01
Sequence Conservation: 3.09
Prediction: probably damaging
Score: 0.978
Sensitivity: 0.75
Specificity: 0.96
Prediction: probably damaging
Score: 0.966
Sensitivity: 0.59
Specificity: 0.94

Mutation number 10 is the change from the aliphatic, unpolar and neutral Leucine to the heterocyclic, unpolar and neutral Proline. Proline forms a ring structure so the change might also influence the protein folding. It is part of a beta sheet but at the exterior of the protein. So the change may not affect the function of the protein.
The values in the substitution matrices are very low, in BLOSUM62 as well as in the PAM matrices. So it is not a common substitution. Also the PSSM value is very low. The conservation is relatively high, although there are some substitutions to Valine and Phenylalanine. This all indicates a non-neutral mutation.
SNAP, SIFT and Polyphen2 predict the mutation as non-neutral and probably damaging. All these results indicate a damaging mutation. We would classify it as non-neutral because all results indicate that.
But we do not know if it has such an influence on the protein's function. The mutation is only found in dbSNP and not in HGMD. So it may be harmless.

Summary

We tried to classify ten different mutations. Therefore we compared the chemical properties of the amino acids with the mutated amino acids and looked, if they are part of an important secondary structure element. We looked at the values of the substitution matrices PAM1, PAM250 and BLOSUM62 as well as in PSSM. Another part was to look at the multiple alignment with mammals. And as a third part we used the prediction methods SNAP, SIFT and PolyPhen2.
If the majority of the methods tend to the same prediction it is easy. But if there are differences between the prediction methods and the assumptions we made because of the substitution matrices or chemical properties it was hard to decide how we should classify the prediction. The most interesting mutation was number seven. The results differ a lot and it was really hard to decide how to classify the mutation. The surprising thing here is that it is the most common mutation in Gaucher Disease. The results of our analysis do not indicate clearly that the mutation is damaging. Another interesting thing is that we predict two of the ten mutations wrong. These are number 6 and number 10. The first one seems to be for all results a tolerated mutation but concerning HGMD it is not. The second one indicates to be damaging but as it is not listed in HGMD it may be tolerated. That shows that it is possible to be sure, that a mutation is tolerated but in reality it is not. We also learned that there is no "master method" to classify a mutation. The most important thing is to take it all together, to weight the different results correctly and then decide how to classify a mutation.

Additional analyses might help to better classify mutations. Therefore have a look at structure-based mutation analysis and combined (sequence and structure) mutations analysis.

References

<references/>