Difference between revisions of "Gaucher Disease: Task 08 - Sequence-based mutation analysis"

From Bioinformatikpedia
(Mutation Analysis)
(Comparison of different approaches)
Line 150: Line 150:
 
! style="background:#FA5858;" | 90%
 
! style="background:#FA5858;" | 90%
 
! style="background:#2EFE64;" | 2
 
! style="background:#2EFE64;" | 2
! style="background:#F4FA58;" |
+
! style="background:#F4FA58;" | pdc
 
! style="background:#2EFE64;" | dbSNP
 
! style="background:#2EFE64;" | dbSNP
 
|-
 
|-
Line 159: Line 159:
 
! style="background:#2EFE64;" | 83%
 
! style="background:#2EFE64;" | 83%
 
! style="background:#2EFE64;" | 5
 
! style="background:#2EFE64;" | 5
! style="background:#2EFE64;" |
+
! style="background:#2EFE64;" | ndc
 
! style="background:#2EFE64;" | dbSNP
 
! style="background:#2EFE64;" | dbSNP
 
|-
 
|-
Line 168: Line 168:
 
! style="background:#FA5858;" | 100%
 
! style="background:#FA5858;" | 100%
 
! style="background:#FA5858;" | 7
 
! style="background:#FA5858;" | 7
! style="background:#FA5858;" |
+
! style="background:#FA5858;" | dc
 
! style="background:#FA5858;" | HGMD
 
! style="background:#FA5858;" | HGMD
 
|-
 
|-
Line 177: Line 177:
 
! style="background:#FA5858;" | 100%
 
! style="background:#FA5858;" | 100%
 
! style="background:#2EFE64;" | 0
 
! style="background:#2EFE64;" | 0
! style="background:#FA5858;" |
+
! style="background:#FA5858;" | dc
 
! style="background:#F4FA58;" | HGMD[dc] vs dbSNP[ndc]
 
! style="background:#F4FA58;" | HGMD[dc] vs dbSNP[ndc]
 
|-
 
|-
Line 186: Line 186:
 
! style="background:#FA5858;" | 100%
 
! style="background:#FA5858;" | 100%
 
! style="background:#2EFE64;" | 0
 
! style="background:#2EFE64;" | 0
! style="background:#FA5858;" |
+
! style="background:#FA5858;" | dc
 
! style="background:#F4FA58;" | HGMD[dc] vs dbSNP[ndc]
 
! style="background:#F4FA58;" | HGMD[dc] vs dbSNP[ndc]
 
|-
 
|-
Line 195: Line 195:
 
! style="background:#FA5858;" | 100%
 
! style="background:#FA5858;" | 100%
 
! style="background:#2EFE64;" | 5
 
! style="background:#2EFE64;" | 5
! style="background:#2EFE64;" |
+
! style="background:#2EFE64;" | ndc
 
! style="background:#2EFE64;" | dbSNP
 
! style="background:#2EFE64;" | dbSNP
 
|-
 
|-
 
|T408M
 
|T408M
! style="background:#efefef;" | Analysis of <xr id="ana"/>
+
! style="background:#2EFE64;" | no effect
 
! style="background:#FA5858;" | 0.03
 
! style="background:#FA5858;" | 0.03
 
! style="background:#2EFE64;" | 0.11
 
! style="background:#2EFE64;" | 0.11
 
! style="background:#2EFE64;" | 60%
 
! style="background:#2EFE64;" | 60%
 
! style="background:#2EFE64;" | 5
 
! style="background:#2EFE64;" | 5
! style="background:#efefef;" |
+
! style="background:#2EFE64;" | ndc
 
! style="background:#FA5858;" | HGMD
 
! style="background:#FA5858;" | HGMD
 
|-
 
|-
Line 222: Line 222:
 
! style="background:#FA5858;" | 100%
 
! style="background:#FA5858;" | 100%
 
! style="background:#2EFE64;" | 1
 
! style="background:#2EFE64;" | 1
! style="background:#FA5858;" |
+
! style="background:#FA5858;" | dc
 
! style="background:#FA5858;" | HGMD
 
! style="background:#FA5858;" | HGMD
 
|-
 
|-
Line 235: Line 235:
 
|-
 
|-
 
|}
 
|}
<center><small>'''<caption>''' Information about selected mutations from different predictors of amino acid substitution effects as well as our own interpretation based on our data of the previous exercises of task8. The consensus is our opinion of the effect based on the predictions and our analysis. We divide between three different kind of effects: <span style="background:#FA5858">'''disease causing'''</span>, <span style="background:#F4FA58">'''possibly damaging'''</span> and <span style="background:#2EFE64">'''non-disease causing'''</span>. The prediction scores are colored acording to this. The validation containts the information of the databases (HGMD, dbSNP).</caption></small></center>
+
<center><small>'''<caption>''' Information about selected mutations from different predictors of amino acid substitution effects as well as our own interpretation based on our data of the previous exercises of task8. The consensus is our opinion of the effect based on the predictions and our analysis. We divide between three different kind of effects: <span style="background:#FA5858">'''disease causing (dc)'''</span>, <span style="background:#F4FA58">'''possibly damaging (pdc)'''</span> and <span style="background:#2EFE64">'''non-disease causing (ndc)'''</span>. The prediction scores are colored acording to this. The validation containts the information of the databases (HGMD, dbSNP).</caption></small></center>
 
</figtable>
 
</figtable>

Revision as of 21:17, 2 September 2013

LabJournal

Mutation Set

<figtable id="sele">

Mutations
mRNA Protein
Reference Sequence Position Codon change Codon Number Amino Acid change One letter code
rs368786234 656 AGC ⇒ AGA 77 Ser ⇒ Arg S77R
rs374003673 847 AAT ⇒ AGT 141 Asn ⇒ Ser N141S
CM880035 - CGG ⇒ CAG 159 Arg ⇒ Gln R159Q
rs374591570 1062 CTC ⇒ TTC 213 Leu ⇒ Phe L213F
CM992894 - GGA ⇒ GAA 241 Gly ⇒ Glu G241E
rs371083513 1470 GTA ⇒ ATA 349 Val ⇒ Ile V349I
CM960697 - ACG ⇒ ATG 408 Thr ⇒ Met T408M
CM880036 - AAC ⇒ AGC 409 Asn ⇒ Ser N409S
CM870010 - CTG ⇒ CCG 483 Leu ⇒ Pro L483P
CM057072 - AAC ⇒ AGC 501 Asn ⇒ Ser N501S
Information about 10 randomly selected mutations for glucocerebrosidase taken from HGMD (CM...) and dbSNP (rs...).

</figtable>

Mutation Analysis

In our analysis we looked closer to the amino acid properties and their changing characteristics by mutation. We analysed the structural difference between wild type (WT) and mutation. We also considered their secondary structure and distinguished between helix (H), sheet (E) and loop (C). We also took two different substitution matrices into account, BLOSUM62 and PAM250. Point Accepted Mutation matrix has only positiv integer values as scores and is not symmetric. The score reflects the probability of a amino acid to mutate into another. In contrast the BLOcks SUbstitution Matrix has also negativ integers and is symmetric. A positive score indicates that a substitution occurs more than random. While a score of 0 shows that the substitution occurs randomly, a negative one points to a mutation less frequent than a random mutation. In case one of our selected mutations has the worst possible substitution score for this amino acids we highlighted the score red in <xr id="ana"/>. To consider also evolutionary information we created different PSSM matrices. These position specific scoring matrices are based on alignments. Just as BLOSUM, the PSSM has positive and negative integer values as scores. A positve value shows that the substitution occurs more often than expected. Critical functional residues, like active site residues, have high positive scores. One PSSM was created with a PsiBlast search. The other one is based on an alignment consisting of all mammalian homologous sequences.


<figtable id="ana">

Mutation Analysis
Changes of Physiochemical Properties Structural Properties Conservation Effect
Mutation From To Pymol Visualization Secondary Structure BLOSUM62 score PAM250 score PSSM score PSSM WT frequency PSSM mutatant frequency MSA WT frequency MSA mutant frequency
S77R polar, neutral charge, sulfur-containing polar, positive, basic
Mutation of serine (blue) to arginine (orange) on position 77.
E -1 6 1 11% 9% 64% 2% slightly negative
N141S polar, neutral charge, acidic polar, neutral, sulfur-containing
Mutation of asparagine (blue) to serine (orange) on position 141.
H 1 5 0 10% 7% 55% 3% neutral
R159Q polar, positive charge, basic polar, neutral, acidic
Mutation of arginine (blue) to glutamine (orange) on position 159.
E 1 5 -4 83% 0% 86% 0% negative
L213F nonpolar, neutral charge, aliphatic, hydrophobic nonpolar, neutral, aromatic, hydrophobic
Mutation of leucine (blue) to phenylalanine (orange) on position 213.
E 0 13 3 22% 13% 100% 0%
G241E nonpolar, neutral charge, aliphatic polar, negative, acidic
Mutation of glycine (blue) to glutamic acid (orange) on position 241.
C -2 9 -1 10% 3% 83% 0%
V349I nonpolar, neutral charge, aliphatic, hydrophobic nonpolar, neutral, aliphatic, hydrophobic
Mutation of valine (blue) to isoleucine (orange) on position 349.
E 3 4 0 14% 5% 97% 3% neutral
T408M polar, neutral charge, hydroxyl-containing nonpolar, neutral, sulfur-containing
Mutation of threonine (blue) to methionine (orange) on position 408.
H -1 5 -1 4% 2% 82% 0%
N409S polar, neutral charge, acidic polar, neutral, sulfur-containing
Mutation of asparagine (blue) to serine (orange) on position 409.
H 1 5 1 10% 9% 76% 2%
L483P nonpolar, neutral charge, aliphatic, hydrophobic nonpolar, neutral, cyclic
Mutation of serine (blue) to arginine (orange) on position 483
E -3 5 -3 29% 1% 100% 0% high negative
N501S polar, neutral charge, acidic polar, neutral, sulfur-containing
Mutation of asparagine (blue) to serine (orange) on position 501.
E 1 5 -2 87% 3% 86% 1% negative
Analysis of the chosen mutations of <xr id="sele"/> in the field of their properties, secondary structure and conservation. The secondary structure can be classified as helix (H), sheet (E) and loop (C). In case a mutations is the worst possible subsitution for this amino acid, the substitution matrix score is coloured red. The effect of the mutation is based on our analysis. A detailed description can be read below.

</figtable>

Based on the analysis summed up in <xr id="ana"/> we interpreted our mutations:

S77R : The biggest change happens in the secondary structure. While serine has a short and neutral side chain, arginine shows a much longer positive side chain, that probably causes a clash with the flexible loops of the environment. Additional to the change in its polarity the residue switches from sulfur containing to basic. This could destabilize its secondary strucure, as the parralel located sheet may be not fixed anymore to the sheet of the residue. The PSSM show only no high frequency for the WT as well as the mutant. With scores of -1 and 6, the substitution matrices identifies the point mutation as expectable. We think the only effect comes from the structural change and has a slightly negative effect.

N141S : The mutation causes no change in its charge and polarity. The affected residue is located in a helix on the protein surface which let us assume that no effect may occur (neutral). The substitution matrices as well as the PSSM score confirms us to this opinion, as the scores (1[BLOSUM], 5[PAM] and 0[PSSM]) indicate the mutation as nearly random. Also the PSSM frequencys tells us that the mutation is rare (7% and 3%) and the WT not very distinct (10% and 55%).

R159Q : The substitution changes the residue from basic and positive charged into a acidic residue without charge. The mutant has a much longer side chain wich extends deep into the protein. This structural characteristics as well as the great pH change let us assume that a clash or effect on the structure around this amino acid cannot be avoided. The scores, especially the evolutionary based position specific score, affirms us in our assumption. In the end, the frequency of the WT (83% and 86%) as well as the absence of the mutant in the MSAs, leaves us no doubt of the mutation severeness.

L213F : The amino located in a sheet turns from aliphatic to aromatic. Although, this is a great structural change, there seem to be no clashes or other influences on the neighbourhood. Both substitution matrices indicate this to occur on random. Even though, the PSSM shows an mutation appearance more than random and the mutation can be seen in 13% of the alignment sequences, the MSA of homologous sequences consists only of leucine at this position. However, we think that the mutation may be neutral, but uncommon between the homologous sequences. As we are not quite sure, we defined it as possible damaging.

G241E : There is great change in the physiochemical properties caused by the mutation from glycine to glutamine acid, especially for the charge and pH. The residue appears in a loop on the protein surface. Its side chain does not extend into the protein. Thats way the propertiy change will not have an effect on the protein structure. The substitution scores deviate from each other. While BLOSUM shows an occurance less than random, PAM indicates this mutation to happen more frequent. Also the different MSAs show frequencies that makes the data interpretation difficult. Due to the structural and property characteristics, we would assume the mutation as neutral. But because of the remainig data, we are not sure und declare it as slightly damging.

V349I : The mutation from valine to isoleucine makes no difference in the properties. Also, both branch chain amino acids differ only slightly in their structure. The PAM declares the substitution as rare, as the score of 4 is the worst for valine, but occurs for several amino acids. Based on evolutionary information the PSSM-score defines the mutation as random. This is not surprising concerning their similarity. In the MSAs both amino acids appear at position 349, but valine more often. Based on this observation, we are convinced that this is neutral mutation between similar amino acids, which occurs once in a while.

T408M : The mutant and the WT differ in their polarity. While threonine is hydroxyl containing, methionine a sulfur atom in its structure. The side chain causes no clashes with other residues. This substitution happens little less than random. Both amino acids are rare at position 408 in the PSSM alignment. Considering only homologous sequences the WT occurs way more often (82%). The mutant is not observable. We interprete this data as a neutral mutation.

N409S :

L483P : There seem to be no great changes in the pysichemical properties except the cyclic characteristic of proline. The residue is located at the end of a sheet and may be not important for the structure stabilisation, but it is obvious that the ring clashes with following residues of the adjoining loop which can be severe for the structure. Especially the BLOSUM and PSSM scores (both -3) show an rare occurance of the mutation. The frequency balance between mutation and wild type is very dissimilar (29%>>1%). By looking at the evolutionary information of the homologs, we can see that the WT is present in all sequences. This can give as a hint to the severeness of the mutation. The reason for never showing up as well as a rare appearance, can be that the mutation causes not only a disease but death. All information about this mutation let us identify a high negativ effect.

N501S :

Comparison of different approaches

First of all we interpreted out collected data from Analysis of <xr id="ana"/>. After that we run several predictors of mutation effects. All these results are summarized in <xr id="app"/>. Then we reassesed our analysis by considering the prediction results. In the end we validated our new interpretation (consensus in <xr id="app"/>) against the databases dbSNP and HGMD. In two cases, both databases had contrary mutation information. While HGMD identifies them as disease causing, dbSNP classifies the two mutations as non disease causing. Thats why we marked them as possibly damaging. For two mutations (N141S, R159Q) all predictions, our interpretations, and the validation totaly agree with each other. So, we can say for sure that the mutation of asparagine to serine on position 141 has no effect. However, the mutation on position 159 from arginine to glutamine is defintly disease causing.


<figtable id="app">

Summary of different prediction approaches
Mutation Analysis of <xr id="ana"/> SIFT Polyphen2 (HumDiv) MutationTaster SNAP Conssensus Validation
S77R slightly negative effect 0.37 0.17 90% 2 pdc dbSNP
N141S no effect 0.15 0.01 83% 5 ndc dbSNP
R159Q negative effect 0 1 100% 7 dc HGMD
L213F slightly negative effect 0 0.79 100% 0 dc HGMD[dc] vs dbSNP[ndc]
G241E slightly negative effect 0.01 0.89 100% 0 dc HGMD[dc] vs dbSNP[ndc]
V349I no effect 0.25 0.12 100% 5 ndc dbSNP
T408M no effect 0.03 0.11 60% 5 ndc HGMD
N409S Analysis of <xr id="ana"/> 0.05 0.23 100% 0 HGMD
L483P high negative 0 0.85 100% 1 dc HGMD
N501S Analysis of <xr id="ana"/> 0 0.98 100% 0 HGMD
Information about selected mutations from different predictors of amino acid substitution effects as well as our own interpretation based on our data of the previous exercises of task8. The consensus is our opinion of the effect based on the predictions and our analysis. We divide between three different kind of effects: disease causing (dc), possibly damaging (pdc) and non-disease causing (ndc). The prediction scores are colored acording to this. The validation containts the information of the databases (HGMD, dbSNP).

</figtable>