Difference between revisions of "Sequence-based mutation analysis Gaucher Disease"

From Bioinformatikpedia
Line 1: Line 1:
The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. The mutations have been selected by another group from our [[Researching SNPs Gaucher Disease|set of collected mutations]] such that their impact had been unknown for us prior to this task. We investigated the provided mutations with respect to their physicochemical properties, structural features, as well as their conservation and employed the tools, SIFT, Polyphen2, as well as SNAP for predicting their impact on the phenotype. For quantifying to which extend the mutations are disease causing, we assigend a ''disease score'' where -1 means ''non-disease causing'', 0 ''ambiguous'', and 1 ''disease causing''. We averaged the disease score to obtain a final prediction which we compared with the true disease state. Technical details are reported in our [[Gaucher_Task06_Protocol|protocol]].
+
The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. The mutations have been selected by another group from our [[Researching SNPs Gaucher Disease|set of mutations]] such that their impact had been unknown for us prior to this task. We investigated the provided mutations with respect to their physicochemical properties, structural features, as well as their conservation and employed the tools, SIFT, Polyphen2, as well as SNAP for predicting their impact on the phenotype. For quantifying to which extend the mutations are disease causing, we assigend a ''disease score'' where -1 means ''non-disease causing'', 0 ''ambiguous'', and 1 ''disease causing''. We averaged the disease scores to obtain a final prediction which we compared with the true impact of the mutation on the phenotype. Technical details are reported in our [[Gaucher_Task06_Protocol|protocol]].
   
 
== Mutations ==
 
== Mutations ==
  +
<xr id="tab:mutations"/> contains five randomly chosen Gaucher disease-causing and five non-disease-causing mutations. Disease causing mutations were sampled from the HGMD whereas non-disease causing mutations were sampled from a set of mutations which were present in the dbSNP but not in the HGMD. Reference sequence was [http://www.uniprot.org/uniprot/p04062 P04062] which has a 39 residue signal peptide. The ten mutations listed in <xr id="tab:mutations"/> were investigated in the following.
  +
 
<figtable id="tab:mutations">
 
<figtable id="tab:mutations">
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:right" width="300px"
 
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 2px 0 2px 0; text-align:right" width="300px"
Line 33: Line 35:
   
 
== Physicochemical properties ==
 
== Physicochemical properties ==
  +
We compared the charge, polarity, size, and the aromatic character of the wild-type and mutant amino-acid and assigned a disease-score of 1 to those mutations, which have a severe impact on the physicochemical properties (cf. <xr id="tab:props"/>). Mutations number 3 changes the polarity since glutamate is acidic but lysine basic. We also considered mutation number 5 and 7 disease-causing as tryptophan is aromatic and unpolar, in contrast to the target residues. Substituting alanine, which is small and unpolar, by the long and acidic aspartate might also impact the structure and function of the protein.
   
 
<figtable id="tab:props">
 
<figtable id="tab:props">

Revision as of 13:27, 17 June 2012

The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. The mutations have been selected by another group from our set of mutations such that their impact had been unknown for us prior to this task. We investigated the provided mutations with respect to their physicochemical properties, structural features, as well as their conservation and employed the tools, SIFT, Polyphen2, as well as SNAP for predicting their impact on the phenotype. For quantifying to which extend the mutations are disease causing, we assigend a disease score where -1 means non-disease causing, 0 ambiguous, and 1 disease causing. We averaged the disease scores to obtain a final prediction which we compared with the true impact of the mutation on the phenotype. Technical details are reported in our protocol.

Mutations

<xr id="tab:mutations"/> contains five randomly chosen Gaucher disease-causing and five non-disease-causing mutations. Disease causing mutations were sampled from the HGMD whereas non-disease causing mutations were sampled from a set of mutations which were present in the dbSNP but not in the HGMD. Reference sequence was P04062 which has a 39 residue signal peptide. The ten mutations listed in <xr id="tab:mutations"/> were investigated in the following.

<figtable id="tab:mutations">

Nr Position From To
1 99 H R
2 211 V I
3 150 E K
4 236 L P
5 248 W R
6 509 L P
7 351 W C
8 423 A D
9 482 D N
10 83 R S

Randomly selected mutations from HGMD and dbSNP which were used for the sequence-based mutation analysis. </figtable>

Physicochemical properties

We compared the charge, polarity, size, and the aromatic character of the wild-type and mutant amino-acid and assigned a disease-score of 1 to those mutations, which have a severe impact on the physicochemical properties (cf. <xr id="tab:props"/>). Mutations number 3 changes the polarity since glutamate is acidic but lysine basic. We also considered mutation number 5 and 7 disease-causing as tryptophan is aromatic and unpolar, in contrast to the target residues. Substituting alanine, which is small and unpolar, by the long and acidic aspartate might also impact the structure and function of the protein.

<figtable id="tab:props">

Nr Wildtype Mutant Disease
score
AA Charge Polarity Size Aromatic AA Charge Polarity Size Aromatic
1 H negative polar large no R negative polar large no -1
2 V neutral unpolar medium no I neutral unpolar medium no -1
3 E positive polar large no K negative polar large no 1
4 L neutral unpolar medium no P neutral unpolar medium no -1
5 W neutral unpolar large yes R negative polar large no 1
6 L neutral unpolar medium no P neutral unpolar medium no -1
7 W neutral unpolar large yes C neutral polar small no 1
8 A neutral unpolar small no D positive polar medium no 1
9 D positive polar medium no N neutral polar medium no -1
10 R negative polar large no S neutral polar small no 0

Physicochemical properiets of the wildtype and mutatant amino acid which were used to classify the mutation as severe or non-severe. </figtable>

Structural analysis

<figtable id="tab:structure">

Nr Mutation Acc 2nd structure Domain region Mutation Disease
score
1 H99R exposed C no Structure nr1.png -1
2 V211I exposed C no Structure nr2.png -1
3 E150K exposed C no Structure nr3.png -1
4 L236P exposed C no Structure nr4.png -1
5 W248R buried H yes Structure nr5.png 1
6 L509P exposed S yes Structure nr6.png 1
7 W351C exposed S yes Structure nr7.png 0
8 A423D buried C yes Structure nr8.png 1
9 D482N exposed C yes Structure nr9.png 0
10 R83S exposed C yes Structure nr10.png -1

Location of mutations in 2nt0_A. Blue: wildtype; Red: mutant; Acc: Solvent accessibility. </figtable>

<figure id="fig:structure_all">

2nt0_A along with the selected wildtype residues from <xr id="tab:mutations"/> in blue.

</figure>


Substitution scores

BLOSUM62 scores

The scores were taken from the BLOSUM62 matrix.

<figtable id="tab:subst_blosum">

Nr Mutation Score
mutation
Score
min
Score
max
Disease
score
1 H99R 0 -3 8 0
2 V211I 3 -3 4 -1
3 E150K 1 -4 5 0
4 L236P -3 -4 4 1
5 W248R -3 -4 11 1
6 L509P -3 -4 4 1
7 W351C -2 -4 11 1
8 A423D -2 -3 4 1
9 D482N 1 -4 6 0
10 R83S -1 -3 5 0

BLOSUM62 scores of the selected mutations. </figtable>

PSSM of all hits

PSSM.

</figure> </figure>
<figure id="fig:subst_pssm_all_ali">
Sequence alignment of P04062 derived from all significant hits after 5 rounds PSI-BLAST
<figure id="fig:subst_pssm_all">
PSSM of P04062 derived from all significant hits after 5 rounds PSI-BLAST.

<figtable id="tab:subst_pssm_all">

Nr Mutation Score
mutation
Score
min
Score
max
Conservation Disease
score
1 H99R 0 -4 2 Subst pssm all col99.png 0
2 V211I 4 -4 4 Subst pssm all col211.png -1
3 E150K 0 -4 5 Subst pssm all col150.png 0
4 L236P 0 -3 1 Subst pssm all col236.png 0
5 W248R -2 -3 4 Subst pssm all col248.png 1
6 L509P -6 -6 4 Subst pssm all col509.png 1
7 W351C -3 -6 9 Subst pssm all col351.png 1
8 A423D -3 -3 3 Subst pssm all col423.png 1
9 D482N 4 -4 4 Subst pssm all col482.png -1
10 R83S 0 -3 2 Subst pssm all col83.png 0

Position specific substitution scores derived from all significant hits after 5 rounds PSI-BLAST. The respective profile column is shown on the right. </figtable>


PSSM of close homologous sequences

PSSM.

</figure> </figure>
<figure id="fig:subst_pssm_best_ali">
Sequence alignment of P04062 derived from the 60 closest homologous sequences after 5 rounds PSI-BLAST
<figure id="fig:subst_pssm_best">
PSSM of P04062 derived from the 60 closest homologous sequences after 5 rounds PSI-BLAST.

<figtable id="tab:subst_pssm_best">

Nr Mutation Score
mutation
Score
min
Score
max
Conservation Disease
score
1 H99R 0 -3 8 Subst pssm best col99.png 0
2 V211I 3 -3 4 Subst pssm best col211.png -1
3 E150K 1 -4 5 Subst pssm best col150.png 0
4 L236P -3 -4 4 Subst pssm best col236.png 1
5 W248R -3 -4 11 Subst pssm best col248.png 1
6 L509P -3 -4 4 Subst pssm best col509.png 1
7 W351C -2 -4 11 Subst pssm best col351.png 1
8 A423D -2 -3 4 Subst pssm best col423.png 1
9 D482N 1 -4 6 Subst pssm best col482.png -1
10 R83S -1 -3 6 Subst pssm best col83.png 0

Position specific substitution scores derived from the 60 closest homologous sequences after 5 rounds PSI-BLAST. The respective profile column is shown on the right. </figtable>

Scoring Mutants

SIFT

The predicted results from SIFT Blink is shown here:

Substitution at pos 83 from R to S is predicted to be TOLERATED with a score of 0.17.
   Median sequence conservation: 2.12
   Sequences represented at this position:77

Substitution at pos 99 from H to R is predicted to be TOLERATED with a score of 0.64.
   Median sequence conservation: 2.14
   Sequences represented at this position:80

Substitution at pos 150 from E to K is predicted to be TOLERATED with a score of 0.76.
   Median sequence conservation: 2.10
   Sequences represented at this position:86

Substitution at pos 211 from V to I is predicted to be TOLERATED with a score of 0.56.
   Median sequence conservation: 2.09
   Sequences represented at this position:86

Substitution at pos 236 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.02.
   Median sequence conservation: 2.09
   Sequences represented at this position:86

Substitution at pos 248 from W to R is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 2.09
   Sequences represented at this position:86

Substitution at pos 351 from W to C is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 2.10
   Sequences represented at this position:87

Substitution at pos 423 from A to D is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01.
   Median sequence conservation: 2.10
   Sequences represented at this position:85

Substitution at pos 482 from D to N is predicted to be TOLERATED with a score of 0.69.
   Median sequence conservation: 2.18
   Sequences represented at this position:66

Substitution at pos 509 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 2.10
   Sequences represented at this position:79


The predicted results from SIFT is shown here:

Substitution at pos 83 from R to S is predicted to AFFECT PROTEIN FUNCTION with a score of 0.05.
   Median sequence conservation: 3.10
   Sequences represented at this position:15

Substitution at pos 99 from H to R is predicted to be TOLERATED with a score of 0.74.
   Median sequence conservation: 3.11
   Sequences represented at this position:14

Substitution at pos 150 from E to K is predicted to be TOLERATED with a score of 0.44.
   Median sequence conservation: 3.10
   Sequences represented at this position:16

Substitution at pos 211 from V to I is predicted to be TOLERATED with a score of 1.00.
   Median sequence conservation: 3.10
   Sequences represented at this position:16

Substitution at pos 236 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.10
   Sequences represented at this position:16

Substitution at pos 248 from W to R is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.10
   Sequences represented at this position:16

Substitution at pos 351 from W to C is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.10
   Sequences represented at this position:16

Substitution at pos 423 from A to D is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01.
   Median sequence conservation: 3.10
   Sequences represented at this position:16

Substitution at pos 482 from D to N is predicted to be TOLERATED with a score of 0.77.
   Median sequence conservation: 3.10
   Sequences represented at this position:16

Substitution at pos 509 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01.
   Median sequence conservation: 3.11
   Sequences represented at this position:14

Polyphen2

H99R
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)

V211I
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
This mutation is predicted to be benign with a score of 0.001 (sensitivity: 0.99; specificity: 0.09)

E150K
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
This mutation is predicted to be benign with a score of 0.001 (sensitivity: 0.99; specificity: 0.09)

L236P
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

W248R
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
This mutation is predicted to be probably damaging with a score of 0.999 (sensitivity: 0.09; specificity: 0.99)

L509P
This mutation is predicted to be probably damaging with a score of 0.992 (sensitivity: 0.70; specificity: 0.97)
This mutation is predicted to be probably damaging with a score of 0.988 (sensitivity: 0.53; specificity: 0.95)

W351C
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

A423D
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.36; specificity: 0.97)

D482N
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
his mutation is predicted to be benign with a score of 0.002 (sensitivity: 0.99; specificity: 0.18)

R83S
This mutation is predicted to be benign with a score of 0.007 (sensitivity: 0.96; specificity: 0.75)
This mutation is predicted to be benign with a score of 0.019 (sensitivity: 0.95; specificity: 0.55)

SNAP

Discussion

<figtable id="tab:discussion">

Property 1 2 3 4 5 6 7 8 9 10
Name Weight H99R V211I E150K L236P W248R L509P W351C A423D D482N R83S
Physicochemical 1.0 -1 -1 1 -1 1 -1 1 1 -1 0
Structure 0.8 -1 -1 -1 -1 1 1 0 1 0 -1
BLOSUM62 0.2 0 -1 0 1 1 1 1 1 0 0
PSSM all 0.4 0 -1 0 0 1 1 1 1 -1 0
PSSM close 0.4 0 -1 0 1 1 1 1 1 -1 0
SIFT 1.0 1 1 1 1 1 1 1 1 1 1
Polyphen2 1.0 1 1 1 1 1 1 1 1 1 1
SNAP 1.0 1 1 1 1 1 1 1 1 1 1
Average disease score 1 1 1 1 1 1 1 1 1 1
Prediction 1 1 1 1 1 1 1 1 1 1
Verification -1 -1 1 1 1 -1 1 1 -1 -1

Summary of the sequence-based mutation analysis. A final disease score is obtained by computing the weighted average of all individual disease scores. </figtable>

H99R

H99R is not disease causing. Not listed in HGMD.

V211I

V211I is not disease causing. Not listed in HGMD.

E150K

Gaucher disease type 1 [1]

L236P

Gaucher disease type 1 [2]

W248R

Gaucher disease [3]

L509P

L509P is not disease causing. Not listed in HGMD.

W351C

Gaucher disease type 1 [4]

A423D

Gaucher disease [5]

D482N

D482N is not disease causing. Not listed in HGMD.

R83S

R83S is not disease causing. Not listed in HGMD.