Difference between revisions of "Sequence-based mutation analysis Gaucher Disease"
(→Structural analysis) |
(→Substitution scores) |
||
Line 173: | Line 173: | ||
W248R takes place in the hydrophobic core region of the TIM beta/alpha-barrel domain and inserting a hydrophilic, arginine is likely to impair the catalytic activity of the enzyme, although it does no change the secondary structure according to the PSI-PRED prediction. Same holds true for A423D. We though L509P as disease-causing as Proline turned the sheet into a loop region at this site. W351C might change the protein structure due to the formation of disulide bonds. |
W248R takes place in the hydrophobic core region of the TIM beta/alpha-barrel domain and inserting a hydrophilic, arginine is likely to impair the catalytic activity of the enzyme, although it does no change the secondary structure according to the PSI-PRED prediction. Same holds true for A423D. We though L509P as disease-causing as Proline turned the sheet into a loop region at this site. W351C might change the protein structure due to the formation of disulide bonds. |
||
− | == |
+ | == Conservation == |
+ | The conservation of an amino acid indicates its importance for the structure and function of the protein. The log-odds substitution score is used to quantify the likelihood of a substitution where a negative score indicates that the substitution is observed less frequently than expected by chance. This is primarily due to different physicochemical properties which cause severe structural changes such that the resulting protein is negatively selected. Hence, substitutions with a negative score a likely to be disease-causing whereas a positive score indicates that the mutation does not affect the protein. |
||
=== BLOSUM62 scores === |
=== BLOSUM62 scores === |
||
+ | The [[BLOSUM62 matrix|BLOSUM62 matrix]] substitution matrix was derived by clustering sequences of the Blocks database with a minimal identity of 62% and counting inter-cluster substitutions. The evolutionary distance underlying the BLOSUM62 matrix turned out to be suitable for many applications. We labeled substitutions with a score close to the minimal score as disease-causing (cf. <xr id="tab:subst_blosum"/>). |
||
− | The scores were taken from the [[BLOSUM62 matrix|BLOSUM62 matrix]]. |
||
<figtable id="tab:subst_blosum"> |
<figtable id="tab:subst_blosum"> |
||
Line 208: | Line 209: | ||
</figtable> |
</figtable> |
||
− | === PSSM of all hits=== |
+ | === PSSM of all hits === |
+ | A Position Specific Scoring Matrix (PSSM) or profile is a matrix which stores the probability P(a|i) to a observe amino acid a at position i. It is derived from a sequence alignment and the position specific substitution scores S(a,b)=log P(a|i)/ P(a) are more precise than the general BLOSUM62 scores. We therefore computed an alignment (cf. <xr id="fig:subst_pssm_all_ali"/>) from all significant sequences found by performing five rounds PSI-BLAST, computed a PSSM (cf. <xr id="fig:subst_pssm_all"/>) and used the position specific substitution scores to assign a disease score for each mutation (cf. <xr id="tab:subst_pssm_all"/>). Mutations 5-7 were though of as disease-causing since their substitution score were close to the minimum and the sites were highly conserved. |
||
+ | |||
[[PSSM|PSSM]]. |
[[PSSM|PSSM]]. |
||
{| |
{| |
||
Line 247: | Line 250: | ||
=== PSSM of close homologous sequences === |
=== PSSM of close homologous sequences === |
||
+ | |||
+ | |||
[[PSSM of close homologous sequences|PSSM]]. |
[[PSSM of close homologous sequences|PSSM]]. |
||
{| |
{| |
Revision as of 15:48, 17 June 2012
The aim of this task was to carry out a thorough analysis of ten mutations and to classify them as disease-causing and non-disease causing. The mutations have been selected by another group from our set of mutations such that their impact had been unknown for us prior to this task. We investigated the provided mutations with respect to their physicochemical properties, structural features, as well as their conservation and employed the tools, SIFT, Polyphen2, as well as SNAP for predicting their impact on the phenotype. For quantifying to which extend the mutations are disease causing, we assigend a disease score where -1 means non-disease causing, 0 ambiguous, and 1 disease causing. We averaged the disease scores to obtain a final prediction which we compared with the true impact of the mutation on the phenotype. Technical details are reported in our protocol.
Contents
Mutations
<xr id="tab:mutations"/> contains five randomly chosen Gaucher disease-causing and five non-disease-causing mutations. Disease causing mutations were sampled from the HGMD whereas non-disease causing mutations were sampled from a set of mutations which were present in the dbSNP but not in the HGMD. Reference sequence was P04062 which has a 39 residue signal peptide. The ten mutations listed in <xr id="tab:mutations"/> were investigated in the following.
<figtable id="tab:mutations">
Nr | Position | From | To |
---|---|---|---|
1 | 99 | H | R |
2 | 211 | V | I |
3 | 150 | E | K |
4 | 236 | L | P |
5 | 248 | W | R |
6 | 509 | L | P |
7 | 351 | W | C |
8 | 423 | A | D |
9 | 482 | D | N |
10 | 83 | R | S |
Randomly selected mutations from HGMD and dbSNP which were used for the sequence-based mutation analysis. </figtable>
Physicochemical properties
We compared the charge, polarity, size, and the aromatic character of the wild-type and mutant amino-acid and assigned a disease-score of 1 to those mutations, which have a severe impact on the physicochemical properties (cf. <xr id="tab:props"/>). Mutations number 3 changes the polarity since glutamate is acidic but lysine basic. We also considered mutation number 5 and 7 disease-causing as tryptophan is aromatic and unpolar, in contrast to the target residues. Substituting alanine, which is small and unpolar, by the long and acidic aspartate might also impact the structure and function of the protein.
<figtable id="tab:props">
Nr | Wildtype | Mutant | Disease score | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
AA | Charge | Polarity | Size | Aromatic | AA | Charge | Polarity | Size | Aromatic | ||
1 | H | negative | polar | large | no | R | negative | polar | large | no | -1 |
2 | V | neutral | unpolar | medium | no | I | neutral | unpolar | medium | no | -1 |
3 | E | positive | polar | large | no | K | negative | polar | large | no | 1 |
4 | L | neutral | unpolar | medium | no | P | neutral | unpolar | medium | no | -1 |
5 | W | neutral | unpolar | large | yes | R | negative | polar | large | no | 1 |
6 | L | neutral | unpolar | medium | no | P | neutral | unpolar | medium | no | -1 |
7 | W | neutral | unpolar | large | yes | C | neutral | polar | small | no | 1 |
8 | A | neutral | unpolar | small | no | D | positive | polar | medium | no | 1 |
9 | D | positive | polar | medium | no | N | neutral | polar | medium | no | -1 |
10 | R | negative | polar | large | no | S | neutral | polar | small | no | 0 |
Physicochemical properiets of the wildtype and mutatant amino acid which were used to classify the mutation as severe or non-severe. </figtable>
Structural analysis
We used the HHsearch alignment for mapping the mutations of <xr id="tab:mutations"/> onto 2nt0_Aand investigated for each mutated site its solvent accessibility (buried or exposed), secondary structure (H=Helix, S=Sheet, C=coil), and whether it takes places in a domain region. Based-upon these features, we estimated a disease score (cf. <xr id="tab:structure"/>, <xr id="fig:structure_all"/>).
<figtable id="tab:structure">
Location of mutations in 2nt0_A. Blue: wildtype; Red: mutant; Acc: Solvent accessibility. </figtable>
<figure id="fig:structure_all">
</figure>
W248R takes place in the hydrophobic core region of the TIM beta/alpha-barrel domain and inserting a hydrophilic, arginine is likely to impair the catalytic activity of the enzyme, although it does no change the secondary structure according to the PSI-PRED prediction. Same holds true for A423D. We though L509P as disease-causing as Proline turned the sheet into a loop region at this site. W351C might change the protein structure due to the formation of disulide bonds.
Conservation
The conservation of an amino acid indicates its importance for the structure and function of the protein. The log-odds substitution score is used to quantify the likelihood of a substitution where a negative score indicates that the substitution is observed less frequently than expected by chance. This is primarily due to different physicochemical properties which cause severe structural changes such that the resulting protein is negatively selected. Hence, substitutions with a negative score a likely to be disease-causing whereas a positive score indicates that the mutation does not affect the protein.
BLOSUM62 scores
The BLOSUM62 matrix substitution matrix was derived by clustering sequences of the Blocks database with a minimal identity of 62% and counting inter-cluster substitutions. The evolutionary distance underlying the BLOSUM62 matrix turned out to be suitable for many applications. We labeled substitutions with a score close to the minimal score as disease-causing (cf. <xr id="tab:subst_blosum"/>).
<figtable id="tab:subst_blosum">
Nr | Mutation | Score mutation |
Score min |
Score max |
Disease score |
---|---|---|---|---|---|
1 | H99R | 0 | -3 | 8 | 0 |
2 | V211I | 3 | -3 | 4 | -1 |
3 | E150K | 1 | -4 | 5 | 0 |
4 | L236P | -3 | -4 | 4 | 1 |
5 | W248R | -3 | -4 | 11 | 1 |
6 | L509P | -3 | -4 | 4 | 1 |
7 | W351C | -2 | -4 | 11 | 1 |
8 | A423D | -2 | -3 | 4 | 1 |
9 | D482N | 1 | -4 | 6 | 0 |
10 | R83S | -1 | -3 | 5 | 0 |
BLOSUM62 scores of the selected mutations. </figtable>
PSSM of all hits
A Position Specific Scoring Matrix (PSSM) or profile is a matrix which stores the probability P(a|i) to a observe amino acid a at position i. It is derived from a sequence alignment and the position specific substitution scores S(a,b)=log P(a|i)/ P(a) are more precise than the general BLOSUM62 scores. We therefore computed an alignment (cf. <xr id="fig:subst_pssm_all_ali"/>) from all significant sequences found by performing five rounds PSI-BLAST, computed a PSSM (cf. <xr id="fig:subst_pssm_all"/>) and used the position specific substitution scores to assign a disease score for each mutation (cf. <xr id="tab:subst_pssm_all"/>). Mutations 5-7 were though of as disease-causing since their substitution score were close to the minimum and the sites were highly conserved.
PSSM.
</figure> </figure><figure id="fig:subst_pssm_all_ali"> |
<figure id="fig:subst_pssm_all"> |
<figtable id="tab:subst_pssm_all">
Position specific substitution scores derived from all significant hits after 5 rounds PSI-BLAST. The respective profile column is shown on the right. </figtable>
PSSM of close homologous sequences
PSSM.
</figure> </figure><figure id="fig:subst_pssm_best_ali"> |
<figure id="fig:subst_pssm_best"> |
<figtable id="tab:subst_pssm_best">
Position specific substitution scores derived from the 60 closest homologous sequences after 5 rounds PSI-BLAST. The respective profile column is shown on the right. </figtable>
Scoring Mutants
SIFT
The predicted results from SIFT Blink is shown here:
Substitution at pos 83 from R to S is predicted to be TOLERATED with a score of 0.17. Median sequence conservation: 2.12 Sequences represented at this position:77 Substitution at pos 99 from H to R is predicted to be TOLERATED with a score of 0.64. Median sequence conservation: 2.14 Sequences represented at this position:80 Substitution at pos 150 from E to K is predicted to be TOLERATED with a score of 0.76. Median sequence conservation: 2.10 Sequences represented at this position:86 Substitution at pos 211 from V to I is predicted to be TOLERATED with a score of 0.56. Median sequence conservation: 2.09 Sequences represented at this position:86 Substitution at pos 236 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.02. Median sequence conservation: 2.09 Sequences represented at this position:86 Substitution at pos 248 from W to R is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 2.09 Sequences represented at this position:86 Substitution at pos 351 from W to C is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 2.10 Sequences represented at this position:87 Substitution at pos 423 from A to D is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01. Median sequence conservation: 2.10 Sequences represented at this position:85 Substitution at pos 482 from D to N is predicted to be TOLERATED with a score of 0.69. Median sequence conservation: 2.18 Sequences represented at this position:66 Substitution at pos 509 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 2.10 Sequences represented at this position:79
The predicted results from SIFT is shown here:
Substitution at pos 83 from R to S is predicted to AFFECT PROTEIN FUNCTION with a score of 0.05. Median sequence conservation: 3.10 Sequences represented at this position:15 Substitution at pos 99 from H to R is predicted to be TOLERATED with a score of 0.74. Median sequence conservation: 3.11 Sequences represented at this position:14 Substitution at pos 150 from E to K is predicted to be TOLERATED with a score of 0.44. Median sequence conservation: 3.10 Sequences represented at this position:16 Substitution at pos 211 from V to I is predicted to be TOLERATED with a score of 1.00. Median sequence conservation: 3.10 Sequences represented at this position:16 Substitution at pos 236 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.10 Sequences represented at this position:16 Substitution at pos 248 from W to R is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.10 Sequences represented at this position:16 Substitution at pos 351 from W to C is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.10 Sequences represented at this position:16 Substitution at pos 423 from A to D is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01. Median sequence conservation: 3.10 Sequences represented at this position:16 Substitution at pos 482 from D to N is predicted to be TOLERATED with a score of 0.77. Median sequence conservation: 3.10 Sequences represented at this position:16 Substitution at pos 509 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01. Median sequence conservation: 3.11 Sequences represented at this position:14
Polyphen2
H99R This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00) This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00) V211I This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00) This mutation is predicted to be benign with a score of 0.001 (sensitivity: 0.99; specificity: 0.09) E150K This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00) This mutation is predicted to be benign with a score of 0.001 (sensitivity: 0.99; specificity: 0.09) L236P This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) W248R This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) This mutation is predicted to be probably damaging with a score of 0.999 (sensitivity: 0.09; specificity: 0.99) L509P This mutation is predicted to be probably damaging with a score of 0.992 (sensitivity: 0.70; specificity: 0.97) This mutation is predicted to be probably damaging with a score of 0.988 (sensitivity: 0.53; specificity: 0.95) W351C This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) A423D This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.36; specificity: 0.97) D482N This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00) his mutation is predicted to be benign with a score of 0.002 (sensitivity: 0.99; specificity: 0.18) R83S This mutation is predicted to be benign with a score of 0.007 (sensitivity: 0.96; specificity: 0.75) This mutation is predicted to be benign with a score of 0.019 (sensitivity: 0.95; specificity: 0.55)
SNAP
Discussion
<figtable id="tab:discussion">
Property | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
Name | Weight | H99R | V211I | E150K | L236P | W248R | L509P | W351C | A423D | D482N | R83S |
Physicochemical | 1.0 | -1 | -1 | 1 | -1 | 1 | -1 | 1 | 1 | -1 | 0 |
Structure | 0.8 | -1 | -1 | -1 | -1 | 1 | 1 | 0 | 1 | 0 | -1 |
BLOSUM62 | 0.2 | 0 | -1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
PSSM all | 0.4 | 0 | -1 | 0 | 0 | 1 | 1 | 1 | 1 | -1 | 0 |
PSSM close | 0.4 | 0 | -1 | 0 | 1 | 1 | 1 | 1 | 1 | -1 | 0 |
SIFT | 1.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Polyphen2 | 1.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
SNAP | 1.0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Average disease score | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
Prediction | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
Verification | -1 | -1 | 1 | 1 | 1 | -1 | 1 | 1 | -1 | -1 |
Summary of the sequence-based mutation analysis. A final disease score is obtained by computing the weighted average of all individual disease scores. </figtable>
H99R
H99R is not disease causing. Not listed in HGMD.
V211I
V211I is not disease causing. Not listed in HGMD.
E150K
Gaucher disease type 1 [1]
L236P
Gaucher disease type 1 [2]
W248R
Gaucher disease [3]
L509P
L509P is not disease causing. Not listed in HGMD.
W351C
Gaucher disease type 1 [4]
A423D
Gaucher disease [5]
D482N
D482N is not disease causing. Not listed in HGMD.
R83S
R83S is not disease causing. Not listed in HGMD.