Difference between revisions of "Gaucher Disease: Task 08 - LabJournal"

From Bioinformatikpedia
(MutationTaster)
(MutationTaster)
Line 137: Line 137:
   
   
'''G241E''''
+
'''G241E'''
 
Prediction: disease causing
 
Prediction: disease causing
 
Model: simple_aae, prob: 0.999988349653367
 
Model: simple_aae, prob: 0.999988349653367
Line 143: Line 143:
   
 
'''V349I'''
 
'''V349I'''
  +
Prediction: disease causing
  +
Model: simple_aae, prob: 0.998220211902893
  +
Summary: amino acid sequence changed, protein features (might be) affected, splice site changes
  +
   
 
'''T408M'''
 
'''T408M'''
  +
Prediction: polymorphism
  +
Model: simple_aae, prob: 0.595642579948168
  +
Summary: amino acid sequence changed, heterozygous in TGP, known disease mutation at this position (HGMD CM960697), protein features (might be) affected, splice site changes
  +
   
 
'''N409S'''
 
'''N409S'''

Revision as of 18:13, 31 August 2013

Choosing Mutation set

The cDNA sequence used by HGMD has the accession nr. NM_001005741.2). The one letter code of this cDNA has a 100% sequence identity to the reference sequence of Uniprot, which we used for all tasks. Therefore,the exact positions of the mutations listed by HGMD can be taken. The accession number is also listed by dbSNP, so that the right mutation position can be seen.

To choose mutations from dbSNP we selected four point mutations of the SNP Geneview Report of glucocerebrosidase with the NP_001005741.1 which was already used in task 7. For HGMD we randomly picked six mutations from the snp list of the NP_001005741.1.

Mutation Analysis

The property information about the amino acids were taken from Wikipedia.

The visualisation of the mutations was done with pymol. For the mutation of the residues we followed the description of the PymolWiki We considered again the sequence shift of 39 residues between uniprot and pdb sequence.

We took the information about the secondary structure from our previous task 3. We alway choosed the secondary structure type which was predicted from the majority of the prediction tools in that task.

The scores for the mutations were looked up in two substitution matrices: BLOSUM62 and PAM250

PSSM

We created different PSSM matrices. The first was generated with PsiBlast:

blastpgp -j 5 -h 10e-6
-i /mnt/home/student/gerkej/gaucher/task8/P04062.fasta 
-d /mnt/project/pracstrucfunc13/data/big/big_80  
-o /mnt/home/student/gerkej/gaucher/task8/psiBl/ev-6/big80_it5.out 
-Q /mnt/home/student/gerkej/gaucher/task8/psiBl/ev-6/big80_it5.pssm 
-C /mnt/home/student/gerkej/gaucher/task8/psiBl/ev-6/big80_it5.chk

PSSM by psiblast

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  77 S    0  1  0  1 -1  1  1 -1  0 -1 -2  0 -1 -3 -3  1  1 -4 -2  0    7   9   4   8   1   7  11   3   2   4   4   6   1   1   0  11  11   0   1   8  0.14 inf
 141 N   -1  0  2  4 -2  2  2 -3 -1 -3 -3  1 -2 -5 -4  0  1  1 -2 -3    5   4  10  20   1  10  15   2   1   1   3   7   1   0   0   7   7   2   1   2  0.40 inf
 159 R   -6  8 -6 -7 -8 -4 -5 -7 -6 -8 -4  2 -6 -7 -7 -3 -6  1 -7 -7    0  81   0   0   0   0   0   0   0   0   2  11   0   0   0   2   0   2   0   0  2.56 inf
 213 L   -3 -5 -2 -6 -4 -5 -5 -2 -5  4  2 -5  0  3 -5 -4 -1  3  0  3    2   0   2   0   0   0   0   3   0  23  22   0   2  13   0   0   4   4   2  21  0.72 inf
 241 G   -1  2  0  2 -4  1 -1  1 -1 -2 -1  3 -2 -3 -1 -1 -1  0 -1 -2    5  11   5  11   0   7   3  11   2   2   6  17   1   1   4   3   4   2   3   3  0.19 inf
 349 V    0 -3 -2 -3  0 -4 -3  2 -1  0 -1 -4 -1  4 -5  0  0  5  2  2   10   1   1   1   2   0   1  14   2   5   4   0   2  17   0   6   6   9   7  14  0.42 inf
 408 T   -1  1  2  2  0  2  0  2  2 -2 -2  0 -1 -1 -1  0  0 -4  0 -2    3   7  10  11   2   9   5  19   5   2   3   4   2   2   2   5   4   0   3   3  0.19 inf
 409 N    1 -2  1  1  1 -2 -1  0  1 -1 -1 -1  0  1  0  1  0  2  1 -1   11   0  10   9   4   0   3   6   3   4   4   3   3   6   4   9   6   4   7   4  0.09 inf
 483 L   -2 -3 -2 -3 -4 -3 -3  0 -2  3  3 -2  0  0 -3 -3 -2 -4 -1  3    2   1   2   2   0   1   1   9   1  13  29   3   2   4   1   2   2   0   2  24  0.46 inf
 501 N   -6 -6  9 -3 -8 -4 -3 -6 -5 -6 -8 -6 -3 -8 -5 -2 -2 -8 -7 -6    1   0  87   1   0   0   2   0   0   0   0   0   1   0   1   3   3   0   0   1  2.83 inf

The second PSSM is based on an alignment of all mamalian homologous. We identified the homologs on Uniprot. The BlastP seaerch was run on the mamalian database. For P04062 we found 140 homologous sequences with an evalue of less than 10e-4. To generate the MSA of the homologous sequences we used Clustal Omega, which is a newer version of CLustalW and recommended on the ClustalW webserver. With PsiBlast we created the PSSM matrix out of the MSA.

blastpgp 
-i /mnt/home/student/gerkej/gaucher/task8/P04062.fasta 
-d /mnt/project/pracstrucfunc13/data/big/big_80  
-B /mnt/home/student/gerkej/gaucher/task8/psiBl/clustal-omegas.clustal
-o /mnt/home/student/gerkej/gaucher/task8/psiBl/big80_it5.out 
-Q /mnt/home/student/gerkej/gaucher/task8/psiBl/big80_it5.pssm 

PSSM by clustalOmega alignment of hoologous

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  77 S    1 -1  0  0 -1  0  0  0 -1 -2 -2  0 -1 -2  0  3  1 -2 -1 -1    3   2   2   2   1   1   2   4   1   2   3   2   1   1   2  64   4   0   1   2  0.25 inf
 141 N   -1  0  5  3 -2  0  1 -1  0 -3 -3  0 -2 -3 -1  0  0 -3 -2 -2    1   1  55  16   0   1  10   1   0   1   1   6   0   1   1   3   1   0   0   1  0.42 inf
 159 R   -2  5 -1 -2 -3  1  0 -3 -1 -3 -2  2 -2 -2 -2 -1 -1  5 -1 -3    0  86   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  14   0   0  0.76 inf
 213 L   -2 -2 -4 -4 -1 -2 -3 -4 -3  2  4 -3  2  0 -3 -3 -1 -2 -1  1    0   0   0   0   0   0   0   0   0   0 100   0   0   0   0   0   0   0   0   0  0.49 inf
 241 G    0  1 -1 -1 -3 -1 -2  5 -2 -4 -4 -1 -3 -3 -2  0 -2 -3 -3 -3    0  17   0   0   0   0   0  83   0   0   0   0   0   0   0   0   0   0   0   0  0.83 inf
 349 V    0 -3 -3 -3 -1 -2 -3 -3 -3  3  1 -2  1 -1 -3 -2  0 -3 -1  4    0   0   0   0   0   0   0   0   0   3   0   0   0   0   0   0   0   0   0  97  0.39 inf
 408 T    0 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -1  1  4 -2 -1  0    1   1   1   1   0   1   1   1   0   1   2   1   0   1   1   1  82   0   1   1  0.37 inf
 409 N   -1  0  5  1 -2  0  0  0  0 -2 -2  0 -2 -2 -1  0  0 -3 -2 -2    2   1  76   5   0   1   1   2   0   1   2   1   0   1   1   2   1   0   1   1  0.51 inf
 483 L   -2 -2 -4 -4 -1 -2 -3 -4 -3  2  4 -3  2  0 -3 -3 -1 -2 -1  1    0   0   0   0   0   0   0   0   0   0 100   0   0   0   0   0   0   0   0   0  0.50 inf
 501 N   -1  0  6  1 -2  0  0  0  0 -3 -3  0 -2 -3 -2  0  0 -3 -2 -2    1   1  86   1   0   1   1   1   0   1   1   1   0   1   1   1   1   0   0   1  0.61 inf

Prediction approaches

SIFT

Polyphen2

Polyphen2 offers two different prediction scores which were trained and tested on different datasets. As recommended on the website, we decided to focus on the score HumVar which uses all human disease-causing mutations from UniProtKB and common human non-synonymous SNPs as non-damaging SNPs. For the Polyphen2 nomenklatur benign, possibly damaging and probably damaging we used the identifier non-disease causing, possibly damaging and disease causing.


S77R
This mutation is predicted to be benign with a score of 0.170 (sensitivity: 0.89; specificity: 0.72)
N141S
This mutation is predicted to be benign with a score of 0.009 (sensitivity: 0.96; specificity: 0.49)
R159Q
This mutation is predicted to be probably damaging with a score of 0.997 (sensitivity: 0.27; specificity: 0.98)
L213F
This mutation is predicted to be possibly damaging with a score of 0.790 (sensitivity: 0.76; specificity: 0.87)
G241E
This mutation is predicted to be possibly damaging with a score of 0.892 (sensitivity: 0.70; specificity: 0.90)
V349I
This mutation is predicted to be benign with a score of 0.118 (sensitivity: 0.90; specificity: 0.70)
T408M
This mutation is predicted to be benign with a score of 0.113 (sensitivity: 0.90; specificity: 0.69)
N409S
This mutation is predicted to be benign with a score of 0.234 (sensitivity: 0.88; specificity: 0.75)
L483P
This mutation is predicted to be possibly damaging with a score of 0.856 (sensitivity: 0.72; specificity: 0.88)
N501S
This mutation is predicted to be probably damaging with a score of 0.979 (sensitivity: 0.57; specificity: 0.94)

MutationTaster

We used the MutationTaster web version. As input we used:

  • Gene: GBA
  • Transcript:
  • Position/snippet refers to coding sequence (ORF)

For the alteration, we calculated the position of the mutated bases.


S77R

Prediction: disease causing
Model: simple_aae, prob: 0.897766357553113 
Summary: amino acid sequence changed, protein features (might be) affected, splice site changes

N141S

Prediction: polymorphism
Model: simple_aae, prob: 0.828715180775458      
Summary: amino acid sequence changed, protein features (might be) affected, splice site changes

R159Q

Prediction: disease causing
Model: simple_aae, prob: 0.999998729728442 (classification due to ClinVar, real probability is shown anyway)      (explain)
Summary: amino acid sequence changed, known disease mutation at this position (HGMD CM880035), known disease mutation: rs79653797 (pathogenic), protein features (might be) affected, splice site changes

L213F

Prediction: disease causing
Model: simple_aae, prob: 0.99997844587523      
Summary: amino acid sequence changed, known disease mutation at this position (HGMD CM057076), protein features (might be) affected, splice site changes


G241E

Prediction: disease causing
Model: simple_aae, prob: 0.999988349653367      
Summary: amino acid sequence changed, known disease mutation at this position (HGMD CM992894), protein features (might be) affected, splice site changes

V349I

Prediction: disease causing
Model: simple_aae, prob: 0.998220211902893      
Summary: amino acid sequence changed, protein features (might be) affected, splice site changes


T408M

Prediction: polymorphism
Model: simple_aae, prob: 0.595642579948168      
Summary: amino acid sequence changed, heterozygous in TGP, known disease mutation at this position (HGMD CM960697), protein features (might be) affected, splice site changes


N409S

L483P

N501S

SNAP