Difference between revisions of "Lab journal task 8"

From Bioinformatikpedia
(SNAP)
(SNAP)
Line 192: Line 192:
   
 
Val53Met
 
Val53Met
Prediction
+
Prediction
disease causing
+
disease causing
 
Model: simple_aae, prob: 0.974610381496817 (explain)
 
Model: simple_aae, prob: 0.974610381496817 (explain)
Summary
+
Summary
 
amino acid sequence changed
 
amino acid sequence changed
 
known disease mutation at this position (HGMD CM994469)
 
known disease mutation at this position (HGMD CM994469)
Line 202: Line 202:
   
 
His63Asp
 
His63Asp
  +
Prediction
  +
disease causing
  +
Model: simple_aae, prob: 0.974610381496817 (explain)
  +
Summary
  +
amino acid sequence changed
  +
known disease mutation at this position (HGMD CM994469)
  +
protein features (might be) affected
  +
splice site changes
   
 
Arg67His
 
Arg67His
  +
Prediction
  +
polymorphism
  +
Model: simple_aae, prob: 0.999999997930159 (explain)
  +
Summary
  +
amino acid sequence changed
  +
listed as SNP
  +
protein features (might be) affected
  +
splice site changes
   
 
Met97Ile
 
Met97Ile
  +
Prediction
  +
disease causing
  +
Model: simple_aae, prob: 0.943409356836766 (explain)
  +
Summary
  +
amino acid sequence changed
  +
protein features (might be) affected
   
 
Asn130Ser
 
Asn130Ser
  +
Prediction
  +
polymorphism
  +
Model: simple_aae, prob: 0.999999996637944 (explain)
  +
Summary
  +
amino acid sequence changed
  +
listed as SNP
  +
protein features (might be) affected
  +
splice site changes
   
 
Glu168Gln
 
Glu168Gln
  +
Prediction
  +
polymorphism
  +
Model: simple_aae, prob: 0.707489599782817 (explain)
  +
Summary
  +
amino acid sequence changed
  +
known disease mutation at this position (HGMD CM004106)
  +
known disease mutation at this position (HGMD CM004810)
  +
protein features (might be) affected
  +
splice site changes
   
 
Leu183Pro
 
Leu183Pro
  +
Prediction
  +
disease causing
  +
Model: simple_aae, prob: 0.999999979100498 (explain)
  +
Summary
  +
amino acid sequence changed
  +
known disease mutation at this position (HGMD CM081301)
  +
protein features (might be) affected
   
 
Thr217Ile
 
Thr217Ile
  +
Prediction
  +
polymorphism
  +
Model: simple_aae, prob: 0.999999999993365 (explain)
  +
Summary
  +
amino acid sequence changed
  +
listed as SNP
  +
protein features (might be) affected
  +
splice site changes
   
 
Cys282Tyr
 
Cys282Tyr
  +
Prediction
  +
disease causing
  +
Model: simple_aae, prob: 0.999999999736277 (classification due to ClinVar, real probability is shown anyway) (explain)
  +
Summary
  +
amino acid sequence changed
  +
heterozygous in TGP
  +
known disease mutation at this position (HGMD CM004391)
  +
known disease mutation at this position (HGMD CM960828)
  +
known disease mutation: rs1800562 (pathogenic)
  +
protein features (might be) affected
   
 
Arg330Met
 
Arg330Met
  +
Prediction
  +
disease causing
  +
Model: simple_aae, prob: 8.23173030693237e-06 (classification due to ClinVar, real probability is shown anyway) (explain)
  +
Summary
  +
amino acid sequence changed
  +
known disease mutation at this position (HGMD CM990722)
  +
known disease mutation: rs111033558 (pathogenic)
  +
protein features (might be) affected
  +
splice site changes

Revision as of 20:57, 24 August 2013

Mutation selection

10 mutations were randomly selected from HGMD and dbSNP.

Mutation analysis

The description of the physicochemical properties is based on the entry for amino acids in wikipedia.

The mutations were visualized with Pymol. Because the pdb structure 1A6Z starts at position 22 in the reference structure, we subtracted 22 from the codon position to get the position of the mutation in the structure. The mutatins were done following the description in Use PyMOL for this. We did mutations for the first 9 SNPs but the last one (Arg330Met) could not be visualized, because the pdb structure is shorter than the reference sequence and only contains the residues 22 to 297. The rotamer for each mutated residue was selected based on the orientation and the size and color of the discs. We selected the rotamers with the least and smallest red discs if there was none without. For residues that are located on the border of the protein, we also tried to find rotamers that are not pointed into the solvent.

The secondary structure of the location of the mutation was taken from the DSSP assignment of the 1A6Z_A structure.

The BLOSUM62 matrix was taken from BLOSUM62 and the PAM250 matrix from PAM250.

PSSM fom PsiBlast with 5 iterations and default parameters using the /mnt/project/pracstrucfunc13/data/big/big_80 database:

blastpgp -i /mnt/home/student/betza/data/hfe.fasta -d /mnt/project/pracstrucfunc13/data/big/big_80 -j 5 -o /mnt/home/student/betza/task8/psiblast/iter5_big80.results
 -Q /mnt/home/student/betza/task8/psiblast/iter5_big80.pssm -C /mnt/home/student/betza/task8/psiblast/iter5_big80.chk

The resulting PSSM is the following (only the 10 mutation positions are shown):

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V	
  53 V   -3 -6 -7 -7 -3 -6 -6 -7 -2  0  4 -6  1 -1 -6 -4 -5 -6 -5  6    2   0   0   0   1   0   0   0   1   4  37   0   3   2   0   1   0   0   0  47  1.32 inf
  63 H   -3 -1  2 -3  1 -5 -4 -1  0 -4 -5  0 -3 -7 -3  6 -1 -2 -7 -5    1   3   8   2   3   0   1   5   2   1   1   5   1   0   1  62   4   1   0   1  1.26 inf
  67 R   -3  5  0 -3 -3  2  0 -3 -1 -2 -3  3 -2 -6 -2  0  2 -1 -5 -4    2  33   4   1   1   7   5   2   1   2   2  16   1   0   2   5  13   1   0   1  0.69 inf
  97 M   -1  0  0  0  2  0  0 -1  2  0 -1  0  2  1  0 -1  0  4  1 -1    5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5   5  0.09 inf
 130 N   -4  0  2  0 -2 -2 -1  5 -2 -2 -4  0 -1 -4 -4 -1  2 -1 -6 -3    1   5  10   4   1   2   4  40   1   3   2   5   1   1   1   5  11   1   0   2  0.67 inf
 168 E   -3  1 -1 -2 -3 -1  0 -3 -2  0  1  4  0  0 -3 -3 -3  0  2  0    2   6   4   2   1   2   7   2   1   6  14  22   2   4   2   3   1   1   7   6  0.29 inf
 183 L   -4 -3 -3 -4 -2 -4 -4 -5 -1  1  5 -2  2 -1 -4 -3 -3 -1  1  2    2   2   1   1   1   1   1   1   1   6  45   3   4   3   1   2   2   1   5  13  0.68 inf
 217 T    0 -3 -2  0 -5 -1 -1  1 -1 -3  1 -2 -1 -3  3  3  0 -6 -1 -1    8   1   1   4   0   3   3  10   1   1  12   2   1   1  15  22   6   0   2   5  0.30 inf
 282 C   -6 -6 -8 -9 12 -8 -9 -8 -8 -7 -7 -8 -7 -5 -8 -5 -6 -8 -5 -6    0   1   0   0  94   0   0   0   0   0   0   0   0   1   0   1   0   0   1   0  4.31 inf
 330 R   -5  5 -4 -6  1 -1 -5 -6  2 -5 -5  3 -3  0 -5 -4 -5  8  4 -6    1  28   1   0   3   2   0   0   4   1   1  18   0   3   1   1   0  23  14   0  1.41 inf

The mutations are marked in purple.

For the creation of the MSA, we first searched for homologous mammalian sequences with the NCBI BlastP online tool in the UniprotKB/SwissProt and restricted the organisms to Mammalia. The E-value cutoff was set to 0.1 and Psi-Blast was used as Algorithm. The maximum target sequences threshold was set to 100, bacause we only wanted to get the close homologs, but we did a second run with a threshold of 20000 to get also remote homologous seuences. All other parameters were left as default. We performed two iterations and then downloaded all matched sequences in fasta format. Those sequences were then used as input for ClustalW. The MSA was gerenerated with default parameters. Jalview was then used to save the alignment in fasta format. The commandline version of Blast 2.2.25+ was then used to generate a PSSM for the HFE protein Q30201 from the MSA:

psiblast -subject Q30201.fasta -in_msa alignment.fa -out_ascii_pssm pssm.txt

The resulting PSSM for the first 100 sequences is the following:

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  53 V    0 -3 -3 -4 -1 -3 -3 -4 -3  2  1 -3  1 -1 -3 -2  0 -3 -1  5    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 100  0.47 inf
  63 H    0 -1  0 -1 -2  0  0 -1  5 -3 -3 -1 -2 -2 -1  4  1 -3 -1 -2    0   0   0   0   0   0   0   0  26   0   0   0   0   0   0  74   0   0   0   0  0.47 inf
  67 R   -2  6 -1 -2 -4  1  0 -3 -1 -3 -3  2 -2 -3 -2 -1 -1 -3 -2 -3    0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  0.91 inf
  97 M    0 -2 -1 -2 -1 -1 -1 -2 -2  0  0 -1  3 -2 -2  1  4  1 -2  0    1   0   0   0   0   0   0   0   0   4   0   0  23   0   0   7  62   2   0   0  0.37 inf
 130 N   -2  6  3 -1 -4  0 -1 -2  0 -4 -3  1 -2 -3 -1 -1 -1 -3 -2 -3    0  75  23   0   0   0   0   0   0   0   0   0   0   0   2   0   0   0   0   0  0.73 inf
 168 E   -1  1 -1  0 -4  1  3 -2 -1 -3 -3  5 -2 -4 -1 -1 -1 -3 -2 -3    0   0   0   0   0   0  23   0   0   0   0  77   0   0   0   0   0   0   0   0  0.62 inf
 183 L   -2 -2 -4 -4 -1 -2 -3 -4 -3  1  4 -3  2  0 -3 -3 -1 -2 -1  1    0   0   0   0   0   0   0   0   0   0  96   0   4   0   0   0   0   0   0   0  0.51 inf
 217 T    1  2 -2 -2 -1 -1 -1 -2 -2  2  0  0  0 -2 -2  0  1 -3 -2  2   15  25   0   0   0   0   0   0   0  12   0   0   0   0   0   0  12   0   0  35  0.15 inf
 282 C   -1 -4 -3 -4  9 -3 -4 -3 -3 -2 -2 -3 -2 -3 -3 -1 -1 -3 -3 -1    0   0   0   0 100   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  1.41 inf
 330 R   -1  3 -2 -3 -2  0 -2 -3 -2  1  1  0  5 -1 -3 -2 -1 -2 -2  1    0  30   0   0   0   0   0   0   0  14   6   0  47   0   0   0   0   0   0   3  0.31 inf

This is the PSSM for all homologous sequences:

Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts
          A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V   A   R   N   D   C   Q   E   G   H   I   L   K   M   F   P   S   T   W   Y   V
  53 V    0 -2  1 -2 -3  1  0  0  2  0  0  0 -1  1 -3  0  0  4  1  0    7   0   7   0   0   7   7   7   7   7   7   7   0   7   0   7   7   7   7   7  0.11 inf
  63 H    0  1  0 -1  0  1  0  1  0 -2  0  1 -2 -1  0  1  0 -3 -2 -1    6   7   5   3   3   6   5  10   3   0  10  14   0   3   3  15   4   0   0   4  0.09 inf
  67 R   -2  3 -1 -1 -4  1 -1 -4  1  1 -1  0  1 -3 -1  0  0  4  0  0    2  20   3   3   0   6   2   0   4   9   6   5   4   0   3   7   8   8   4   6  0.26 inf
  97 M   -1 -1 -1 -1  1 -1 -1 -1 -3  1  0 -1  2  0 -1  0  1  2  1  2    4   3   3   4   3   3   3   3   0   7  10   3   7   3   3   5  12   3   5  16  0.10 inf
 130 N   -1  2  2  2 -4  0  1  2  1 -3 -1  0  0 -3  0 -1 -1  0 -3 -2    3  12  12  13   0   2   9  17   4   0   7   5   2   0   5   2   3   2   0   2  0.23 inf
 168 E   -1  0  0  1  2  1  0 -2 -3  1  2  2 -1  1 -3 -2 -1 -3 -2 -2    3   3   5   7   5   5   4   2   0   9  25  19   0   8   0   0   3   0   0   0  0.18 inf
 183 L   -1 -4 -1 -4  3 -1 -4  0 -4  1  4 -4  2 -3  0  0 -3 -4 -4 -1    4   0   3   0   6   3   0   9   0   7  47   0   6   0   4   9   0   0   0   2  0.50 inf
 217 T    1  0  0  0 -2  0  0  0 -2  0  0 -1  1  0  0  0  0 -3 -2  0   15   7   4   5   0   5   5   9   0   6  12   0   6   5   5   4   5   0   0   7  0.04 inf
 282 C    0 -4  1 -4  8  1 -4 -4 -4 -3 -1 -4 -3 -3 -4  1 -3 -4  1  1    7   0   7   0  46   8   0   0   0   0   7   0   0   0   0  10   0   0   6  10  1.16 inf
 330 R    0 -1 -2 -3  2 -1 -3  3 -1 -1  0 -1  0 -1  1 -1  0  4  0  0    6   4   1   0   6   2   0  25   1   2   8   5   3   2   6   2   6   7   3   9  0.22 inf

SIFT

SIFT was executed from the web interface with default parameters

Val53Met

Substitution at pos 53 from V to M is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

His63Asp

Substitution at pos 63 from H to D is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

Arg67His

Substitution at pos 67 from R to H is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

Met97Ile

Substitution at pos 97 from M to I is predicted to be TOLERATED with a score of 0.53.
   Median sequence conservation: 3.03
   Sequences represented at this position:395

Asn130Ser

Substitution at pos 130 from N to S is predicted to AFFECT PROTEIN FUNCTION with a score of 0.02.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

Glu168Gln

Substitution at pos 168 from E to Q is predicted to AFFECT PROTEIN FUNCTION with a score of 0.01.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

Leu183Pro

Substitution at pos 183 from L to P is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

Thr217Ile

Substitution at pos 217 from T to I is predicted to be TOLERATED with a score of 0.92.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

Cys282Tyr

Substitution at pos 282 from C to Y is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.03
   Sequences represented at this position:396

Arg330Met

Substitution at pos 330 from R to M is predicted to be TOLERATED with a score of 0.11.
   Median sequence conservation: 3.01
   Sequences represented at this position:255

Polyphen2

Polyphen was executed from the web interface using default parameters

Val53Met

HumDiv
This mutation is predicted to be probably damaging with a score of 0.998 (sensitivity: 0.27; specificity: 0.99)
HumVar
This mutation is predicted to be possibly damaging with a score of 0.841 (sensitivity: 0.73; specificity: 0.88)

His63Asp

HumDiv
This mutation is predicted to be benign with a score of 0.142 (sensitivity: 0.92; specificity: 0.86)
HumVar
This mutation is predicted to be benign with a score of 0.161 (sensitivity: 0.89; specificity: 0.72)

Arg67His

HumDiv
This mutation is predicted to be benign with a score of 0.145 (sensitivity: 0.92; specificity: 0.86)
HumVar
This mutation is predicted to be benign with a score of 0.031 (sensitivity: 0.94; specificity: 0.59)

Met97Ile

HumDiv
This mutation is predicted to be possibly damaging with a score of 0.575 (sensitivity: 0.88; specificity: 0.91)
HumVar
This mutation is predicted to be benign with a score of 0.114 (sensitivity: 0.90; specificity: 0.69)

Asn130Ser

HumDiv
This mutation is predicted to be possibly damaging with a score of 0.883 (sensitivity: 0.82; specificity: 0.94)
HumVar
This mutation is predicted to be benign with a score of 0.282 (sensitivity: 0.87; specificity: 0.76)

Glu168Gln

HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 0.980 (sensitivity: 0.57; specificity: 0.94)

Leu183Pro

HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

Thr217Ile

HumDiv
This mutation is predicted to be benign with a score of 0.118 (sensitivity: 0.93; specificity: 0.86)
HumVar
This mutation is predicted to be benign with a score of 0.097 (sensitivity: 0.91; specificity: 0.68)

Cys282Tyr

HumDiv
This mutation is predicted to be probably damaging with a score of 0.961 (sensitivity: 0.78; specificity: 0.95)
HumVar
This mutation is predicted to be possibly damaging with a score of 0.667 (sensitivity: 0.79; specificity: 0.84)

Arg330Met

HumDiv
This mutation is predicted to be probably damaging with a score of 0.997 (sensitivity: 0.41; specificity: 0.98)
HumVar
This mutation is predicted to be possibly damaging with a score of 0.781 (sensitivity: 0.76; specificity: 0.87)

MutationTaster

Val53Met His63Asp Arg67His Met97Ile Asn130Ser Glu168Gln Leu183Pro Thr217Ile Cys282Tyr Arg330Met

SNAP

Val53Met

Prediction 
disease causing

Model: simple_aae, prob: 0.974610381496817 (explain)

Summary 	
   amino acid sequence changed
   known disease mutation at this position (HGMD CM994469)
   protein features (might be) affected
   splice site changes

His63Asp

Prediction	
disease causing

Model: simple_aae, prob: 0.974610381496817 (explain) Summary

   amino acid sequence changed
   known disease mutation at this position (HGMD CM994469)
   protein features (might be) affected
   splice site changes

Arg67His

Prediction
polymorphism

Model: simple_aae, prob: 0.999999997930159 (explain)

Summary 	
   amino acid sequence changed
   listed as SNP
   protein features (might be) affected
   splice site changes

Met97Ile

Prediction	
disease causing

Model: simple_aae, prob: 0.943409356836766 (explain)

Summary 	
   amino acid sequence changed
   protein features (might be) affected

Asn130Ser

Prediction	
polymorphism

Model: simple_aae, prob: 0.999999996637944 (explain)

Summary 	
   amino acid sequence changed
   listed as SNP
   protein features (might be) affected
   splice site changes

Glu168Gln

Prediction
polymorphism

Model: simple_aae, prob: 0.707489599782817 (explain) Summary

   amino acid sequence changed
   known disease mutation at this position (HGMD CM004106)
   known disease mutation at this position (HGMD CM004810)
   protein features (might be) affected
   splice site changes

Leu183Pro

Prediction
disease causing

Model: simple_aae, prob: 0.999999979100498 (explain) Summary

   amino acid sequence changed
   known disease mutation at this position (HGMD CM081301)
   protein features (might be) affected

Thr217Ile

Prediction
polymorphism

Model: simple_aae, prob: 0.999999999993365 (explain) Summary

   amino acid sequence changed
   listed as SNP
   protein features (might be) affected
   splice site changes

Cys282Tyr

Prediction
disease causing

Model: simple_aae, prob: 0.999999999736277 (classification due to ClinVar, real probability is shown anyway) (explain)

Summary 	
   amino acid sequence changed
   heterozygous in TGP
   known disease mutation at this position (HGMD CM004391)
   known disease mutation at this position (HGMD CM960828)
   known disease mutation: rs1800562 (pathogenic)
   protein features (might be) affected

Arg330Met

Prediction
disease causing

Model: simple_aae, prob: 8.23173030693237e-06 (classification due to ClinVar, real probability is shown anyway) (explain)

Summary 	
   amino acid sequence changed
   known disease mutation at this position (HGMD CM990722)
   known disease mutation: rs111033558 (pathogenic)
   protein features (might be) affected
   splice site changes