Difference between revisions of "Lab Journal - Task 8 (PAH)"

From Bioinformatikpedia
(Analyze SNPs)
(PSSM)
Line 54: Line 54:
 
===PSSM===
 
===PSSM===
 
We created the PSSM matrix using the standard parameter of PsiBlast and five iterations.
 
We created the PSSM matrix using the standard parameter of PsiBlast and five iterations.
  +
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V
  +
20 Q -2 1 2 3 -4 2 2 -1 2 -3 -1 2 -2 -3 -3 -1 -1 -4 -3 -1 0 7 13 17 0 14 11 3 5 0 6 16 0 0 0 0 2 0 0 6 0.32 inf
  +
64 H -5 2 -1 -3 -7 0 0 -6 9 -7 -6 2 -3 -2 -6 -2 -5 -6 1 -7 0 10 2 1 0 3 6 0 58 0 0 11 1 1 0 2 0 0 5 0 1.89 inf
  +
103 G 0 0 0 0 2 0 0 -1 1 0 -1 0 1 0 0 0 0 -2 1 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 5 5 0.03 inf
  +
123 R -3 6 -1 -3 -3 0 -2 -3 0 0 -1 3 -1 -4 -3 -1 1 -4 -3 -3 1 49 1 0 1 3 0 1 2 7 4 16 1 0 0 2 10 0 0 0 0.72 inf
  +
259 A 6 -2 -2 -3 -3 -1 -2 -1 -3 -2 -2 -2 -1 -4 -3 1 -2 -4 -2 -2 70 2 1 0 0 3 1 3 0 2 4 1 1 0 0 9 1 0 1 1 0.93 inf
  +
266 T 3 -1 -1 -3 -3 -3 -3 0 -3 -3 -3 0 -3 -4 -3 0 5 -4 0 -2 29 3 1 0 0 0 0 6 0 0 1 6 0 0 0 5 43 0 3 1 0.70 inf
  +
341 K -2 4 -1 -2 -1 1 1 -3 -1 -3 -1 4 -1 0 -2 -1 -1 -1 -2 -2 0 28 0 0 2 2 10 0 0 0 8 39 1 7 0 1 1 1 0 1 0.44 inf
  +
392 F -4 -5 -5 -5 0 -5 -5 -3 -3 3 2 -5 1 6 -5 -4 -3 -2 2 1 0 0 0 0 2 0 0 2 0 16 17 0 2 49 0 0 0 0 6 7 0.98 inf
  +
416 P 0 -4 1 -3 -5 -3 -2 -3 -2 -2 -5 -2 -4 -1 7 -1 -3 -6 -4 -4 9 0 9 0 0 0 2 1 1 3 0 2 0 4 66 4 0 0 0 0 1.60 inf
  +
421 I -1 -4 -4 -4 -3 -1 -4 -1 -4 4 1 -3 0 -2 -4 -3 -2 -4 -3 4 5 0 0 0 0 4 0 6 0 32 9 0 1 0 0 0 0 0 0 41 0.62 inf
   
 
===SIFT===
 
===SIFT===

Revision as of 14:55, 27 June 2013

Mutation dataset

For the generation of the mutation dataset the following five SNPs from the HGMD database were used (25th June 2013): <figtable id="mutds_hgmd">

Missense mutations (SNPs) from HGMD
Accession Number Codon change Sequence position Amino acid change Codon number Disease Reference
CM000542 CAG⇒CTG 59 Gln(Q)-Leu(L) 20 Hyperphenylalaninaemia Hennermann (2000) Hum Mutat 15, 254
CM045080 GGT⇒AGT 307 Gly(G)-Ser(S) 103 Phenylketonuria Lee (2004) J Hum Genet 49, 617
CM910286 GCC⇒GTC 776 Ala(A)-Val(V) 259 Phenylketonuria Labrune (1991) Am J Hum Genet 48, 1115
CM010981 AAG⇒ACG 1022 Lys(K)-Thr(T) 341 Phenylketonuria Tyfield (1997) Am J Hum Genet 60, 388
CM090791 CCA⇒CAA 1247 Pro(P)-Gln(Q) 416 Hyperphenylalaninaemia Dobrowolski (2009) J Inherit Metab Dis 32, 10

</figtable>


Furthermore, the following five mutations from dbSNP were added (25th June 2013): <figtable id="mutds_dbSNP">

Missense mutations (SNPs) from dbSNP
Reference SNP Codon change Sequence position Amino acid change Codon number Disease
rs199475569 CAC⇒AAC 190 His(H)-Asn(N) 64 ?
rs199475681 AGA⇒ATA 368 Arg(R)-Ile(I) 123 ?
rs62508752 ACA⇒CCA 796 Thr(T)-Ala(A) 266 Phenylketonuria
rs199475695 TTT⇒TCT 1175 Phe(F)-Ser(S) 392 ?
rs199475696 ATT⇒ACT 1262 Ile(I)-Thr(T) 421 ?

</figtable>

Analyze SNPs

PSSM

We created the PSSM matrix using the standard parameter of PsiBlast and five iterations.

         A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V
 20 Q   -2  1  2  3 -4  2  2 -1  2 -3 -1  2 -2 -3 -3 -1 -1 -4 -3 -1    0  7 13 17  0 14 11  3  5  0  6 16  0  0  0  0  2  0  0  6 0.32 inf
 64 H   -5  2 -1 -3 -7  0  0 -6  9 -7 -6  2 -3 -2 -6 -2 -5 -6  1 -7    0 10  2  1  0  3  6  0 58  0  0 11  1  1  0  2  0  0  5  0 1.89 inf  
103 G    0  0  0  0  2  0  0 -1  1  0 -1  0  1  0  0  0  0 -2  1  0    5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  0  5  5 0.03 inf
123 R   -3  6 -1 -3 -3  0 -2 -3  0  0 -1  3 -1 -4 -3 -1  1 -4 -3 -3    1 49  1  0  1  3  0  1  2  7  4 16  1  0  0  2 10  0  0  0 0.72 inf
259 A    6 -2 -2 -3 -3 -1 -2 -1 -3 -2 -2 -2 -1 -4 -3  1 -2 -4 -2 -2   70  2  1  0  0  3  1  3  0  2  4  1  1  0  0  9  1  0  1  1 0.93 inf
266 T    3 -1 -1 -3 -3 -3 -3  0 -3 -3 -3  0 -3 -4 -3  0  5 -4  0 -2   29  3  1  0  0  0  0  6  0  0  1  6  0  0  0  5 43  0  3  1 0.70 inf
341 K   -2  4 -1 -2 -1  1  1 -3 -1 -3 -1  4 -1  0 -2 -1 -1 -1 -2 -2    0 28  0  0  2  2 10  0  0  0  8 39  1  7  0  1  1  1  0  1 0.44 inf 
392 F   -4 -5 -5 -5  0 -5 -5 -3 -3  3  2 -5  1  6 -5 -4 -3 -2  2  1    0  0  0  0  2  0  0  2  0 16 17  0  2 49  0  0  0  0  6  7 0.98 inf
416 P    0 -4  1 -3 -5 -3 -2 -3 -2 -2 -5 -2 -4 -1  7 -1 -3 -6 -4 -4    9  0  9  0  0  0  2  1  1  3  0  2  0  4 66  4  0  0  0  0 1.60 inf
421 I   -1 -4 -4 -4 -3 -1 -4 -1 -4  4  1 -3  0 -2 -4 -3 -2 -4 -3  4    5  0  0  0  0  4  0  6  0 32  9  0  1  0  0  0  0  0  0 41 0.62 inf

SIFT

  • A259V
Substitution at pos 20 from Q to L is predicted to be TOLERATED with a score of 0.16.
   Median sequence conservation: 3.20
   Sequences represented at this position:42
  • R123I
Substitution at pos 64 from H to N is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • Q20L
Substitution at pos 103 from G to S is predicted to be TOLERATED with a score of 0.90.
   Median sequence conservation: 3.02
   Sequences represented at this position:72
  • G103S
Substitution at pos 123 from R to I is predicted to be TOLERATED with a score of 0.05.
   Median sequence conservation: 3.01
   Sequences represented at this position:76
  • H64N
Substitution at pos 259 from A to V is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • I421T
Substitution at pos 266 from T to A is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • K341T
Substitution at pos 341 from K to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • F392S
Substitution at pos 392 from F to S is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • P416Q
Substitution at pos 416 from P to Q is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.00
   Sequences represented at this position:74
  • T266A
Substitution at pos 421 from I to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.00
   Sequences represented at this position:74

PolyPhen2

  • A259V
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
  • R123I
HumDiv
This mutation is predicted to be possibly damaging with a score of 0.807 (sensitivity: 0.84; specificity: 0.93)
HumVar
This mutation is predicted to be possibly damaging with a score of 0.582 (sensitivity: 0.81; specificity: 0.83)
  • Q20L
HumDiv
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
HumVar
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
  • G103S
HumDiv
This mutation is predicted to be benign with a score of 0.003 (sensitivity: 0.98; specificity: 0.44)
HumVar
This mutation is predicted to be benign with a score of 0.006 (sensitivity: 0.97; specificity: 0.45)
  • H64N
HumDiv
This mutation is predicted to be probably damaging with a score of 0.993 (sensitivity: 0.70; specificity: 0.97)
HumVar
This mutation is predicted to be probably damaging with a score of 0.962 (sensitivity: 0.62; specificity: 0.92)
  • I421T
HumDiv
This mutation is predicted to be possibly damaging with a score of 0.667 (sensitivity: 0.86; specificity: 0.91)
HumVar
This mutation is predicted to be probably damaging with a score of 0.913 (sensitivity: 0.69; specificity: 0.90)
  • K341T
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.36; specificity: 0.97)
  • F392S
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
  • P416Q
HumDiv
This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.55; specificity: 0.98)
HumVar
This mutation is predicted to be probably damaging with a score of 0.985 (sensitivity: 0.55; specificity: 0.94)
  • T266A
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

SNAP

  • A259V
  • R123I
  • Q20L
  • G103S
  • H64N
  • I421T
  • K341T
  • F392S
  • P416Q
  • T266A

MutationTaster

Gene: ENSG00000171759
Transcript: ENST00000553106
For affected protein features only those are reported that are annotated as "lost" and not as "might get lost".

  • A259V - C776T
Prediction: disease causing, Model: simple_aae, prob: ~1 (classification due to ClinVar)
Summary: 
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM910286)
•	known disease mutation: rs118203921 (pathogenic)
•	protein features (might be) affected (HELIX: 251-259, lost)
splice sites: Acc marginally increased
  • R123I - G368T
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected
•	splice site changes (Acc increased)
  • Q20L - A59T
Prediction: disease causing, Model: simple_aae, prob: ~0.95
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM000542)
•	protein features (might be) affected
•	splice site changes (Donor lost, acc marginally increased)
  • G103S - G307A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM045080)
•	protein features (might be) affected (Domain ACT: lost)
•	splice site changes (Donor gained)
  • H64N - C190A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CD993066)
•	protein features (might be) affected (Domain ACT: lost)
  • I421T - T1262C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected (STRAND: 420-424, lost)
  • K341T - A1022C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM010980)
•	known disease mutation at this position (HGMD CM010981)
•	protein features (might be) affected (STRAND: 339-342, lost)
  • F392S - T1175C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected (HELIX: 392-403, lost)
•	splice site changes (Donor increased)
  • P416Q - C1247A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM090791)
•	protein features (might be) affected (TURN: 416-419, lost)
•	splice site changes (Acc marginally increased, donor increased, donor gained)
  • T266A - A796C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected 
•	splice site changes (Acc increased, acc gained, donor increased)