Lab Journal - Task 8 (PAH)

From Bioinformatikpedia

Mutation dataset

For the generation of the mutation dataset the following five SNPs from the HGMD database were used (25th June 2013): <figtable id="mutds_hgmd">

Missense mutations (SNPs) from HGMD
Accession Number Codon change Sequence position Amino acid change Codon number Disease Reference
CM000542 CAG⇒CTG 59 Gln(Q)-Leu(L) 20 Hyperphenylalaninaemia Hennermann (2000) Hum Mutat 15, 254
CM045080 GGT⇒AGT 307 Gly(G)-Ser(S) 103 Phenylketonuria Lee (2004) J Hum Genet 49, 617
CM910286 GCC⇒GTC 776 Ala(A)-Val(V) 259 Phenylketonuria Labrune (1991) Am J Hum Genet 48, 1115
CM010981 AAG⇒ACG 1022 Lys(K)-Thr(T) 341 Phenylketonuria Tyfield (1997) Am J Hum Genet 60, 388
CM090791 CCA⇒CAA 1247 Pro(P)-Gln(Q) 416 Hyperphenylalaninaemia Dobrowolski (2009) J Inherit Metab Dis 32, 10
Detailed information about the five SNPs taken of HGMD. The first column represents the SNP ID and the second the change of the nucleotide which leads to a codon change. This change happens on the position found in column three, whereas four and five tell about the amino acids that are exchanged and which position this is in the amino acid sequence. The last column shows with wich disease the SNP is associated.

</figtable>


Furthermore, the following five mutations from dbSNP were added (25th June 2013): <figtable id="mutds_dbSNP">

Missense mutations (SNPs) from dbSNP
Reference SNP Codon change Sequence position Amino acid change Codon number Disease
rs199475681 AGA⇒ATA 368 Arg(R)-Ile(I) 123 ?
rs192592111 CAG⇒CAT 516 Gln(Q)-His(H) 172 ?
rs62508752 ACA⇒CCA 796 Thr(T)-Ala(A) 266 Phenylketonuria
rs199475695 TTT⇒TCT 1175 Phe(F)-Ser(S) 392 ?
rs199475696 ATT⇒ACT 1262 Ile(I)-Thr(T) 421 ?
Detailed information about the five SNPs taken of dbSNP. The first column represents the SNP ID and the second the change of the nucleotide which leads to a codon change. This change happens on the position found in column three, whereas four and five tell about the amino acids that are exchanged and which position this is in the amino acid sequence. The last column shows with which disease the SNP is associated, if annotated. Those indicate with '?' are not associated with any disease yet.

</figtable>

Analyze SNPs

PSSM

We created the PSSM matrix using the standard parameter of PsiBlast and five iterations. In the right matrix you can see, that for some of the ten residues the amino acid is not conserved as they have higher values for other amino acids. SNPs are marked in red.

Last position-specific scoring matrix computed, weighted observed percentages rounded down,
information per position, and relative weight of gapless real matches to pseudocounts
         A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V
 20 Q   -2  1  2  3 -4  2  2 -1  2 -3 -1  2 -2 -3 -3 -1 -1 -4 -3 -1    0  7 13 17  0 14 11  3  5  0  6 16  0  0  0  0  2  0  0  6 0.32 inf
103 G    0  0  0  0  2  0  0 -1  1  0 -1  0  1  0  0  0  0 -2  1  0    5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  0  5  5 0.03 inf
123 R   -3  6 -1 -3 -3  0 -2 -3  0  0 -1  3 -1 -4 -3 -1  1 -4 -3 -3    1 49  1  0  1  3  0  1  2  7  4 16  1  0  0  2 10  0  0  0 0.72 inf
172 Q   -1  0  0  3 -4  5  2 -1  0 -3 -3  1 -1 -4 -3 -1  0 -4 -3 -1    5  4  2 16  0 29 11  5  1  1  2  9  1  0  0  3  6  0  0  4 0.43 inf
259 A    6 -2 -2 -3 -3 -1 -2 -1 -3 -2 -2 -2 -1 -4 -3  1 -2 -4 -2 -2   70  2  1  0  0  3  1  3  0  2  4  1  1  0  0  9  1  0  1  1 0.93 inf
266 T    3 -1 -1 -3 -3 -3 -3  0 -3 -3 -3  0 -3 -4 -3  0  5 -4  0 -2   29  3  1  0  0  0  0  6  0  0  1  6  0  0  0  5 43  0  3  1 0.70 inf
341 K   -2  4 -1 -2 -1  1  1 -3 -1 -3 -1  4 -1  0 -2 -1 -1 -1 -2 -2    0 28  0  0  2  2 10  0  0  0  8 39  1  7  0  1  1  1  0  1 0.44 inf 
392 F   -4 -5 -5 -5  0 -5 -5 -3 -3  3  2 -5  1  6 -5 -4 -3 -2  2  1    0  0  0  0  2  0  0  2  0 16 17  0  2 49  0  0  0  0  6  7 0.98 inf
416 P    0 -4  1 -3 -5 -3 -2 -3 -2 -2 -5 -2 -4 -1  7 -1 -3 -6 -4 -4    9  0  9  0  0  0  2  1  1  3  0  2  0  4 66  4  0  0  0  0 1.60 inf
421 I   -1 -4 -4 -4 -3 -1 -4 -1 -4  4  1 -3  0 -2 -4 -3 -2 -4 -3  4    5  0  0  0  0  4  0  6  0 32  9  0  1  0  0  0  0  0  0 41 0.62 inf

SIFT

  • A259V
Substitution at pos 259 from A to V is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • R123I
Substitution at pos 123 from R to I is predicted to be TOLERATED with a score of 0.05.
   Median sequence conservation: 3.01
   Sequences represented at this position:76
  • Q20L
Substitution at pos 20 from Q to L is predicted to be TOLERATED with a score of 0.16.
   Median sequence conservation: 3.20
   Sequences represented at this position:42
  • Q172H
Substitution at pos 172 from Q to H is predicted to AFFECT PROTEIN FUNCTION with a score of 0.03.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • G103S
Substitution at pos 103 from G to S is predicted to be TOLERATED with a score of 0.90.
   Median sequence conservation: 3.02
   Sequences represented at this position:72
  • I421T
Substitution at pos 421 from I to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.00
   Sequences represented at this position:74
  • K341T
Substitution at pos 341 from K to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • F392S
Substitution at pos 392 from F to S is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • P416Q
Substitution at pos 416 from P to Q is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.00
   Sequences represented at this position:74
  • T266A
Substitution at pos 266 from T to A is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77

PolyPhen2

  • A259V
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
  • R123I
HumDiv
This mutation is predicted to be possibly damaging with a score of 0.807 (sensitivity: 0.84; specificity: 0.93)
HumVar
This mutation is predicted to be possibly damaging with a score of 0.582 (sensitivity: 0.81; specificity: 0.83)
  • Q20L
HumDiv
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
HumVar
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
  • Q172H
HumDiv
This mutation is predicted to be possibly damaging with a score of 0.705 (sensitivity: 0.86; specificity: 0.92)
HumVar
This mutation is predicted to be benign with a score of 0.170 (sensitivity: 0.89; specificity: 0.72)
  • G103S
HumDiv
This mutation is predicted to be benign with a score of 0.003 (sensitivity: 0.98; specificity: 0.44)
HumVar
This mutation is predicted to be benign with a score of 0.006 (sensitivity: 0.97; specificity: 0.45)
  • I421T
HumDiv
This mutation is predicted to be possibly damaging with a score of 0.667 (sensitivity: 0.86; specificity: 0.91)
HumVar
This mutation is predicted to be probably damaging with a score of 0.913 (sensitivity: 0.69; specificity: 0.90)
  • K341T
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.36; specificity: 0.97)
  • F392S
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
  • P416Q
HumDiv
This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.55; specificity: 0.98)
HumVar
This mutation is predicted to be probably damaging with a score of 0.985 (sensitivity: 0.55; specificity: 0.94)
  • T266A
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

SNAP

A259V          Non-neutral          Reliability Index: 2          Expected Accuracy: 70%
R123I              Neutral          Reliability Index: 0          Expected Accuracy: 53%
Q20L               Neutral          Reliability Index: 0          Expected Accuracy: 53%
Q172H              Neutral          Reliability Index: 4          Expected Accuracy: 85%
G103S              Neutral          Reliability Index: 6          Expected Accuracy: 92%
I421T          Non-neutral          Reliability Index: 2          Expected Accuracy: 70%
K341T          Non-neutral          Reliability Index: 3          Expected Accuracy: 78%
F392S          Non-neutral          Reliability Index: 1          Expected Accuracy: 63%
P416Q          Non-neutral          Reliability Index: 1          Expected Accuracy: 63%
T266A              Neutral          Reliability Index: 2          Expected Accuracy: 69%

MutationTaster

Gene: ENSG00000171759
Transcript: ENST00000553106
For affected protein features only those are reported that are annotated as "lost" and not as "might get lost".

  • A259V - C776T
Prediction: disease causing, Model: simple_aae, prob: ~1 (classification due to ClinVar)
Summary: 
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM910286)
•	known disease mutation: rs118203921 (pathogenic)
•	protein features (might be) affected (HELIX: 251-259, lost)
splice sites: Acc marginally increased
  • R123I - G368T
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected
•	splice site changes (Acc increased)
  • Q20L - A59T
Prediction: disease causing, Model: simple_aae, prob: ~0.95
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM000542)
•	protein features (might be) affected
•	splice site changes (Donor lost, acc marginally increased)
  • Q172H - G516T
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary 	
•	amino acid sequence changed
•	protein features (might be) affected 
•	splice site changes (Acc marginally increased, donor lost, acc gained)
  • G103S - G307A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM045080)
•	protein features (might be) affected (Domain ACT: lost)
•	splice site changes (Donor gained)
  • I421T - T1262C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected (STRAND: 420-424, lost)
  • K341T - A1022C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM010980)
•	known disease mutation at this position (HGMD CM010981)
•	protein features (might be) affected (STRAND: 339-342, lost)
  • F392S - T1175C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected (HELIX: 392-403, lost)
•	splice site changes (Donor increased)
  • P416Q - C1247A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM090791)
•	protein features (might be) affected (TURN: 416-419, lost)
•	splice site changes (Acc marginally increased, donor increased, donor gained)
  • T266A - A796C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected 
•	splice site changes (Acc increased, acc gained, donor increased)