Lab Journal - Task 8 (PAH)
From Bioinformatikpedia
Contents
Mutation dataset
For the generation of the mutation dataset the following five SNPs from the HGMD database were used (25th June 2013): <figtable id="mutds_hgmd">
Missense mutations (SNPs) from HGMD | ||||||
---|---|---|---|---|---|---|
Accession Number | Codon change | Sequence position | Amino acid change | Codon number | Disease | Reference |
CM000542 | CAG⇒CTG | 59 | Gln(Q)-Leu(L) | 20 | Hyperphenylalaninaemia | Hennermann (2000) Hum Mutat 15, 254 |
CM045080 | GGT⇒AGT | 307 | Gly(G)-Ser(S) | 103 | Phenylketonuria | Lee (2004) J Hum Genet 49, 617 |
CM910286 | GCC⇒GTC | 776 | Ala(A)-Val(V) | 259 | Phenylketonuria | Labrune (1991) Am J Hum Genet 48, 1115 |
CM010981 | AAG⇒ACG | 1022 | Lys(K)-Thr(T) | 341 | Phenylketonuria | Tyfield (1997) Am J Hum Genet 60, 388 |
CM090791 | CCA⇒CAA | 1247 | Pro(P)-Gln(Q) | 416 | Hyperphenylalaninaemia | Dobrowolski (2009) J Inherit Metab Dis 32, 10 |
</figtable>
Furthermore, the following five mutations from dbSNP were added (25th June 2013):
<figtable id="mutds_dbSNP">
Missense mutations (SNPs) from dbSNP | |||||
---|---|---|---|---|---|
Reference SNP | Codon change | Sequence position | Amino acid change | Codon number | Disease |
rs199475681 | AGA⇒ATA | 368 | Arg(R)-Ile(I) | 123 | ? |
rs192592111 | CAG⇒CAT | 516 | Gln(Q)-His(H) | 172 | ? |
rs62508752 | ACA⇒CCA | 796 | Thr(T)-Ala(A) | 266 | Phenylketonuria |
rs199475695 | TTT⇒TCT | 1175 | Phe(F)-Ser(S) | 392 | ? |
rs199475696 | ATT⇒ACT | 1262 | Ile(I)-Thr(T) | 421 | ? |
</figtable>
Analyze SNPs
PSSM
We created the PSSM matrix using the standard parameter of PsiBlast and five iterations. In the right matrix you can see, that for some of the ten residues the amino acid is not conserved as they have higher values for other amino acids. SNPs are marked in red.
Last position-specific scoring matrix computed, weighted observed percentages rounded down, information per position, and relative weight of gapless real matches to pseudocounts A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V 20 Q -2 1 2 3 -4 2 2 -1 2 -3 -1 2 -2 -3 -3 -1 -1 -4 -3 -1 0 7 13 17 0 14 11 3 5 0 6 16 0 0 0 0 2 0 0 6 0.32 inf 103 G 0 0 0 0 2 0 0 -1 1 0 -1 0 1 0 0 0 0 -2 1 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 5 5 0.03 inf 123 R -3 6 -1 -3 -3 0 -2 -3 0 0 -1 3 -1 -4 -3 -1 1 -4 -3 -3 1 49 1 0 1 3 0 1 2 7 4 16 1 0 0 2 10 0 0 0 0.72 inf 172 Q -1 0 0 3 -4 5 2 -1 0 -3 -3 1 -1 -4 -3 -1 0 -4 -3 -1 5 4 2 16 0 29 11 5 1 1 2 9 1 0 0 3 6 0 0 4 0.43 inf 259 A 6 -2 -2 -3 -3 -1 -2 -1 -3 -2 -2 -2 -1 -4 -3 1 -2 -4 -2 -2 70 2 1 0 0 3 1 3 0 2 4 1 1 0 0 9 1 0 1 1 0.93 inf 266 T 3 -1 -1 -3 -3 -3 -3 0 -3 -3 -3 0 -3 -4 -3 0 5 -4 0 -2 29 3 1 0 0 0 0 6 0 0 1 6 0 0 0 5 43 0 3 1 0.70 inf 341 K -2 4 -1 -2 -1 1 1 -3 -1 -3 -1 4 -1 0 -2 -1 -1 -1 -2 -2 0 28 0 0 2 2 10 0 0 0 8 39 1 7 0 1 1 1 0 1 0.44 inf 392 F -4 -5 -5 -5 0 -5 -5 -3 -3 3 2 -5 1 6 -5 -4 -3 -2 2 1 0 0 0 0 2 0 0 2 0 16 17 0 2 49 0 0 0 0 6 7 0.98 inf 416 P 0 -4 1 -3 -5 -3 -2 -3 -2 -2 -5 -2 -4 -1 7 -1 -3 -6 -4 -4 9 0 9 0 0 0 2 1 1 3 0 2 0 4 66 4 0 0 0 0 1.60 inf 421 I -1 -4 -4 -4 -3 -1 -4 -1 -4 4 1 -3 0 -2 -4 -3 -2 -4 -3 4 5 0 0 0 0 4 0 6 0 32 9 0 1 0 0 0 0 0 0 41 0.62 inf
SIFT
- A259V
Substitution at pos 259 from A to V is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.01 Sequences represented at this position:77
- R123I
Substitution at pos 123 from R to I is predicted to be TOLERATED with a score of 0.05. Median sequence conservation: 3.01 Sequences represented at this position:76
- Q20L
Substitution at pos 20 from Q to L is predicted to be TOLERATED with a score of 0.16. Median sequence conservation: 3.20 Sequences represented at this position:42
- Q172H
Substitution at pos 172 from Q to H is predicted to AFFECT PROTEIN FUNCTION with a score of 0.03. Median sequence conservation: 3.01 Sequences represented at this position:77
- G103S
Substitution at pos 103 from G to S is predicted to be TOLERATED with a score of 0.90. Median sequence conservation: 3.02 Sequences represented at this position:72
- I421T
Substitution at pos 421 from I to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.00 Sequences represented at this position:74
- K341T
Substitution at pos 341 from K to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.01 Sequences represented at this position:77
- F392S
Substitution at pos 392 from F to S is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.01 Sequences represented at this position:77
- P416Q
Substitution at pos 416 from P to Q is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.00 Sequences represented at this position:74
- T266A
Substitution at pos 266 from T to A is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00. Median sequence conservation: 3.01 Sequences represented at this position:77
PolyPhen2
- A259V
HumDiv This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) HumVar This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
- R123I
HumDiv This mutation is predicted to be possibly damaging with a score of 0.807 (sensitivity: 0.84; specificity: 0.93) HumVar This mutation is predicted to be possibly damaging with a score of 0.582 (sensitivity: 0.81; specificity: 0.83)
- Q20L
HumDiv This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00) HumVar This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
- Q172H
HumDiv This mutation is predicted to be possibly damaging with a score of 0.705 (sensitivity: 0.86; specificity: 0.92) HumVar This mutation is predicted to be benign with a score of 0.170 (sensitivity: 0.89; specificity: 0.72)
- G103S
HumDiv This mutation is predicted to be benign with a score of 0.003 (sensitivity: 0.98; specificity: 0.44) HumVar This mutation is predicted to be benign with a score of 0.006 (sensitivity: 0.97; specificity: 0.45)
- I421T
HumDiv This mutation is predicted to be possibly damaging with a score of 0.667 (sensitivity: 0.86; specificity: 0.91) HumVar This mutation is predicted to be probably damaging with a score of 0.913 (sensitivity: 0.69; specificity: 0.90)
- K341T
HumDiv This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) HumVar This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.36; specificity: 0.97)
- F392S
HumDiv This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) HumVar This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
- P416Q
HumDiv This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.55; specificity: 0.98) HumVar This mutation is predicted to be probably damaging with a score of 0.985 (sensitivity: 0.55; specificity: 0.94)
- T266A
HumDiv This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00) HumVar This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
SNAP
A259V Non-neutral Reliability Index: 2 Expected Accuracy: 70% R123I Neutral Reliability Index: 0 Expected Accuracy: 53% Q20L Neutral Reliability Index: 0 Expected Accuracy: 53% Q172H Neutral Reliability Index: 4 Expected Accuracy: 85% G103S Neutral Reliability Index: 6 Expected Accuracy: 92% I421T Non-neutral Reliability Index: 2 Expected Accuracy: 70% K341T Non-neutral Reliability Index: 3 Expected Accuracy: 78% F392S Non-neutral Reliability Index: 1 Expected Accuracy: 63% P416Q Non-neutral Reliability Index: 1 Expected Accuracy: 63% T266A Neutral Reliability Index: 2 Expected Accuracy: 69%
MutationTaster
Gene: ENSG00000171759
Transcript: ENST00000553106
For affected protein features only those are reported that are annotated as "lost" and not as "might get lost".
- A259V - C776T
Prediction: disease causing, Model: simple_aae, prob: ~1 (classification due to ClinVar) Summary: • amino acid sequence changed • known disease mutation at this position (HGMD CM910286) • known disease mutation: rs118203921 (pathogenic) • protein features (might be) affected (HELIX: 251-259, lost) splice sites: Acc marginally increased
- R123I - G368T
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • listed as SNP • protein features (might be) affected • splice site changes (Acc increased)
- Q20L - A59T
Prediction: disease causing, Model: simple_aae, prob: ~0.95 Summary • amino acid sequence changed • known disease mutation at this position (HGMD CM000542) • protein features (might be) affected • splice site changes (Donor lost, acc marginally increased)
- Q172H - G516T
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • protein features (might be) affected • splice site changes (Acc marginally increased, donor lost, acc gained)
- G103S - G307A
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • known disease mutation at this position (HGMD CM045080) • protein features (might be) affected (Domain ACT: lost) • splice site changes (Donor gained)
- I421T - T1262C
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • listed as SNP • protein features (might be) affected (STRAND: 420-424, lost)
- K341T - A1022C
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • known disease mutation at this position (HGMD CM010980) • known disease mutation at this position (HGMD CM010981) • protein features (might be) affected (STRAND: 339-342, lost)
- F392S - T1175C
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • listed as SNP • protein features (might be) affected (HELIX: 392-403, lost) • splice site changes (Donor increased)
- P416Q - C1247A
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • known disease mutation at this position (HGMD CM090791) • protein features (might be) affected (TURN: 416-419, lost) • splice site changes (Acc marginally increased, donor increased, donor gained)
- T266A - A796C
Prediction: disease causing, Model: simple_aae, prob: ~1 Summary • amino acid sequence changed • listed as SNP • protein features (might be) affected • splice site changes (Acc increased, acc gained, donor increased)