Difference between revisions of "Lab Journal - Task 8 (PAH)"

From Bioinformatikpedia
(Mammalian Homologous Sequences)
(Mammalian Homologous Sequences)
Line 75: Line 75:
 
{| border="1" style="text-align:center;" cellpadding="5" cellspacing="0" align="center"
 
{| border="1" style="text-align:center;" cellpadding="5" cellspacing="0" align="center"
 
|-
 
|-
! colspan="7" style="background:#32CD32;" | UniProt-IDs
+
! colspan="8" style="background:#32CD32;" | UniProt-IDs
 
|-
 
|-
 
|P00439 || H2Q6R0 || G3S964 || G1R3M2 || G7PJC2 || F7HMW9 || F7I717 || F7BKF9
 
|P00439 || H2Q6R0 || G3S964 || G1R3M2 || G7PJC2 || F7HMW9 || F7I717 || F7BKF9
Line 83: Line 83:
 
|G3TIW0 || G1LIM6 || Q2KIH7 || G1P4I7 || P16331 || M9P0Y5 || ||
 
|G3TIW0 || G1LIM6 || Q2KIH7 || G1P4I7 || P16331 || M9P0Y5 || ||
 
|}
 
|}
<center><small>'''<caption>''' UniProt IDs of the 22 mammalian homologues sequences.</caption></small></center>
+
<center><small>'''<caption>''' UniProt IDs of the 22 mammalian homologues sequences of the PAH protein.</caption></small></center>
 
</figtable>
 
</figtable>
   

Revision as of 13:54, 2 July 2013

Mutation dataset

For the generation of the mutation dataset the following five SNPs from the HGMD database were used (25th June 2013): <figtable id="mutds_hgmd">

Missense mutations (SNPs) from HGMD
Accession Number Codon change Sequence position Amino acid change Codon number Disease Reference
CM000542 CAG⇒CTG 59 Gln(Q)-Leu(L) 20 Hyperphenylalaninaemia Hennermann (2000) Hum Mutat 15, 254
CM045080 GGT⇒AGT 307 Gly(G)-Ser(S) 103 Phenylketonuria Lee (2004) J Hum Genet 49, 617
CM910286 GCC⇒GTC 776 Ala(A)-Val(V) 259 Phenylketonuria Labrune (1991) Am J Hum Genet 48, 1115
CM010981 AAG⇒ACG 1022 Lys(K)-Thr(T) 341 Phenylketonuria Tyfield (1997) Am J Hum Genet 60, 388
CM090791 CCA⇒CAA 1247 Pro(P)-Gln(Q) 416 Hyperphenylalaninaemia Dobrowolski (2009) J Inherit Metab Dis 32, 10
...

</figtable>


Furthermore, the following five mutations from dbSNP were added (25th June 2013): <figtable id="mutds_dbSNP">

Missense mutations (SNPs) from dbSNP
Reference SNP Codon change Sequence position Amino acid change Codon number Disease
rs199475569 CAC⇒AAC 190 His(H)-Asn(N) 64 ?
rs199475681 AGA⇒ATA 368 Arg(R)-Ile(I) 123 ?
rs62508752 ACA⇒CCA 796 Thr(T)-Ala(A) 266 Phenylketonuria
rs199475695 TTT⇒TCT 1175 Phe(F)-Ser(S) 392 ?
rs199475696 ATT⇒ACT 1262 Ile(I)-Thr(T) 421 ?
...

</figtable>

Analyze SNPs

PSSM

We created the PSSM matrix using the standard parameter of PsiBlast and five iterations. In the right matrix you can see, that for some of the ten residues the amino acid is not conserved as they have higher values for other amino acids. SNPs are marked in red.

Last position-specific scoring matrix computed, weighted observed percentages rounded down,
information per position, and relative weight of gapless real matches to pseudocounts
         A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V    A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V
 20 Q   -2  1  2  3 -4  2  2 -1  2 -3 -1  2 -2 -3 -3 -1 -1 -4 -3 -1    0  7 13 17  0 14 11  3  5  0  6 16  0  0  0  0  2  0  0  6 0.32 inf
 64 H   -5  2 -1 -3 -7  0  0 -6  9 -7 -6  2 -3 -2 -6 -2 -5 -6  1 -7    0 10  2  1  0  3  6  0 58  0  0 11  1  1  0  2  0  0  5  0 1.89 inf  
103 G    0  0  0  0  2  0  0 -1  1  0 -1  0  1  0  0  0  0 -2  1  0    5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  0  5  5 0.03 inf
123 R   -3  6 -1 -3 -3  0 -2 -3  0  0 -1  3 -1 -4 -3 -1  1 -4 -3 -3    1 49  1  0  1  3  0  1  2  7  4 16  1  0  0  2 10  0  0  0 0.72 inf
259 A    6 -2 -2 -3 -3 -1 -2 -1 -3 -2 -2 -2 -1 -4 -3  1 -2 -4 -2 -2   70  2  1  0  0  3  1  3  0  2  4  1  1  0  0  9  1  0  1  1 0.93 inf
266 T    3 -1 -1 -3 -3 -3 -3  0 -3 -3 -3  0 -3 -4 -3  0  5 -4  0 -2   29  3  1  0  0  0  0  6  0  0  1  6  0  0  0  5 43  0  3  1 0.70 inf
341 K   -2  4 -1 -2 -1  1  1 -3 -1 -3 -1  4 -1  0 -2 -1 -1 -1 -2 -2    0 28  0  0  2  2 10  0  0  0  8 39  1  7  0  1  1  1  0  1 0.44 inf 
392 F   -4 -5 -5 -5  0 -5 -5 -3 -3  3  2 -5  1  6 -5 -4 -3 -2  2  1    0  0  0  0  2  0  0  2  0 16 17  0  2 49  0  0  0  0  6  7 0.98 inf
416 P    0 -4  1 -3 -5 -3 -2 -3 -2 -2 -5 -2 -4 -1  7 -1 -3 -6 -4 -4    9  0  9  0  0  0  2  1  1  3  0  2  0  4 66  4  0  0  0  0 1.60 inf
421 I   -1 -4 -4 -4 -3 -1 -4 -1 -4  4  1 -3  0 -2 -4 -3 -2 -4 -3  4    5  0  0  0  0  4  0  6  0 32  9  0  1  0  0  0  0  0  0 41 0.62 inf

Mammalian Homologous Sequences

We used a normal BLAST search on the mammalian database of Uniprot and filter the results per hand for double entries. Althogether we found 21 homologues sequences (<xr id="IDs"/>). <figtable id="IDs">

UniProt-IDs
P00439 H2Q6R0 G3S964 G1R3M2 G7PJC2 F7HMW9 F7I717 F7BKF9
F6XY00 E2R366 G1T8B6 H2NIF5 M3YKN3 M9P0Q7 H0WTI6 M3W9R1
G3TIW0 G1LIM6 Q2KIH7 G1P4I7 P16331 M9P0Y5
UniProt IDs of the 22 mammalian homologues sequences of the PAH protein.

</figtable>

SIFT

  • A259V
Substitution at pos 20 from Q to L is predicted to be TOLERATED with a score of 0.16.
   Median sequence conservation: 3.20
   Sequences represented at this position:42
  • R123I
Substitution at pos 64 from H to N is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • Q20L
Substitution at pos 103 from G to S is predicted to be TOLERATED with a score of 0.90.
   Median sequence conservation: 3.02
   Sequences represented at this position:72
  • G103S
Substitution at pos 123 from R to I is predicted to be TOLERATED with a score of 0.05.
   Median sequence conservation: 3.01
   Sequences represented at this position:76
  • H64N
Substitution at pos 259 from A to V is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • I421T
Substitution at pos 266 from T to A is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • K341T
Substitution at pos 341 from K to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • F392S
Substitution at pos 392 from F to S is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.01
   Sequences represented at this position:77
  • P416Q
Substitution at pos 416 from P to Q is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.00
   Sequences represented at this position:74
  • T266A
Substitution at pos 421 from I to T is predicted to AFFECT PROTEIN FUNCTION with a score of 0.00.
   Median sequence conservation: 3.00
   Sequences represented at this position:74

PolyPhen2

  • A259V
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
  • R123I
HumDiv
This mutation is predicted to be possibly damaging with a score of 0.807 (sensitivity: 0.84; specificity: 0.93)
HumVar
This mutation is predicted to be possibly damaging with a score of 0.582 (sensitivity: 0.81; specificity: 0.83)
  • Q20L
HumDiv
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
HumVar
This mutation is predicted to be benign with a score of 0.000 (sensitivity: 1.00; specificity: 0.00)
  • G103S
HumDiv
This mutation is predicted to be benign with a score of 0.003 (sensitivity: 0.98; specificity: 0.44)
HumVar
This mutation is predicted to be benign with a score of 0.006 (sensitivity: 0.97; specificity: 0.45)
  • H64N
HumDiv
This mutation is predicted to be probably damaging with a score of 0.993 (sensitivity: 0.70; specificity: 0.97)
HumVar
This mutation is predicted to be probably damaging with a score of 0.962 (sensitivity: 0.62; specificity: 0.92)
  • I421T
HumDiv
This mutation is predicted to be possibly damaging with a score of 0.667 (sensitivity: 0.86; specificity: 0.91)
HumVar
This mutation is predicted to be probably damaging with a score of 0.913 (sensitivity: 0.69; specificity: 0.90)
  • K341T
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.36; specificity: 0.97)
  • F392S
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
  • P416Q
HumDiv
This mutation is predicted to be probably damaging with a score of 0.996 (sensitivity: 0.55; specificity: 0.98)
HumVar
This mutation is predicted to be probably damaging with a score of 0.985 (sensitivity: 0.55; specificity: 0.94)
  • T266A
HumDiv
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)
HumVar
This mutation is predicted to be probably damaging with a score of 1.000 (sensitivity: 0.00; specificity: 1.00)

SNAP

  • A259V
  • R123I
  • Q20L
  • G103S
  • H64N
  • I421T
  • K341T
  • F392S
  • P416Q
  • T266A

MutationTaster

Gene: ENSG00000171759
Transcript: ENST00000553106
For affected protein features only those are reported that are annotated as "lost" and not as "might get lost".

  • A259V - C776T
Prediction: disease causing, Model: simple_aae, prob: ~1 (classification due to ClinVar)
Summary: 
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM910286)
•	known disease mutation: rs118203921 (pathogenic)
•	protein features (might be) affected (HELIX: 251-259, lost)
splice sites: Acc marginally increased
  • R123I - G368T
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected
•	splice site changes (Acc increased)
  • Q20L - A59T
Prediction: disease causing, Model: simple_aae, prob: ~0.95
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM000542)
•	protein features (might be) affected
•	splice site changes (Donor lost, acc marginally increased)
  • G103S - G307A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM045080)
•	protein features (might be) affected (Domain ACT: lost)
•	splice site changes (Donor gained)
  • H64N - C190A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CD993066)
•	protein features (might be) affected (Domain ACT: lost)
  • I421T - T1262C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected (STRAND: 420-424, lost)
  • K341T - A1022C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM010980)
•	known disease mutation at this position (HGMD CM010981)
•	protein features (might be) affected (STRAND: 339-342, lost)
  • F392S - T1175C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected (HELIX: 392-403, lost)
•	splice site changes (Donor increased)
  • P416Q - C1247A
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	known disease mutation at this position (HGMD CM090791)
•	protein features (might be) affected (TURN: 416-419, lost)
•	splice site changes (Acc marginally increased, donor increased, donor gained)
  • T266A - A796C
Prediction: disease causing, Model: simple_aae, prob: ~1
Summary
•	amino acid sequence changed
•	listed as SNP
•	protein features (might be) affected 
•	splice site changes (Acc increased, acc gained, donor increased)