Task 5: Mapping point mutations

From Bioinformatikpedia
Revision as of 18:44, 20 June 2011 by Meier (talk | contribs) (Comparing the annotation of HGMD and SNPdb)

Task description

A detailed task description can be found here: Mapping point mutations

SNP databases

HGMD

  • HGMD
  • Searched for PAH
  • 429 Missense/Nonsense mutations known by HGMD Professional

There are several mutation types known for PAH:

  • Missense - A single nucleotide point mutation in a codon, such that the resulting amino acid changes
  • Nonsense - A single nucleotide point mutation in a codon, such that the resulting codon represents a polymerase stop signal
  • Splicing - A mutation, which influences the splicing of the gene
  • Regulatory - A mutation, which influences the regulation of the gene
  • Small/Gross deletions - A mutation, which deletes some/more nucleotides in the gene
  • Small/Gross insertions - A mustation, which inserts some/more nucleotides in the gene
  • Small indels - A deletion followed by an insertion after the nucleotides affected
  • Gross duplications - A mutation, which results in the copy of a piece of the DNA
  • Complex rearrangements - A mutation, which results in a changed order of the sequence parts of a gene

One additional category of mutation is known, but is not recorded for PAH

  • Repeat variations - A mutation, which affects a repeated sequence in the gene

Reference Sequence

The reference sequence is given by the accession number NM_000277.1, whose entry contains the following amino acid sequence:

MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEEN DVNLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDI GATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ FADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCG FHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPM YTPEPDICHELLGHVPLFSDRSFAQFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLC KQGDSIKAYGAGLLSSFGELQYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESF NDAKEKVRNFAATIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQK IK

SNPs

The missense/nonsense mutations, which cause phenylketonuria.

Identifier Type AA-Position Reference Triplet Mutated Triplet Reference Residue Mutated Residue
CM920539 missense 1 ATGt ATA M I
CM890092 missense 1 cATG GTG M V
CM010947 missense 16 cTCT CCT S P
CM971121 nonsense 20 aCAG TAG Q Term
HM972014 missense 22 ACA AAA T K
CM910280 missense 39 TTCt TTG F L
CM961063 missense 40 TCA TTA S L
CM000543 missense 41 CTC CCC L P
CM930535 missense 46 tGGT AGT G S
CM993577 missense 47 GCA GAA A E
CM910281 missense 48 TTG TCG L S
CM981426 missense 52 TTG TCG L S
CM981427 missense 53 CGC CAC R H
CM010948 missense 53 gCGC TGC R C
CM068057 missense 54 TTA TCA L S
CM971122 missense 55 TTTg TTG F L
CM920540 missense 56 GAGg GAT E D
CM992368 missense 59 tGAT TAT D Y
CM024619 missense 61 aAAC GAC N D
CM992369 missense 61 AACc AAG N K
CM981428 missense 65 ATT AAT I N
CM920541 missense 65 ATT ACT I T
CM024620 missense 65 ATT AGT I S
CM010949 missense 65 cATT GTT I V
CM950882 missense 67 aTCT CCT S P
CM920542 missense 68 AGAc AGT R S
CM045074 missense 69 aCCT TCT P S
CM981429 missense 70 tTCT CCT S P
CM010950 missense 76 GAG GCG E A
CM992370 missense 76 GAG GGG E G
CM071057 nonsense 76 tGAG TAG E Term
CM981430 nonsense 77 TATg TAG Y Term
CM973035 missense 78 tGAA AAA E K
CM950883 missense 81 cACC CCC T P
CM961066 missense 84 gGAT TAT D Y
CM961067 missense 87 tAGC CGC S R
CM971123 missense 89 gCCT TCT P S
CM981431 missense 94 ATC AGC I S
CM930537 missense 98 TTG TCG L S
CM992944 missense 100 CAT CGT H R
CM993578 missense 102 ATT ACT I T
CM045080 missense 103 tGGT AGT G S
CM920543 missense 104 GCC GAC A D
CM010951 missense 110 TCA TTA S L
CM920544 nonsense 111 aCGA TGA R Term
CM043044 missense 119 gCCC TCC P S
CM961068 nonsense 120 TGG TAG W Term
CM971124 missense 122 CCA CAA P Q
CM930538 missense 124 ACC ATC T I
CM971125 missense 129 GAC GGC D G
CM010952 missense 129 gGAC TAC D Y
CM981432 missense 132 GCC GTC A V
CM071056 nonsense 134 tCAG TAG Q Term
CM961069 missense 143 GAT GGT D G
CM961070 missense 145 GAC GTC D V
CM983349 missense 146 cCAC TAC H Y
CM971126 missense 147 cCCT TCT P S
CM950884 missense 148 tGGT AGT G S
CM010953 missense 151 aGAT CAT D H
CM971127 missense 151 GAT GGT D G
CM010954 missense 154 gTAC AAC Y N
CM056672 missense 154 gTAC CAC Y H
CM981433 missense 155 CGT CAT R H
CM000544 missense 155 CGT CCT R P
CM056667 missense 157 AGA AAA R K
CM056668 missense 157 AGA ATA R I
CM930539 missense 158 aCGG TGG R W
CM890093 missense 158 CGG CAG R Q
CM015340 missense 158 CGG CCG R P
CM010955 missense 160 CAG CCG Q P
CM920545 missense 161 TTT TCT F S
CM941128 missense 164 ATT ACT I T
CM992945 missense 164 cATT GTT I V
CM971128 missense 165 tGCC ACC A T
CM010956 missense 165 tGCC CCC A P
CM990993 nonsense 166 TACa TAA Y Term
CM994293 nonsense 166 TACa TAG Y Term
CM011945 missense 167 AAC AGC N S
CM981434 missense 167 AAC ATC N I
CM992946 missense 168 cTAC CAC Y H
CM992371 missense 169 CGC CAC R H
CM010957 missense 170 CAT CGT H R
CM011946 missense 170 cCAT GAT H D
CM941129 missense 171 GGG GCG G A
CM010958 missense 171 tGGG AGG G R
CM961071 nonsense 172 gCAG TAG Q Term
CM920546 missense 173 gCCC ACC P T
CM930540 missense 174 ATC ACC I T
CM990994 missense 174 cATC GTC I V
CM961072 missense 175 cCCT GCT P A
CM992372 missense 176 CGA CAA R Q
CM010959 missense 176 CGA CCA R P
CM993954 missense 177 aGTG ATG V M
CM961073 missense 177 aGTG CTG V L
CM941131 missense 178 GAA GGA E G
CM015341 missense 179 aTAC CAC Y H
CM950885 missense 182 GAA GGA E G
CM000545 missense 183 aGAA CAA E Q
CM950886 missense 187 aTGG CGG W R
CM981435 missense 187 TGGg TGC W C
CM981436 missense 190 GTG GCG V A
CM950887 missense 194 CTG CCG L P
CM010960 missense 201 CAT CGT H R
CM981437 missense 201 cCAT TAT H Y
CM981438 missense 203 TGC TAC C Y
CM003162 missense 205 GAG GCG E A
CM068056 missense 205 tGAG AAG E K
CM992373 missense 206 gTAC GAC Y D
CM043311 missense 206 TAC TGC Y C
CM056674 nonsense 206 TACa TAA Y Term
CM981439 nonsense 206 TACa TAG Y Term
CM971129 missense 207 AAT AGT N S
CM981440 missense 207 cAAT GAT N D
CM983350 missense 212 CTT CCT L P
CM961074 missense 213 CTT CCT L P
CM010961 nonsense 216 TACt TAG Y Term
CM034743 missense 217 cTGT CGT C R
CM961075 missense 217 cTGT GGT C G
CM015342 missense 217 TGT TAT C Y
CM930545 missense 218 GGC GTC G V
CM910282 missense 221 GAA GGA E G
CM971130 missense 222 GAT GGT D G
CM010962 missense 222 GAT GTT D V
CM961076 missense 224 ATT ACT I T
CM920547 missense 224 ATTc ATG I M
CM990995 missense 225 CCC CGC P R
CM971131 missense 225 tCCC ACC P T
CM994634 missense 225 tCCC GCC P A
CM004891 missense 226 CAGc CAC Q H
CM024631 nonsense 226 cCAG TAG Q Term
CM081727 missense 230 GTT GCT V A
CM994779 missense 230 GTT GGT V G
CM000546 missense 231 TCT TTT S F
CM950888 missense 231 tTCT CCT S P
CM990996 nonsense 232 tCAA TAA Q Term
CM010963 missense 233 TTCc TTA F L
CM043312 missense 235 CAG CCG Q P
CM920548 missense 238 cACT CCT T P
CM034744 missense 239 GGT GAT G D
CM971132 missense 239 GGT GCT G A
CM990997 missense 239 GGT GTT G V
CM941132 missense 239 tGGT AGT G S
CM003163 missense 240 TTC TCC F S
CM973036 missense 240 tTTC GTC F V
CM930548 missense 241 cCGC TGC R C
CM950889 missense 241 CGC CTC R L
CM920549 missense 242 cCTC TTC L F
CM900176 nonsense 243 cCGA TGA R Term
CM910283 missense 243 CGA CAA R Q
CM993955 missense 243 CGA CTA R L
CM920550 missense 244 CCT CTT P L
CM930549 missense 245 GTG GAG V E
CM961077 missense 245 tGTG CTG V L
CM010964 missense 246 GCT GAT A D
CM981441 missense 246 GCT GTT A V
CM043314 missense 247 GGC GAC G D
CM920551 missense 247 GGC GTC G V
CM043315 missense 247 tGGC AGC G S
CM043313 missense 247 tGGC CGC G R
CM961078 missense 248 CTG CCG L P
CM010965 missense 248 CTG CGG L R
CM973037 missense 249 CTT CAT L H
CM030919 missense 249 CTT CCT L P
CM950890 missense 249 gCTT TTT L F
CM941134 missense 252 CGG CAG R Q
CM920552 missense 252 tCGG GGG R G
CM910284 missense 252 tCGG TGG R W
CM973038 missense 254 tTTC ATC F I
CM920553 missense 255 cTTG GTG L V
CM910285 missense 255 TTG TCG L S
CM010966 missense 257 GGC GAC G D
CM973039 missense 257 GGC GTC G V
CM010968 missense 257 tGGC AGC G S
CM010967 missense 257 tGGC TGC G C
CM910286 missense 259 GCC GTC A V
CM962433 missense 259 gGCC ACC A T
CM910288 nonsense 261 cCGA TGA R Term
CM910287 missense 261 CGA CAA R Q
CM950891 missense 261 CGA CCA R P
CM930550 missense 263 TTCc TTG F L
CM010969 missense 264 CAC CTC H L
CM983351 missense 265 cTGC GGC C G
CM981442 missense 265 TGC TAC C Y
CM056666 missense 267 aCAG GAG Q E
CM056670 missense 267 CAGt CAC Q H
CM010970 missense 268 gTAC CAC Y H
CM981443 missense 269 ATC AAC I N
CM981444 missense 269 cATC CTC I L
CM950892 missense 270 AGA AAA R K
CM930551 missense 270 AGAc AGT R S
CM010971 missense 271 aCAT TAT H Y
CM900177 nonsense 272 tGGA TGA G Term
HM972015 missense 273 aTCC CCC S P
CM910289 missense 273 TCC TTC S F
CM003327 missense 274 cAAG GAG K E
CM992374 missense 275 CCC CGC P R
CM024621 missense 275 CCC CTC P L
CM034746 missense 276 ATG AAG M K
CM034745 missense 276 ATG AGG M R
CM910290 missense 276 ATGt ATT M I
CM930552 missense 276 cATG GTG M V
CM910291 missense 277 gTAT GAT Y D
CM931193 missense 277 TAT TGT Y C
CM045073 missense 278 ACC AGC T S
CM981445 missense 278 ACC ATC T I
CM931194 missense 278 tACC GCC T A
CM890094 missense 280 cGAA AAA E K
CM030920 missense 280 GAA GCA E A
CM045310 missense 280 GAA GGA E G
CM043045 missense 281 aCCT GCT P A
CM992948 missense 281 aCCT TCT P S
CM910292 missense 281 CCT CTT P L
CM043316 missense 282 GAC GGC D G
CM930554 missense 282 tGAC AAC D N
CM981446 missense 283 ATC AAC I N
CM930555 missense 283 cATC TTC I F
CM981447 missense 285 cCAT TAT H Y
CM040236 missense 286 tGAG AAG E K
CM010972 missense 288 TTGg TTC L F
CM055475 missense 290 CAT CGT H R
CM045076 missense 293 cTTG ATG L M
CM990998 nonsense 295 TCA TGA S Term
CM971133 missense 297 CGC CAC R H
CM961079 missense 297 tCGC TGC R C
CM920554 missense 299 TTT TGT F C
CM950893 missense 300 GCC GTC A V
CM981448 missense 303 tTCC CCC S P
CM010973 missense 303 tTCC GCC S A
CM010974 missense 304 CAG CGG Q R
CM920556 missense 306 aATT GTT I V
CM010975 missense 308 cCTT GTT L V
CM941135 missense 309 GCC GAC A D
CM024622 missense 310 TCT TAT S Y
CM010976 missense 310 TCT TTT S F
CM880057 missense 311 CTG CCG L P
CM040237 missense 312 GGT GTT G V
CM992375 missense 313 GCA GTA A V
CM014737 missense 313 tGCA ACA A T
CM040238 missense 314 aCCT ACT P T
CM024623 missense 314 aCCT TCT P S
CM961080 missense 314 CCT CAT P H
CM993956 missense 315 tGAT TAT D Y
CM024624 missense 317 aTAC CAC Y H
CM003164 missense 318 ATT ACT I T
CM961081 missense 320 AAGc AAC K N
CM920558 missense 322 cGCC ACC A T
CM920557 missense 322 GCC GGC A G
CM041815 missense 324 aATT GTT I V
CM000547 missense 325 TAC TGC Y C
CM981449 nonsense 325 TACt TAG Y Term
CM920559 nonsense 326 TGG TAG W Term
CM990999 missense 327 TTTa TTG F L
CM010977 missense 328 tACT GCT T A
CM000548 missense 330 GAGt GAC E D
CM941136 missense 331 gTTT CTT F L
CM961082 missense 331 TTT TGT F C
CM993957 missense 332 GGG GAG G E
CM045077 missense 332 GGG GTG G V
CM950894 missense 333 gCTC TTC L F
CM010978 missense 334 TGC TCC C S
CM930557 nonsense 336 aCAA TAA Q Term
CM043317 missense 336 CAA CGA Q R
CM010979 missense 337 GGA GTA G V
CM941137 missense 338 aGAC TAC D Y
CM034747 missense 340 ATA ACA I T
CM981450 nonsense 341 aAAG TAG K Term
CM010981 missense 341 AAG ACG K T
CM010980 missense 341 AAG AGG K R
CM930558 missense 342 gGCA ACA A T
CM971134 missense 342 gGCA CCA A P
CM068055 missense 343 aTAT GAT Y D
CM010982 missense 343 TAT TTT Y F
CM000550 missense 344 GGT GTT G V
CM981451 missense 344 tGGT AGT G S
CM000549 missense 344 tGGT CGT G R
CM920560 missense 345 tGCT ACT A T
CM010983 missense 345 tGCT TCT A S
CM056675 missense 346 tGGG AGG G R
CM991000 missense 346 tGGG CGG G R
CM961083 missense 347 gCTC TTC L F
CM920561 missense 348 cCTG GTG L V
CM910293 missense 349 gTCA CCA S P
CM056673 missense 349 gTCA GCA S A
CM043318 nonsense 349 TCA TAA S Term
CM981452 missense 349 TCA TTA S L
CM981453 missense 350 aTCC ACC S T
CM015343 missense 350 TCC TAC S Y
CM981454 missense 352 tGGT CGT G R
CM993958 missense 352 tGGT TGT G C
CM991001 nonsense 355 aCAG TAG Q Term
CM962547 missense 356 gTAC CAC Y H
CM941138 nonsense 356 TACt TAA Y Term
CM930560 nonsense 356 TACt TAG Y Term
CM003165 missense 357 cTGC GGC C G
CM930561 nonsense 359 TCA TGA S Term
CM971135 missense 362 gCCA ACA P T
CM068054 missense 363 AAGc AAT K N
CM993959 missense 367 CTG CCG L P
CM056671 missense 367 CTGg CTA L L
CM081726 missense 369 gCTG GTG L V
CM950895 missense 371 AAG AGG K R
CM961084 missense 372 gACA TCA T S
CM994329 missense 373 aGCC ACC A T
CM010984 missense 377 TAC TGC Y C
CM950896 missense 386 TAT TGT Y C
CM061883 nonsense 387 TACg TAA Y Term
CM950897 missense 387 tTAC CAC Y H
CM930564 missense 388 cGTG ATG V M
CM981455 missense 388 cGTG CTG V L
CM993579 missense 389 GCA GAA A E
CM045075 missense 391 AGT ATT S I
CM961085 missense 394 tGAT CAT D H
CM043319 missense 395 GCC GAC A D
CM981456 missense 395 GCC GGC A G
CM930566 missense 395 tGCC CCC A P
CM056669 missense 400 AGG AAG R K
CM034748 missense 400 AGG ACG R T
CM030921 missense 402 cTTT CTT F L
CM992376 missense 406 ATA ACA I T
CM971136 missense 407 aCCT TCT P S
CM993960 missense 407 CCT CTT P L
CM920562 missense 408 CGG CAG R Q
CM870016 missense 408 tCGG TGG R W
CM000551 missense 410 TTC TCC F S
CM068058 missense 410 TTC TGC F C
CM003166 nonsense 411 TCA TGA S Term
CM920563 missense 413 CGC CCC R P
CM981457 missense 413 tCGC AGC R S
CM010985 missense 413 tCGC TGC R C
CM910294 missense 414 TAC TGC Y C
CM973397 missense 417 aTAC AAC Y N
CM024632 missense 417 aTAC CAC Y H
CM043046 missense 418 ACC AAC T N
CM920565 missense 418 cACC CCC T P
CM993580 missense 422 tGAG AAG E K
CM993961 missense 424 TTG TCG L S
CM010986 missense 430 CTT CCT L P
CM043320 missense 434 GCT GAT A D
CM961086 missense 447 GCC GAC A D
CM045079 missense 447 tGCC CCC A P

dbSNP

Clarifications on insertions and deletions

The impact of insertions and deletions in coding regions on chromosomes can be differently. Fatal for a protein are insertions or deletions which introduce a frame shift. This happens when the length of the insertion or deletion is not divisible by 3 without producing a rest. Or in other words: Len mod 3 != 0 . Where Len is the length of the insertion or deletion.

Methodology

Retrieving Data
Figure 1: Shows the query and result for our silent mutation search

We searched in dbSNP for silent mutations in coding regions. This means we only considered those SNPs which alter the triplet but not the amino acid.

To do so we used the Entrez interface of NCBI which is accessible under this URL:

  •  ://www.ncbi.nlm.nih.gov/sites/entrez?Db=snp

The advantage of this Entrez interface is that we can construct arbitrary complex queries to restrict our result set.

We constructed the following query to search for SNPs which are considered silent in the coding regions of the human PAH gene (see figure 1):

  • "synonymous-codon"[Function_Class] AND PAH[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS]

Results of this query can be accessed directly via the following URL:

We decided to download the results as FlatFile. This seemed to be the most simple format to process and contains almost all information we need.

Processing Data

A self written Perl script is helping us to parse the important information out of the FlatFile we downloaded in the previous step. In the current version the following information is parsed out of the FlatFile: identifier, triplet reference/mutated, allele reference/mutated , frame of the mutation, residue reference/mutated and residue position.

All annotations we retrieved are annotated for the mRNA sequence with the GenBank accession number NM_000277.1 which contains the coding sequences of our PAH protein with the accession number NP_000268.1 which is exactly the same protein and therefore has also the same sequence as the UniProt entry for PAH with the accession number P00439. Thus, no mapping to other residue coordinates was required to map these mutations to our mutation map.

With our perl script at hand (the code is accessible here: dbSNP Silent Mutations Parser) and the results in FlatFile format, we are only missing the CDS sequence of NM_000277.1 in order to get the used triplet for each residue. This CDS sequence can be found in the CCDS database of NCBI with the accession number CCDS9092.1.

Now, with all data at hand we can run our perl script in different ways to retrieve different outputs for different purposes. Currently the following three outputs with the following commands can be generated:

  • Generates a WikiTable of found silent mutations: ./parse.pl wiki snp_result.txt ccds_pah.fasta
  • Generates CSVs which can be used by our mapping tool to generate the mutation map: ./parse.pl map snp_result.txt ccds_pah.fasta
  • Generates a human readable one liner for each silent mutation: ./parse.pl list snp_result.txt ccds_pah.fasta

Results

We could find the following silent mutations in dbSNP:

Identifier AA-Position Reference Triplet Mutated Triplet Reference Allele Mutated Allele Frame Reference Residue Mutated Residue
rs117308669 66 GAA GAG A G 3 E E
rs75065106 258 CTG TTG C T 1 L L
rs62651567 323 ACA ACG A G 3 T T
rs62508648 367 CTG CTA G A 3 L L
rs61747292 321 CTC CTT C T 3 L L
rs59326968 426 AAT AAC T C 3 N N
rs17852374 36 TCA TCG A G 3 S S
rs1801152 414 TAC TAT C T 3 Y Y
rs1801151 400 AGG CGG A C 1 R R
rs1801150 399 GTA GTT A T 3 V V
rs1801147 203 TGC TGT C T 3 C C
rs1801146 137 AGC AGT C T 3 S S
rs1801145 10 GGC GGG C G 3 G G
rs1126758 232 CAG CAG A G 3 Q Q
rs1042503 245 GTG GTA G A 3 V V
rs772897 385 CTG CTC G C 3 L L

Comparing the annotation of HGMD and SNPdb

Mapping

position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
sequence M S T A V L E N P G L G R K L S D F G Q
annotations CM920539,missense,M1I,ATGt-ATA rs1801145,silent,G10G,GGC-GGG CM010947,missense,S16P,cTCT-CCT CM971121,nonsense,Q20Term,aCAG-TAG
annotations CM890092,missense,M1V,cATG-GTG
position 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
sequence E T S Y I E D N C N Q N G A I S L I F S
annotations HM972014,missense,T22K,ACA-AAA rs17852374,silent,S36S,TCA-TCG CM910280,missense,F39L,TTCt-TTG CM961063,missense,S40L,TCA-TTA
position 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
sequence L K E E V G A L A K V L R L F E E N D V
annotations CM000543,missense,L41P,CTC-CCC CM930535,missense,G46S,tGGT-AGT CM993577,missense,A47E,GCA-GAA CM910281,missense,L48S,TTG-TCG CM981426,missense,L52S,TTG-TCG CM981427,missense,R53H,CGC-CAC CM068057,missense,L54S,TTA-TCA CM971122,missense,F55L,TTTg-TTG CM920540,missense,E56D,GAGg-GAT CM992368,missense,D59Y,tGAT-TAT
annotations CM010948,missense,R53C,gCGC-TGC
position 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
sequence N L T H I E S R P S R L K K D E Y E F F
annotations CM024619,missense,N61D,aAAC-GAC CM981428,missense,I65N,ATT-AAT rs117308669,silent,E66E,GAA-GAG CM950882,missense,S67P,aTCT-CCT CM920542,missense,R68S,AGAc-AGT CM045074,missense,P69S,aCCT-TCT CM981429,missense,S70P,tTCT-CCT CM010950,missense,E76A,GAG-GCG CM981430,nonsense,Y77Term,TATg-TAG CM973035,missense,E78K,tGAA-AAA
annotations CM992369,missense,N61K,AACc-AAG CM920541,missense,I65T,ATT-ACT CM992370,missense,E76G,GAG-GGG
annotations CM024620,missense,I65S,ATT-AGT CM071057,nonsense,E76Term,tGAG-TAG
annotations CM010949,missense,I65V,cATT-GTT
position 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
sequence T H L D K R S L P A L T N I I K I L R H
annotations CM950883,missense,T81P,cACC-CCC CM961066,missense,D84Y,gGAT-TAT CM961067,missense,S87R,tAGC-CGC CM971123,missense,P89S,gCCT-TCT CM981431,missense,I94S,ATC-AGC CM930537,missense,L98S,TTG-TCG CM992944,missense,H100R,CAT-CGT
position 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
sequence D I G A T V H E L S R D K K K D T V P W
annotations CM993578,missense,I102T,ATT-ACT CM045080,missense,G103S,tGGT-AGT CM920543,missense,A104D,GCC-GAC CM010951,missense,S110L,TCA-TTA CM920544,nonsense,R111Term,aCGA-TGA CM043044,missense,P119S,gCCC-TCC CM961068,nonsense,W120Term,TGG-TAG
position 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
sequence F P R T I Q E L D R F A N Q I L S Y G A
annotations CM971124,missense,P122Q,CCA-CAA CM930538,missense,T124I,ACC-ATC CM971125,missense,D129G,GAC-GGC CM981432,missense,A132V,GCC-GTC CM071056,nonsense,Q134Term,tCAG-TAG rs1801146,silent,S137S,AGC-AGT
annotations CM010952,missense,D129Y,gGAC-TAC
position 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
sequence E L D A D H P G F K D P V Y R A R R K Q
annotations CM961069,missense,D143G,GAT-GGT CM961070,missense,D145V,GAC-GTC CM983349,missense,H146Y,cCAC-TAC CM971126,missense,P147S,cCCT-TCT CM950884,missense,G148S,tGGT-AGT CM010953,missense,D151H,aGAT-CAT CM010954,missense,Y154N,gTAC-AAC CM981433,missense,R155H,CGT-CAT CM056667,missense,R157K,AGA-AAA CM930539,missense,R158W,aCGG-TGG CM010955,missense,Q160P,CAG-CCG
annotations CM971127,missense,D151G,GAT-GGT CM056672,missense,Y154H,gTAC-CAC CM000544,missense,R155P,CGT-CCT CM056668,missense,R157I,AGA-ATA CM890093,missense,R158Q,CGG-CAG
annotations CM015340,missense,R158P,CGG-CCG
position 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180
sequence F A D I A Y N Y R H G Q P I P R V E Y M
annotations CM920545,missense,F161S,TTT-TCT CM941128,missense,I164T,ATT-ACT CM971128,missense,A165T,tGCC-ACC CM990993,nonsense,Y166Term,TACa-TAA CM011945,missense,N167S,AAC-AGC CM992946,missense,Y168H,cTAC-CAC CM992371,missense,R169H,CGC-CAC CM010957,missense,H170R,CAT-CGT CM941129,missense,G171A,GGG-GCG CM961071,nonsense,Q172Term,gCAG-TAG CM920546,missense,P173T,gCCC-ACC CM930540,missense,I174T,ATC-ACC CM961072,missense,P175A,cCCT-GCT CM992372,missense,R176Q,CGA-CAA CM993954,missense,V177M,aGTG-ATG CM941131,missense,E178G,GAA-GGA CM015341,missense,Y179H,aTAC-CAC
annotations CM992945,missense,I164V,cATT-GTT CM010956,missense,A165P,tGCC-CCC CM994293,nonsense,Y166Term,TACa-TAG CM981434,missense,N167I,AAC-ATC CM011946,missense,H170D,cCAT-GAT CM010958,missense,G171R,tGGG-AGG CM990994,missense,I174V,cATC-GTC CM010959,missense,R176P,CGA-CCA CM961073,missense,V177L,aGTG-CTG
position 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200
sequence E E E K K T W G T V F K T L K S L Y K T
annotations CM950885,missense,E182G,GAA-GGA CM000545,missense,E183Q,aGAA-CAA CM950886,missense,W187R,aTGG-CGG CM981436,missense,V190A,GTG-GCG CM950887,missense,L194P,CTG-CCG
annotations CM981435,missense,W187C,TGGg-TGC
position 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220
sequence H A C Y E Y N H I F P L L E K Y C G F H
annotations CM010960,missense,H201R,CAT-CGT CM981438,missense,C203Y,TGC-TAC CM003162,missense,E205A,GAG-GCG CM992373,missense,Y206D,gTAC-GAC CM971129,missense,N207S,AAT-AGT CM983350,missense,L212P,CTT-CCT CM961074,missense,L213P,CTT-CCT CM010961,nonsense,Y216Term,TACt-TAG CM034743,missense,C217R,cTGT-CGT CM930545,missense,G218V,GGC-GTC
annotations CM981437,missense,H201Y,cCAT-TAT rs1801147,silent,C203C,TGC-TGT CM068056,missense,E205K,tGAG-AAG CM043311,missense,Y206C,TAC-TGC CM981440,missense,N207D,cAAT-GAT CM961075,missense,C217G,cTGT-GGT
annotations CM056674,nonsense,Y206Term,TACa-TAA CM015342,missense,C217Y,TGT-TAT
annotations CM981439,nonsense,Y206Term,TACa-TAG
position 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240
sequence E D N I P Q L E D V S Q F L Q T C T G F
annotations CM910282,missense,E221G,GAA-GGA CM971130,missense,D222G,GAT-GGT CM961076,missense,I224T,ATT-ACT CM990995,missense,P225R,CCC-CGC CM004891,missense,Q226H,CAGc-CAC CM081727,missense,V230A,GTT-GCT CM000546,missense,S231F,TCT-TTT CM990996,nonsense,Q232Term,tCAA-TAA CM010963,missense,F233L,TTCc-TTA CM043312,missense,Q235P,CAG-CCG CM920548,missense,T238P,cACT-CCT CM034744,missense,G239D,GGT-GAT CM003163,missense,F240S,TTC-TCC
annotations CM010962,missense,D222V,GAT-GTT CM920547,missense,I224M,ATTc-ATG CM971131,missense,P225T,tCCC-ACC CM024631,nonsense,Q226Term,cCAG-TAG CM994779,missense,V230G,GTT-GGT CM950888,missense,S231P,tTCT-CCT rs1126758,silent,Q232Q,CAG-CAG CM971132,missense,G239A,GGT-GCT CM973036,missense,F240V,tTTC-GTC
annotations CM994634,missense,P225A,tCCC-GCC CM990997,missense,G239V,GGT-GTT
annotations CM941132,missense,G239S,tGGT-AGT
position 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260
sequence R L R P V A G L L S S R D F L G G L A F
annotations CM930548,missense,R241C,cCGC-TGC CM920549,missense,L242F,cCTC-TTC CM900176,nonsense,R243Term,cCGA-TGA CM920550,missense,P244L,CCT-CTT CM930549,missense,V245E,GTG-GAG CM010964,missense,A246D,GCT-GAT CM043314,missense,G247D,GGC-GAC CM961078,missense,L248P,CTG-CCG CM973037,missense,L249H,CTT-CAT CM941134,missense,R252Q,CGG-CAG CM973038,missense,F254I,tTTC-ATC CM920553,missense,L255V,cTTG-GTG CM010966,missense,G257D,GGC-GAC rs75065106,silent,L258L,CTG-TTG CM910286,missense,A259V,GCC-GTC
annotations CM950889,missense,R241L,CGC-CTC CM910283,missense,R243Q,CGA-CAA CM961077,missense,V245L,tGTG-CTG CM981441,missense,A246V,GCT-GTT CM920551,missense,G247V,GGC-GTC CM010965,missense,L248R,CTG-CGG CM030919,missense,L249P,CTT-CCT CM920552,missense,R252G,tCGG-GGG CM910285,missense,L255S,TTG-TCG CM973039,missense,G257V,GGC-GTC CM962433,missense,A259T,gGCC-ACC
annotations CM993955,missense,R243L,CGA-CTA rs1042503,silent,V245V,GTG-GTA CM043315,missense,G247S,tGGC-AGC CM950890,missense,L249F,gCTT-TTT CM910284,missense,R252W,tCGG-TGG CM010968,missense,G257S,tGGC-AGC
annotations CM043313,missense,G247R,tGGC-CGC CM010967,missense,G257C,tGGC-TGC
position 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
sequence R V F H C T Q Y I R H G S K P M Y T P E
annotations CM910288,nonsense,R261Term,cCGA-TGA CM930550,missense,F263L,TTCc-TTG CM010969,missense,H264L,CAC-CTC CM983351,missense,C265G,cTGC-GGC CM056666,missense,Q267E,aCAG-GAG CM010970,missense,Y268H,gTAC-CAC CM981443,missense,I269N,ATC-AAC CM950892,missense,R270K,AGA-AAA CM010971,missense,H271Y,aCAT-TAT CM900177,nonsense,G272Term,tGGA-TGA HM972015,missense,S273P,aTCC-CCC CM003327,missense,K274E,cAAG-GAG CM992374,missense,P275R,CCC-CGC CM034746,missense,M276K,ATG-AAG CM910291,missense,Y277D,gTAT-GAT CM045073,missense,T278S,ACC-AGC CM890094,missense,E280K,cGAA-AAA
annotations CM910287,missense,R261Q,CGA-CAA CM981442,missense,C265Y,TGC-TAC CM056670,missense,Q267H,CAGt-CAC CM981444,missense,I269L,cATC-CTC CM930551,missense,R270S,AGAc-AGT CM910289,missense,S273F,TCC-TTC CM024621,missense,P275L,CCC-CTC CM034745,missense,M276R,ATG-AGG CM931193,missense,Y277C,TAT-TGT CM981445,missense,T278I,ACC-ATC CM030920,missense,E280A,GAA-GCA
annotations CM950891,missense,R261P,CGA-CCA CM910290,missense,M276I,ATGt-ATT CM931194,missense,T278A,tACC-GCC CM045310,missense,E280G,GAA-GGA
annotations CM930552,missense,M276V,cATG-GTG
position 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300
sequence P D I C H E L L G H V P L F S D R S F A
annotations CM043045,missense,P281A,aCCT-GCT CM043316,missense,D282G,GAC-GGC CM981446,missense,I283N,ATC-AAC CM981447,missense,H285Y,cCAT-TAT CM040236,missense,E286K,tGAG-AAG CM010972,missense,L288F,TTGg-TTC CM055475,missense,H290R,CAT-CGT CM045076,missense,L293M,cTTG-ATG CM990998,nonsense,S295Term,TCA-TGA CM971133,missense,R297H,CGC-CAC CM920554,missense,F299C,TTT-TGT CM950893,missense,A300V,GCC-GTC
annotations CM992948,missense,P281S,aCCT-TCT CM930554,missense,D282N,tGAC-AAC CM930555,missense,I283F,cATC-TTC CM961079,missense,R297C,tCGC-TGC
annotations CM910292,missense,P281L,CCT-CTT
position 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320
sequence Q F S Q E I G L A S L G A P D E Y I E K
annotations CM981448,missense,S303P,tTCC-CCC CM010974,missense,Q304R,CAG-CGG CM920556,missense,I306V,aATT-GTT CM010975,missense,L308V,cCTT-GTT CM941135,missense,A309D,GCC-GAC CM024622,missense,S310Y,TCT-TAT CM880057,missense,L311P,CTG-CCG CM040237,missense,G312V,GGT-GTT CM992375,missense,A313V,GCA-GTA CM040238,missense,P314T,aCCT-ACT CM993956,missense,D315Y,tGAT-TAT CM024624,missense,Y317H,aTAC-CAC CM003164,missense,I318T,ATT-ACT CM961081,missense,K320N,AAGc-AAC
annotations CM010973,missense,S303A,tTCC-GCC CM010976,missense,S310F,TCT-TTT CM014737,missense,A313T,tGCA-ACA CM024623,missense,P314S,aCCT-TCT
annotations CM961080,missense,P314H,CCT-CAT
position 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
sequence L A T I Y W F T V E F G L C K Q G D S I
annotations rs61747292,silent,L321L,CTC-CTT CM920558,missense,A322T,cGCC-ACC rs62651567,silent,T323T,ACA-ACG CM041815,missense,I324V,aATT-GTT CM000547,missense,Y325C,TAC-TGC CM920559,nonsense,W326Term,TGG-TAG CM990999,missense,F327L,TTTa-TTG CM010977,missense,T328A,tACT-GCT CM000548,missense,E330D,GAGt-GAC CM941136,missense,F331L,gTTT-CTT CM993957,missense,G332E,GGG-GAG CM950894,missense,L333F,gCTC-TTC CM010978,missense,C334S,TGC-TCC CM930557,nonsense,Q336Term,aCAA-TAA CM010979,missense,G337V,GGA-GTA CM941137,missense,D338Y,aGAC-TAC CM034747,missense,I340T,ATA-ACA
annotations CM920557,missense,A322G,GCC-GGC CM981449,nonsense,Y325Term,TACt-TAG CM961082,missense,F331C,TTT-TGT CM045077,missense,G332V,GGG-GTG CM043317,missense,Q336R,CAA-CGA
position 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360
sequence K A Y G A G L L S S F G E L Q Y C L S E
annotations CM981450,nonsense,K341Term,aAAG-TAG CM930558,missense,A342T,gGCA-ACA CM068055,missense,Y343D,aTAT-GAT CM000550,missense,G344V,GGT-GTT CM920560,missense,A345T,tGCT-ACT CM056675,missense,G346R,tGGG-AGG CM961083,missense,L347F,gCTC-TTC CM920561,missense,L348V,cCTG-GTG CM910293,missense,S349P,gTCA-CCA CM981453,missense,S350T,aTCC-ACC CM981454,missense,G352R,tGGT-CGT CM991001,nonsense,Q355Term,aCAG-TAG CM962547,missense,Y356H,gTAC-CAC CM003165,missense,C357G,cTGC-GGC CM930561,nonsense,S359Term,TCA-TGA
annotations CM010981,missense,K341T,AAG-ACG CM971134,missense,A342P,gGCA-CCA CM010982,missense,Y343F,TAT-TTT CM981451,missense,G344S,tGGT-AGT CM010983,missense,A345S,tGCT-TCT CM991000,missense,G346R,tGGG-CGG CM056673,missense,S349A,gTCA-GCA CM015343,missense,S350Y,TCC-TAC CM993958,missense,G352C,tGGT-TGT CM941138,nonsense,Y356Term,TACt-TAA
annotations CM010980,missense,K341R,AAG-AGG CM000549,missense,G344R,tGGT-CGT CM043318,nonsense,S349Term,TCA-TAA CM930560,nonsense,Y356Term,TACt-TAG
annotations CM981452,missense,S349L,TCA-TTA
position 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380
sequence K P K L L P L E L E K T A I Q N Y T V T
annotations CM971135,missense,P362T,gCCA-ACA CM068054,missense,K363N,AAGc-AAT CM993959,missense,L367P,CTG-CCG CM081726,missense,L369V,gCTG-GTG CM950895,missense,K371R,AAG-AGG CM961084,missense,T372S,gACA-TCA CM994329,missense,A373T,aGCC-ACC CM010984,missense,Y377C,TAC-TGC
annotations CM056671,missense,L367L,CTGg-CTA
annotations rs62508648,silent,L367L,CTG-CTA
position 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400
sequence E F Q P L Y Y V A E S F N D A K E K V R
annotations rs772897,silent,L385L,CTG-CTC CM950896,missense,Y386C,TAT-TGT CM061883,nonsense,Y387Term,TACg-TAA CM930564,missense,V388M,cGTG-ATG CM993579,missense,A389E,GCA-GAA CM045075,missense,S391I,AGT-ATT CM961085,missense,D394H,tGAT-CAT CM043319,missense,A395D,GCC-GAC rs1801150,silent,V399V,GTA-GTT CM056669,missense,R400K,AGG-AAG
annotations CM950897,missense,Y387H,tTAC-CAC CM981455,missense,V388L,cGTG-CTG CM981456,missense,A395G,GCC-GGC CM034748,missense,R400T,AGG-ACG
annotations CM930566,missense,A395P,tGCC-CCC rs1801151,silent,R400R,AGG-CGG
position 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420
sequence N F A A T I P R P F S V R Y D P Y T Q R
annotations CM030921,missense,F402L,cTTT-CTT CM992376,missense,I406T,ATA-ACA CM971136,missense,P407S,aCCT-TCT CM920562,missense,R408Q,CGG-CAG CM000551,missense,F410S,TTC-TCC CM003166,nonsense,S411Term,TCA-TGA CM920563,missense,R413P,CGC-CCC CM910294,missense,Y414C,TAC-TGC CM973397,missense,Y417N,aTAC-AAC CM043046,missense,T418N,ACC-AAC
annotations CM993960,missense,P407L,CCT-CTT CM870016,missense,R408W,tCGG-TGG CM068058,missense,F410C,TTC-TGC CM981457,missense,R413S,tCGC-AGC rs1801152,silent,Y414Y,TAC-TAT CM024632,missense,Y417H,aTAC-CAC CM920565,missense,T418P,cACC-CCC
annotations CM010985,missense,R413C,tCGC-TGC
position 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440
sequence I E V L D N T Q Q L K I L A D S I N S E
annotations CM993580,missense,E422K,tGAG-AAG CM993961,missense,L424S,TTG-TCG rs59326968,silent,N426N,AAT-AAC CM010986,missense,L430P,CTT-CCT CM043320,missense,A434D,GCT-GAT
position 441 442 443 444 445 446 447 448 449 450 451 452
sequence I G I L C S A L Q K I K
annotations CM961086,missense,A447D,GCC-GAC
annotations CM045079,missense,A447P,tGCC-CCC

In a more visual way the sequence colored by the possible mutations at that position. Red means that there are known mutations in HGMD for this residue but not in SNPdb. Green means that there are known mutations in SNPdb but not in HGMD. Blue means that there are known mutations in both databases for this residue. Black means that there are no records for this position in the databases.

MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDVNLTHIESRPSRLKKDEYEFF
THLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ
FADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF
RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIGLASLGAPDEYIEK
LATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVR
NFAATIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQKIK

Discussion

dbSNP

At first, we were quite surprised that we found only 16 silent mutation for the PAH gene. Normally, silent mutations are expected to appear more frequent in coding regions than missense/nonsense mutations because they are not subject to positive selection pressure. However, we assume that the few known silent mutations in PAH is probably a result of lack of data.

In a next step we analyzed the frequencies of the different possible allele mutations with the following results:

Allele Mutation Absolute Frequency Relative Frequency
A->T 1 0.0625
A->C 1 0.0625
A->G 4 0.25
T->A 0 0
T->C 1 0.0625
T->G 0 0
C->A 0 0
C->T 5 0.3125
C->G 1 0.0625
G->A 2 0.125
G->T 0 0
G->C 1 0.0625
Purine->Purine 6 0.375
Purine->Pyrimidine 3 0.1875
Pyrimidine->Pyrimidine 6 0.375
Pyrimidine->Purine 1 0.0625

We observed that C -> T and A -> G are the most frequent mutations with a relative frequency of 0.3125 and 0.25 respectively. The first most frequent mutation (C -> T) is a mutation from a pyrimidine base (C) to another pyrimidine base (T) and the second most frequent mutation is a mutation from a purine base (A) to another purine base (G). Also, in general purine -> purine and pyrimidine -> pyrimidine mutations are the most frequent ones with a relative frequency of 0.375 and 0.375 respectively.

This observation somehow reflects our expectations. Since it is more likely that a base gets wrongly incorporated during replication with a base of the same type. The reason for this is that the DNA-polymerase is differentiating the different nucleobase by shape. Adenine and guanine have almost the same shape since they have two rings. The same applies fro cytosine and thymine which also hardly differ regarding to their shape (both have only one ring). Hence, it is more likely for a DNA-polymerase to mistake a purine base with another purine base and to mistake a pyrimidine base with another pyrimidine base than to mistake a purine base with a pyrimidine and vice versa.