Mapping SNPs BCKDHA

From Bioinformatikpedia
Revision as of 22:02, 25 August 2011 by Reisinger (talk | contribs) (SNPs in human)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

General

Maple syrup urine disease is an autosomal recessive disorder that affects the amino acid metabolism. The disease is caused by a defect in the branched-chain alpha-keto acid dehydrogenase complex which blocks oxidative decarboxylation. The result is a rising concentration of branched-chain amino acids. MSUD is caused by mutations in the gene coding for the alpha subunit of the branched-chain keto acid dehydrogenase(BCKDHA).


Reference Sequences: Reference_Sequence_BCKDHA

HGMD

Searching the HGMD <ref>http://www.hgmd.cf.ac.uk/ac/index.php</ref> for "BCKDHA" a total of 39 mutations are reported, comprised of the following mutation types:

  • missense/nonsense: 33 mutations
  • small deletions: 3 mutations
  • small insertions: 1 mutation
  • gross deletions: 1 mutation
  • complex rearrangements: 1 mutation

For us the missense/nonsense mutations are the most interesting ones, as a single nucleotide change can lead to the phenotype of Maple Syrup Urine Disease.

Codon change Amino Acid change Codon number Position in our reference sequence
gCAG-GAG Gln-Glu 80 125
ACG-ATG Thr-Met 106 151
cCGG-TGG Arg-Trp 114 159
gTAT-AAAT Tyr-Asn 121 166
CGG-CAG Arg-Gln 122 167
cCAG-AAG Gln-Lys 145 190
ATC-ACC Ile-Thr 168 213
GCG-GTG Ala-Val 171 216
GCG-GTG Ala-Val 175 220
cGGC-AGC Gly-Ser 204 249
cGCT-ACT Ala-Thr 208 254
TGC-TAC Cys-Thr 213 258
cCGG-TGG Arg-Trp 220 265
AAT-AGT Asn-Ser 222 267
GGC-GAC Gly-Asp 238 283
tGCA-CCA Ala-Pro 240 285
aCGA-TGA Arg-Term 242 287
cGGG-AGG Gly-Arg 245 290
cCGC-TGC Arg-Cys 252 297
CGC-CAC Arg-His 252 297
tGGT-AGT Gly-Ser 255 300
GAT-GCT Asp-Ala 257 302
ACA-AGA Thr-Arg 265 310
cCGA-TGA Arg-Term 269 314
ATC-ACC Ile-Thr 281 326
cGAG-AAG Glu-Lys 282 327
gGCC-ACC Ala-Thr 283 328
CGC-CAC Arg-His 301 346
cCGG-TGG Arg-Trp 318 363
TTC-TGC Phe-Cys 364 409
cGTG-ATG Val-Met 367 412
TAT-TGT Tyr-Cys 368 413
cTAC-AAC Tyr-Asn 393 438

The mutations are given for a reference sequence, which can be found under the accession number NM_000709.3. This is a nucleotide sequence, which was translated using the Expasy Translate tool<ref>http://expasy.org/tools/dna.html</ref> into a protein sequence.

The following sequence shows the mutations annotatied in HGMD:

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK

dbSNP

Searching dbSNP<ref>http://www.ncbi.nlm.nih.gov/projects/SNP/</ref> for SNPs in BCKDHA one gets the following number of results:

-all: 742

-human: 371

After parsing the file for mutations in exons, 16 disease-causing mutations and 14 silent mutations remained. They are listed in the following tables.

SNPs in human

Missense Mutations annotated in dbSNP

RefSeq ID SNP Reference Sequence Nucleotide Position Nt old Nt new Codon Number Reference AA Mutated AA
rs10853751 TGGGCTCGGCGCGATGGAGGAGGAGA[C/T]GCATACTGACGCCAAAATCCGTGCT NP_064543.3 14 C T 6 Thr Met
rs111855817 TGCCCTCCTGCTGCTGCGGCAGCCTG[A/G]GGCTCGGGGACTGGCTAGATCTGTG NP_001158255.1 86 G A 29 Gly Glu
rs34500671 TCTGGCCGCGACAGCAGGTTCTGTTC[C/G]CAGGCAAAGTGCCGGAGGCTGCAGC NP_064543.3 99 C G 33 Cys Trp
rs34589432 CCTCTGCTCTCTTCCCCAGCACCCCC[A/C]CAGGCAGCAGCAGCAGTTTTCATCT NP_001158255.1 116 C A 39 Pro His
rs11549938 TCTCTGGAATCCCCATCTACCGCGTC[A/C]TGGACCGGCAAGGCCAGATCATCAA NP_001158255.1 244 A C 82 Met Leu
rs34442879 GGGGAGTGCCGCCGCCCTGGACAACA[C/T]GGACCTGGTGTTTGGCCAGTACCGG NP_001158255.1 452 C T 151 Thr Met
rs34956071 TAGGTGTGCTGATGTATCGGGACTAC[C/T]CCCTGGAACTATTCATGGCCCAGTG NP_001158255.1 508 C T 170 Pro Ser
rs28940288 ACTTCGGCGAGGGGGCAGCCAGTGAG[A/G]GGGACGCCCATGCCGGCTTCAACTT NP_001158255.1 730 G A 244 Gly Arg
rs137852874 CAGCCAGTGAGGGGGACGCCCATGCC[A/G]GCTTCAACTTCGCTGCCACACTTGA NP_001158255.1 745 G A 249 Gly Ser
rs137852876 CTTGAGTGCCCCATCATCTTCTTCTG[C/G]CGGAACAATGGCTACGCCATCTCCA NP_001158255.1 792 C G 264 Cys Trp
rs137852873 TTGAGTGCCCCATCATCTTCTTCTGC[C/T]GGAACAATGGCTACGCCATCTCCAC NP_001158255.1 793 C T 265 Arg Trp
rs137852871 GTGTCCCCACAGCAGCACGAGGCCCC[A/G]GGTATGGCATCATGTCAATCCGCGT NP_001158255.1 865 G A 409 Phe Cys
rs137852870 GCCACCTGCAGACCTACGGGGAGCAC[A/T]ACCCACTGGATCACTTCGATAAGTG NP_001158255.1 1309 T A 438 Tyr Asn


The missense dbSNP mutation positions in the amino acid sequence are relative the RefSeq entry NP_001158255.1 and NP_064543.3, respectively.

The following sequence shows the mutations annotated in dbSNP which lead to disease phenotype. Mutations indicate a mismatching reference amino acid from dbSNP. MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK


Silent mutations annotated in dbSNP

RefSeq ID RefCodon Mutated Codon Reference Allele Mutated Allele Mutation Frame Codon Number Reference Residue Mutated Residue
rs17173144 TGC TGT C T 3 5 I I
rs34541442 TTA ATA C A 1 12 R R
rs75733136 ATC ATA C A 3 19 S S
rs34169026 AGC AGT C T 3 32 A A
rs62637712 CTG CTT C T 3 38 P P
rs80014754 CTG CTA C A 3 39 P P
rs11549937 GAC GAC G C 3 97 L L
rs10404506 GAA GAT C T 3 213 I I
rs114716391 TTC TTT G T 3 216 A A
rs61737367 AAC AAT C T 3 280 R R
rs284652 ACA ACT C T 3 324 F F
rs55940366 AAG AAT C T 3 325 L L
rs4674 GCC GCG A G 3 407 L L
rs34492894 CCC CCT C T 3 419 L L


MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK


Point mutations can have an influence on the amino acids depending on the kind of the point mutation. There are two different types: synonymous and non-synonymous mutations.

If a point mutation is synonymous it means that the change occurs only in the nucleotide sequence but not in the amino acid sequence. This is possible because of the fact that amino acids are encoded by three nucleotides (codons) and some of the amino acids are encoded by more than one possible arrangement of nucleotides. So it can happen that when there is a mutation in the nucleotide sequence there is also a change in the codon but both codons encode the same amino acid.

The other possibility is that a mutation is non-synonymous which means that the mutation has an influence on the amino acid sequence and the amino acid changes. This change can have more or less severe effects because amino acids have several properties. When an amino acid is replaced by an amino acid which has the same properties the change is not as grave as the change to an amino acid with completely different properties.

Mutation Map

Reference Sequence Alignments

To map the different mutations from different sources onto the same sequence, first the reference sequences needed to be compared. Herefore we performed pairwise alignments for the following sequences:

  • NP_000700.1 and (source: Uniprot)

The alignment for these two sequences is perfect, the identity is 100%. This indicates that the reference sequence from dbSNP is the same one we were working with before.

  • NP_000700.1 and NP_064543.3

The alignment for these two sequences showed only 9.9% identity and 17.3% similarity, whereas the 63.2% are gaps. As this alignment is not good enough to assume similar sequences, the SNPs found with reference sequence NP_064543.3 are ignored.

  • NM_000709.3 and translated NP_000700.1

The alignment for the HGMD reference sequence and the translated dbSNP reference sequence shows 97.2% identity. The only difference in these sequences is a short oligopeptide at the beginning of the HGMD reference sequence. This oligo is 13 amino acids long. These 13 positions have to be taken into account when mapping the SNPs onto the same sequence. As we found out, the HGMD codon positions are relative to the start codon of the protein, so the signal peptide of 45 aa have to be taken into account (add 45 to codon position), but the additional 13 aa, by what the sequence differs from the dbSNP reference sequence can be ignored.


Disease causing SNPs in HGMD and dbSNP

The following table shows disease causing SNPs which were found in both HGMD and dbSNP.

RefSeq ID RefCodon Mutated Codon Reference Allele Mutated Allele Mutation Frame Codon Number Reference Residue Mutated Residue
rs34442879 GAG GTG C T 2 151 T M
rs137852874 TAC AAC G A 1 249 G S
rs137852873 AAC TAC C T 1 265 R W
rs137852871 ACC ACC G A 1 290 G R
rs137852875 TCA TGA C G 2 310 T R
rs137852872 GAG GGG T G 2 409 F C
rs137852870 CAG AAG T A 1 438 Y N

Mutation Map

The following sequence shows the protein BCKDHA, with disease causing mutation positions coloured as described below.

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK


Colour code:

Mutations listed in both Databases

Mutations listed only in HGMD

Mutations listed only in dbSNP

Mutation Analysis

The extracted mutations from HGMD and dbSNP for the BCKDHA gene are analyzed in the following section:

Figure 1: Histogram showing the number of exchanges for each amino acid

Figure 1 shows how often each amino acid is exchanged. The amino acid mutated in most of the SNPs is arginine, followed by alanine and glycine. These three amino acids show almost no important biophysical properties and therefore seem to be more prone to amino acid exchanges. The only amino acids which were not involved in a mutation are histidine, lysine and tryptophan. This can be due to the uniqueness of the amino acid structure especially of histidine and tryptophan.

Figure 2: Frequency of mutations for the different positions in a codon frame

Figure 2 shows that most of the disease causing amino acid exchanges are due to mutations on the first two sites of a codon. Mutations on the third codon position lead more often to silent mutations. This observation agrees with the common opinion that the first two sites of each codon are generally irreplaceable and a mutation here leads to a different amino acid. The third position of a codon leads not always to a change in the amino acids and therefore these mutations are silent. This is due to the fact that the genetic code is degenerate<ref>F.H.C. Crick, Leslie Barnett, S. Brenner and R.J. Watts-Tobin: General Nature of the Genetic Code for Proteins, Nature(192), 1961</ref>. Most of the amino acids are coded for by more than one codon. These synonymous codons usually differ only in the last position<ref>http://en.wikipedia.org/wiki/Genetic_code#Degeneracy</ref>.

In the next step we are going to look at the frequency of each observed amino acid exchange in HGMD and dbSNP.

Figure 3: Heatmap for all missense mutations listed in HGMD and dbSNP, showing the frequency of amino acid exchanges for each pair of amino acids (x-axis: reference aa, y-axis: mutated aa, Ter: Stopp codon).

Figure 3 shows a heatmap for all amino acid pairs including the mutation leading to a stop codon. The amino acid exchanges which take place most often are:

  • Arg => Trp
  • Gly => Arg
  • Gly => Ser
  • Tyr => Asn

The physiochemical properties of these amino acid substitutions are quite different. The first mutation, Arg => Trp, introduces a very bulky, hydrophobic amino acid and the positive charge of arginine gets lost. The mutations of glycine are often harmful for a protein's function, as the amino acids unique smallness is advantageous in many positions<ref>M.O. Dayhoff, R.M. Schwartz, B.C. Orcutt: A Model of Evolutionary Change in Proteins, Atlas of Protein Sequence and Structure, 1978</ref>. The Tyr => Asn mutation exchanges an hydrophobic aromatic residue with a small polar amino acid. This substitution is also very likely to have an effect on the protein's function. Looking at the biochemical properties all these amino acid exchanges are not effectless. This is what we expected as the mutations were taken from HGMD and dbSNP, which listed many disease causing mutations.


References

<references/>

go back to Maple syrup urine disease Main page

go back to Task 4 Homology based structure predictions

go to Task 6 Sequence based mutation analysis