Mapping SNPs BCKDHA
Contents
General
Maple syrup urine disease is an autosomal recessive disorder that affects the amino acid metabolism. The disease is caused by a defect in the branched-chain alpha-keto acid dehydrogenase complex which blocks oxidative decarboxylation. The result is a rising concentration of branched-chain amino acids. MSUD is caused by mutations in the gene coding for the alpha subunit of the branched-chain keto acid dehydrogenase(BCKDHA).
Reference Sequences: Reference_Sequence_BCKDHA
HGMD
Searching the HGMD <ref>http://www.hgmd.cf.ac.uk/ac/index.php</ref> for "BCKDHA" a total of 39 mutations are reported, comprised of the following mutation types:
- missense/nonsense: 33 mutations
- small deletions: 3 mutations
- small insertions: 1 mutation
- gross deletions: 1 mutation
- complex rearrangements: 1 mutation
For us the missense/nonsense mutations are the most interesting ones, as a single nucleotide change can lead to the phenotype of Maple Syrup Urine Disease.
Codon change | Amino Acid change | Codon number | Position in our reference sequence |
---|---|---|---|
gCAG-GAG | Gln-Glu | 80 | 125 |
ACG-ATG | Thr-Met | 106 | 151 |
cCGG-TGG | Arg-Trp | 114 | 159 |
gTAT-AAAT | Tyr-Asn | 121 | 166 |
CGG-CAG | Arg-Gln | 122 | 167 |
cCAG-AAG | Gln-Lys | 145 | 190 |
ATC-ACC | Ile-Thr | 168 | 213 |
GCG-GTG | Ala-Val | 171 | 216 |
GCG-GTG | Ala-Val | 175 | 220 |
cGGC-AGC | Gly-Ser | 204 | 249 |
cGCT-ACT | Ala-Thr | 208 | 254 |
TGC-TAC | Cys-Thr | 213 | 258 |
cCGG-TGG | Arg-Trp | 220 | 265 |
AAT-AGT | Asn-Ser | 222 | 267 |
GGC-GAC | Gly-Asp | 238 | 283 |
tGCA-CCA | Ala-Pro | 240 | 285 |
aCGA-TGA | Arg-Term | 242 | 287 |
cGGG-AGG | Gly-Arg | 245 | 290 |
cCGC-TGC | Arg-Cys | 252 | 297 |
CGC-CAC | Arg-His | 252 | 297 |
tGGT-AGT | Gly-Ser | 255 | 300 |
GAT-GCT | Asp-Ala | 257 | 302 |
ACA-AGA | Thr-Arg | 265 | 310 |
cCGA-TGA | Arg-Term | 269 | 314 |
ATC-ACC | Ile-Thr | 281 | 326 |
cGAG-AAG | Glu-Lys | 282 | 327 |
gGCC-ACC | Ala-Thr | 283 | 328 |
CGC-CAC | Arg-His | 301 | 346 |
cCGG-TGG | Arg-Trp | 318 | 363 |
TTC-TGC | Phe-Cys | 364 | 409 |
cGTG-ATG | Val-Met | 367 | 412 |
TAT-TGT | Tyr-Cys | 368 | 413 |
cTAC-AAC | Tyr-Asn | 393 | 438 |
The mutations are given for a reference sequence, which can be found under the accession number NM_000709.3. This is a nucleotide sequence, which was translated using the Expasy Translate tool<ref>http://expasy.org/tools/dna.html</ref> into a protein sequence.
The following sequence shows the mutations annotatied in HGMD:
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK
dbSNP
Searching dbSNP<ref>http://www.ncbi.nlm.nih.gov/projects/SNP/</ref> for SNPs in BCKDHA one gets the following number of results:
-all: 742
-human: 371
After parsing the file for mutations in exons, 16 disease-causing mutations and 14 silent mutations remained. They are listed in the following tables.
SNPs in human
Missense Mutations annotated in dbSNP
RefSeq ID | SNP | Reference Sequence | Nucleotide Position | Nt old | Nt new | Codon Number | Reference AA | Mutated AA |
---|---|---|---|---|---|---|---|---|
rs10853751 | TGGGCTCGGCGCGATGGAGGAGGAGA[C/T]GCATACTGACGCCAAAATCCGTGCT | NP_064543.3 | 14 | C | T | 6 | Thr | Met |
rs111855817 | TGCCCTCCTGCTGCTGCGGCAGCCTG[A/G]GGCTCGGGGACTGGCTAGATCTGTG | NP_001158255.1 | 86 | G | A | 29 | Gly | Glu |
rs34500671 | TCTGGCCGCGACAGCAGGTTCTGTTC[C/G]CAGGCAAAGTGCCGGAGGCTGCAGC | NP_064543.3 | 99 | C | G | 33 | Cys | Trp |
rs34589432 | CCTCTGCTCTCTTCCCCAGCACCCCC[A/C]CAGGCAGCAGCAGCAGTTTTCATCT | NP_001158255.1 | 116 | C | A | 39 | Pro | His |
rs11549938 | TCTCTGGAATCCCCATCTACCGCGTC[A/C]TGGACCGGCAAGGCCAGATCATCAA | NP_001158255.1 | 244 | A | C | 82 | Met | Leu |
rs34442879 | GGGGAGTGCCGCCGCCCTGGACAACA[C/T]GGACCTGGTGTTTGGCCAGTACCGG | NP_001158255.1 | 452 | C | T | 151 | Thr | Met |
rs34956071 | TAGGTGTGCTGATGTATCGGGACTAC[C/T]CCCTGGAACTATTCATGGCCCAGTG | NP_001158255.1 | 508 | C | T | 170 | Pro | Ser |
rs28940288 | ACTTCGGCGAGGGGGCAGCCAGTGAG[A/G]GGGACGCCCATGCCGGCTTCAACTT | NP_001158255.1 | 730 | G | A | 244 | Gly | Arg |
rs137852874 | CAGCCAGTGAGGGGGACGCCCATGCC[A/G]GCTTCAACTTCGCTGCCACACTTGA | NP_001158255.1 | 745 | G | A | 249 | Gly | Ser |
rs137852876 | CTTGAGTGCCCCATCATCTTCTTCTG[C/G]CGGAACAATGGCTACGCCATCTCCA | NP_001158255.1 | 792 | C | G | 264 | Cys | Trp |
rs137852873 | TTGAGTGCCCCATCATCTTCTTCTGC[C/T]GGAACAATGGCTACGCCATCTCCAC | NP_001158255.1 | 793 | C | T | 265 | Arg | Trp |
rs137852871 | GTGTCCCCACAGCAGCACGAGGCCCC[A/G]GGTATGGCATCATGTCAATCCGCGT | NP_001158255.1 | 865 | G | A | 409 | Phe | Cys |
rs137852870 | GCCACCTGCAGACCTACGGGGAGCAC[A/T]ACCCACTGGATCACTTCGATAAGTG | NP_001158255.1 | 1309 | T | A | 438 | Tyr | Asn |
The missense dbSNP mutation positions in the amino acid sequence are relative the RefSeq entry NP_001158255.1 and NP_064543.3, respectively.
The following sequence shows the mutations annotated in dbSNP which lead to disease phenotype. Mutations indicate a mismatching reference amino acid from dbSNP.
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK
Silent mutations annotated in dbSNP
RefSeq ID | RefCodon | Mutated Codon | Reference Allele | Mutated Allele | Mutation Frame | Codon Number | Reference Residue | Mutated Residue |
---|---|---|---|---|---|---|---|---|
rs17173144 | TGC | TGT | C | T | 3 | 5 | I | I |
rs34541442 | TTA | ATA | C | A | 1 | 12 | R | R |
rs75733136 | ATC | ATA | C | A | 3 | 19 | S | S |
rs34169026 | AGC | AGT | C | T | 3 | 32 | A | A |
rs62637712 | CTG | CTT | C | T | 3 | 38 | P | P |
rs80014754 | CTG | CTA | C | A | 3 | 39 | P | P |
rs11549937 | GAC | GAC | G | C | 3 | 97 | L | L |
rs10404506 | GAA | GAT | C | T | 3 | 213 | I | I |
rs114716391 | TTC | TTT | G | T | 3 | 216 | A | A |
rs61737367 | AAC | AAT | C | T | 3 | 280 | R | R |
rs284652 | ACA | ACT | C | T | 3 | 324 | F | F |
rs55940366 | AAG | AAT | C | T | 3 | 325 | L | L |
rs4674 | GCC | GCG | A | G | 3 | 407 | L | L |
rs34492894 | CCC | CCT | C | T | 3 | 419 | L | L |
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK
Point mutations can have an influence on the amino acids depending on the kind of the point mutation. There are two different types: synonymous and non-synonymous mutations.
If a point mutation is synonymous it means that the change occurs only in the nucleotide sequence but not in the amino acid sequence. This is possible because of the fact that amino acids are encoded by three nucleotides (codons) and some of the amino acids are encoded by more than one possible arrangement of nucleotides. So it can happen that when there is a mutation in the nucleotide sequence there is also a change in the codon but both codons encode the same amino acid.
The other possibility is that a mutation is non-synonymous which means that the mutation has an influence on the amino acid sequence and the amino acid changes. This change can have more or less severe effects because amino acids have several properties. When an amino acid is replaced by an amino acid which has the same properties the change is not as grave as the change to an amino acid with completely different properties.
Mutation Map
Reference Sequence Alignments
To map the different mutations from different sources onto the same sequence, first the reference sequences needed to be compared. Herefore we performed pairwise alignments for the following sequences:
- NP_000700.1 and (source: Uniprot)
The alignment for these two sequences is perfect, the identity is 100%. This indicates that the reference sequence from dbSNP is the same one we were working with before.
- NP_000700.1 and NP_064543.3
The alignment for these two sequences showed only 9.9% identity and 17.3% similarity, whereas the 63.2% are gaps. As this alignment is not good enough to assume similar sequences, the SNPs found with reference sequence NP_064543.3 are ignored.
- NM_000709.3 and translated NP_000700.1
The alignment for the HGMD reference sequence and the translated dbSNP reference sequence shows 97.2% identity. The only difference in these sequences is a short oligopeptide at the beginning of the HGMD reference sequence. This oligo is 13 amino acids long. These 13 positions have to be taken into account when mapping the SNPs onto the same sequence. As we found out, the HGMD codon positions are relative to the start codon of the protein, so the signal peptide of 45 aa have to be taken into account (add 45 to codon position), but the additional 13 aa, by what the sequence differs from the dbSNP reference sequence can be ignored.
Disease causing SNPs in HGMD and dbSNP
The following table shows disease causing SNPs which were found in both HGMD and dbSNP.
RefSeq ID | RefCodon | Mutated Codon | Reference Allele | Mutated Allele | Mutation Frame | Codon Number | Reference Residue | Mutated Residue |
---|---|---|---|---|---|---|---|---|
rs34442879 | GAG | GTG | C | T | 2 | 151 | T | M |
rs137852874 | TAC | AAC | G | A | 1 | 249 | G | S |
rs137852873 | AAC | TAC | C | T | 1 | 265 | R | W |
rs137852871 | ACC | ACC | G | A | 1 | 290 | G | R |
rs137852875 | TCA | TGA | C | G | 2 | 310 | T | R |
rs137852872 | GAG | GGG | T | G | 2 | 409 | F | C |
rs137852870 | CAG | AAG | T | A | 1 | 438 | Y | N |
Mutation Map
The following sequence shows the protein BCKDHA, with disease causing mutation positions coloured as described below.
MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK
Colour code:
Mutations listed in both Databases
Mutations listed only in HGMD
Mutations listed only in dbSNP
Mutation Analysis
The extracted mutations from HGMD and dbSNP for the BCKDHA gene are analyzed in the following section:
Figure 1 shows how often each amino acid is exchanged. The amino acid mutated in most of the SNPs is arginine, followed by alanine and glycine. These three amino acids show almost no important biophysical properties and therefore seem to be more prone to amino acid exchanges. The only amino acids which were not involved in a mutation are histidine, lysine and tryptophan. This can be due to the uniqueness of the amino acid structure especially of histidine and tryptophan.
Figure 2 shows that most of the disease causing amino acid exchanges are due to mutations on the first two sites of a codon. Mutations on the third codon position lead more often to silent mutations. This observation agrees with the common opinion that the first two sites of each codon are generally irreplaceable and a mutation here leads to a different amino acid. The third position of a codon leads not always to a change in the amino acids and therefore these mutations are silent. This is due to the fact that the genetic code is degenerate<ref>F.H.C. Crick, Leslie Barnett, S. Brenner and R.J. Watts-Tobin: General Nature of the Genetic Code for Proteins, Nature(192), 1961</ref>. Most of the amino acids are coded for by more than one codon. These synonymous codons usually differ only in the last position<ref>http://en.wikipedia.org/wiki/Genetic_code#Degeneracy</ref>.
In the next step we are going to look at the frequency of each observed amino acid exchange in HGMD and dbSNP.
Figure 3 shows a heatmap for all amino acid pairs including the mutation leading to a stop codon. The amino acid exchanges which take place most often are:
- Arg => Trp
- Gly => Arg
- Gly => Ser
- Tyr => Asn
The physiochemical properties of these amino acid substitutions are quite different. The first mutation, Arg => Trp, introduces a very bulky, hydrophobic amino acid and the positive charge of arginine gets lost. The mutations of glycine are often harmful for a protein's function, as the amino acids unique smallness is advantageous in many positions<ref>M.O. Dayhoff, R.M. Schwartz, B.C. Orcutt: A Model of Evolutionary Change in Proteins, Atlas of Protein Sequence and Structure, 1978</ref>. The Tyr => Asn mutation exchanges an hydrophobic aromatic residue with a small polar amino acid. This substitution is also very likely to have an effect on the protein's function. Looking at the biochemical properties all these amino acid exchanges are not effectless. This is what we expected as the mutations were taken from HGMD and dbSNP, which listed many disease causing mutations.
References
<references/>
go back to Maple syrup urine disease Main page
go back to Task 4 Homology based structure predictions
go to Task 6 Sequence based mutation analysis