Difference between revisions of "Mapping SNPs BCKDHA"

From Bioinformatikpedia
(Mutation Map)
(Mutation Analysis)
Line 378: Line 378:
 
The extracted mutations from HGMD and dbSNP for the BCKDHA gene are analyzed in the following section:
 
The extracted mutations from HGMD and dbSNP for the BCKDHA gene are analyzed in the following section:
   
[[File:BCKDHA_AAexchanges.png|thumb|300px|center|Histogram showing the number of exchanges for each amino acid]]
+
[[File:BCKDHA_AAexchanges.png|thumb|400px|center|Figure 1: Histogram showing the number of exchanges for each amino acid]]
   
  +
Figure 1 shows how often each amino acid is exchanged. The amino acid mutated in most of the SNPs is arginine, followed by alanine and glycine.
[[File:BCKDHA_frameMutations.png|thumb|300px|center|Frequence of mutations for the different positions in a codon frame]]
 
  +
The only amino acids which were not involved in a mutation are histidine, lysine and tryptophan.
   
[[File:BCKDHA_heatmap.png|thumb|300px|Heatmap for all missense mutations listed in HGMD and dbSNP, showing the frequency of amino acid exchanges for each pair of amino acids.]]
+
[[File:BCKDHA_frameMutations.png|thumb|400px|center|Figure 2: Frequency of mutations for the different positions in a codon frame]]
  +
  +
Figure 2 agrees with the common opinion that the first two sites of each codon are generally irreplaceable and a mutation here leads to a different amino acid. The third position of a codon leads not always to a change in the amino acids and therefore these mutations are silent.
  +
  +
[[File:BCKDHA_heatmap.png|thumb|400px|center|Figure 3: Heatmap for all missense mutations listed in HGMD and dbSNP, showing the frequency of amino acid exchanges for each pair of amino acids (x-axis: reference aa, y-axis: mutated aa, Ter: Stopp codon).]]
  +
  +
Figure 3 shows a heatmap for all amino acid pairs including the mutation leading to a stop codon. The amino acid exchanges which take place most often are:
  +
* Arg => Trp
  +
* Gly => Arg
  +
* Gly => Ser
  +
* Tyr => Asn

Revision as of 14:42, 19 June 2011

General

Maple syrup urine disease is an autosomal recessive disorder that affects the amino acid metabolism. The disease is caused by a defect in the branched-chain alpha-keto acid dehydrogenase complex which blocks oxidative decarboxylation. The result is a rising concentration of branched-chain amino acids. MSUD is caused by mutations in the gene coding for the alpha subunit of the branched-chain keto acid dehydrogenase(BCKDHA).


Reference Sequences: Reference_Sequence_BCKDHA

HGMD

Searching for "BCKDHA" a total of 39 mutations are reported, comprised of the following mutation types:

  • missense/nonsense: 33 mutations
  • small deletions: 3 mutations
  • small insertions: 1 mutation
  • gross deletions: 1 mutation
  • complex rearrangements: 1 mutation

For us the missense/nonsense mutations are the most interesting ones, as a single nucleotide change can lead to the phenotype of Maple Syrup Urine Disease.

Codon change Amino Acid change Codon number
gCAG-GAG Gln-Glu 80
ACG-ATG THr-Met 106
cCGG-TGG Arg-Trp 114
gTAT-AAAT Tyr-Asn 121
CGG-CAG Arg-Gln 122
cCAG-AAG Gln-Lys 145
ATC-ACC Ile-Thr 168
GCG-GTG Ala-Val 171
GCG-GTG Ala-Val 175
cGGC-AGC Gly-Ser 204
cGCT-ACT Ala-Thr 208
TGC-TAC Cys-Thr 213
cCGG-TGG Arg-Trp 220
AAT-AGT Asn-Ser 222
GGC-GAC Gly-Asp 238
tGCA-CCA Ala-Pro 240
aCGA-TGA Arg-Term 242
cGGG-AGG Gly-Arg 245
cCGC-TGC Arg-Cys 252
CGC-CAC Arg-His 252
tGGT-AGT Gly-Ser 255
GAT-GCT Asp-Ala 257
ACA-AGA Thr-Arg 265
cCGA-TGA Arg-Term 269
ATC-ACC Ile-Thr 281
cGAG-AAG Glu-Lys 282
gGCC-ACC Ala-Thr 283
CGC-CAC Arg-His 301
cCGG-TGG Arg-Trp 318
TTC-TGC Phe-Cys 364
cGTG-ATG Val-Met 367
TAT-TGT Tyr-Cys 368
cTAC-AAC Tyr-Asn 393

The mutations are given for a reference sequence, which can be found under the accession number NM_000709.3. This is a nucleotide sequence, which was translated using the Expasy Translate tool(<ref>http://expasy.org/tools/dna.html</ref>) into a protein sequence.

The following sequence shows the mutations annotatied in HGMD:

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK

dbSNP

results for the SNPs search with BCKDHA:

-all: 742

-human: 371

After parsing the file for mutations in exons, 16 disease-causing mutations and 14 silent mutations remained. They are listed in the following tables.

SNPs in human

Missense Mutations annotated in dbSNP

RefSeq ID RefCodon Mutated Codon Reference Allele Mutated Allele Mutation Frame Codon Number Reference Residue Mutated Residue
rs10853751 TGC TTC C T 2 5 T M
rs111855817 CGT CAT G A 2 29 G E
rs34500671 CAG CAG C G 3 31 C W
rs34589432 CTG CAG C A 2 39 P H
rs11549938 ATC CTC A C 1 82 M L
rs34442879 GAG GTG C T 2 151 T M
rs34956071 GGC TGC C T 1 170 P S
rs28940288 AGG AGG G A 1 244 G R
rs137852874 TAC AAC G A 1 249 G S
rs137852876 TTC TTG C G 3 264 C W
rs137852873 AAC TAC C T 1 265 R W
rs137852871 ACC ACC G A 1 289 G R
rs137852875 TCA TGA C G 2 309 T R
rs61736656 CGC GGC A G 1 360 I V
rs137852872 GAG GGG T G 2 409 F C
rs137852870 CAG AAG T A 1 438 Y N


The missense dbSNP mutation positions in the amino acid sequence are relative the RefSeq entry NP_000700.1 and NP_064543.3, respectively.

The following sequence shows the mutations annotated in dbSNP which lead to disease phenotype. Mutations indicate a mismatching reference amino acid from dbSNP. MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK

Silent mutations annotated in dbSNP

RefSeq ID RefCodon Mutated Codon Reference Allele Mutated Allele Mutation Frame Codon Number Reference Residue Mutated Residue
rs17173144 TGC TGT C T 3 5 I I
rs34541442 TTA ATA C A 1 12 R R
rs75733136 ATC ATA C A 3 19 S S
rs34169026 AGC AGT C T 3 32 A A
rs62637712 CTG CTT C T 3 38 P P
rs80014754 CTG CTA C A 3 39 P P
rs11549937 GAC GAC G C 3 97 L L
rs10404506 GAA GAT C T 3 213 I I
rs114716391 TTC TTT G T 3 216 A A
rs61737367 AAC AAT C T 3 280 R R
rs284652 ACA ACT C T 3 324 F F
rs55940366 AAG AAT C T 3 325 L L
rs4674 GCC GCG A G 3 407 L L
rs34492894 CCC CCT C T 3 419 L L

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK


Point mutations can have an influence on the amino acids or not depending on the kind of the point mutation. There are two different types: synonymous and non-synonymous mutations.

If a point mutation is synonymous it means that the change occurs only in the nucleotide sequence but not in the amino acid sequence. This is possible because of the fact that amino acids are encoded by three nucleotides (codons) and some of the amino acids are encoded by more than one possible arrangement of nucleotides. So it can happen that when there is a mutation in the nucleotide sequence there is also a change in the codon but both codons encode the same amino acid.

The other possibility is that a mutation is non-synonymous which means that the mutation has an influence on the amino acid sequence and the amino acid changes. This change can have more or less severe because amino acids have several properties. When an amino acid is replaced by an amino acid which has the same properties the change is not so grave as the change to an amino acid with completely different properties.

Mutation Map

Reference Sequence Alignments

To map the different mutations from different sources onto the same sequence, first the reference sequences needed to be compared. Herefore we performed pairwise alignments for the following sequences:

  • NP_000700.1 and (source: Uniprot)

The alignment for these two sequences is perfect, the identity is 100%. This indicates that the reference sequence from dbSNP is the same one we were working with before.

  • NP_000700.1 and NP_064543.3

The alignment for these two sequences showed only 9.9% identity and 17.3% similarity, whereas the 63.2% are gaps. As this alignment is not good enough to assume similar sequences, the SNPs found with reference sequence NP_064543.3 are ignored.

  • NM_000709.3 and translated NP_000700.1

The alignment for the HGMD reference sequence and the translated dbSNP reference sequence shows 97.2% identity. The only difference in these sequences is a short oligopeptide at the beginning of the HGMD reference sequence. This oligo is 13 amino acids long. These 13 positions have to be taken into account when mapping the SNPs onto the same sequence. As we found out, the HGMD codon positions are relative to the start codon of the protein, so the signal peptide of 45 aa have to be taken into account (add 45 to codon position), but the additional 13 aa, by what the sequence differs from the dbSNP reference sequence can be ignored.


Mutation Map

The following sequence shows the protein BCKDHA, with disease causing mutation positions coloured as described below.

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF
IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE
SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN
ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA
SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN
DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI
SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR
KQQESLARHLQTYGEHYPLDHFDK


Colour code:

Mutations listed in both Databases

Mutations listed only in HGMD

Mutations listed only in dbSNP

Mutation Analysis

The extracted mutations from HGMD and dbSNP for the BCKDHA gene are analyzed in the following section:

Figure 1: Histogram showing the number of exchanges for each amino acid

Figure 1 shows how often each amino acid is exchanged. The amino acid mutated in most of the SNPs is arginine, followed by alanine and glycine. The only amino acids which were not involved in a mutation are histidine, lysine and tryptophan.

Figure 2: Frequency of mutations for the different positions in a codon frame

Figure 2 agrees with the common opinion that the first two sites of each codon are generally irreplaceable and a mutation here leads to a different amino acid. The third position of a codon leads not always to a change in the amino acids and therefore these mutations are silent.

Figure 3: Heatmap for all missense mutations listed in HGMD and dbSNP, showing the frequency of amino acid exchanges for each pair of amino acids (x-axis: reference aa, y-axis: mutated aa, Ter: Stopp codon).

Figure 3 shows a heatmap for all amino acid pairs including the mutation leading to a stop codon. The amino acid exchanges which take place most often are:

  • Arg => Trp
  • Gly => Arg
  • Gly => Ser
  • Tyr => Asn