Difference between revisions of "Mapping SNPs BCKDHA"

Latest revision as of 21:02, 25 August 2011

General

Maple syrup urine disease is an autosomal recessive disorder that affects the amino acid metabolism. The disease is caused by a defect in the branched-chain alpha-keto acid dehydrogenase complex which blocks oxidative decarboxylation. The result is a rising concentration of branched-chain amino acids. MSUD is caused by mutations in the gene coding for the alpha subunit of the branched-chain keto acid dehydrogenase(BCKDHA).

Reference Sequences: Reference_Sequence_BCKDHA

HGMD

Searching the HGMD <ref>http://www.hgmd.cf.ac.uk/ac/index.php</ref> for "BCKDHA" a total of 39 mutations are reported, comprised of the following mutation types:

missense/nonsense: 33 mutations
small deletions: 3 mutations
small insertions: 1 mutation
gross deletions: 1 mutation
complex rearrangements: 1 mutation

For us the missense/nonsense mutations are the most interesting ones, as a single nucleotide change can lead to the phenotype of Maple Syrup Urine Disease.

Codon change	Amino Acid change	Codon number	Position in our reference sequence
gCAG-GAG	Gln-Glu	80	125
ACG-ATG	Thr-Met	106	151
cCGG-TGG	Arg-Trp	114	159
gTAT-AAAT	Tyr-Asn	121	166
CGG-CAG	Arg-Gln	122	167
cCAG-AAG	Gln-Lys	145	190
ATC-ACC	Ile-Thr	168	213
GCG-GTG	Ala-Val	171	216
GCG-GTG	Ala-Val	175	220
cGGC-AGC	Gly-Ser	204	249
cGCT-ACT	Ala-Thr	208	254
TGC-TAC	Cys-Thr	213	258
cCGG-TGG	Arg-Trp	220	265
AAT-AGT	Asn-Ser	222	267
GGC-GAC	Gly-Asp	238	283
tGCA-CCA	Ala-Pro	240	285
aCGA-TGA	Arg-Term	242	287
cGGG-AGG	Gly-Arg	245	290
cCGC-TGC	Arg-Cys	252	297
CGC-CAC	Arg-His	252	297
tGGT-AGT	Gly-Ser	255	300
GAT-GCT	Asp-Ala	257	302
ACA-AGA	Thr-Arg	265	310
cCGA-TGA	Arg-Term	269	314
ATC-ACC	Ile-Thr	281	326
cGAG-AAG	Glu-Lys	282	327
gGCC-ACC	Ala-Thr	283	328
CGC-CAC	Arg-His	301	346
cCGG-TGG	Arg-Trp	318	363
TTC-TGC	Phe-Cys	364	409
cGTG-ATG	Val-Met	367	412
TAT-TGT	Tyr-Cys	368	413
cTAC-AAC	Tyr-Asn	393	438

The mutations are given for a reference sequence, which can be found under the accession number NM_000709.3. This is a nucleotide sequence, which was translated using the Expasy Translate tool<ref>http://expasy.org/tools/dna.html</ref> into a protein sequence.

The following sequence shows the mutations annotatied in HGMD:

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR KQQESLARHLQTYGEHYPLDHFDK

dbSNP

Searching dbSNP<ref>http://www.ncbi.nlm.nih.gov/projects/SNP/</ref> for SNPs in BCKDHA one gets the following number of results:

-all: 742

-human: 371

After parsing the file for mutations in exons, 16 disease-causing mutations and 14 silent mutations remained. They are listed in the following tables.

SNPs in human

Missense Mutations annotated in dbSNP

RefSeq ID	SNP	Reference Sequence	Nucleotide Position	Nt old	Nt new	Codon Number	Reference AA	Mutated AA
rs10853751	TGGGCTCGGCGCGATGGAGGAGGAGA[C/T]GCATACTGACGCCAAAATCCGTGCT	NP_064543.3	14	C	T	6	Thr	Met
rs111855817	TGCCCTCCTGCTGCTGCGGCAGCCTG[A/G]GGCTCGGGGACTGGCTAGATCTGTG	NP_001158255.1	86	G	A	29	Gly	Glu
rs34500671	TCTGGCCGCGACAGCAGGTTCTGTTC[C/G]CAGGCAAAGTGCCGGAGGCTGCAGC	NP_064543.3	99	C	G	33	Cys	Trp
rs34589432	CCTCTGCTCTCTTCCCCAGCACCCCC[A/C]CAGGCAGCAGCAGCAGTTTTCATCT	NP_001158255.1	116	C	A	39	Pro	His
rs11549938	TCTCTGGAATCCCCATCTACCGCGTC[A/C]TGGACCGGCAAGGCCAGATCATCAA	NP_001158255.1	244	A	C	82	Met	Leu
rs34442879	GGGGAGTGCCGCCGCCCTGGACAACA[C/T]GGACCTGGTGTTTGGCCAGTACCGG	NP_001158255.1	452	C	T	151	Thr	Met
rs34956071	TAGGTGTGCTGATGTATCGGGACTAC[C/T]CCCTGGAACTATTCATGGCCCAGTG	NP_001158255.1	508	C	T	170	Pro	Ser
rs28940288	ACTTCGGCGAGGGGGCAGCCAGTGAG[A/G]GGGACGCCCATGCCGGCTTCAACTT	NP_001158255.1	730	G	A	244	Gly	Arg
rs137852874	CAGCCAGTGAGGGGGACGCCCATGCC[A/G]GCTTCAACTTCGCTGCCACACTTGA	NP_001158255.1	745	G	A	249	Gly	Ser
rs137852876	CTTGAGTGCCCCATCATCTTCTTCTG[C/G]CGGAACAATGGCTACGCCATCTCCA	NP_001158255.1	792	C	G	264	Cys	Trp
rs137852873	TTGAGTGCCCCATCATCTTCTTCTGC[C/T]GGAACAATGGCTACGCCATCTCCAC	NP_001158255.1	793	C	T	265	Arg	Trp
rs137852871	GTGTCCCCACAGCAGCACGAGGCCCC[A/G]GGTATGGCATCATGTCAATCCGCGT	NP_001158255.1	865	G	A	409	Phe	Cys
rs137852870	GCCACCTGCAGACCTACGGGGAGCAC[A/T]ACCCACTGGATCACTTCGATAAGTG	NP_001158255.1	1309	T	A	438	Tyr	Asn

The missense dbSNP mutation positions in the amino acid sequence are relative the RefSeq entry NP_001158255.1 and NP_064543.3, respectively.

The following sequence shows the mutations annotated in dbSNP which lead to disease phenotype. Mutations indicate a mismatching reference amino acid from dbSNP. MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR KQQESLARHLQTYGEHYPLDHFDK

Silent mutations annotated in dbSNP

RefSeq ID	RefCodon	Mutated Codon	Reference Allele	Mutated Allele	Mutation Frame	Codon Number	Reference Residue	Mutated Residue
rs17173144	TGC	TGT	C	T	3	5	I	I
rs34541442	TTA	ATA	C	A	1	12	R	R
rs75733136	ATC	ATA	C	A	3	19	S	S
rs34169026	AGC	AGT	C	T	3	32	A	A
rs62637712	CTG	CTT	C	T	3	38	P	P
rs80014754	CTG	CTA	C	A	3	39	P	P
rs11549937	GAC	GAC	G	C	3	97	L	L
rs10404506	GAA	GAT	C	T	3	213	I	I
rs114716391	TTC	TTT	G	T	3	216	A	A
rs61737367	AAC	AAT	C	T	3	280	R	R
rs284652	ACA	ACT	C	T	3	324	F	F
rs55940366	AAG	AAT	C	T	3	325	L	L
rs4674	GCC	GCG	A	G	3	407	L	L
rs34492894	CCC	CCT	C	T	3	419	L	L

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR KQQESLARHLQTYGEHYPLDHFDK

Point mutations can have an influence on the amino acids depending on the kind of the point mutation. There are two different types: synonymous and non-synonymous mutations.

If a point mutation is synonymous it means that the change occurs only in the nucleotide sequence but not in the amino acid sequence. This is possible because of the fact that amino acids are encoded by three nucleotides (codons) and some of the amino acids are encoded by more than one possible arrangement of nucleotides. So it can happen that when there is a mutation in the nucleotide sequence there is also a change in the codon but both codons encode the same amino acid.

The other possibility is that a mutation is non-synonymous which means that the mutation has an influence on the amino acid sequence and the amino acid changes. This change can have more or less severe effects because amino acids have several properties. When an amino acid is replaced by an amino acid which has the same properties the change is not as grave as the change to an amino acid with completely different properties.

Mutation Map

Reference Sequence Alignments

To map the different mutations from different sources onto the same sequence, first the reference sequences needed to be compared. Herefore we performed pairwise alignments for the following sequences:

NP_000700.1 and (source: Uniprot)

The alignment for these two sequences is perfect, the identity is 100%. This indicates that the reference sequence from dbSNP is the same one we were working with before.

NP_000700.1 and NP_064543.3

The alignment for these two sequences showed only 9.9% identity and 17.3% similarity, whereas the 63.2% are gaps. As this alignment is not good enough to assume similar sequences, the SNPs found with reference sequence NP_064543.3 are ignored.

NM_000709.3 and translated NP_000700.1

The alignment for the HGMD reference sequence and the translated dbSNP reference sequence shows 97.2% identity. The only difference in these sequences is a short oligopeptide at the beginning of the HGMD reference sequence. This oligo is 13 amino acids long. These 13 positions have to be taken into account when mapping the SNPs onto the same sequence. As we found out, the HGMD codon positions are relative to the start codon of the protein, so the signal peptide of 45 aa have to be taken into account (add 45 to codon position), but the additional 13 aa, by what the sequence differs from the dbSNP reference sequence can be ignored.

Disease causing SNPs in HGMD and dbSNP

The following table shows disease causing SNPs which were found in both HGMD and dbSNP.

RefSeq ID	RefCodon	Mutated Codon	Reference Allele	Mutated Allele	Mutation Frame	Codon Number	Reference Residue	Mutated Residue
rs34442879	GAG	GTG	C	T	2	151	T	M
rs137852874	TAC	AAC	G	A	1	249	G	S
rs137852873	AAC	TAC	C	T	1	265	R	W
rs137852871	ACC	ACC	G	A	1	290	G	R
rs137852875	TCA	TGA	C	G	2	310	T	R
rs137852872	GAG	GGG	T	G	2	409	F	C
rs137852870	CAG	AAG	T	A	1	438	Y	N

Mutation Map

The following sequence shows the protein BCKDHA, with disease causing mutation positions coloured as described below.

MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGN DVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLR KQQESLARHLQTYGEHYPLDHFDK

Colour code:

Mutations listed in both Databases

Mutations listed only in HGMD

Mutations listed only in dbSNP

Mutation Analysis

The extracted mutations from HGMD and dbSNP for the BCKDHA gene are analyzed in the following section:

Figure 1: Histogram showing the number of exchanges for each amino acid

Figure 1 shows how often each amino acid is exchanged. The amino acid mutated in most of the SNPs is arginine, followed by alanine and glycine. These three amino acids show almost no important biophysical properties and therefore seem to be more prone to amino acid exchanges. The only amino acids which were not involved in a mutation are histidine, lysine and tryptophan. This can be due to the uniqueness of the amino acid structure especially of histidine and tryptophan.

Figure 2: Frequency of mutations for the different positions in a codon frame

Figure 2 shows that most of the disease causing amino acid exchanges are due to mutations on the first two sites of a codon. Mutations on the third codon position lead more often to silent mutations. This observation agrees with the common opinion that the first two sites of each codon are generally irreplaceable and a mutation here leads to a different amino acid. The third position of a codon leads not always to a change in the amino acids and therefore these mutations are silent. This is due to the fact that the genetic code is degenerate<ref>F.H.C. Crick, Leslie Barnett, S. Brenner and R.J. Watts-Tobin: General Nature of the Genetic Code for Proteins, Nature(192), 1961</ref>. Most of the amino acids are coded for by more than one codon. These synonymous codons usually differ only in the last position<ref>http://en.wikipedia.org/wiki/Genetic_code#Degeneracy</ref>.

In the next step we are going to look at the frequency of each observed amino acid exchange in HGMD and dbSNP.

Figure 3: Heatmap for all missense mutations listed in HGMD and dbSNP, showing the frequency of amino acid exchanges for each pair of amino acids (x-axis: reference aa, y-axis: mutated aa, Ter: Stopp codon).

Figure 3 shows a heatmap for all amino acid pairs including the mutation leading to a stop codon. The amino acid exchanges which take place most often are:

Arg => Trp
Gly => Arg
Gly => Ser
Tyr => Asn

The physiochemical properties of these amino acid substitutions are quite different. The first mutation, Arg => Trp, introduces a very bulky, hydrophobic amino acid and the positive charge of arginine gets lost. The mutations of glycine are often harmful for a protein's function, as the amino acids unique smallness is advantageous in many positions<ref>M.O. Dayhoff, R.M. Schwartz, B.C. Orcutt: A Model of Evolutionary Change in Proteins, Atlas of Protein Sequence and Structure, 1978</ref>. The Tyr => Asn mutation exchanges an hydrophobic aromatic residue with a small polar amino acid. This substitution is also very likely to have an effect on the protein's function. Looking at the biochemical properties all these amino acid exchanges are not effectless. This is what we expected as the mutations were taken from HGMD and dbSNP, which listed many disease causing mutations.

References

go back to Maple syrup urine disease Main page

go back to Task 4 Homology based structure predictions

go to Task 6 Sequence based mutation analysis

Difference between revisions of "Mapping SNPs BCKDHA"

Latest revision as of 21:02, 25 August 2011

Contents

General

HGMD

dbSNP

SNPs in human

Mutation Map

Reference Sequence Alignments

Disease causing SNPs in HGMD and dbSNP

Mutation Map

Mutation Analysis

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools

@@ Line 2: / Line 2: @@
 Maple syrup urine disease is an autosomal recessive disorder that affects the amino acid metabolism. The disease is caused by a defect in the branched-chain alpha-keto acid dehydrogenase complex which blocks oxidative decarboxylation. The result is a rising concentration of branched-chain amino acids. MSUD is caused by mutations in the gene coding for the alpha subunit of the branched-chain keto acid dehydrogenase(BCKDHA).
+Reference Sequences: [[Reference_Sequence_BCKDHA]]
 == HGMD ==
-Searching for "BCKDHA" a total of 39 mutations are reported, comprised of the following mutation types:
+Searching the HGMD <ref>http://www.hgmd.cf.ac.uk/ac/index.php</ref> for "BCKDHA" a total of 39 mutations are reported, comprised of the following mutation types:
 *missense/nonsense: 33 mutations
 *small deletions: 3 mutations
@@ Line 14: / Line 18: @@
 For us the missense/nonsense mutations are the most interesting ones, as a single nucleotide change can lead to the phenotype of Maple Syrup Urine Disease.
-{|border="1"
+{|border="1"  style="text-align:center; border-spacing:0;"
 !Codon change
 !Amino Acid change
 !Codon number
+!Position in our reference sequence
 |-
-| gCAG-GAG || Gln-Glu||80
+| gCAG-GAG || Gln-Glu||80||125
 |-
-|ACG-ATG||THr-Met||106
+|ACG-ATG||Thr-Met||106||151
 |-
-|cCGG-TGG||Arg-Trp||114
+|cCGG-TGG||Arg-Trp||114||159
 |-
-|gTAT-AAAT||Tyr-Asn||121
+|gTAT-AAAT||Tyr-Asn||121||166
 |-
-|CGG-CAG||Arg-Gln||122
+|CGG-CAG||Arg-Gln||122||167
 |-
-|cCAG-AAG||Gln-Lys||145
+|cCAG-AAG||Gln-Lys||145||190
 |-
-|ATC-ACC||Ile-Thr||168
+|ATC-ACC||Ile-Thr||168||213
 |-
-|GCG-GTG||Ala-Val||171
+|GCG-GTG||Ala-Val||171||216
 |-
-|GCG-GTG||Ala-Val||175
+|GCG-GTG||Ala-Val||175||220
 |-
-|cGGC-AGC||Gly-Ser||204
+|cGGC-AGC||Gly-Ser||204||249
 |-
-|cGCT-ACT||Ala-Thr||208
+|cGCT-ACT||Ala-Thr||208||254
 |-
-|TGC-TAC||Cys-Thr||213
+|TGC-TAC||Cys-Thr||213||258
 |-
-|cCGG-TGG||Arg-Trp||220
+|cCGG-TGG||Arg-Trp||220||265
 |-
-|AAT-AGT||Asn-Ser||222
+|AAT-AGT||Asn-Ser||222||267
 |-
-|GGC-GAC||Gly-Asp||238
+|GGC-GAC||Gly-Asp||238||283
 |-
-|tGCA-CCA||Ala-Pro||240
+|tGCA-CCA||Ala-Pro||240||285
 |-
-|aCGA-TGA||Arg-Term||242
+|aCGA-TGA||Arg-Term||242||287
 |-
-|cGGG-AGG||Gly-Arg||245
+|cGGG-AGG||Gly-Arg||245||290
 |-
-|cCGC-TGC||Arg-Cys||252
+|cCGC-TGC||Arg-Cys||252||297
 |-
-|CGC-CAC||Arg-His||252
+|CGC-CAC||Arg-His||252||297
 |-
-|tGGT-AGT||Gly-Ser||255
+|tGGT-AGT||Gly-Ser||255||300
 |-
-|GAT-GCT||Asp-Ala||257
+|GAT-GCT||Asp-Ala||257||302
 |-
-|ACA-AGA||Thr-Arg||265
+|ACA-AGA||Thr-Arg||265||310
 |-
-|cCGA-TGA||Arg-Term||269
+|cCGA-TGA||Arg-Term||269||314
 |-
-|ATC-ACC||Ile-Thr||281
+|ATC-ACC||Ile-Thr||281||326
 |-
-|cGAG-AAG||Glu-Lys||282
+|cGAG-AAG||Glu-Lys||282||327
 |-
-|gGCC-ACC||Ala-Thr||283
+|gGCC-ACC||Ala-Thr||283||328
 |-
-|CGC-CAC||Arg-His||301
+|CGC-CAC||Arg-His||301||346
 |-
-|cCGG-TGG||Arg-Trp||318
+|cCGG-TGG||Arg-Trp||318||363
 |-
-|TTC-TGC||Phe-Cys||364
+|TTC-TGC||Phe-Cys||364||409
 |-
-|cGTG-ATG||Val-Met||367
+|cGTG-ATG||Val-Met||367||412
 |-
-|TAT-TGT||Tyr-Cys||368
+|TAT-TGT||Tyr-Cys||368||413
 |-
-|cTAC-AAC||Tyr-Asn||393
+|cTAC-AAC||Tyr-Asn||393||438
 |}
+The mutations are given for a reference sequence, which can be found under the accession number NM_000709.3. This is a nucleotide sequence, which was translated using the Expasy Translate tool<ref>http://expasy.org/tools/dna.html</ref> into a protein sequence.
+The following sequence shows the <font color =red>mutations</font> annotatied in HGMD:
+<code>
+MAVAIAAARVWRLNRGLSQAALLLLRQPGARGLARSHPPRQQQQFSSLDDKPQFPGASAEF<br>
+IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE<br>
+SQR<font color=red>Q</font>GRISFYMTNYGEEGTHVGSAAALDN<font color=red>T</font>DLVFGQY<font color=red>R</font>EAGVLM<font color=red>Y</font><font color=red>R</font>DYPLELFMAQCYGN<br>
+ISDLGKGR<font color=red>Q</font>MPVHYGCKERHFVTISSPLATQ<font color=red>I</font>PQ<font color=red>A</font>VGA<font color=red>A</font>YAAKRANANRVVICYFGEGAA<br>
+SEGDAHA<font color=red>G</font>FNF<font color=red>A</font>ATLE<font color=red>C</font>PIIFFC<font color=red>R</font>N<font color=red>N</font>GYAISTPTSEQYRGD<font color=red>G</font>I<font color=red>A</font>A<font color=red>R</font>GP<font color=red>G</font>YGIMSI<font color=red>R</font>VD<font color=red>G</font>N<br>
+<font color=red>D</font>VFAVYNA<font color=red>T</font>KEA<font color=red>R</font>RRAVAENQPFL<font color=red>I</font><font color=red>E</font><font color=red>A</font>MTYRIGHHSTSDDSSAY<font color=red>R</font>SVDEVNYWDKQDHPI<br>
+S<font color=red>R</font>LRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLL<font color=red>F</font>SD<font color=red>V</font><font color=red>Y</font>QEMPAQLR<br>
+KQQESLARHLQTYGEH<font color=red>Y</font>PLDHFDK
+</code>
 == dbSNP ==
+Searching dbSNP<ref>http://www.ncbi.nlm.nih.gov/projects/SNP/</ref> for SNPs in BCKDHA one gets the following number of results:
-'''results for SNPs in BCKDHA:'''
 -all: 742
 -human: 371
+After parsing the file for mutations in exons, 16 disease-causing mutations and 14 silent mutations remained. They are listed in the following tables.
 === SNPs in human ===
+''' Missense Mutations annotated in dbSNP'''
+<!--
+{|border="1" style="text-align:center; border-spacing:0;"
+!RefSeq ID
+!SNP
+!Reference Sequence
+!Nucleotide Position
+!Nt old
+!Nt new
+!Codon Number
+!Reference AA
+!Mutated AA
+|-
+| rs10853751 || TGGGCTCGGCGCGATGGAGGAGGAGA[C/T]GCATACTGACGCCAAAATCCGTGCT || NP_064543.3 || 14 || C || T || 5 || Thr || Met
+|-
+| rs111855817 || TGCCCTCCTGCTGCTGCGGCAGCCTG[A/G]GGCTCGGGGACTGGCTAGATCTGTG || NP_000700.1 || 86 || G || A || 29 || Gly || Glu
+|-
+| rs34500671 || TCTGGCCGCGACAGCAGGTTCTGTTC[C/G]CAGGCAAAGTGCCGGAGGCTGCAGC || NP_064543.3 || 99 || C || G || 33 || Cys || Trp
+|-
+| rs34589432 || CCTCTGCTCTCTTCCCCAGCACCCCC[A/C]CAGGCAGCAGCAGCAGTTTTCATCT || NP_000700.1 || 116 || C || A || 39 || Pro || His
+|-
+| rs11549938 || TCTCTGGAATCCCCATCTACCGCGTC[A/C]TGGACCGGCAAGGCCAGATCATCAA || NP_000700.1 || 244 || A || C || 82 || Met || Leu
+|-
+| rs34442879 || GGGGAGTGCCGCCGCCCTGGACAACA[C/T]GGACCTGGTGTTTGGCCAGTACCGG || NP_000700.1 || 452 || C || T || 151 || Thr || Met
+|-
+| rs34956071 || TAGGTGTGCTGATGTATCGGGACTAC[C/T]CCCTGGAACTATTCATGGCCCAGTG || NP_000700.1 || 508 || C || T || 170 || Pro || Ser
+|-
+| rs28940288 || ACTTCGGCGAGGGGGCAGCCAGTGAG[A/G]GGGACGCCCATGCCGGCTTCAACTT || NP_000700.1 || 730 || G || A || 244 || Gly || Arg
+|-
+| rs137852874 || CAGCCAGTGAGGGGGACGCCCATGCC[A/G]GCTTCAACTTCGCTGCCACACTTGA || NP_000700.1 || 745 || G || A || 249 || Gly || Ser
+|-
+| rs137852876 || CTTGAGTGCCCCATCATCTTCTTCTG[C/G]CGGAACAATGGCTACGCCATCTCCA || NP_000700.1 || 792 || C || G || 264 || Cys || Trp
+|-
+| rs137852873 || TTGAGTGCCCCATCATCTTCTTCTGC[C/T]GGAACAATGGCTACGCCATCTCCAC || NP_000700.1 || 793 || C || T || 265 || Arg || Trp
+|-
+| rs137852871 || GTGTCCCCACAGCAGCACGAGGCCCC[A/G]GGTATGGCATCATGTCAATCCGCGT || NP_000700.1 || 865 || G || A || 290 || Gly || Arg
+|-
+| rs137852875 || TGATGTGTTTGCCGTATACAACGCCA[C/G]AAAGGAGGCCCGACGGCGGGCTGTG || NP_000700.1 || 926 || C || G || 310 || Thr || Arg
+|-
+| rs61736656 || ATTACTGGGATAAACAGGACCACCCC[A/G]TCTCCCGGCTGCGGCACTATCTGCT || NP_000700.1 || 1078 || A || G || 361 || Ile || Val
+|-
+| rs137852872 || GCCCAAACCCAACCCCAACCTACTCT[G/T]CTCAGACGTGTATCAGGAGATGCCC || NP_000700.1 || 1223 || T || G || 409 || Phe || Cys
+|-
+| rs137852870 || GCCACCTGCAGACCTACGGGGAGCAC[A/T]ACCCACTGGATCACTTCGATAAGTG || NP_000700.1 || 1309 || T || A || 438 || Tyr || Asn
+|-
+|}
+-->
+{|border="1" style="text-align:center; border-spacing:0;"
+!RefSeq ID
+!SNP
+!Reference Sequence
+!Nucleotide Position
+!Nt old
+!Nt new
+!Codon Number
+!Reference AA
+!Mutated AA
+|-
+| rs10853751 || TGGGCTCGGCGCGATGGAGGAGGAGA[C/T]GCATACTGACGCCAAAATCCGTGCT || NP_064543.3 || 14 || C || T || <!--5-->6 || Thr || Met
+|-
+| rs111855817 || TGCCCTCCTGCTGCTGCGGCAGCCTG[A/G]GGCTCGGGGACTGGCTAGATCTGTG || NP_001158255.1 || 86 || G || A || 29 || Gly || Glu
+|-
+| rs34500671 || TCTGGCCGCGACAGCAGGTTCTGTTC[C/G]CAGGCAAAGTGCCGGAGGCTGCAGC || NP_064543.3 || 99 || C || G || 33 || Cys || Trp
+|-
+| rs34589432 || CCTCTGCTCTCTTCCCCAGCACCCCC[A/C]CAGGCAGCAGCAGCAGTTTTCATCT || NP_001158255.1 || 116 || C || A || 39 || Pro || His
+|-
+| rs11549938 || TCTCTGGAATCCCCATCTACCGCGTC[A/C]TGGACCGGCAAGGCCAGATCATCAA || NP_001158255.1 || 244 || A || C || 82 || Met || Leu
+|-
+| rs34442879 || GGGGAGTGCCGCCGCCCTGGACAACA[C/T]GGACCTGGTGTTTGGCCAGTACCGG || NP_001158255.1 || 452 || C || T || 151 || Thr || Met
+|-
+| rs34956071 || TAGGTGTGCTGATGTATCGGGACTAC[C/T]CCCTGGAACTATTCATGGCCCAGTG || NP_001158255.1 || 508 || C || T || 170 || Pro || Ser
+|-
+| rs28940288 || ACTTCGGCGAGGGGGCAGCCAGTGAG[A/G]GGGACGCCCATGCCGGCTTCAACTT || NP_001158255.1 || 730 || G || A || 244 || Gly || Arg
+|-
+| rs137852874 || CAGCCAGTGAGGGGGACGCCCATGCC[A/G]GCTTCAACTTCGCTGCCACACTTGA || NP_001158255.1 || 745 || G || A || 249 || Gly || Ser
+|-
+| rs137852876 || CTTGAGTGCCCCATCATCTTCTTCTG[C/G]CGGAACAATGGCTACGCCATCTCCA || NP_001158255.1 || 792 || C || G || 264 || Cys || Trp
+|-
+| rs137852873 || TTGAGTGCCCCATCATCTTCTTCTGC[C/T]GGAACAATGGCTACGCCATCTCCAC || NP_001158255.1 || 793 || C || T || 265 || Arg || Trp
+|-
+| rs137852871 || GTGTCCCCACAGCAGCACGAGGCCCC[A/G]GGTATGGCATCATGTCAATCCGCGT || NP_001158255.1 || 865 || G || A || <!--289->>290 || Gly || Arg
+|-
+| rs137852875 || TGATGTGTTTGCCGTATACAACGCCA[C/G]AAAGGAGGCCCGACGGCGGGCTGTG || NP_001158255.1 || 926 || C || G || 310 || Thr || Arg
+|-
+| rs61736656 || ATTACTGGGATAAACAGGACCACCCC[A/G]TCTCCCGGCTGCGGCACTATCTGCT || NP_001158255.1 || 1078 || A || G || 361 || Ile || Val
+|-
+| rs137852872 || GCCCAAACCCAACCCCAACCTACTCT[G/T]CTCAGACGTGTATCAGGAGATGCCC || NP_001158255.1 || 1223 || T || G || <!--410-->409 || Phe || Cys
+|-
+| rs137852870 || GCCACCTGCAGACCTACGGGGAGCAC[A/T]ACCCACTGGATCACTTCGATAAGTG || NP_001158255.1 || 1309 || T || A || 438 || Tyr || Asn
+|-
+|}
+The missense dbSNP mutation positions in the amino acid sequence are relative the RefSeq entry NP_001158255.1 and NP_064543.3, respectively.
+The following sequence shows the <font color=red> mutations</font> annotated in dbSNP which lead to disease phenotype. <font color=mediumpurple>Mutations</font> indicate a mismatching reference amino acid from dbSNP.
+<code>
+MAVA<font color=mediumpurple>I</font>AAARVWRLNRGLSQAALLLLRQP<font color=red>G</font>ARG<font color=mediumpurple>L</font>ARSHP<font color=red>P</font>RQQQQFSSLDDKPQFPGASAEF<br>
+IDKLEFIQPNVISGIPIYRV<font color=red>M</font>DRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE<br>
+SQRQGRISFYMTNYGEEGTHVGSAAALDN<font color=red>T</font>DLVFGQYREAGVLMYRDY<font color=red>P</font>LELFMAQCYGN<br>
+ISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAA<br>
+SE<font color=red>G</font>DAHA<font color=red>G</font>FNFAATLECPIIFF<font color=red>C</font><font color=red>R</font>NNGYAISTPTSEQYRGDGIAARG<font color=mediumpurple>P</font>GYGIMSIRVDGN<br>
+DVFAVYN<font color=mediumpurple>A</font>TKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDH<font color=mediumpurple>P</font>I<br>
+SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLL<font color=red>F</font>SDVYQEMPAQLR<br>
+KQQESLARHLQTYGEH<font color=red>Y</font>PLDHFDK
+</code>
+'''Silent mutations annotated in dbSNP'''
+{|border="1" style="text-align:center; border-spacing:0;"
+!RefSeq ID
+!RefCodon
+!Mutated Codon
+!Reference Allele
+!Mutated Allele
+!Mutation Frame
+!Codon Number
+!Reference Residue
+!Mutated Residue
+|-
+|rs17173144 || TGC || TGT || C || T || 3 || 5 || I || I
+|-
+|rs34541442 || TTA || ATA || C || A || 1 || 12 || R || R
+|-
+|rs75733136 || ATC || ATA || C || A || 3 || 19 || S || S
+|-
+|rs34169026 || AGC || AGT || C || T || 3 || 32 || A || A
+|-
+|rs62637712 || CTG || CTT || C || T || 3 || 38 || P || P
+|-
+|rs80014754 || CTG || CTA || C || A || 3 || 39 || P || P
+|-
+|rs11549937 || GAC || GAC || G || C || 3 || 97 || L || L
+|-
+|rs10404506 || GAA || GAT || C || T || 3 || 213 || I || I
+|-
+|rs114716391 || TTC || TTT || G || T || 3 || 216 || A || A
+|-
+|rs61737367 || AAC || AAT || C || T || 3 || 280 || R || R
+|-
+|rs284652 || ACA || ACT || C || T || 3 || 324 || F || F
+|-
+|rs55940366 || AAG || AAT || C || T || 3 || 325 || L || L
+|-
+|rs4674 || GCC || GCG || A || G || 3 || 407 || L || L
+|-
+|rs34492894 || CCC || CCT || C || T || 3 || 419 || L || L
+|-
+|}
+<code>
+MAVA<font color=lightsalmon>I</font>AAARVW<font color=lightsalmon>R</font>LNRGLS<font color=mediumpurple>Q</font>AALLLLRQPGAR<font color=mediumpurple>G</font>LARSH<font color=lightsalmon>P</font><font color=lightsalmon>P</font>RQQQQFSSLDDKPQFPGASAEF<br>
+IDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPH<font color=lightsalmon>L</font>PKEKVLKLYKSMTLLNTMDRILYE<br>
+SQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGN<br>
+ISDLGKGRQMPVHYGCKERHFVTISSPLATQ<font color=lightsalmon>I</font>PQ<font color=lightsalmon>A</font>VGAAYAAKRANANRVVICYFGEGAA<br>
+SEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQY<font color=lightsalmon>R</font>GDGIAARGPGYGIMSIRVDGN<br>
+DVFAVYNATKEARRRAVAENQP<font color=lightsalmon>F</font><font color=lightsalmon>L</font>IEAMTYRIGHHSTSDDSSAYRSVDEVNYWDKQDHPI<br>
+SRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPN<font color=lightsalmon>L</font>LFSDVYQEMPA<font color=mediumpurple>Q</font>LR<br>
+KQQESLARHLQTYGEHYPLDHFDK
+</code>
+<!--
+'''results for the gene search with BCKDHA'''
 {|border="1"
+!mRNA pos
-!ID
+!aa position
-!mutation in sequence
+!function
-!amino acid
+!mutation
-!position
+!aa change
 |-
+|54||5||synonymous||C/T|| -
-|rs137852876||CTTGAGTGCCCCATCATCTTCTTCTG[C/G]CGGAACAATGGCTACGCCATCTCCA||Ser||pos=251
 |-
+|72 ||12 ||synonymous||C/A || -
-|rs137852875||TGATGTGTTTGCCGTATACAACGCCA[C/G]AAAGGAGGCCCGACGGCGGGCTGTG||Ser||pos=251
 |-
+|125||29||missense||G/A||Gly/Glu
-|rs137852874||CAGCCAGTGAGGGGGACGCCCATGCC[A/G]GCTTCAACTTCGCTGCCACACTTGA||Arg||pos=251
 |-
+|153 || 83 || synonymous || C/G, C/T || -
-|rs137852873||TTGAGTGCCCCATCATCTTCTTCTGC[C/T]GGAACAATGGCTACGCCATCTCCAC||Tyr||pos=251
 |-
+|155|| 39 || missense|| C/A||Pro/His
-|rs137852872||GCCCAAACCCAACCCCAACCTACTCT[G/T]CTCAGACGTGTATCAGGAGATGCCC||Lys||pos=251
 |-
+|156||39 ||synonymous ||C/A || -
-|rs137852871||GTGTCCCCACAGCAGCACGAGGCCCC[A/G]GGTATGGCATCATGTCAATCCGCGT||Arg||pos=251
+|-
+|238||82||missense||A/C||Met/Leu
+|-
+|330||97|| synonymous|| G/C|| -
+|-
+|491||151||missense||C/T||Thr/Met
+|-
+|547||170||missense||C/T||Pro/Ser
+|-
+|678||213||synonymous||C/T || -
+|-
+|687||216||synonymous||G/T|| -
+|-
+|769||244||missense||G/A ||Gly/Arg
+|-
+|879||280||synonymous||C/T|| -
+|-
+|1011||324||synonymous||C/T|| -
+|-
+|1012||325 ||frame shift||C/ || -
+|-
+|1014||325||synonymous 	||C/T|| -
+|-
+|1120||361||missense||A/G||Ile/Val
+|-
+|1260||407||synonymous||A/G|| -
+|-
+|1299||420 ||synonymous||C/T || -
 |}
+-->
+Point mutations can have an influence on the amino acids depending on the kind of the point mutation. There are two different types: synonymous and non-synonymous mutations.
+If a point mutation is synonymous it means that the change occurs only in the nucleotide sequence but not in the amino acid sequence. This is possible because of the fact that amino acids are encoded by three nucleotides (codons) and some of the amino acids are encoded by more than one possible arrangement of nucleotides. So it can happen that when there is a mutation in the nucleotide sequence there is also a change in the codon but both codons encode the same amino acid.
+The other possibility is that a mutation is non-synonymous which means that the mutation has an influence on the amino acid sequence and the amino acid changes. This change can have more or less severe effects because amino acids have several properties. When an amino acid is replaced by an amino acid which has the same properties the change is not as grave as the change to an amino acid with completely different properties.
+== Mutation Map ==
+=== Reference Sequence Alignments ===
+To map the different mutations from different sources onto the same sequence, first the reference sequences needed to be compared. Herefore we performed pairwise alignments for the following sequences:
+* NP_000700.1 and (source: Uniprot)
+The [[Reference_Alignment_BCKDHA#Alignment NP000700.1 and Uniprot |alignment]] for these two sequences is perfect, the identity is 100%. This indicates that the reference sequence from dbSNP is the same one we were working with before.
+* NP_000700.1 and  NP_064543.3
+The [[Reference_Alignment_BCKDHA#Alignment NP000700.1 and NP_064543.3 |alignment]] for these two sequences showed only 9.9% identity and 17.3% similarity, whereas the 63.2% are gaps. As this alignment is not good enough to assume similar sequences, the SNPs found with reference sequence NP_064543.3 are ignored.
+* NM_000709.3 and translated NP_000700.1
+The [[Reference_Alignment_BCKDHA#Alignment NP_000709.3 and NP000700.1|alignment]] for the HGMD reference sequence and the translated dbSNP reference sequence shows 97.2% identity. The only difference in these sequences is a short oligopeptide at the beginning of the HGMD reference sequence. This oligo is 13 amino acids long. These 13 positions have to be taken into account when mapping the SNPs onto the same sequence. As we found out, the HGMD codon positions are relative to the start codon of the protein, so the signal peptide of 45 aa have to be taken into account (add 45 to codon position), but the additional 13 aa, by what the sequence differs from the dbSNP reference sequence can be ignored.
+=== Disease causing SNPs in HGMD and dbSNP ===
+The following table shows disease causing SNPs which were found in both HGMD and dbSNP.
+{|border="1" style="text-align:center; border-spacing:0;"
+!RefSeq ID
+!RefCodon
+!Mutated Codon
+!Reference Allele
+!Mutated Allele
+!Mutation Frame
+!Codon Number
+!Reference Residue
+!Mutated Residue
+|-
+|rs34442879 || GAG || GTG || C || T || 2 || 151 || T || M
+|-
+|rs137852874 || TAC || AAC || G || A || 1 || 249 || G || S
+|-
+|rs137852873 || AAC || TAC || C || T || 1 || 265 || R || W
+|-
+|rs137852871 || ACC || ACC || G || A || 1 || 290 || G || R
+|-
+|rs137852875 || TCA || TGA || C || G || 2 || 310 || T || R
+|-
+|rs137852872 || GAG || GGG || T || G || 2 || 409 || F || C
+|-
+|rs137852870 || CAG || AAG || T || A || 1 || 438 || Y || N
+|-
+|}
+=== Mutation Map ===
+The following sequence shows the protein BCKDHA, with disease causing mutation positions coloured as described below.
+<code>
+MAVAIAAARVWRLNRGLSQAALLLLRQP<font color=darkorange>G</font>ARGLARSHP<font color=darkorange>P</font>RQQQQFSSLDDKPQFPGASAEF<br>
+IDKLEFIQPNVISGIPIYRV<font color=darkorange>M</font>DRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYE<br>
+SQR<font color=blue>Q</font>GRISFYMTNYGEEGTHVGSAAALDN<font color=red>T</font>DLVFGQY<font color=blue>R</font>EAGVLM<font color=blue>Y</font><font color=blue>R</font>DY<font color=darkorange>P</font>LELFMAQCYGN<br>
+ISDLGKGR<font color=blue>Q</font>MPVHYGCKERHFVTISSPLATQ<font color=blue>I</font>PQ<font color=blue>A</font>VGA<font color=blue>A</font>YAAKRANANRVVICYFGEGAA<br>
+SE<font color=darkorange>G</font>DAHA<font color=red>G</font>FNF<font color=blue>A</font>ATLE<font color=blue>C</font>PIIFF<font color=darkorange>C</font><font color=red>R</font>N<font color=blue>N</font>GYAISTPTSEQYRGD<font color=blue>G</font>I<font color=blue>A</font>A<font color=blue>R</font>GP<font color=red>G</font>YGIMSI<font color=blue>R</font>VD<font color=blue>G</font>N<br>
+<font color=blue>D</font>VFAVYNA<font color=red>T</font>KEA<font color=blue>R</font>RRAVAENQPFL<font color=blue>I</font><font color=blue>E</font><font color=blue>A</font>MTYRIGHHSTSDDSSAY<font color=blue>R</font>SVDEVNYWDKQDHP<font color=darkorange>I</font><br>
+S<font color=blue>R</font>LRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLL<font color=red>F</font>SD<font color=blue>V</font><font color=blue>Y</font>QEMPAQLR<br>
+KQQESLARHLQTYGEH<font color=red>Y</font>PLDHFDK
+</code>
+Colour code:
+<font color=red>Mutations listed in both Databases</font>
+<font color=blue>Mutations listed only in HGMD</font>
+<font color=darkorange>Mutations listed only in dbSNP</font>
+== Mutation Analysis ==
+The extracted mutations from HGMD and dbSNP for the BCKDHA gene are analyzed in the following section:
+[[File:BCKDHA_AAexchanges.png|thumb|400px|center|Figure 1: Histogram showing the number of exchanges for each amino acid]]
+Figure 1 shows how often each amino acid is exchanged. The amino acid mutated in most of the SNPs is arginine, followed by alanine and glycine. These three amino acids show almost no important biophysical properties and therefore seem to be more prone to amino acid exchanges.
+The only amino acids which were not involved in a mutation are histidine, lysine and tryptophan. This can be due to the uniqueness of the amino acid structure especially of histidine and tryptophan.
+[[File:BCKDHA_frameMutations.png|thumb|400px|center|Figure 2: Frequency of mutations for the different positions in a codon frame]]
+Figure 2 shows that most of the disease causing amino acid exchanges are due to mutations on the first two sites of a codon. Mutations on the third codon position lead more often to silent mutations. This observation agrees with the common opinion that the first two sites of each codon are generally irreplaceable and a mutation here leads to a different amino acid. The third position of a codon leads not always to a change in the amino acids and therefore these mutations are silent. This is due to the fact that the genetic code is degenerate<ref>F.H.C. Crick, Leslie Barnett, S. Brenner and R.J. Watts-Tobin: General Nature of the Genetic Code for Proteins, Nature(192), 1961</ref>. Most of the amino acids are coded for by more than one codon. These synonymous codons usually differ only in the last position<ref>http://en.wikipedia.org/wiki/Genetic_code#Degeneracy</ref>.
+In the next step we are going to look at the frequency of each observed amino acid exchange in HGMD and dbSNP.
+[[File:BCKDHA_heatmap.png|thumb|400px|center|Figure 3: Heatmap for all missense mutations listed in HGMD and dbSNP, showing the frequency of amino acid exchanges for each pair of amino acids (x-axis: reference aa, y-axis: mutated aa, Ter: Stopp codon).]]
+Figure 3 shows a heatmap for all amino acid pairs including the mutation leading to a stop codon. The amino acid exchanges which take place most often are:
+* Arg => Trp
+* Gly => Arg
+* Gly => Ser
+* Tyr => Asn
+The physiochemical properties of these amino acid substitutions are quite different. The first mutation, Arg => Trp, introduces a very bulky, hydrophobic amino acid and the positive charge of arginine gets lost. The mutations of glycine are often harmful for a protein's function, as the amino acids unique smallness is advantageous in many positions<ref>M.O. Dayhoff, R.M. Schwartz, B.C. Orcutt: A Model of Evolutionary Change in Proteins, Atlas of Protein Sequence and Structure, 1978</ref>. The Tyr => Asn mutation exchanges an hydrophobic aromatic residue with a small polar amino acid. This substitution is also very likely to have an effect on the protein's function. Looking at the biochemical properties all these amino acid exchanges are not effectless. This is what we expected as the mutations were taken from HGMD and dbSNP, which listed many disease causing mutations.
+=== References ===
+<references/>
+go back to [[Maple syrup urine disease]] Main page
+go back to Task 4 [[Homology_based_structure_predictions_BCKDHA| Homology based structure predictions]]
+go to Task 6 [[Sequence-based_mutation_analysis_BCKDHA | Sequence based mutation analysis]]