Difference between revisions of "Glucocerebrosidase mapping snps"

Revision as of 11:14, 17 August 2011

General

HGMD

The HGMD is the Human Gene Mutation Database, which contains germline mutations that are linked to human diseases. There are several types of mutations:

missense/nonsense: codon codes for a different amino acid/premature stop codon
splicing: a mutation that causes splicing
regulatory: mutation affecting the regulation of gene expression
small/gross deletions: mutation that deletes residues
small/gross insertions: mutation that inserts residues
small indels: insertion or deletion (maybe not recognizable)
duplications: duplicated sequence pieces
complex rearrangements: part of the sequence is placed somewhere else
repeat variations: repeated varied parts of the sequence are placed somewhere else

dbSNP

The dbSNP is the Single Nucleotide Polymorphism Database by the NCBI together with the National Human Genome Research Institute (NHGRI), which was built up 1998. <ref>http://en.wikipedia.org/wiki/DbSNP</ref> It contains several types of mutations for 55 organisms including Homo Sapiens:

SNPs (single nucleotide polymorphisms)
MNPs (multinucleotide polymorphisms)
small deletions
small insertions
small indels
short tandem repeats (STRs)

HGMD: Mutations for GBA

Overview

To get the different mutation types for the GBA gene, which is the gene causing Gaucher Disease, we searched HMGD for GBA. As result, we got a list with the different types of mutations found for GBA:

mutation type	number of mutations
missense/nonsense	236
splicing	13
regulatory	0
small deletions	23
small insertions	13
small indels	2
gross deletions	3
sross insertions/duplications	0
complex rearrangements	13
repeat variations	0
public total (HGMD Professional 2011.1 total)	303 (353)

In this case, the missense/nonsense mutations are of interest, as they cause a change in the amino acid sequence. Such single point mutations seem to be responsible for Gaucher Disease, so the analysis is focused on them.

Missense/nonsense mutations given for GBA

The following table provides a detailed overview of the 236 missense/nonsense mutations found in GBA:

Codon change	Amino acid change	Codon number
AAG-AGG	Lys-Arg	-27
TGGg-TGA	Trp-Term	-4
cGGC-AGC	Gly-Ser	10
AGC-ATC	Ser-Ile	12
gGTG-ATG	Val-Met	15
gGTG-CTG	Val-Leu	15
TGT-TCT	Cys-Ser	16
tGAC-AAC	Asp-Asn	24
tGGT-AGT	Gly-Ser	35
cTTC-GTC	Phe-Val	37
tGAG-AAG	Glu-Lys	41
AGT-AAT	Ser-Asn	42
ACA-ATA	Thr-Ile	43
GGG-GAG	Gly-Glu	46
gCGA-TGA	Arg-Term	47
aCGG-TGG	Arg-Trp	48
CGG-CAG	Arg-Gln	48
CTA-CCA	Leu-Pro	66
aCAG-TAG	Gln-Term	73
gAAG-TAG	Lys-Term	74
GTG-GCG	Val-Ala	78
AAGg-AAC	Lys-Asn	79
ATG-ACG	Met-Thr	85
tGCT-ACT	Ala-Thr	90
CTT-CGT	Leu-Arg	105
TCG-TTG	Ser-Leu	107
cTTC-GTC	Phe-Val	109
tGAA-AAA	Glu-Lys	111
GGA-GAA	Gly-Glu	113
tAAC-GAC	Asn-Asp	117
ATC-ACC	Ile-Thr	119
ATC-AGC	Ile-Ser	119
cCGG-TGG	Arg-Trp	120
CGG-CAG	Arg-Gln	120
GTA-GCA	Val-Ala	121
aCCC-TCC	Pro-Ser	122
CCC-CTC	Pro-Leu	122
ATG-ACG	Met-Thr	123
cATG-GTG	Met-Val	123
GAC-GTC	Asp-Val	127
cCGC-TGC	Arg-Cys	131
CGC-CTC	Arg-Leu	131
ACC-ATC	Thr-Ile	134
cACC-CCC	Thr-Pro	134
TATg-TAG	Tyr-Term	135
GCA-GAA	Ala-Glu	136
tGAT-CAT	Asp-His	140
GAA-GCA	Glu-Ala	152
AAGa-AAT	Lys-Asn	157
cAAG-CAG	Lys-Gln	157
aCCC-ACC	Pro-Thr	159
CCC-CTC	Pro-Leu	159
ATT-AAT	Ile-Asn	161
ATT-AGT	Ile-Ser	161
CAC-CCC	His-Pro	162
cCGA-TGA	Arg-Term	163
cCAG-TAG	Gln-Term	169
CGT-CCT	Arg-Pro	170
gCGT-TGT	Arg-Cys	170
TCA-TGA	Ser-Term	173
aCTC-TTC	Leu-Phe	174
CTC-CCC	Leu-Pro	174
GCC-GAC	Ala-Asp	176
cCCC-TCC	Pro-Ser	178
TGG-TAG	Trp-Term	179
gACA-CCA	Thr-Pro	180
aCCC-ACC	Pro-Thr	182
CCC-CTC	Pro-Leu	182
tTGG-CGG	Trp-Arg	184
gCTC-TTC	Leu-Phe	185
AAT-AGT	Asn-Ser	188
AATg-AAG	Asn-Lys	188
GGA-GTA	Gly-Val	189
aGCG-ACG	Ala-Thr	190
GCG-GAG	Ala-Glu	190
GTG-GAG	Val-Glu	191
GTG-GGG	Val-Gly	191
GGG-GAG	Gly-Glu	195
gGGG-TGG	Gly-Trp	195
gTCA-CCA	Ser-Pro	196
aCTC-TTC	Leu-Phe	197
CTC-CCC	Leu-Pro	197
AAG-ACG	Lys-Thr	198
cAAG-GAG	Lys-Glu	198
cGGA-AGA	Gly-Arg	202
GGA-GAA	Gly-Glu	202
TAC-TGC	Tyr-Cys	205
cTGG-CGG	Trp-Arg	209
GCC-GTC	Ala-Val	210
aTAC-CAC	Tyr-His	212
cTTT-ATT	Phe-Ile	213
TTT-TGT	Phe-Cys	213
gTTC-GTC	Phe-Val	216
TTC-TAC	Phe-Tyr	216
TAT-TGT	Tyr-Cys	220
ACA-AGA	Thr-Arg	231
GAAa-GAC	Glu-Asp	233
tGAA-TAA	Glu-Term	233
tTCT-CCT	Ser-Pro	237
GGG-GTG	Gly-Val	239
GGA-GTA	Gly-Val	243
aTAC-CAC	Tyr-His	244
CCC-CAC	Pro-His	245
TTCa-TTA	Phe-Leu	251
CATc-CAG	His-Gln	255
CGA-CAA	Arg-Gln	257
gCGA-TGA	Arg-Term	257
TTCa-TTA	Phe-Leu	259
ATT-ACT	Ile-Thr	260
GGT-GAT	Gly-Asp	265
CCT-CGT	Pro-Arg	266
CCT-CTT	Pro-Leu	266
tCCT-GCT	Pro-Ala	266
AGT-AAT	Ser-Asn	271
CTC-CCC	Leu-Pro	279
aCGC-TGC	Arg-Cys	285
CGC-CAC	Arg-His	285
CCC-CTC	Pro-Leu	289
aCTG-TTG	Leu-Leu	296
AAA-ATA	Lys-Ile	303
TAT-TGT	Tyr-Cys	304
TATg-TAG	Tyr-Term	304
tGTT-CTT	Val-Leu	305
GCT-GTT	Ala-Val	309
CAT-CGT	His-Arg	311
TGGt-TGT	Trp-Cys	312
tTGG-CGG	Trp-Arg	312
gTAC-CAC	Tyr-His	313
gGAC-CAC	Asp-His	315
GCT-GAT	Ala-Asp	318
tCCA-GCA	Pro-Ala	319
ACC-ATC	Thr-Ile	323
CTA-CAA	Leu-Gln	324
CTA-CCA	Leu-Pro	324
aGGG-AGG	Gly-Arg	325
aGGG-TGG	Gly-Trp	325
gGAG-AAG	Glu-Lys	326
cCGC-TGC	Arg-Cys	329
TTC-TCC	Phe-Ser	331
CTC-CCC	Leu-Pro	336
gGCC-ACC	Ala-Thr	341
cTGT-CGT	Cys-Arg	342
cTGT-GGT	Cys-Gly	342
TGT-TAT	Cys-Tyr	342
cTGG-GGG	Trp-Gly	348
gGAG-AAG	Glu-Lys	349
gCAG-TAG	Gln-Term	350
tGTG-CTG	Val-Leu	352
gCGG-GGG	Arg-Gly	353
gCGG-TGG	Arg-Trp	353
GGC-GAC	Gly-Asp	355
TCC-TTC	Ser-Phe	356
TGG-TAG	Trp-Term	357
CGA-CAA	Arg-Gln	359
tCGA-TGA	Arg-Term	359
ATGc-ATA	Met-Ile	361
TAC-TGC	Tyr-Cys	363
AGC-AAC	Ser-Asn	364
AGC-ACC	Ser-Thr	364
cAGC-CGC	Ser-Arg	364
AGC-AAC	Ser-Asn	366
AGC-ACC	Ser-Thr	366
cAGC-GGC	Ser-Gly	366
ACG-ATG	Thr-Met	369
AAC-AGC	Asn-Ser	370
AACc-AAA	Asn-Lys	370
cCTC-GTC	Leu-Val	371
tGTG-TTG	Val-Leu	375
cGGC-AGC	Gly-Ser	377
cTGG-GGG	Trp-Gly	378
TGG-TAG	Trp-Term	378
cGAC-AAC	Asp-Asn	380
cGAC-CAC	Asp-His	380
GAC-GCC	Asp-Ala	380
TGG-TAG	Trp-Term	381
AACc-AAA	Asn-Lys	382
CTT-CGT	Leu-Arg	383
CTG-CCG	Leu-Pro	385
CCC-CTC	Pro-Leu	387
cGAA-TAA	Glu-Term	388
GGA-GAA	Gly-Glu	389
aGGA-AGA	Gly-Arg	390
CCC-CTC	Pro-Leu	391
AAT-ATT	Asn-Ile	392
TGG-TTG	Trp-Leu	393
tTGG-AGG	Trp-Arg	393
gGTG-TTG	Val-Leu	394
CGT-CCT	Arg-Pro	395
gCGT-TGT	Arg-Cys	395
AAC-ACC	Asn-Thr	396
TTT-TCT	Phe-Ser	397
tGTC-ATC	Val-Ile	398
tGTC-CTC	Val-Leu	398
tGTC-TTC	Val-Phe	398
cGAC-AAC	Asp-Asn	399
cGAC-TAC	Asp-Tyr	399
CCC-CTC	Pro-Leu	401
ATC-ACC	Ile-Thr	402
cATC-TTC	Ile-Phe	402
GAC-GGC	Asp-Gly	409
GAC-GTC	Asp-Val	409
gGAC-CAC	Asp-His	409
gTTT-ATT	Phe-Ile	411
tTAC-CAC	Tyr-His	412
cAAA-CAA	Lys-Gln	413
aCAG-TAG	Gln-Term	414
CAG-CGG	Gln-Arg	414
CCC-CGC	Pro-Arg	415
cATG-GTG	Met-Val	416
gTTC-GTC	Phe-Val	417
TAC-TGC	Tyr-Cys	418
GGC-GAC	Gly-Asp	421
cAAG-GAG	Lys-Glu	425
AGAg-AGT	Arg-Ser	433
gAGA-GGA	Arg-Gly	433
CTG-CCG	Leu-Pro	444
CTG-CGG	Leu-Arg	444
cGCA-CCA	Ala-Pro	446
CAT-CGT	His-Arg	451
tGCT-CCT	Ala-Pro	456
cGTG-ATG	Val-Met	460
CTA-CCA	Leu-Pro	461
AAC-AGC	Asn-Ser	462
AACc-AAG	Asn-Lys	462
cCGC-TGC	Arg-Cys	463
CGC-CAC	Arg-His	463
CGC-CCC	Arg-Pro	463
gGAT-TAT	Asp-Tyr	474
gGGC-AGC	Gly-Ser	478
CTG-CCG	Leu-Pro	480
cTCC-CCC	Ser-Pro	488
ATT-ACT	Ile-Thr	489
ACC-ATC	Thr-Ile	491
CGC-CAC	Arg-His	496
tCGC-TGC	Arg-Cys	496
CAG-CGG	Gln-Arg	497

Sequence

For mapping the mutations to the sequence we used the one of the given accession number NM_001005741.1. That is exactly the sequence we also used for our interpretations before. With the help of a Perl script we generated the following sequences and marked the given mutations.

Positions where mutations occur

>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3 MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ

Posistions for possible missense/nonsense mutations are marked red.

Possible mutated amino acid residues

The following sequence shows the different possibilities for mutated residues. As there are different mutations for the same position, all changed residues are shown, each in a separate line.

>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3 MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES MEFSSPSREECPRPLSRVSIMAGSLTGLLLLQAVS!ASGARPCIPKSFSYISVMSVCNATYCNSFDPPTFPALSTVSRYKN MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVLCVCNATYCDSFDPPTFPALGTFSRYES MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM IRSE!WMELSMGPIQANHTGTGLPLTLQPE!!FQKANGFGGATTDAATLNILALSPPAQNLLRKLYVSKEEIGYDITWAST TRSGRQMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNISQVLV TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI ASCVFSICTYI!EDTPHDFQLHNFSLPEADTKLNITLNP!ALQLA!PPV!FLDSS!PSTTRFKTSVTENGKEPFTGQPRDI ASCDFSILTYPYADTPDDFQLHNFSLPEEDTKLQILLSHRALQLAQCPVSPLASPWTSLTWLKTKGEGNGKWSPEGQPEDI ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR CHQTRVRHIVKVLDACAEHKLQFWAVRADNEPPAVLLSVHHFQCLGLTPEQQQDLTARDLDRTLANNTHHNVRLPMLDDQC YHQTWARYCVKYLDAYAEHKLQFWAVTA!NEPSAGLLSGYPFQCLGFTPEHQ!DFIARDLGLTLANSTHHNVRLLMLDDQH YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGATLANSTHHNVRLLMLDDQR LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS LLLLHWAKVVLTDPEAAICLHGIVVRCHLHFLDAAKAIQRKTHCLSPNTMPFASETRVGSKFGK!SLGLDF!DQGIQCNHN LLLPHWAKVVLTDPEAAK!VHGIAVHRYLDFLAPAKATPWETHRLFPNTMLFASEAGVGSKFWEQSVWLGSWD!GMQYTHT LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEAYVGSKFWEQSVRLGSWDRGMQYRHG IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV IIMSVLYHLVSGTN!KRAPNL!ERLILLPTSINSLTIVDITKGTIHQ!RVVCHLDHFSEFIPEGSQSVGLVASQKNDPDPV IITKLLYHVVG!THWNLALNPEGGPNRVCNFLYSPFIVDITKVTFYKRPMFYHLGHFSKFIPEGSQGVGLVASQKNDRDAV IITNLLYHVVGWTAWNLALNPEGGPNWVRNFFDSPIIVDITKHTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ ALMRPDGSPVVVMPSCSSKDVPLTIKYPAVSFPETISPGYPTHIYLWRHR ALMHPDGSAVVVVLKHSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRCQ ALMHPDGSAVVVVLNPSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ

The first row shows the original sequence, the second, third and fourth line show mutated residues. Positions for possible missense/nonsense mutations are marked red.

dbSNP: Mutations for GBA

dbSNP was searched for synonymous mutations as well as for missense mutations. Synonymous mutations do not have an influence on the resulting amino acid which means that the residue remains the same after the mutation. The output of dbSNP was also parsed with a Perl script, where we used FlatFile and the gene map. The positions where the same as in our reference sequence, so we could use them again.

Synonymous Mutations in GBA

In the following table the synonymous mutations for GBA are listed:

ID	mutated allele	amino acid	codon position	amino acid position
rs78297361	T	R	3	535
rs77130994	A	G	3	517
rs1135675	C	V	3	499
rs12747811	A	Q	3	471
rs79226895	A	K	3	464
rs78346899	T	Y	3	451
rs75034092	A	G	3	416
rs74498117	G	L	3	410
rs1141826	A	T	3	408
rs75391747	A	E	3	388
rs80317710	A	E	3	365
rs79311125	T	Y	3	352
rs1064647	T	G	3	346
rs1064646	G	K	3	342
rs74486098	A	K	3	237
rs76158190	C	S	3	235
rs75370695	A	A	3	229
rs76682322	T	L	3	224
rs76727497	A	P	3	221
rs76717906	T	P	3	217
rs78659905	T	L	3	213
rs77916306	A	P	3	198
rs77191198	A	T	3	173
rs74572011	T	R	3	170
rs79767521	T	P	3	161
rs75249684	C	R	3	159
rs79175920	A	A	3	129
rs1141821	C	T	3	100
rs1141816	A	G	3	93
rs78669556	C	R	3	87
rs1141810	C	S	3	81
rs76337315	A	E	3	80
rs1141807	C	Y	3	79

Sequence

>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3 MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ

Positions for possible synonymous mutations are marked blue.

Missense Mutations in GBA

In the following table the missense mutations are listed:

ID	mutated allele	amino acid	codon position	amino acid position
rs75822236	A	H	2	535
rs78016673	T	I	2	530
rs77409925	G	E	3	513
rs113825752	C	P	2	509
rs76071730	G	R	2	490
rs74752878	G	C	2	457
rs79185870	G	L	3	456
rs80020805	A	I	3	455
rs77035024	A	L	3	450
rs78802049	G	E	3	448
rs75564605	C	T	2	441
rs75090908	G	E	3	438
rs75243000	C	S	2	436
rs75385858	C	T	2	435
rs77738682	T	I	2	431
rs76910485	T	L	2	430
rs78715199	A	E	3	419
rs77284004	C	A	2	419
rs76014919	T	C	3	417
rs2230289	T	M	2	408
rs75528494	A	R	3	405
rs76228122	G	C	2	402
rs74979486	A	Q	2	398
rs11558184	A	Q	2	392
rs1064648	A	H	2	368
rs78188205	A	D	2	357
rs77321207	G	C	2	343
rs77714449	T	I	2	342
rs79696831	A	H	2	324
rs74731340	A	N	2	310
rs79215220	G	R	2	305
rs80116658	A	D	2	304
rs76725886	G	R	2	270
rs79945741	A	L	3	252
rs76026102	G	C	2	244
rs77451368	A	E	2	241
rs74462743	A	E	2	234
rs75636769	A	E	2	229
rs78911246	T	V	2	228
rs80205046	T	L	2	221
rs76500263	C	P	2	201
rs80222298	T	L	2	198
rs78446355	C	N	3	196
rs79660787	A	E	2	175
rs78657146	T	I	2	173
rs75690705	T	L	2	170
rs79796061	T	V	2	166
rs77959976	A	I	3	162
rs79637617	T	L	2	161
rs77834747	G	S	2	158
rs77019233	A	K	3	156
rs1141820	G	R	2	99
rs1141818	T	Y	1	99
rs78769774	A	Q	2	87
rs1141812	A	S	1	83
rs1141808	A	K	1	80
rs75954905	G	L	3	76
rs74953658	A	E	3	63

Sequence

>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3 MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCESFDPPTFPALGTLSRYKS TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM TSSGRQMELSMGPIQANYTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYKISRVLI ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI ASCVFSILTYIYEDTPDDFQLHNFSLPEEDTKLNILLIPRALQLAQRPVSLLASPWTSLTWLKTNVEVNGKESLKGQPEDI YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR CHQTWARYLVKFLDAYAEHKLQFWAVRAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLDRTLANNTHHNVRLLMLDDQH LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS LLLPHWAKVVLTDPEAAICVHGIAVHWYLDFLDPAKATLGETHHLFPNTMLFASEACVGSKFWEQSVQLGSWDQGMQCSHR IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV IIMNLLYHVVGCTAWNLALNPEGGLIWVRTSVESPTIVDITKETLYKQPILCHLGHFSKFIPEGSQRVGLVASQKNDLDAV ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ ALMRPDGSAVVVVLNRSSKDVPPTIKEPAVGFLETISPGYSIHIYLWRHQ

Positions for possible missense mutations are marked red.

Sequence with synonymous and missense mutations

The following sequence shows the synonymous and the missense mutations found with dbSNP. As there are positions with possible synonymous or missense mutations the second line shows the missense mutations and the third one the synonymous ones.

>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3 MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCESFDPPTFPALGTLSRYKS MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM TSSGRQMELSMGPIQANYTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYKISRVLI TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI ASCVFSILTYIYEDTPDDFQLHNFSLPEEDTKLNILLIPRALQLAQRPVSLLASPWTSLTWLKTNVEVNGKESLKGQPEDI ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR CHQTWARYLVKFLDAYAEHKLQFWAVRAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLDRTLANNTHHNVRLLMLDDQH YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS LLLPHWAKVVLTDPEAAICVHGIAVHWYLDFLDPAKATLGETHHLFPNTMLFASEACVGSKFWEQSVQLGSWDQGMQCSHR LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV IIMNLLYHVVGCTAWNLALNPEGGLIWVRTSVESPTIVDITKETLYKQPILCHLGHFSKFIPEGSQRVGLVASQKNDLDAV IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ ALMRPDGSAVVVVLNRSSKDVPPTIKEPAVGFLETISPGYSIHIYLWRHQ ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ

Positions for possible synonymous mutations are marked blue, positions for possible missense mutations are marked red.

Mutation map

To create the mutation map with all missense and synonymous mutations listed in dbSNP and HGMD the corresponding sequence positions were mapped together, which was quite simple as both databases use the same sequence and only the numbering was slightly different.

>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ

Positions for possible missense mutations are marked:

red, if the mutation is only listed in HGMD
blue, if the mutation is only listed in dbSNP
green, if the mutation is listed in dbSNP and HGMD

Underlined residues represent active site and residues forming hydrogen bonds with the active site. <ref>Kim et al., Crystal Structure of the Salmonella enterica Serovar Typhimurium Virulence Factor SrfJ, a Glycoside Hydrolase Family Enzyme. Journal of Bacteriology, 2009, p. 6550-6554, Vol. 191, No. 21 </ref>

The mutation map shows, that no mutation of the active sites Glu235 and Glu340 are known, whereas two of the residues forming hydrogen bonds with the active site (Arg120, Asp282, His311) are listed in the mutation databases. [Note, that the position of the residues of interest is indicated for the mature protein, which does not contain the signal peptide. The mutation map contains the 39 residue signal peptide.]

Statistical analyses

Now we want to analyse our results. Therefore we make some statistical analyses to see if there are amino acids that mutate more often than others and which amino acids are substituted etc.

Synonymous mutations

Figure 1: Diagram that shows how often which amino acid was mutated for synonymous mutations

First we want to look at the synonymous mutations. That means that the codon is mutated but this does not affect the amino acid because the mutated codon stands for the same amino acid. In our case always the third position mutated which we expected because there are many amino acids for which several codons are responsible that only differ in the third position.

The amino acids that mutated most in synonymous mutations are arginine, glycine and proline, which you can see in figure 1. If you look at the codon table <ref>http://en.wikipedia.org/wiki/Genetic_code</ref> for arginine there are six different codons, for glycine four and for proline also four. Glutamic acid, leucine, lysine, threonine and tyrosine mutate three times and they have two, six, two, four and two codons coding for them. You can see that the amino acids which mutate most often in synonymous mutations also have many codons which code for them. For the amino acids that mutate three times it is not as obvious as for them who mutate four times.

Asparagine (two codons), aspartic acid (two codons), cysteine (two codons), histidine (two codons), isoleucine (three codons), methionine (one codon), phenylalanine (two codons) and tryptophane (one codon) have no synonymous mutations. As methionine and tryptophane have only one codon a synonymous mutation is not possible. For all the other cases you can see that there are at most two codons coding for them. So there is only one possible mutation for the same amino acid.

The diagram shows us what we expected. Amino acids which more codons should also have more synonymous mutations, which occurs in our case. It is not that clear if we look for example at valine (four codons) and glutamic acid (two codons). There might be a mutation bias or we think that the number of mutations is too small.

Non-synonymous mutations

After analysing the synonymous mutation we want to analyse the non-synonymous mutations.

Figure 2: Diagram that shows which position in the codon was mutated in non-synnonymous mutations

Figure 2 shows how often each codon position was mutated. You can see that the first and the second position show almost the same frequency whereas the third position is mutated in less then twenty cases. The reason for that is that if the third position of the codon mutates the probability that it is a synonymous mutation is much higher than for the first or second codon position.

Figure 3: Diagram that shows which amino acids where mutated in non-synonymous mutations

Figure 3 shows which amino acids where mutated in non-synonymous mutations and how often. Arginine, glycine, leucine and proline are mutated most frequent and cysteine, methionine and histidine the rarest. The reason could be a mutation bias towards cytosine and guanine, which occur more frequent in the codons of the most often mutated amino acids.

Figure 4: Diagram that shows which amino acids occured because of non-synonymous mutations

Figure 4 shows which amino acids occured how often because of mutation. Arginine, leucine, proline and also the terminator codon occured most often. Methionine, tyrosine, tryptophan were not often the result of a mutation. The reason could be that the amino acids that occured most often have more possible codons whereas for example methionine has only one. So the probability is higher. Interestingly also the terminator codon occured very often due to mutation.

Figure 5: Heatmap that shows how often one amino acid was mutated to another

Figure 5 shows a heatmap where amino acid replacements with high frequency are marked read and amino acid replacements which did not occur white. The one which occured with the highest frequency are:

Asn -> Lys (4 times)
Gln -> Term (4 times)
Glu -> Lys (4 times)
Ile -> Thr (4 times)
Leu -> Pro (10 times)
Phe -> Val (4 times)
Pro -> Leu (8 times)
Thr -> Ile (4 times)
Trp -> Term (5 times)
Val -> Leu (6 times)
Tyr -> Cys (5 times)

You can see that the exchange Leucine to Proline or Proline to Leucine occurs very often. There seems to be a bias towards this mutation. As the codons are very similiar this also could be a reason.

References

@@ Line 862: / Line 862: @@
 Figure 5 shows a heatmap where amino acid replacements with high frequency are marked read and amino acid replacements which did not occur white. The one which occured with the highest frequency are:
-* Asn -> Lys
+* Asn -> Lys (4 times)
-* Gln -> Term
+* Gln -> Term (4 times)
-* Glu -> Lys
+* Glu -> Lys (4 times)
-* His -> Arg
+* Ile -> Thr (4 times)
-* Ile -> Thr
+* Leu -> Pro (10 times)
-* Leu -> Pro
+* Phe -> Val (4 times)
-* Phe -> Val
+* Pro -> Leu (8 times)
-* Pro -> Leu
+* Thr -> Ile (4 times)
-* Thr -> Ile
+* Trp -> Term (5 times)
-* Trp -> Term
+* Val -> Leu (6 times)
-* Val -> Leu
+* Tyr -> Cys (5 times)
-* Tyr -> Cys
+You can see that the exchange Leucine to Proline or Proline to Leucine occurs very often. There seems to be a bias towards this mutation. As the codons are very similiar this also could be a reason.
 == References ==

Difference between revisions of "Glucocerebrosidase mapping snps"

Revision as of 11:14, 17 August 2011

Contents

General

HGMD

dbSNP

HGMD: Mutations for GBA

Overview

Missense/nonsense mutations given for GBA

Sequence

Positions where mutations occur

Possible mutated amino acid residues

dbSNP: Mutations for GBA

Synonymous Mutations in GBA

Sequence

Missense Mutations in GBA

Sequence

Sequence with synonymous and missense mutations

Mutation map

Statistical analyses

Synonymous mutations

Non-synonymous mutations

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools