Difference between revisions of "Glucocerebrosidase mapping snps"
(→Non-synonymous mutations) |
(→Positions where mutations occur) |
||
Line 556: | Line 556: | ||
</code> |
</code> |
||
− | + | Positions for possible missense/nonsense mutations are marked <span style="background:#ff0000">red</span>. |
|
==== Possible mutated amino acid residues ==== |
==== Possible mutated amino acid residues ==== |
Revision as of 08:22, 18 August 2011
Contents
General
HGMD
The HGMD is the Human Gene Mutation Database, which contains germline mutations that are linked to human diseases. There are several types of mutations:
- missense/nonsense: codon codes for a different amino acid/premature stop codon
- splicing: a mutation that causes splicing
- regulatory: mutation affecting the regulation of gene expression
- small/gross deletions: mutation that deletes residues
- small/gross insertions: mutation that inserts residues
- small indels: insertion or deletion (maybe not recognizable)
- duplications: duplicated sequence pieces
- complex rearrangements: part of the sequence is placed somewhere else
- repeat variations: repeated varied parts of the sequence are placed somewhere else
dbSNP
The dbSNP is the Single Nucleotide Polymorphism Database by the NCBI together with the National Human Genome Research Institute (NHGRI), which was built up 1998. <ref>http://en.wikipedia.org/wiki/DbSNP</ref> It contains several types of mutations for 55 organisms including Homo Sapiens:
- SNPs (single nucleotide polymorphisms)
- MNPs (multinucleotide polymorphisms)
- small deletions
- small insertions
- small indels
- short tandem repeats (STRs)
HGMD: Mutations for GBA
Overview
To get the different mutation types for the GBA gene, which is the gene causing Gaucher Disease, we searched HMGD for GBA. As result, we got a list with the different types of mutations found for GBA:
mutation type | number of mutations |
---|---|
missense/nonsense | 236 |
splicing | 13 |
regulatory | 0 |
small deletions | 23 |
small insertions | 13 |
small indels | 2 |
gross deletions | 3 |
sross insertions/duplications | 0 |
complex rearrangements | 13 |
repeat variations | 0 |
public total (HGMD Professional 2011.1 total) | 303 (353) |
In this case, the missense/nonsense mutations are of interest, as they cause a change in the amino acid sequence. Such single point mutations seem to be responsible for Gaucher Disease, so the analysis is focused on them.
Missense/nonsense mutations given for GBA
The following table provides a detailed overview of the 236 missense/nonsense mutations found in GBA:
Codon change | Amino acid change | Codon number |
---|---|---|
AAG-AGG | Lys-Arg | -27 |
TGGg-TGA | Trp-Term | -4 |
cGGC-AGC | Gly-Ser | 10 |
AGC-ATC | Ser-Ile | 12 |
gGTG-ATG | Val-Met | 15 |
gGTG-CTG | Val-Leu | 15 |
TGT-TCT | Cys-Ser | 16 |
tGAC-AAC | Asp-Asn | 24 |
tGGT-AGT | Gly-Ser | 35 |
cTTC-GTC | Phe-Val | 37 |
tGAG-AAG | Glu-Lys | 41 |
AGT-AAT | Ser-Asn | 42 |
ACA-ATA | Thr-Ile | 43 |
GGG-GAG | Gly-Glu | 46 |
gCGA-TGA | Arg-Term | 47 |
aCGG-TGG | Arg-Trp | 48 |
CGG-CAG | Arg-Gln | 48 |
CTA-CCA | Leu-Pro | 66 |
aCAG-TAG | Gln-Term | 73 |
gAAG-TAG | Lys-Term | 74 |
GTG-GCG | Val-Ala | 78 |
AAGg-AAC | Lys-Asn | 79 |
ATG-ACG | Met-Thr | 85 |
tGCT-ACT | Ala-Thr | 90 |
CTT-CGT | Leu-Arg | 105 |
TCG-TTG | Ser-Leu | 107 |
cTTC-GTC | Phe-Val | 109 |
tGAA-AAA | Glu-Lys | 111 |
GGA-GAA | Gly-Glu | 113 |
tAAC-GAC | Asn-Asp | 117 |
ATC-ACC | Ile-Thr | 119 |
ATC-AGC | Ile-Ser | 119 |
cCGG-TGG | Arg-Trp | 120 |
CGG-CAG | Arg-Gln | 120 |
GTA-GCA | Val-Ala | 121 |
aCCC-TCC | Pro-Ser | 122 |
CCC-CTC | Pro-Leu | 122 |
ATG-ACG | Met-Thr | 123 |
cATG-GTG | Met-Val | 123 |
GAC-GTC | Asp-Val | 127 |
cCGC-TGC | Arg-Cys | 131 |
CGC-CTC | Arg-Leu | 131 |
ACC-ATC | Thr-Ile | 134 |
cACC-CCC | Thr-Pro | 134 |
TATg-TAG | Tyr-Term | 135 |
GCA-GAA | Ala-Glu | 136 |
tGAT-CAT | Asp-His | 140 |
GAA-GCA | Glu-Ala | 152 |
AAGa-AAT | Lys-Asn | 157 |
cAAG-CAG | Lys-Gln | 157 |
aCCC-ACC | Pro-Thr | 159 |
CCC-CTC | Pro-Leu | 159 |
ATT-AAT | Ile-Asn | 161 |
ATT-AGT | Ile-Ser | 161 |
CAC-CCC | His-Pro | 162 |
cCGA-TGA | Arg-Term | 163 |
cCAG-TAG | Gln-Term | 169 |
CGT-CCT | Arg-Pro | 170 |
gCGT-TGT | Arg-Cys | 170 |
TCA-TGA | Ser-Term | 173 |
aCTC-TTC | Leu-Phe | 174 |
CTC-CCC | Leu-Pro | 174 |
GCC-GAC | Ala-Asp | 176 |
cCCC-TCC | Pro-Ser | 178 |
TGG-TAG | Trp-Term | 179 |
gACA-CCA | Thr-Pro | 180 |
aCCC-ACC | Pro-Thr | 182 |
CCC-CTC | Pro-Leu | 182 |
tTGG-CGG | Trp-Arg | 184 |
gCTC-TTC | Leu-Phe | 185 |
AAT-AGT | Asn-Ser | 188 |
AATg-AAG | Asn-Lys | 188 |
GGA-GTA | Gly-Val | 189 |
aGCG-ACG | Ala-Thr | 190 |
GCG-GAG | Ala-Glu | 190 |
GTG-GAG | Val-Glu | 191 |
GTG-GGG | Val-Gly | 191 |
GGG-GAG | Gly-Glu | 195 |
gGGG-TGG | Gly-Trp | 195 |
gTCA-CCA | Ser-Pro | 196 |
aCTC-TTC | Leu-Phe | 197 |
CTC-CCC | Leu-Pro | 197 |
AAG-ACG | Lys-Thr | 198 |
cAAG-GAG | Lys-Glu | 198 |
cGGA-AGA | Gly-Arg | 202 |
GGA-GAA | Gly-Glu | 202 |
TAC-TGC | Tyr-Cys | 205 |
cTGG-CGG | Trp-Arg | 209 |
GCC-GTC | Ala-Val | 210 |
aTAC-CAC | Tyr-His | 212 |
cTTT-ATT | Phe-Ile | 213 |
TTT-TGT | Phe-Cys | 213 |
gTTC-GTC | Phe-Val | 216 |
TTC-TAC | Phe-Tyr | 216 |
TAT-TGT | Tyr-Cys | 220 |
ACA-AGA | Thr-Arg | 231 |
GAAa-GAC | Glu-Asp | 233 |
tGAA-TAA | Glu-Term | 233 |
tTCT-CCT | Ser-Pro | 237 |
GGG-GTG | Gly-Val | 239 |
GGA-GTA | Gly-Val | 243 |
aTAC-CAC | Tyr-His | 244 |
CCC-CAC | Pro-His | 245 |
TTCa-TTA | Phe-Leu | 251 |
CATc-CAG | His-Gln | 255 |
CGA-CAA | Arg-Gln | 257 |
gCGA-TGA | Arg-Term | 257 |
TTCa-TTA | Phe-Leu | 259 |
ATT-ACT | Ile-Thr | 260 |
GGT-GAT | Gly-Asp | 265 |
CCT-CGT | Pro-Arg | 266 |
CCT-CTT | Pro-Leu | 266 |
tCCT-GCT | Pro-Ala | 266 |
AGT-AAT | Ser-Asn | 271 |
CTC-CCC | Leu-Pro | 279 |
aCGC-TGC | Arg-Cys | 285 |
CGC-CAC | Arg-His | 285 |
CCC-CTC | Pro-Leu | 289 |
aCTG-TTG | Leu-Leu | 296 |
AAA-ATA | Lys-Ile | 303 |
TAT-TGT | Tyr-Cys | 304 |
TATg-TAG | Tyr-Term | 304 |
tGTT-CTT | Val-Leu | 305 |
GCT-GTT | Ala-Val | 309 |
CAT-CGT | His-Arg | 311 |
TGGt-TGT | Trp-Cys | 312 |
tTGG-CGG | Trp-Arg | 312 |
gTAC-CAC | Tyr-His | 313 |
gGAC-CAC | Asp-His | 315 |
GCT-GAT | Ala-Asp | 318 |
tCCA-GCA | Pro-Ala | 319 |
ACC-ATC | Thr-Ile | 323 |
CTA-CAA | Leu-Gln | 324 |
CTA-CCA | Leu-Pro | 324 |
aGGG-AGG | Gly-Arg | 325 |
aGGG-TGG | Gly-Trp | 325 |
gGAG-AAG | Glu-Lys | 326 |
cCGC-TGC | Arg-Cys | 329 |
TTC-TCC | Phe-Ser | 331 |
CTC-CCC | Leu-Pro | 336 |
gGCC-ACC | Ala-Thr | 341 |
cTGT-CGT | Cys-Arg | 342 |
cTGT-GGT | Cys-Gly | 342 |
TGT-TAT | Cys-Tyr | 342 |
cTGG-GGG | Trp-Gly | 348 |
gGAG-AAG | Glu-Lys | 349 |
gCAG-TAG | Gln-Term | 350 |
tGTG-CTG | Val-Leu | 352 |
gCGG-GGG | Arg-Gly | 353 |
gCGG-TGG | Arg-Trp | 353 |
GGC-GAC | Gly-Asp | 355 |
TCC-TTC | Ser-Phe | 356 |
TGG-TAG | Trp-Term | 357 |
CGA-CAA | Arg-Gln | 359 |
tCGA-TGA | Arg-Term | 359 |
ATGc-ATA | Met-Ile | 361 |
TAC-TGC | Tyr-Cys | 363 |
AGC-AAC | Ser-Asn | 364 |
AGC-ACC | Ser-Thr | 364 |
cAGC-CGC | Ser-Arg | 364 |
AGC-AAC | Ser-Asn | 366 |
AGC-ACC | Ser-Thr | 366 |
cAGC-GGC | Ser-Gly | 366 |
ACG-ATG | Thr-Met | 369 |
AAC-AGC | Asn-Ser | 370 |
AACc-AAA | Asn-Lys | 370 |
cCTC-GTC | Leu-Val | 371 |
tGTG-TTG | Val-Leu | 375 |
cGGC-AGC | Gly-Ser | 377 |
cTGG-GGG | Trp-Gly | 378 |
TGG-TAG | Trp-Term | 378 |
cGAC-AAC | Asp-Asn | 380 |
cGAC-CAC | Asp-His | 380 |
GAC-GCC | Asp-Ala | 380 |
TGG-TAG | Trp-Term | 381 |
AACc-AAA | Asn-Lys | 382 |
CTT-CGT | Leu-Arg | 383 |
CTG-CCG | Leu-Pro | 385 |
CCC-CTC | Pro-Leu | 387 |
cGAA-TAA | Glu-Term | 388 |
GGA-GAA | Gly-Glu | 389 |
aGGA-AGA | Gly-Arg | 390 |
CCC-CTC | Pro-Leu | 391 |
AAT-ATT | Asn-Ile | 392 |
TGG-TTG | Trp-Leu | 393 |
tTGG-AGG | Trp-Arg | 393 |
gGTG-TTG | Val-Leu | 394 |
CGT-CCT | Arg-Pro | 395 |
gCGT-TGT | Arg-Cys | 395 |
AAC-ACC | Asn-Thr | 396 |
TTT-TCT | Phe-Ser | 397 |
tGTC-ATC | Val-Ile | 398 |
tGTC-CTC | Val-Leu | 398 |
tGTC-TTC | Val-Phe | 398 |
cGAC-AAC | Asp-Asn | 399 |
cGAC-TAC | Asp-Tyr | 399 |
CCC-CTC | Pro-Leu | 401 |
ATC-ACC | Ile-Thr | 402 |
cATC-TTC | Ile-Phe | 402 |
GAC-GGC | Asp-Gly | 409 |
GAC-GTC | Asp-Val | 409 |
gGAC-CAC | Asp-His | 409 |
gTTT-ATT | Phe-Ile | 411 |
tTAC-CAC | Tyr-His | 412 |
cAAA-CAA | Lys-Gln | 413 |
aCAG-TAG | Gln-Term | 414 |
CAG-CGG | Gln-Arg | 414 |
CCC-CGC | Pro-Arg | 415 |
cATG-GTG | Met-Val | 416 |
gTTC-GTC | Phe-Val | 417 |
TAC-TGC | Tyr-Cys | 418 |
GGC-GAC | Gly-Asp | 421 |
cAAG-GAG | Lys-Glu | 425 |
AGAg-AGT | Arg-Ser | 433 |
gAGA-GGA | Arg-Gly | 433 |
CTG-CCG | Leu-Pro | 444 |
CTG-CGG | Leu-Arg | 444 |
cGCA-CCA | Ala-Pro | 446 |
CAT-CGT | His-Arg | 451 |
tGCT-CCT | Ala-Pro | 456 |
cGTG-ATG | Val-Met | 460 |
CTA-CCA | Leu-Pro | 461 |
AAC-AGC | Asn-Ser | 462 |
AACc-AAG | Asn-Lys | 462 |
cCGC-TGC | Arg-Cys | 463 |
CGC-CAC | Arg-His | 463 |
CGC-CCC | Arg-Pro | 463 |
gGAT-TAT | Asp-Tyr | 474 |
gGGC-AGC | Gly-Ser | 478 |
CTG-CCG | Leu-Pro | 480 |
cTCC-CCC | Ser-Pro | 488 |
ATT-ACT | Ile-Thr | 489 |
ACC-ATC | Thr-Ile | 491 |
CGC-CAC | Arg-His | 496 |
tCGC-TGC | Arg-Cys | 496 |
CAG-CGG | Gln-Arg | 497 |
Sequence
For mapping the mutations to the sequence we used the one of the given accession number NM_001005741.1. That is exactly the sequence we also used for our interpretations before. With the help of a Perl script we generated the following sequences and marked the given mutations.
Positions where mutations occur
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Positions for possible missense/nonsense mutations are marked red.
Possible mutated amino acid residues
The following sequence shows the different possibilities for mutated residues. As there are different mutations for the same position, all changed residues are shown, each in a separate line.
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPRPLSRVSIMAGSLTGLLLLQAVS!ASGARPCIPKSFSYISVMSVCNATYCNSFDPPTFPALSTVSRYKN
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVLCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
IRSE!WMELSMGPIQANHTGTGLPLTLQPE!!FQKANGFGGATTDAATLNILALSPPAQNLLRKLYVSKEEIGYDITWAST
TRSGRQMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNISQVLV
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
ASCVFSICTYI!EDTPHDFQLHNFSLPEADTKLNITLNP!ALQLA!PPV!FLDSS!PSTTRFKTSVTENGKEPFTGQPRDI
ASCDFSILTYPYADTPDDFQLHNFSLPEEDTKLQILLSHRALQLAQCPVSPLASPWTSLTWLKTKGEGNGKWSPEGQPEDI
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
CHQTRVRHIVKVLDACAEHKLQFWAVRADNEPPAVLLSVHHFQCLGLTPEQQQDLTARDLDRTLANNTHHNVRLPMLDDQC
YHQTWARYCVKYLDAYAEHKLQFWAVTA!NEPSAGLLSGYPFQCLGFTPEHQ!DFIARDLGLTLANSTHHNVRLLMLDDQH
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGATLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
LLLLHWAKVVLTDPEAAICLHGIVVRCHLHFLDAAKAIQRKTHCLSPNTMPFASETRVGSKFGK!SLGLDF!DQGIQCNHN
LLLPHWAKVVLTDPEAAK!VHGIAVHRYLDFLAPAKATPWETHRLFPNTMLFASEAGVGSKFWEQSVWLGSWD!GMQYTHT
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEAYVGSKFWEQSVRLGSWDRGMQYRHG
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IIMSVLYHLVSGTN!KRAPNL!ERLILLPTSINSLTIVDITKGTIHQ!RVVCHLDHFSEFIPEGSQSVGLVASQKNDPDPV
IITKLLYHVVG!THWNLALNPEGGPNRVCNFLYSPFIVDITKVTFYKRPMFYHLGHFSKFIPEGSQGVGLVASQKNDRDAV
IITNLLYHVVGWTAWNLALNPEGGPNWVRNFFDSPIIVDITKHTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
ALMRPDGSPVVVMPSCSSKDVPLTIKYPAVSFPETISPGYPTHIYLWRHR
ALMHPDGSAVVVVLKHSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRCQ
ALMHPDGSAVVVVLNPSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
The first row shows the original sequence, the second, third and fourth line show mutated residues. Positions for possible missense/nonsense mutations are marked red.
dbSNP: Mutations for GBA
dbSNP was searched for synonymous mutations as well as for missense mutations. Synonymous mutations do not have an influence on the resulting amino acid which means that the residue remains the same after the mutation. The output of dbSNP was also parsed with a Perl script, where we used FlatFile and the gene map. The positions where the same as in our reference sequence, so we could use them again.
Synonymous Mutations in GBA
In the following table the synonymous mutations for GBA are listed:
ID | mutated allele | amino acid | codon position | amino acid position |
---|---|---|---|---|
rs78297361 | T | R | 3 | 535 |
rs77130994 | A | G | 3 | 517 |
rs1135675 | C | V | 3 | 499 |
rs12747811 | A | Q | 3 | 471 |
rs79226895 | A | K | 3 | 464 |
rs78346899 | T | Y | 3 | 451 |
rs75034092 | A | G | 3 | 416 |
rs74498117 | G | L | 3 | 410 |
rs1141826 | A | T | 3 | 408 |
rs75391747 | A | E | 3 | 388 |
rs80317710 | A | E | 3 | 365 |
rs79311125 | T | Y | 3 | 352 |
rs1064647 | T | G | 3 | 346 |
rs1064646 | G | K | 3 | 342 |
rs74486098 | A | K | 3 | 237 |
rs76158190 | C | S | 3 | 235 |
rs75370695 | A | A | 3 | 229 |
rs76682322 | T | L | 3 | 224 |
rs76727497 | A | P | 3 | 221 |
rs76717906 | T | P | 3 | 217 |
rs78659905 | T | L | 3 | 213 |
rs77916306 | A | P | 3 | 198 |
rs77191198 | A | T | 3 | 173 |
rs74572011 | T | R | 3 | 170 |
rs79767521 | T | P | 3 | 161 |
rs75249684 | C | R | 3 | 159 |
rs79175920 | A | A | 3 | 129 |
rs1141821 | C | T | 3 | 100 |
rs1141816 | A | G | 3 | 93 |
rs78669556 | C | R | 3 | 87 |
rs1141810 | C | S | 3 | 81 |
rs76337315 | A | E | 3 | 80 |
rs1141807 | C | Y | 3 | 79 |
Sequence
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Positions for possible synonymous mutations are marked blue.
Missense Mutations in GBA
In the following table the missense mutations are listed:
ID | mutated allele | amino acid | codon position | amino acid position |
---|---|---|---|---|
rs75822236 | A | H | 2 | 535 |
rs78016673 | T | I | 2 | 530 |
rs77409925 | G | E | 3 | 513 |
rs113825752 | C | P | 2 | 509 |
rs76071730 | G | R | 2 | 490 |
rs74752878 | G | C | 2 | 457 |
rs79185870 | G | L | 3 | 456 |
rs80020805 | A | I | 3 | 455 |
rs77035024 | A | L | 3 | 450 |
rs78802049 | G | E | 3 | 448 |
rs75564605 | C | T | 2 | 441 |
rs75090908 | G | E | 3 | 438 |
rs75243000 | C | S | 2 | 436 |
rs75385858 | C | T | 2 | 435 |
rs77738682 | T | I | 2 | 431 |
rs76910485 | T | L | 2 | 430 |
rs78715199 | A | E | 3 | 419 |
rs77284004 | C | A | 2 | 419 |
rs76014919 | T | C | 3 | 417 |
rs2230289 | T | M | 2 | 408 |
rs75528494 | A | R | 3 | 405 |
rs76228122 | G | C | 2 | 402 |
rs74979486 | A | Q | 2 | 398 |
rs11558184 | A | Q | 2 | 392 |
rs1064648 | A | H | 2 | 368 |
rs78188205 | A | D | 2 | 357 |
rs77321207 | G | C | 2 | 343 |
rs77714449 | T | I | 2 | 342 |
rs79696831 | A | H | 2 | 324 |
rs74731340 | A | N | 2 | 310 |
rs79215220 | G | R | 2 | 305 |
rs80116658 | A | D | 2 | 304 |
rs76725886 | G | R | 2 | 270 |
rs79945741 | A | L | 3 | 252 |
rs76026102 | G | C | 2 | 244 |
rs77451368 | A | E | 2 | 241 |
rs74462743 | A | E | 2 | 234 |
rs75636769 | A | E | 2 | 229 |
rs78911246 | T | V | 2 | 228 |
rs80205046 | T | L | 2 | 221 |
rs76500263 | C | P | 2 | 201 |
rs80222298 | T | L | 2 | 198 |
rs78446355 | C | N | 3 | 196 |
rs79660787 | A | E | 2 | 175 |
rs78657146 | T | I | 2 | 173 |
rs75690705 | T | L | 2 | 170 |
rs79796061 | T | V | 2 | 166 |
rs77959976 | A | I | 3 | 162 |
rs79637617 | T | L | 2 | 161 |
rs77834747 | G | S | 2 | 158 |
rs77019233 | A | K | 3 | 156 |
rs1141820 | G | R | 2 | 99 |
rs1141818 | T | Y | 1 | 99 |
rs78769774 | A | Q | 2 | 87 |
rs1141812 | A | S | 1 | 83 |
rs1141808 | A | K | 1 | 80 |
rs75954905 | G | L | 3 | 76 |
rs74953658 | A | E | 3 | 63 |
Sequence
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCESFDPPTFPALGTLSRYKS
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
TSSGRQMELSMGPIQANYTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYKISRVLI
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
ASCVFSILTYIYEDTPDDFQLHNFSLPEEDTKLNILLIPRALQLAQRPVSLLASPWTSLTWLKTNVEVNGKESLKGQPEDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
CHQTWARYLVKFLDAYAEHKLQFWAVRAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLDRTLANNTHHNVRLLMLDDQH
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
LLLPHWAKVVLTDPEAAICVHGIAVHWYLDFLDPAKATLGETHHLFPNTMLFASEACVGSKFWEQSVQLGSWDQGMQCSHR
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IIMNLLYHVVGCTAWNLALNPEGGLIWVRTSVESPTIVDITKETLYKQPILCHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
ALMRPDGSAVVVVLNRSSKDVPPTIKEPAVGFLETISPGYSIHIYLWRHQ
Positions for possible missense mutations are marked red.
Sequence with synonymous and missense mutations
The following sequence shows the synonymous and the missense mutations found with dbSNP. As there are positions with possible synonymous or missense mutations the second line shows the missense mutations and the third one the synonymous ones.
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCESFDPPTFPALGTLSRYKS
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
TSSGRQMELSMGPIQANYTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYKISRVLI
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
ASCVFSILTYIYEDTPDDFQLHNFSLPEEDTKLNILLIPRALQLAQRPVSLLASPWTSLTWLKTNVEVNGKESLKGQPEDI
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
CHQTWARYLVKFLDAYAEHKLQFWAVRAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLDRTLANNTHHNVRLLMLDDQH
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
LLLPHWAKVVLTDPEAAICVHGIAVHWYLDFLDPAKATLGETHHLFPNTMLFASEACVGSKFWEQSVQLGSWDQGMQCSHR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IIMNLLYHVVGCTAWNLALNPEGGLIWVRTSVESPTIVDITKETLYKQPILCHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
ALMRPDGSAVVVVLNRSSKDVPPTIKEPAVGFLETISPGYSIHIYLWRHQ
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Positions for possible synonymous mutations are marked blue, positions for possible missense mutations are marked red.
Mutation map
To create the mutation map with all missense and synonymous mutations listed in dbSNP and HGMD the corresponding sequence positions were mapped together, which was quite simple as both databases use the same sequence and only the numbering was slightly different.
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Positions for possible missense mutations are marked:
- red, if the mutation is only listed in HGMD
- blue, if the mutation is only listed in dbSNP
- green, if the mutation is listed in dbSNP and HGMD
Underlined residues represent active site and residues forming hydrogen bonds with the active site. <ref>Kim et al., Crystal Structure of the Salmonella enterica Serovar Typhimurium Virulence Factor SrfJ, a Glycoside Hydrolase Family Enzyme. Journal of Bacteriology, 2009, p. 6550-6554, Vol. 191, No. 21 </ref>
The mutation map shows, that no mutation of the active sites Glu235 and Glu340 are known, whereas two of the residues forming hydrogen bonds with the active site (Arg120, Asp282, His311) are listed in the mutation databases. [Note, that the position of the residues of interest is indicated for the mature protein, which does not contain the signal peptide. The mutation map contains the 39 residue signal peptide.]
Statistical analyses
Now we want to analyse our results. Therefore we make some statistical analyses to see if there are amino acids that mutate more often than others and which amino acids are substituted etc.
Synonymous mutations
First we want to look at the synonymous mutations. That means that the codon is mutated but this does not affect the amino acid because the mutated codon encodes the same amino acid. In our case always the third position mutated which we expected because there are many amino acids for which several codons are responsible that only differ in the third position.
The amino acids that mutated most in synonymous mutations are arginine, glycine and proline, which you can see in figure 1. If you look at the codon table <ref>http://en.wikipedia.org/wiki/Genetic_code</ref> there are six different codons for arginine, four for glycine and also four for proline. Glutamic acid, leucine, lysine, threonine and tyrosine mutated three times and they have two, six, two, four and two codons encoding the amino acids. You can see that the amino acids which mutate most often in synonymous mutations also have many codons. For the amino acids that mutate three times it is not as obvious as for them which mutate four times.
Asparagine (two codons), aspartic acid (two codons), cysteine (two codons), histidine (two codons), isoleucine (three codons), methionine (one codon), phenylalanine (two codons) and tryptophane (one codon) have no synonymous mutations. As methionine and tryptophane have only one codon a synonymous mutation is not possible. For all the other cases you can see that there are at most two codons which encode the amino acids. So there is only one possible mutation for the same amino acid.
The diagram shows us what we expected. Amino acids with more codons should also have more synonymous mutations, which occurs in our case. It is not that clear if we look for example at valine (four codons) and glutamic acid (two codons). There might be a mutation bias or the number of observed mutations is too small to see the effect strong enough.
Non-synonymous mutations
After analysing the synonymous mutations we want to analyse the non-synonymous mutations.
Figure 2 shows how often each codon position was mutated for non-synonymous mutations. You can see that the first and the second position show almost the same frequency whereas the third position is mutated in less than twenty cases. The reason for that is that if the third position of the codon mutates the probability that it is a synonymous mutation is much higher than for the first or second codon position. In contrast a mutation on the first or second position of the codon causes a mutation to another amino acid in the majority of the cases.
Figure 3 shows which amino acids were mutated in non-synonymous mutations and how often. Arginine, glycine, leucine and proline are mutated most frequently and cysteine, methionine and histidine fewest. The reason could be a mutation bias towards cytosine and guanine, which occur more frequently in the codons of arginine, glycine, leucine and proline.
Figure 4 shows which amino acids occured how often because of mutation. Arginine, leucine, proline and also the termination codon occured most often. Methionine, tyrosine, tryptophan were not often the result of a mutation. The reason could be that the amino acids that occured most often have more possible codons whereas for example methionine has only one. So the probability is higher that after a mutation the codon encodes an amino acid, which has many codons encoding it. Interestingly also the termination codon occured very often due to mutation. There are three possible codons, so the number is also high.
Figure 5 shows a heatmap where amino acid replacements with high frequency are marked red and amino acid replacements which did not occur are marked white. Mutations from the same amino acid to itself are not marked in the heatmap because they are discussed in the section above. The ones which occured with the highest frequency are:
- Asn -> Lys (4 times)
- Gln -> Term (4 times)
- Glu -> Lys (4 times)
- Ile -> Thr (4 times)
- Leu -> Pro (10 times)
- Phe -> Val (4 times)
- Pro -> Leu (8 times)
- Thr -> Ile (4 times)
- Trp -> Term (5 times)
- Val -> Leu (6 times)
- Tyr -> Cys (5 times)
You can see that the exchange Leucine to Proline or Proline to Leucine occurs very often. There seems to be a bias towards this mutation. As the codons are very similiar, this also could be a reason.
There are also mutations which only occur often in one direction. An example for that is valine to leucine. Four different codons encode valine (GU[AUCG]) and also four different codons encode leucine (CU[AUCG]), they only differ in the first position. The mutation guanine to cytosine seems to be more common than cytosine to guanine.
Tryptophan mutated five times to a termination codon. The reason may also be the very similar codon. For tryptophan it is UGG and for the termination codon UGA, UAG and UAA. For UGA and UAG a mutation at the second or third position in the codon of tryptophan results in a termination codon.
Another frequent mutation is tyrosine to cystein. This is another example for a mutation which occurs frequently into one direction. Here again the mutation of one codon position is sufficient as tyrosine has the codons UAC and UAU and cytosine the codons UGU and UGC. There seems to be a bias towards the mutation of arginine to guanine.
On the other hand you can look at mutations that do not occur. For example alanine to arginine or tyrosine to histidine etc. For these replacements two mutations would be necessary. So the probability is lower. Of course there is also a correlation between the number of codons and the mutations. Methionine has only one codon so there are fewer possible mutations.
All in all we have two conclusions. The more codons encode an amino acid, the more mutations it has, the fewer codons encode an amino acid, the fewer mutations occur. And: The more similar the codons of different amino acids are, the more probable and frequent is a mutation. But there must also be something like a mutation bias because sometimes this is only true for one direction.