Difference between revisions of "Glucocerebrosidase mapping snps"
(→Synonymous mutations) |
(→Non-synonymous mutations) |
||
Line 862: | Line 862: | ||
Figure 5 shows a heatmap where amino acid replacements with high frequency are marked read and amino acid replacements which did not occur white. The one which occured with the highest frequency are: |
Figure 5 shows a heatmap where amino acid replacements with high frequency are marked read and amino acid replacements which did not occur white. The one which occured with the highest frequency are: |
||
− | * Asn -> Lys |
+ | * Asn -> Lys (4 times) |
− | * Gln -> Term |
+ | * Gln -> Term (4 times) |
− | * Glu -> Lys |
+ | * Glu -> Lys (4 times) |
− | * |
+ | * Ile -> Thr (4 times) |
− | * |
+ | * Leu -> Pro (10 times) |
− | * |
+ | * Phe -> Val (4 times) |
− | * |
+ | * Pro -> Leu (8 times) |
− | * |
+ | * Thr -> Ile (4 times) |
− | * |
+ | * Trp -> Term (5 times) |
− | * |
+ | * Val -> Leu (6 times) |
− | * |
+ | * Tyr -> Cys (5 times) |
+ | |||
− | * Tyr -> Cys |
||
+ | You can see that the exchange Leucine to Proline or Proline to Leucine occurs very often. There seems to be a bias towards this mutation. As the codons are very similiar this also could be a reason. |
||
== References == |
== References == |
Revision as of 11:14, 17 August 2011
Contents
General
HGMD
The HGMD is the Human Gene Mutation Database, which contains germline mutations that are linked to human diseases. There are several types of mutations:
- missense/nonsense: codon codes for a different amino acid/premature stop codon
- splicing: a mutation that causes splicing
- regulatory: mutation affecting the regulation of gene expression
- small/gross deletions: mutation that deletes residues
- small/gross insertions: mutation that inserts residues
- small indels: insertion or deletion (maybe not recognizable)
- duplications: duplicated sequence pieces
- complex rearrangements: part of the sequence is placed somewhere else
- repeat variations: repeated varied parts of the sequence are placed somewhere else
dbSNP
The dbSNP is the Single Nucleotide Polymorphism Database by the NCBI together with the National Human Genome Research Institute (NHGRI), which was built up 1998. <ref>http://en.wikipedia.org/wiki/DbSNP</ref> It contains several types of mutations for 55 organisms including Homo Sapiens:
- SNPs (single nucleotide polymorphisms)
- MNPs (multinucleotide polymorphisms)
- small deletions
- small insertions
- small indels
- short tandem repeats (STRs)
HGMD: Mutations for GBA
Overview
To get the different mutation types for the GBA gene, which is the gene causing Gaucher Disease, we searched HMGD for GBA. As result, we got a list with the different types of mutations found for GBA:
mutation type | number of mutations |
---|---|
missense/nonsense | 236 |
splicing | 13 |
regulatory | 0 |
small deletions | 23 |
small insertions | 13 |
small indels | 2 |
gross deletions | 3 |
sross insertions/duplications | 0 |
complex rearrangements | 13 |
repeat variations | 0 |
public total (HGMD Professional 2011.1 total) | 303 (353) |
In this case, the missense/nonsense mutations are of interest, as they cause a change in the amino acid sequence. Such single point mutations seem to be responsible for Gaucher Disease, so the analysis is focused on them.
Missense/nonsense mutations given for GBA
The following table provides a detailed overview of the 236 missense/nonsense mutations found in GBA:
Codon change | Amino acid change | Codon number |
---|---|---|
AAG-AGG | Lys-Arg | -27 |
TGGg-TGA | Trp-Term | -4 |
cGGC-AGC | Gly-Ser | 10 |
AGC-ATC | Ser-Ile | 12 |
gGTG-ATG | Val-Met | 15 |
gGTG-CTG | Val-Leu | 15 |
TGT-TCT | Cys-Ser | 16 |
tGAC-AAC | Asp-Asn | 24 |
tGGT-AGT | Gly-Ser | 35 |
cTTC-GTC | Phe-Val | 37 |
tGAG-AAG | Glu-Lys | 41 |
AGT-AAT | Ser-Asn | 42 |
ACA-ATA | Thr-Ile | 43 |
GGG-GAG | Gly-Glu | 46 |
gCGA-TGA | Arg-Term | 47 |
aCGG-TGG | Arg-Trp | 48 |
CGG-CAG | Arg-Gln | 48 |
CTA-CCA | Leu-Pro | 66 |
aCAG-TAG | Gln-Term | 73 |
gAAG-TAG | Lys-Term | 74 |
GTG-GCG | Val-Ala | 78 |
AAGg-AAC | Lys-Asn | 79 |
ATG-ACG | Met-Thr | 85 |
tGCT-ACT | Ala-Thr | 90 |
CTT-CGT | Leu-Arg | 105 |
TCG-TTG | Ser-Leu | 107 |
cTTC-GTC | Phe-Val | 109 |
tGAA-AAA | Glu-Lys | 111 |
GGA-GAA | Gly-Glu | 113 |
tAAC-GAC | Asn-Asp | 117 |
ATC-ACC | Ile-Thr | 119 |
ATC-AGC | Ile-Ser | 119 |
cCGG-TGG | Arg-Trp | 120 |
CGG-CAG | Arg-Gln | 120 |
GTA-GCA | Val-Ala | 121 |
aCCC-TCC | Pro-Ser | 122 |
CCC-CTC | Pro-Leu | 122 |
ATG-ACG | Met-Thr | 123 |
cATG-GTG | Met-Val | 123 |
GAC-GTC | Asp-Val | 127 |
cCGC-TGC | Arg-Cys | 131 |
CGC-CTC | Arg-Leu | 131 |
ACC-ATC | Thr-Ile | 134 |
cACC-CCC | Thr-Pro | 134 |
TATg-TAG | Tyr-Term | 135 |
GCA-GAA | Ala-Glu | 136 |
tGAT-CAT | Asp-His | 140 |
GAA-GCA | Glu-Ala | 152 |
AAGa-AAT | Lys-Asn | 157 |
cAAG-CAG | Lys-Gln | 157 |
aCCC-ACC | Pro-Thr | 159 |
CCC-CTC | Pro-Leu | 159 |
ATT-AAT | Ile-Asn | 161 |
ATT-AGT | Ile-Ser | 161 |
CAC-CCC | His-Pro | 162 |
cCGA-TGA | Arg-Term | 163 |
cCAG-TAG | Gln-Term | 169 |
CGT-CCT | Arg-Pro | 170 |
gCGT-TGT | Arg-Cys | 170 |
TCA-TGA | Ser-Term | 173 |
aCTC-TTC | Leu-Phe | 174 |
CTC-CCC | Leu-Pro | 174 |
GCC-GAC | Ala-Asp | 176 |
cCCC-TCC | Pro-Ser | 178 |
TGG-TAG | Trp-Term | 179 |
gACA-CCA | Thr-Pro | 180 |
aCCC-ACC | Pro-Thr | 182 |
CCC-CTC | Pro-Leu | 182 |
tTGG-CGG | Trp-Arg | 184 |
gCTC-TTC | Leu-Phe | 185 |
AAT-AGT | Asn-Ser | 188 |
AATg-AAG | Asn-Lys | 188 |
GGA-GTA | Gly-Val | 189 |
aGCG-ACG | Ala-Thr | 190 |
GCG-GAG | Ala-Glu | 190 |
GTG-GAG | Val-Glu | 191 |
GTG-GGG | Val-Gly | 191 |
GGG-GAG | Gly-Glu | 195 |
gGGG-TGG | Gly-Trp | 195 |
gTCA-CCA | Ser-Pro | 196 |
aCTC-TTC | Leu-Phe | 197 |
CTC-CCC | Leu-Pro | 197 |
AAG-ACG | Lys-Thr | 198 |
cAAG-GAG | Lys-Glu | 198 |
cGGA-AGA | Gly-Arg | 202 |
GGA-GAA | Gly-Glu | 202 |
TAC-TGC | Tyr-Cys | 205 |
cTGG-CGG | Trp-Arg | 209 |
GCC-GTC | Ala-Val | 210 |
aTAC-CAC | Tyr-His | 212 |
cTTT-ATT | Phe-Ile | 213 |
TTT-TGT | Phe-Cys | 213 |
gTTC-GTC | Phe-Val | 216 |
TTC-TAC | Phe-Tyr | 216 |
TAT-TGT | Tyr-Cys | 220 |
ACA-AGA | Thr-Arg | 231 |
GAAa-GAC | Glu-Asp | 233 |
tGAA-TAA | Glu-Term | 233 |
tTCT-CCT | Ser-Pro | 237 |
GGG-GTG | Gly-Val | 239 |
GGA-GTA | Gly-Val | 243 |
aTAC-CAC | Tyr-His | 244 |
CCC-CAC | Pro-His | 245 |
TTCa-TTA | Phe-Leu | 251 |
CATc-CAG | His-Gln | 255 |
CGA-CAA | Arg-Gln | 257 |
gCGA-TGA | Arg-Term | 257 |
TTCa-TTA | Phe-Leu | 259 |
ATT-ACT | Ile-Thr | 260 |
GGT-GAT | Gly-Asp | 265 |
CCT-CGT | Pro-Arg | 266 |
CCT-CTT | Pro-Leu | 266 |
tCCT-GCT | Pro-Ala | 266 |
AGT-AAT | Ser-Asn | 271 |
CTC-CCC | Leu-Pro | 279 |
aCGC-TGC | Arg-Cys | 285 |
CGC-CAC | Arg-His | 285 |
CCC-CTC | Pro-Leu | 289 |
aCTG-TTG | Leu-Leu | 296 |
AAA-ATA | Lys-Ile | 303 |
TAT-TGT | Tyr-Cys | 304 |
TATg-TAG | Tyr-Term | 304 |
tGTT-CTT | Val-Leu | 305 |
GCT-GTT | Ala-Val | 309 |
CAT-CGT | His-Arg | 311 |
TGGt-TGT | Trp-Cys | 312 |
tTGG-CGG | Trp-Arg | 312 |
gTAC-CAC | Tyr-His | 313 |
gGAC-CAC | Asp-His | 315 |
GCT-GAT | Ala-Asp | 318 |
tCCA-GCA | Pro-Ala | 319 |
ACC-ATC | Thr-Ile | 323 |
CTA-CAA | Leu-Gln | 324 |
CTA-CCA | Leu-Pro | 324 |
aGGG-AGG | Gly-Arg | 325 |
aGGG-TGG | Gly-Trp | 325 |
gGAG-AAG | Glu-Lys | 326 |
cCGC-TGC | Arg-Cys | 329 |
TTC-TCC | Phe-Ser | 331 |
CTC-CCC | Leu-Pro | 336 |
gGCC-ACC | Ala-Thr | 341 |
cTGT-CGT | Cys-Arg | 342 |
cTGT-GGT | Cys-Gly | 342 |
TGT-TAT | Cys-Tyr | 342 |
cTGG-GGG | Trp-Gly | 348 |
gGAG-AAG | Glu-Lys | 349 |
gCAG-TAG | Gln-Term | 350 |
tGTG-CTG | Val-Leu | 352 |
gCGG-GGG | Arg-Gly | 353 |
gCGG-TGG | Arg-Trp | 353 |
GGC-GAC | Gly-Asp | 355 |
TCC-TTC | Ser-Phe | 356 |
TGG-TAG | Trp-Term | 357 |
CGA-CAA | Arg-Gln | 359 |
tCGA-TGA | Arg-Term | 359 |
ATGc-ATA | Met-Ile | 361 |
TAC-TGC | Tyr-Cys | 363 |
AGC-AAC | Ser-Asn | 364 |
AGC-ACC | Ser-Thr | 364 |
cAGC-CGC | Ser-Arg | 364 |
AGC-AAC | Ser-Asn | 366 |
AGC-ACC | Ser-Thr | 366 |
cAGC-GGC | Ser-Gly | 366 |
ACG-ATG | Thr-Met | 369 |
AAC-AGC | Asn-Ser | 370 |
AACc-AAA | Asn-Lys | 370 |
cCTC-GTC | Leu-Val | 371 |
tGTG-TTG | Val-Leu | 375 |
cGGC-AGC | Gly-Ser | 377 |
cTGG-GGG | Trp-Gly | 378 |
TGG-TAG | Trp-Term | 378 |
cGAC-AAC | Asp-Asn | 380 |
cGAC-CAC | Asp-His | 380 |
GAC-GCC | Asp-Ala | 380 |
TGG-TAG | Trp-Term | 381 |
AACc-AAA | Asn-Lys | 382 |
CTT-CGT | Leu-Arg | 383 |
CTG-CCG | Leu-Pro | 385 |
CCC-CTC | Pro-Leu | 387 |
cGAA-TAA | Glu-Term | 388 |
GGA-GAA | Gly-Glu | 389 |
aGGA-AGA | Gly-Arg | 390 |
CCC-CTC | Pro-Leu | 391 |
AAT-ATT | Asn-Ile | 392 |
TGG-TTG | Trp-Leu | 393 |
tTGG-AGG | Trp-Arg | 393 |
gGTG-TTG | Val-Leu | 394 |
CGT-CCT | Arg-Pro | 395 |
gCGT-TGT | Arg-Cys | 395 |
AAC-ACC | Asn-Thr | 396 |
TTT-TCT | Phe-Ser | 397 |
tGTC-ATC | Val-Ile | 398 |
tGTC-CTC | Val-Leu | 398 |
tGTC-TTC | Val-Phe | 398 |
cGAC-AAC | Asp-Asn | 399 |
cGAC-TAC | Asp-Tyr | 399 |
CCC-CTC | Pro-Leu | 401 |
ATC-ACC | Ile-Thr | 402 |
cATC-TTC | Ile-Phe | 402 |
GAC-GGC | Asp-Gly | 409 |
GAC-GTC | Asp-Val | 409 |
gGAC-CAC | Asp-His | 409 |
gTTT-ATT | Phe-Ile | 411 |
tTAC-CAC | Tyr-His | 412 |
cAAA-CAA | Lys-Gln | 413 |
aCAG-TAG | Gln-Term | 414 |
CAG-CGG | Gln-Arg | 414 |
CCC-CGC | Pro-Arg | 415 |
cATG-GTG | Met-Val | 416 |
gTTC-GTC | Phe-Val | 417 |
TAC-TGC | Tyr-Cys | 418 |
GGC-GAC | Gly-Asp | 421 |
cAAG-GAG | Lys-Glu | 425 |
AGAg-AGT | Arg-Ser | 433 |
gAGA-GGA | Arg-Gly | 433 |
CTG-CCG | Leu-Pro | 444 |
CTG-CGG | Leu-Arg | 444 |
cGCA-CCA | Ala-Pro | 446 |
CAT-CGT | His-Arg | 451 |
tGCT-CCT | Ala-Pro | 456 |
cGTG-ATG | Val-Met | 460 |
CTA-CCA | Leu-Pro | 461 |
AAC-AGC | Asn-Ser | 462 |
AACc-AAG | Asn-Lys | 462 |
cCGC-TGC | Arg-Cys | 463 |
CGC-CAC | Arg-His | 463 |
CGC-CCC | Arg-Pro | 463 |
gGAT-TAT | Asp-Tyr | 474 |
gGGC-AGC | Gly-Ser | 478 |
CTG-CCG | Leu-Pro | 480 |
cTCC-CCC | Ser-Pro | 488 |
ATT-ACT | Ile-Thr | 489 |
ACC-ATC | Thr-Ile | 491 |
CGC-CAC | Arg-His | 496 |
tCGC-TGC | Arg-Cys | 496 |
CAG-CGG | Gln-Arg | 497 |
Sequence
For mapping the mutations to the sequence we used the one of the given accession number NM_001005741.1. That is exactly the sequence we also used for our interpretations before. With the help of a Perl script we generated the following sequences and marked the given mutations.
Positions where mutations occur
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Posistions for possible missense/nonsense mutations are marked red.
Possible mutated amino acid residues
The following sequence shows the different possibilities for mutated residues. As there are different mutations for the same position, all changed residues are shown, each in a separate line.
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPRPLSRVSIMAGSLTGLLLLQAVS!ASGARPCIPKSFSYISVMSVCNATYCNSFDPPTFPALSTVSRYKN
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVLCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
IRSE!WMELSMGPIQANHTGTGLPLTLQPE!!FQKANGFGGATTDAATLNILALSPPAQNLLRKLYVSKEEIGYDITWAST
TRSGRQMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNISQVLV
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
ASCVFSICTYI!EDTPHDFQLHNFSLPEADTKLNITLNP!ALQLA!PPV!FLDSS!PSTTRFKTSVTENGKEPFTGQPRDI
ASCDFSILTYPYADTPDDFQLHNFSLPEEDTKLQILLSHRALQLAQCPVSPLASPWTSLTWLKTKGEGNGKWSPEGQPEDI
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
CHQTRVRHIVKVLDACAEHKLQFWAVRADNEPPAVLLSVHHFQCLGLTPEQQQDLTARDLDRTLANNTHHNVRLPMLDDQC
YHQTWARYCVKYLDAYAEHKLQFWAVTA!NEPSAGLLSGYPFQCLGFTPEHQ!DFIARDLGLTLANSTHHNVRLLMLDDQH
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGATLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
LLLLHWAKVVLTDPEAAICLHGIVVRCHLHFLDAAKAIQRKTHCLSPNTMPFASETRVGSKFGK!SLGLDF!DQGIQCNHN
LLLPHWAKVVLTDPEAAK!VHGIAVHRYLDFLAPAKATPWETHRLFPNTMLFASEAGVGSKFWEQSVWLGSWD!GMQYTHT
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEAYVGSKFWEQSVRLGSWDRGMQYRHG
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IIMSVLYHLVSGTN!KRAPNL!ERLILLPTSINSLTIVDITKGTIHQ!RVVCHLDHFSEFIPEGSQSVGLVASQKNDPDPV
IITKLLYHVVG!THWNLALNPEGGPNRVCNFLYSPFIVDITKVTFYKRPMFYHLGHFSKFIPEGSQGVGLVASQKNDRDAV
IITNLLYHVVGWTAWNLALNPEGGPNWVRNFFDSPIIVDITKHTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
ALMRPDGSPVVVMPSCSSKDVPLTIKYPAVSFPETISPGYPTHIYLWRHR
ALMHPDGSAVVVVLKHSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRCQ
ALMHPDGSAVVVVLNPSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
The first row shows the original sequence, the second, third and fourth line show mutated residues. Positions for possible missense/nonsense mutations are marked red.
dbSNP: Mutations for GBA
dbSNP was searched for synonymous mutations as well as for missense mutations. Synonymous mutations do not have an influence on the resulting amino acid which means that the residue remains the same after the mutation. The output of dbSNP was also parsed with a Perl script, where we used FlatFile and the gene map. The positions where the same as in our reference sequence, so we could use them again.
Synonymous Mutations in GBA
In the following table the synonymous mutations for GBA are listed:
ID | mutated allele | amino acid | codon position | amino acid position |
---|---|---|---|---|
rs78297361 | T | R | 3 | 535 |
rs77130994 | A | G | 3 | 517 |
rs1135675 | C | V | 3 | 499 |
rs12747811 | A | Q | 3 | 471 |
rs79226895 | A | K | 3 | 464 |
rs78346899 | T | Y | 3 | 451 |
rs75034092 | A | G | 3 | 416 |
rs74498117 | G | L | 3 | 410 |
rs1141826 | A | T | 3 | 408 |
rs75391747 | A | E | 3 | 388 |
rs80317710 | A | E | 3 | 365 |
rs79311125 | T | Y | 3 | 352 |
rs1064647 | T | G | 3 | 346 |
rs1064646 | G | K | 3 | 342 |
rs74486098 | A | K | 3 | 237 |
rs76158190 | C | S | 3 | 235 |
rs75370695 | A | A | 3 | 229 |
rs76682322 | T | L | 3 | 224 |
rs76727497 | A | P | 3 | 221 |
rs76717906 | T | P | 3 | 217 |
rs78659905 | T | L | 3 | 213 |
rs77916306 | A | P | 3 | 198 |
rs77191198 | A | T | 3 | 173 |
rs74572011 | T | R | 3 | 170 |
rs79767521 | T | P | 3 | 161 |
rs75249684 | C | R | 3 | 159 |
rs79175920 | A | A | 3 | 129 |
rs1141821 | C | T | 3 | 100 |
rs1141816 | A | G | 3 | 93 |
rs78669556 | C | R | 3 | 87 |
rs1141810 | C | S | 3 | 81 |
rs76337315 | A | E | 3 | 80 |
rs1141807 | C | Y | 3 | 79 |
Sequence
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Positions for possible synonymous mutations are marked blue.
Missense Mutations in GBA
In the following table the missense mutations are listed:
ID | mutated allele | amino acid | codon position | amino acid position |
---|---|---|---|---|
rs75822236 | A | H | 2 | 535 |
rs78016673 | T | I | 2 | 530 |
rs77409925 | G | E | 3 | 513 |
rs113825752 | C | P | 2 | 509 |
rs76071730 | G | R | 2 | 490 |
rs74752878 | G | C | 2 | 457 |
rs79185870 | G | L | 3 | 456 |
rs80020805 | A | I | 3 | 455 |
rs77035024 | A | L | 3 | 450 |
rs78802049 | G | E | 3 | 448 |
rs75564605 | C | T | 2 | 441 |
rs75090908 | G | E | 3 | 438 |
rs75243000 | C | S | 2 | 436 |
rs75385858 | C | T | 2 | 435 |
rs77738682 | T | I | 2 | 431 |
rs76910485 | T | L | 2 | 430 |
rs78715199 | A | E | 3 | 419 |
rs77284004 | C | A | 2 | 419 |
rs76014919 | T | C | 3 | 417 |
rs2230289 | T | M | 2 | 408 |
rs75528494 | A | R | 3 | 405 |
rs76228122 | G | C | 2 | 402 |
rs74979486 | A | Q | 2 | 398 |
rs11558184 | A | Q | 2 | 392 |
rs1064648 | A | H | 2 | 368 |
rs78188205 | A | D | 2 | 357 |
rs77321207 | G | C | 2 | 343 |
rs77714449 | T | I | 2 | 342 |
rs79696831 | A | H | 2 | 324 |
rs74731340 | A | N | 2 | 310 |
rs79215220 | G | R | 2 | 305 |
rs80116658 | A | D | 2 | 304 |
rs76725886 | G | R | 2 | 270 |
rs79945741 | A | L | 3 | 252 |
rs76026102 | G | C | 2 | 244 |
rs77451368 | A | E | 2 | 241 |
rs74462743 | A | E | 2 | 234 |
rs75636769 | A | E | 2 | 229 |
rs78911246 | T | V | 2 | 228 |
rs80205046 | T | L | 2 | 221 |
rs76500263 | C | P | 2 | 201 |
rs80222298 | T | L | 2 | 198 |
rs78446355 | C | N | 3 | 196 |
rs79660787 | A | E | 2 | 175 |
rs78657146 | T | I | 2 | 173 |
rs75690705 | T | L | 2 | 170 |
rs79796061 | T | V | 2 | 166 |
rs77959976 | A | I | 3 | 162 |
rs79637617 | T | L | 2 | 161 |
rs77834747 | G | S | 2 | 158 |
rs77019233 | A | K | 3 | 156 |
rs1141820 | G | R | 2 | 99 |
rs1141818 | T | Y | 1 | 99 |
rs78769774 | A | Q | 2 | 87 |
rs1141812 | A | S | 1 | 83 |
rs1141808 | A | K | 1 | 80 |
rs75954905 | G | L | 3 | 76 |
rs74953658 | A | E | 3 | 63 |
Sequence
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCESFDPPTFPALGTLSRYKS
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
TSSGRQMELSMGPIQANYTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYKISRVLI
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
ASCVFSILTYIYEDTPDDFQLHNFSLPEEDTKLNILLIPRALQLAQRPVSLLASPWTSLTWLKTNVEVNGKESLKGQPEDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
CHQTWARYLVKFLDAYAEHKLQFWAVRAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLDRTLANNTHHNVRLLMLDDQH
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
LLLPHWAKVVLTDPEAAICVHGIAVHWYLDFLDPAKATLGETHHLFPNTMLFASEACVGSKFWEQSVQLGSWDQGMQCSHR
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IIMNLLYHVVGCTAWNLALNPEGGLIWVRTSVESPTIVDITKETLYKQPILCHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
ALMRPDGSAVVVVLNRSSKDVPPTIKEPAVGFLETISPGYSIHIYLWRHQ
Positions for possible missense mutations are marked red.
Sequence with synonymous and missense mutations
The following sequence shows the synonymous and the missense mutations found with dbSNP. As there are positions with possible synonymous or missense mutations the second line shows the missense mutations and the third one the synonymous ones.
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCESFDPPTFPALGTLSRYKS
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
TSSGRQMELSMGPIQANYTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYKISRVLI
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
ASCVFSILTYIYEDTPDDFQLHNFSLPEEDTKLNILLIPRALQLAQRPVSLLASPWTSLTWLKTNVEVNGKESLKGQPEDI
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
CHQTWARYLVKFLDAYAEHKLQFWAVRAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLDRTLANNTHHNVRLLMLDDQH
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
LLLPHWAKVVLTDPEAAICVHGIAVHWYLDFLDPAKATLGETHHLFPNTMLFASEACVGSKFWEQSVQLGSWDQGMQCSHR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IIMNLLYHVVGCTAWNLALNPEGGLIWVRTSVESPTIVDITKETLYKQPILCHLGHFSKFIPEGSQRVGLVASQKNDLDAV
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
ALMRPDGSAVVVVLNRSSKDVPPTIKEPAVGFLETISPGYSIHIYLWRHQ
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Positions for possible synonymous mutations are marked blue, positions for possible missense mutations are marked red.
Mutation map
To create the mutation map with all missense and synonymous mutations listed in dbSNP and HGMD the corresponding sequence positions were mapped together, which was quite simple as both databases use the same sequence and only the numbering was slightly different.
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNATYCDSFDPPTFPALGTFSRYES
TRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGFGGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPM
ASCDFSIRTYTYADTPDDFQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQPGDI
YHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIARDLGPTLANSTHHNVRLLMLDDQR
LLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAKATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHS
IITNLLYHVVGWTDWNLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQKNDLDAV
ALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
Positions for possible missense mutations are marked:
- red, if the mutation is only listed in HGMD
- blue, if the mutation is only listed in dbSNP
- green, if the mutation is listed in dbSNP and HGMD
Underlined residues represent active site and residues forming hydrogen bonds with the active site. <ref>Kim et al., Crystal Structure of the Salmonella enterica Serovar Typhimurium Virulence Factor SrfJ, a Glycoside Hydrolase Family Enzyme. Journal of Bacteriology, 2009, p. 6550-6554, Vol. 191, No. 21 </ref>
The mutation map shows, that no mutation of the active sites Glu235 and Glu340 are known, whereas two of the residues forming hydrogen bonds with the active site (Arg120, Asp282, His311) are listed in the mutation databases. [Note, that the position of the residues of interest is indicated for the mature protein, which does not contain the signal peptide. The mutation map contains the 39 residue signal peptide.]
Statistical analyses
Now we want to analyse our results. Therefore we make some statistical analyses to see if there are amino acids that mutate more often than others and which amino acids are substituted etc.
Synonymous mutations
First we want to look at the synonymous mutations. That means that the codon is mutated but this does not affect the amino acid because the mutated codon stands for the same amino acid. In our case always the third position mutated which we expected because there are many amino acids for which several codons are responsible that only differ in the third position.
The amino acids that mutated most in synonymous mutations are arginine, glycine and proline, which you can see in figure 1. If you look at the codon table <ref>http://en.wikipedia.org/wiki/Genetic_code</ref> for arginine there are six different codons, for glycine four and for proline also four. Glutamic acid, leucine, lysine, threonine and tyrosine mutate three times and they have two, six, two, four and two codons coding for them. You can see that the amino acids which mutate most often in synonymous mutations also have many codons which code for them. For the amino acids that mutate three times it is not as obvious as for them who mutate four times.
Asparagine (two codons), aspartic acid (two codons), cysteine (two codons), histidine (two codons), isoleucine (three codons), methionine (one codon), phenylalanine (two codons) and tryptophane (one codon) have no synonymous mutations. As methionine and tryptophane have only one codon a synonymous mutation is not possible. For all the other cases you can see that there are at most two codons coding for them. So there is only one possible mutation for the same amino acid.
The diagram shows us what we expected. Amino acids which more codons should also have more synonymous mutations, which occurs in our case. It is not that clear if we look for example at valine (four codons) and glutamic acid (two codons). There might be a mutation bias or we think that the number of mutations is too small.
Non-synonymous mutations
After analysing the synonymous mutation we want to analyse the non-synonymous mutations.
Figure 2 shows how often each codon position was mutated. You can see that the first and the second position show almost the same frequency whereas the third position is mutated in less then twenty cases. The reason for that is that if the third position of the codon mutates the probability that it is a synonymous mutation is much higher than for the first or second codon position.
Figure 3 shows which amino acids where mutated in non-synonymous mutations and how often. Arginine, glycine, leucine and proline are mutated most frequent and cysteine, methionine and histidine the rarest. The reason could be a mutation bias towards cytosine and guanine, which occur more frequent in the codons of the most often mutated amino acids.
Figure 4 shows which amino acids occured how often because of mutation. Arginine, leucine, proline and also the terminator codon occured most often. Methionine, tyrosine, tryptophan were not often the result of a mutation. The reason could be that the amino acids that occured most often have more possible codons whereas for example methionine has only one. So the probability is higher. Interestingly also the terminator codon occured very often due to mutation.
Figure 5 shows a heatmap where amino acid replacements with high frequency are marked read and amino acid replacements which did not occur white. The one which occured with the highest frequency are:
- Asn -> Lys (4 times)
- Gln -> Term (4 times)
- Glu -> Lys (4 times)
- Ile -> Thr (4 times)
- Leu -> Pro (10 times)
- Phe -> Val (4 times)
- Pro -> Leu (8 times)
- Thr -> Ile (4 times)
- Trp -> Term (5 times)
- Val -> Leu (6 times)
- Tyr -> Cys (5 times)
You can see that the exchange Leucine to Proline or Proline to Leucine occurs very often. There seems to be a bias towards this mutation. As the codons are very similiar this also could be a reason.