Gaucher Disease: Task 07 - Research SNPs

From Bioinformatikpedia
Revision as of 01:42, 28 August 2013 by Kalemanovm (talk | contribs) (Mutation map)


table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}



HGMD professional was last updated in March 2013. The free public HGMD version provides data from the release of HGMD professional in 2008. The database contains the first example of disease causing or disease associated mutations as well as disease-associated/functional polymorphisms. The informations are taken from literature or reported functional studies. Only non-silent mutations in the coding region are considered. Each mutation is stored once in the database. Additional to the information shown in the second table below, the information about a mutation includes its associated disease phenotype, chromosomal location, gene symbol and the reference of its first literature report. HGMD professional also provides a mutation viewer. The database has access to more than 250 Journals, that are searched by a combination of computerized and manual procedures to find new detected mutations in published articles about germline mutations causing human genetic diseases.

The Glucocerebrosidase has the gene symbol GBA and the accession number NM_001005741.2 . A research on The Human Gene Mutation Database for GBA leads to the results in <xr id="hgmd"/>.

<figtable id="hgmd">

Table 1: GBA in HGMD
Mutation Type Number of Mutations Effect of Mutation
Missense/nonsense 256 substitutions of a single base-pair in coding regions, that cause a amino acid or stop/start codon change
Splicing 16 mutations that influence the mRNA splicing
Regulatory 0 different Regulation caused by substitution
Small deletions 26 micro-deletions (< 21 bp)
Small insertions 13 micro-insertions (<21 bp)
Small indels 4 micro-indels (<21 bp)
Gross deletions 3 deletion (>20 bp)
Gross insertions/duplications 1 insertion (>20 bp)
Complex rearrangements 16 rearangement of DNA fragments within the sequence
Repeat variations 0 different number of repeats
Total 335
Mutations found for gene symbol GBA on the database HGMD.


GBA is not affected by a regulatory mutations and repeated variations. The mutations on GBA do not only result to Gaucher's disease or influence its phenotype, but may also have an effect on the phenotype of the diseases Parkinson and Alzheimer. HGMD public (2008 update) provides 335 mutations for GBA (table 1). On HGMD Professional 2013.1, 380 mutations can be found.

The following table of missense/nonsense mutations on GBA shows an abridgment of 10 mutations out of all 256 known missense/nonsense mutation (<xr id="hgmd"/>)of the public version of HGMD. The high number of mutations shows the high liability of GBA on Gaucher causing mutations. Some position specific amino acids have a higher liability to causing a the Gaucher's disease, as more than one missense/nonsense mutation occur on that position (for example on sequence position 54 in <xr id="miss"/>).

<figtable id="miss">

Table 2: Missense/nonsense mutations on GBA
Accession Number Codon change Amino acid change Codon Number
CM081634 cGGC-AGC Gly-Ser 49
CM057078 AGC-ATC Ser-Ile 51
CM044630 gGTG-ATG Val-Met 54
CM960691 gGTG-CTG Val-Leu 54
HM971738 TGT-TCT Cys-Ser 55
CM081630 AGT-AAT Ser-Asn 81
CM950560 ACA-ATA Thr-Ile 82
CM960692 GGG-GAG Gly-Glu 85
CM016030 gCGA-TGA Arg-Term 86
CM950561 aCGG-TGG Arg-Trp 87
First 10 misssense/non-sense mutation (out of 256) listed on HGMD for GBA.



The available data in dbSNP (version 138) was released on April 25, 2013. It provides mutation information for the three organisms, Homo sapiens, Mus musculus and Bos taurus. The database contains single base nucleotide substitutions as well as short deletion and insertion polymorphisms. In the database 432 mutation were found for Homo sapiens on the gene GBA. 59 of them are silent mutations. Some chosen silent mutations are listed in <xr id="dbsnp"/>.

<figtable id="dbsnp">

Silent mutations on GBA
mRNA Protein refNumber
Allele change Sequence position Residue Codon position
TCT ⇒ TCC 1937 Ser 504 rs141710041
CCC ⇒ CCT 646, 872, 1036 Pro 161, 123, 210 rs201615998
TTG ⇒ CTG 245 Leu 28 rs201330214
Randomly chosen, silent mutations of GBA with accession numbers NM_001005741.2 (gene) and NP_001005741.1 (protein) on dbSNP.


"SNP Geneview Report" of the "gene" search of the GBA human gene: [[1]].


GBA on SNPedia

5 SNPs were found


  • The following information (if available) is given on each SAAS (single amino acid substitution):
    • Experimentally derived functional and structural impact
    • Predicted functional effect
    • Associated disease
    • Average heterozygosity
    • Experimental evidence of the nsSNP
    • Evolutionary conservation of wildtype and mutant amino acid
    • Link-outs to external databases
  • Last update: 2012-02-20 (updated to recent Swiss-Prot release (2012-01))
  • The information comes from various databases (SwissProt, PMD, dbSNP, 1000 Genomes) and SNP effect is predicted with SIFT and SNAP
  • Currently (2013-06-23) 159142 protein sequences from 2985 organisms are covered in SNPdbe and 1691464 SAASs are referenced, consisting of natural variants, SAASs from mutagenesis experiments and sequencing conflicts
Table 4: Human variants
Effect Number Percent
Observed functional effect 23121 2%
Disease associations 26842 3%
Observed functional effect and disease 1629 0.17%
Overall 967879 100%

Search for Gaucher disease delivered 174 entries, 147 from them from the human gene GBA and protein P04062. Note: as some of the mutation occur at the same position, there are only 121 distinct mutated positions from the subset of 147.

Most SNPs are not validated:

Table 5: Experimental evidence
Number of mutations Evidence type
120 Not validated
20 by cluster
3 by cluster,freq
2 1000Genome,freq,cluster
1 by freq
1 HapMap,freq,cluster

Conservation score: Likelihood of observing either the wt (wildtype; green bar) or mt (mutant; red bar) at given position in the sequence. The longer the bar the higher the likelihood. The conservation score can be used as a first (simple) estimate of effect (disease causing) or no effect. There are three types of conservation scores: Pssm, Perc and Psic. There is a direct correlation between the length of the bars and Perc scores, however it is difficult to find a threshold to discriminate effect from no-effect. On the contrary, Pssm mt scores below 0 combined with Pssm wt scores above 0 imply very low occurrence of the mutant in comparison to the wild type protein, which implies that the mutation is deleterious (causes a negative effect). With this method we received 46 no-effect 101 effect mutations.


  • daily updated
  • 21 844 entries (updated 11 June 2013)
  • gene entries → allelic variants (only selected mutations)
  • disease entries
  • relationship between genotype and phenotype (diseases)

For the GBA gene 48 mutations are found in OMIM (41 of them are referenced to dbSNP), which is a small number in comparison to the other databases. OMIM table view of GBA allelic variants.

SNP Databases Summary

under construction

Table 6: SNP Databases
Information Database
Type of information Only non-silent mutations in the coding region: disease-causing/associated mutations, disease-associated/functional polymorphisms. Mutation type/number/effect, associated disease phenotype, chromosomal location, gene symbol, reference of its first literature report mutation information for Homo sapiens, Mus musculus and Bos taurus: SNPs, short deletion and insertion polymorphisms ... For each SNP: experimentally derived functional and structural impact, predicted functional effect, associated disease, average heterozygosity, experimental evidence of the nsSNP, evolutionary conservation of wildtype and mutant amino acid, link-outs to external databases Gene entries - information about mutations from GWAs, allelic variants (only selected examples), disease entries, relationship between genotype and phenotype (diseases)
Source of information Literature, reported functional studies ... ... Various databases (SwissProt, PMD, dbSNP, 1000 Genomes), SNP effect is predicted with SIFT and SNAP Full-text, referenced overviews
Last update Professional version: March 2013, free public version: from professional release 2008 April 25, 2013 ... February 20, 2012 daily updated
GBA mutations Total: 335, of them missense/nonsense SNPs: 256 Total: 432, of them silent SNPs: 59 5 SNPs Total: 147 SNPs (46 no-effect 101 effect) At least 48 deleterious SNPs (in "allelic variants" table)

Mutation map

We extracted information about the point mutations in our protein GBA causing Gaucher disease from three different databases described above: HGMD, dbSNP and SNPdbe. For dbSNP, the contig NT_004487.19 was used. We mapped the mutations to the location on the reference protein sequence (NP_001005741.1, identical to the UniProt entry P04062) and combined the information about identical mutations. Three different types of mutations were extracted:

  • missense, which lead to amino acid exchange (from all databases)
  • nonsense, which lead to a premature termination of translation and end of the protein (only from HGMD)
  • synonymous, leading to a different codon, but encoding the same amino acid (only from dbSNP)

As the next step, the mutations were classified into

  • pathogenic (i.e. Gaucher disease causing) and
  • not pathogenic (non-disease causing).

All mutations from HGMD are pathogenic. In dbSNP, no such annotation is found. In SNPdbe we decided whether a mutation is pathogenic or not according to the conservation scores, as described above. If a mutation had at least one annotation that it causes Gaucher disease, we classified it as pathogenic, otherwise as not pathogenic.

Altogether 366 point mutations were collected our protein sequence, from them 246 disease causing and 120 non-disease causing. Finally, we plotted the mutation according to the classes and the types using R.

As can be seen in the map, there are some mutation hot-spots, i.e. mutations concentrated in some area of the sequence. For example, around the positions 150 and 200-250 there are mutation hot-spots. The longest hot-spot lies in the area between the positions 400 and 450.


Adrienne Kitts and Stephen Sherry. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. Chapter 5 in: The NCBI Handbook. Created: October 9, 2002; Last Update: February 2, 2011

Christian Schaefer, Alice Meier, Burkhard Rost, Yana Bromberg (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics 28(4):601-602.

OMIM FAQs report about GBA