Gaucher Disease: Task 07 - Research SNPs
<css>
table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }
.colBasic2 th,td { padding: 3px; border: 1px solid black; }
.colBasic2 td { text-align:left; }
/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff
maria's style blue: #adceff grey: #efefef
- /
.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}
</css>
Contents
HGMD
HGMD professional was last updated in March 2013. The free public HGMD version provides data from the release of HGMD professional in 2008. The database contains the first example of disease causing or disease associated mutations as well as disease-associated/functional polymorphisms. The informations are taken from literature or reported functional studies. Only non-silent mutations in the coding region are considered. Each mutation is stored once in the database. Additional to the information shown in the second table below, the information about a mutation includes its associated disease phenotype, chromosomal location, gene symbol and the reference of its first literature report. HGMD professional also provides a mutation viewer. The database has access to more than 250 Journals, that are searched by a combination of computerized and manual procedures to find new detected mutations in published articles about germline mutations causing human genetic diseases.
The Glucocerebrosidase has the gene symbol GBA and the accession number NM_001005741.2 . A research on The Human Gene Mutation Database for GBA leads to the results in <xr id="hgmd"/>.
<figtable id="hgmd">
Table 1: GBA in HGMD | ||
---|---|---|
Mutation Type | Number of Mutations | Effect of Mutation |
Missense/nonsense | 256 | substitutions of a single base-pair in coding regions, that cause a amino acid or stop/start codon change |
Splicing | 16 | mutations that influence the mRNA splicing |
Regulatory | 0 | different Regulation caused by substitution |
Small deletions | 26 | micro-deletions (< 21 bp) |
Small insertions | 13 | micro-insertions (<21 bp) |
Small indels | 4 | micro-indels (<21 bp) |
Gross deletions | 3 | deletion (>20 bp) |
Gross insertions/duplications | 1 | insertion (>20 bp) |
Complex rearrangements | 16 | rearangement of DNA fragments within the sequence |
Repeat variations | 0 | different number of repeats |
Total | 335 |
</figtable>
GBA is not affected by a regulatory mutations and repeated variations. The mutations on GBA do not only result to Gaucher's disease or influence its phenotype, but may also have an effect on the phenotype of the diseases Parkinson and Alzheimer. HGMD public (2008 update) provides 335 mutations for GBA (table 1). On HGMD Professional 2013.1, 380 mutations can be found.
The following table of missense/nonsense mutations on GBA shows an abridgment of 10 mutations out of all 256 known missense/nonsense mutation (<xr id="hgmd"/>)of the public version of HGMD. The high number of mutations shows the high liability of GBA on Gaucher causing mutations. Some position specific amino acids have a higher liability to causing a the Gaucher's disease, as more than one missense/nonsense mutation occur on that position (for example on sequence position 54 in <xr id="miss"/>).
<figtable id="miss">
Table 2: Missense/nonsense mutations on GBA | |||
---|---|---|---|
Accession Number | Codon change | Amino acid change | Codon Number |
CM081634 | cGGC-AGC | Gly-Ser | 49 |
CM057078 | AGC-ATC | Ser-Ile | 51 |
CM044630 | gGTG-ATG | Val-Met | 54 |
CM960691 | gGTG-CTG | Val-Leu | 54 |
HM971738 | TGT-TCT | Cys-Ser | 55 |
CM081630 | AGT-AAT | Ser-Asn | 81 |
CM950560 | ACA-ATA | Thr-Ile | 82 |
CM960692 | GGG-GAG | Gly-Glu | 85 |
CM016030 | gCGA-TGA | Arg-Term | 86 |
CM950561 | aCGG-TGG | Arg-Trp | 87 |
</figtable>
dbSNP
The available data in dbSNP (version 138) was released on April 25, 2013. It provides mutation information for the three organisms, Homo sapiens, Mus musculus and Bos taurus. The database contains single base nucleotide substitutions as well as short deletion and insertion polymorphisms. In the database 432 mutation were found for Homo sapiens on the gene GBA. 59 of them are silent mutations. Some chosen silent mutations are listed in <xr id="dbsnp"/>.
<figtable id="dbsnp">
Silent mutations on GBA | ||||
---|---|---|---|---|
mRNA | Protein | refNumber | ||
Allele change | Sequence position | Residue | Codon position | |
TCT ⇒ TCC | 1937 | Ser | 504 | rs141710041 |
CCC ⇒ CCT | 646, 872, 1036 | Pro | 161, 123, 210 | rs201615998 |
TTG ⇒ CTG | 245 | Leu | 28 | rs201330214 |
</figtable>
"SNP Geneview Report" of the "gene" search of the GBA human gene: [[1]].
SNPedia
5 SNPs were found
SNPdbe
- The following information (if available) is given on each SAAS (single amino acid substitution):
- Experimentally derived functional and structural impact
- Predicted functional effect
- Associated disease
- Average heterozygosity
- Experimental evidence of the nsSNP
- Evolutionary conservation of wildtype and mutant amino acid
- Link-outs to external databases
- Last update: 2012-02-20 (updated to recent Swiss-Prot release (2012-01))
- The information comes from various databases (SwissProt, PMD, dbSNP, 1000 Genomes) and SNP effect is predicted with SIFT and SNAP
- Currently (2013-06-23) 159142 protein sequences from 2985 organisms are covered in SNPdbe and 1691464 SAASs are referenced, consisting of natural variants, SAASs from mutagenesis experiments and sequencing conflicts
Table 4: Human variants | ||
---|---|---|
Effect | Number | Percent |
Observed functional effect | 23121 | 2% |
Disease associations | 26842 | 3% |
Observed functional effect and disease | 1629 | 0.17% |
Overall | 967879 | 100% |
Search for Gaucher disease delivered 174 entries, 147 from them from the human gene GBA and protein P04062. Note: as some of the mutation occur at the same position, there are only 121 distinct mutated positions from the subset of 147.
Most SNPs are not validated:
Table 5: Experimental evidence | |
---|---|
Number of mutations | Evidence type |
120 | Not validated |
20 | by cluster |
3 | by cluster,freq |
2 | 1000Genome,freq,cluster |
1 | by freq |
1 | HapMap,freq,cluster |
Conservation score: Likelihood of observing either the wt (wildtype; green bar) or mt (mutant; red bar) at given position in the sequence. The longer the bar the higher the likelihood. The conservation score can be used as a first (simple) estimate of effect (disease causing) or no effect. There are three types of conservation scores: Pssm, Perc and Psic. There is a direct correlation between the length of the bars and Perc scores, however it is difficult to find a threshold to discriminate effect from no-effect. On the contrary, Pssm mt scores below 0 combined with Pssm wt scores above 0 imply very low occurrence of the mutant in comparison to the wild type protein, which implies that the mutation is deleterious (causes a negative effect). With this method we received 46 no-effect 101 effect mutations.
OMIM
- daily updated
- 21 844 entries (updated 11 June 2013)
- gene entries → allelic variants (only selected mutations)
- disease entries
- relationship between genotype and phenotype (diseases)
For the GBA gene 48 mutations are found in OMIM (41 of them are referenced to dbSNP), which is a small number in comparison to the other databases. OMIM table view of GBA allelic variants.
SNP Databases Summary
under construction
Table 6: SNP Databases | |||||
---|---|---|---|---|---|
Information | Database | ||||
HGMD | dbSNP | SNPedia | SNPdbe | OMIM | |
Type of information | Only non-silent mutations in the coding region: disease-causing/associated mutations, disease-associated/functional polymorphisms. Mutation type/number/effect, associated disease phenotype, chromosomal location, gene symbol, reference of its first literature report | mutation information for Homo sapiens, Mus musculus and Bos taurus: SNPs, short deletion and insertion polymorphisms | ... | For each SNP: experimentally derived functional and structural impact, predicted functional effect, associated disease, average heterozygosity, experimental evidence of the nsSNP, evolutionary conservation of wildtype and mutant amino acid, link-outs to external databases | Gene entries - information about mutations from GWAs, allelic variants (only selected examples), disease entries, relationship between genotype and phenotype (diseases) |
Source of information | Literature, reported functional studies | ... | ... | Various databases (SwissProt, PMD, dbSNP, 1000 Genomes), SNP effect is predicted with SIFT and SNAP | Full-text, referenced overviews |
Last update | Professional version: March 2013, free public version: from professional release 2008 | April 25, 2013 | ... | February 20, 2012 | daily updated |
GBA mutations | Total: 335, of them missense/nonsense SNPs: 256 | Total: 432, of them silent SNPs: 59 | 5 SNPs | Total: 147 SNPs (46 no-effect 101 effect) | At least 48 deleterious SNPs (in "allelic variants" table) |
Mutation map
We extracted information about the point mutations in our protein GBA causing Gaucher disease from three different databases described above: HGMD, dbSNP and SNPdbe. For dbSNP, the contig NT_004487.19 was used. We mapped the mutations to the location on the reference protein sequence (NP_001005741.1, identical to the UniProt entry P04062) and combined the information about identical mutations. Three different types of mutations were extracted:
- missense, which lead to amino acid exchange (from all databases)
- nonsense, which lead to a premature termination of translation and end of the protein (only from HGMD)
- synonymous, leading to a different codon, but encoding the same amino acid (only from dbSNP)
As the next step, the mutations were classified into
- pathogenic (i.e. Gaucher disease causing) and
- not pathogenic (non-disease causing).
All mutations from HGMD are pathogenic. In dbSNP, no such annotation is found. In SNPdbe we decided whether a mutation is pathogenic or not according to the conservation scores, as described above. If a mutation had at least one annotation that it causes Gaucher disease, we classified it as pathogenic, otherwise as not pathogenic.
Altogether 366 point mutations were collected our protein sequence, from them 246 disease causing and 120 non-disease causing. Finally, we plotted the mutation according to the classes and the types using R (see Figure ...). Are there mutation hot-spots?