Task 5 - Mapping SNPs Canavan
Contents
First impression
Protocol
Further information can be found in the protocol.
Sources for mutations
HGMD
The aim of ther Human Gene Mutation Database (HGMD) ist to collect known gene variants that are associated with human inherited diseases. All listed mutations are based on published results, that stand as a validation for the listed mutations. The database is manually curated and relies on the information given by authors in their work. It is stated on the HGMD website that "Many published mutation searches identify more than one genetic change in a single patient. In such cases, the relationship between a given lesion and the clinical phenotype has not always been immediately clear,[..]. The possibility of unintentional inclusion of some lesions with little or no pathological significance can therefore not be ruled out." Listed mutations therefore must not be causal for any disease |
For Canavan Disease there are 74 mutations listed (79 in the 2012 professional version):
|
SNPdbe
The nsSNP database of functional effects (SNPdbe) collects information on SNPs from a variety of accessible webservices and additionally aims at providing a functional annotation. For reach variant, SNPdbe lists its predicted effect (SIFT, Snap) and also experimentally derived functional effects.
For Aspartoacylase (P45381 and NP_000040), there are 55 results in total:
- 29 labelled as involved in Canavan Disease
- 12 with experimental evidence
dbSNP
The Single Nucleotide Polymorphism database (dbSNP) aims at being a "central repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms". It also lists synonymous variants, that are not listed in any other web resources.
For the gene "aspa" there are 508 SNPs annotated in the whole gene region (including 5' and 3' flanking regions):
- 458 for NP_000040.1 (coding SNPs: 35)
- 493 for NP_001121557.1 (coding SNPs: 35)
- "synonymous-codon"[Function_Class] AND ASPA[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS] yields only 9 results
[ http://www.snpedia.com/index.php/SNPedia SNPedia]
SNPedia comes up in wiki style and collects information on human genetics from other web resouces, citing peer-reviewed scientific publications. One can browse for genes, genomes, phenotypes and even medical substances.
For the gene "ASPA" it lists 9 variants out of which three are associated with Canavan disease.
OMIM
OMIM stands for Online Mendelian Inheritance in Man and represents a "compendium of human genes and genetic phenotypes". Vast information are given on human mendelian diseases and associated genes that are all based on referenced publications. The database is updated daily and also lists links to other genetic resources.
For Aspartoacylase there are 12 allelic variants listed, which are all linked to Canavan Disease.
Coding SNPs
<figure id="venny">
</figure>
From the above mentioned SNP databases we extracted all coding SNPs for Aspartoacylase. The venn diagram in <xr id="venny"/> shows the overlap of the contained SNPs in the three biggest database sources.
We used the sequence position of the listed SNPs as an identifier to create a unique list of known SNPs. <xr id="coding_snp_table"/> shows the resulting list of unique SNPs. For each polymorphism, the source is given (i.e., from which DB the SNP was extracted), as well as annotated validation, if available.
The resulting table contains 79 SNPs in total, out of which 48 are reported to cause the Canavan Disease.
<figtable id="coding_snp_table">
Residue Position | Identifier | Reference DB | Validation evidence | SNP Type | Mutation |
4 | rs142041344 | dbSNP SNPdbe |
1000Genomes | missense | C4R |
14 | CM063852 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | V14G |
16 | CM960084 | SNPdbe HGMD | Kaul (1996) Am J Hum Genet 59 | missense | I16T |
18 | CM067343 | HGMD | Zeng (2006) Adv Exp Med Biol 576 | missense | G18R |
21 | CM001608 | SNPdbe HGMD | SHGMD857217istermans (2000) Eur J Hum Genet 8 | missense | H21P |
24 | rs104894551 CM023602 | dbSNP SNPdbe OMIM HGMD | Multiple independent submissions to the refSNP cluster Zeng (2002) J Inherit Metab Dis 25 | missense | E24G |
26 | rs145616193 | dbSNP | synonymous | T26T | |
27 | CM960085 | SNPdbe HGMD | Kaul (1996) Am J Hum Genet 59 | missense | G27R |
33 | rs138158568 | dbSNP SNPdbe | missense | H33R | |
53 | rs17850703 | dbSNP SNPdbe | missense | T53A | |
57 | CM001609 | SNPdbe HGMD | Sistermans (2000) Eur J Hum Genet 8 | missense | A57T |
68 | CM023603 | SNPdbe HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | D68A |
71 | rs104894553 CM060201 | dbSNP SNPdbe HGMD | multiple independent submissions to the refSNP cluster Janson (2006) Ann Neurol 59 | missense | R71H |
71_2 | SNPdbe | missense | R71K | ||
82 | rs80099330 | dbSNP SNPdbe | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000 Genome project | missense | M82T |
93 | rs144639820 | dbSNP | Validated by frequency or genotype data | synonymous | A93A |
109 | CM990192 | HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | nonsense | Y109Ter |
111 | rs181347986 | dbSNP SNPdbe | 1000 Genome project | missense | I111V |
114 | CM023014 | SNPdbe HGMD | Olsen (2002) J Med Genet 39 | missense | D114Y |
114_2 | CM960086 | HGMD | Kaul (1996) Am J Hum Genet 59 | missense | D114E |
121 | rs148451498 | dbSNP SNPdbe | missense | N121D | |
121_2 | CM063846 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | N121I |
123 | CM960087 | SNPdbe HGMD | Kaul (1996) Am J Hum Genet 59 | missense | G123E |
143 | rs199565861 | dbSNP SNPdbe | 1000 Genome project | missense | I143V |
143_2 | CM063849 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | I143F |
143_3 | CM980125 | HGMD | Kobayashi (1998) Hum Mutat S1, S308 | missense | I143T |
152 | rs104894548 CM950102 | dbSNP SNPdbe OMIM HGMD | Multiple independent submissions to the refSNP cluster Kaul (1995) Hum Mutat 5 | missense | C152R |
152_2 | CM023604 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | C152W |
152_3 | CM960088 | HGMD | Kaul (1996) Am J Hum Genet 59, 95 | missense | C152Y |
153 | rs141755746 | dbSNP | synonymous | Y153Y | |
154 | rs147193431 rs2228435 | dbSNP SNPdbe | missense | V154I | |
157 | rs140357187 | dbSNP SNPdbe | missense | I157T | |
164 | SNPdbe | missense | Y164F | ||
166 | CM063847 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | T166I |
168 | CM001610 | SNPdbe HGMD | Sistermans (2000) Eur J Hum Genet 8 | missense | R168H |
168_2 | CM960089 | HGMD | Kaul (1996) Am J Hum Genet 59 | missense | R168C |
170 | rs144321760 | dbSNP SNPdbe | Validated by frequency or genotype data by freq | missense | I170T |
178 | SNPdbe | missense | E178A | ||
181_2 | CM063850 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | P181L |
181 | CM001611 | SNPdbe HGMD | Sistermans (2000) Eur J Hum Genet 8 | missense | P181T |
183 | CM990193 | SNPdbe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | P183H |
184 | CM023605 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | nonsense | Q184Ter |
186 | CM990194 | SNPdbe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | V186F |
195 | CM990195 | SNPdbe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | M195R |
202 | rs147763700 | dbSNP SNPdbe | missense | A202S | |
213 | CM055097 | HGMD | Tacke (2005) Neuropediatrics 36 | missense | K213E |
214 | CM023606 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | nonsense | E214Ter |
218 | rs104894549 CM950103 | dbSNP HGMD | Multiple independent submissions to the refSNP cluster Shaag (1995) Am J Hum Genet 57 | nonsense | C218Ter |
220 | rs139053885 | dbSNP SNPdbe | missense | I220T | |
226_2 | CM086530 | HGMD | Di Pietro (2008) Clin Biochem 41 | missense | I226T |
226 | rs201887670 | dbSNP | missense | I226K | |
231 | rs104894550 CM994594 | dbSNP SNPdbe OMIM HGMD | Multiple independent submissions to the refSNP cluster Rady (1999) Am J Med Genet 87 | missense | Y231C |
231_2 | CM940123 | HGMD | Kaul (1994) Am J Hum Genet 55 | nonsense | Y231Ter |
235 | rs149842031 | dbSNP SNPdbe | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000 Genome project | missense | E235K |
236 | rs149189911 | dbSNP | synonymous | N236N | |
239 | rs145085349 | dbSNP SNPdbe | missense | I239T | |
244 | CM023607 | SNPdbe HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | H244R |
244_2 | CM063848 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | H244L |
249 | rs104894552 CM023015 | dbSNP SNPdbe HGMD | Multiple independent submissions to the refSNP cluster Olsen (2002) J Med Genet 39 | missense | D249V |
270 | rs200126822 | dbSNP SNPdbe | 1000 Genome project | missense | I270T |
272 | CM063851 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | L272P |
274 | CM950104 | SNPdbe HGMD | Shaag (1995) Am J Hum Genet 57 | missense | G274R |
277 | rs78677072 | dbSNP | Multiple independent submissions to the refSNP cluster 1000 Genome project | synonymous | T277T |
278 | rs140581464 | dbSNP SNPdbe | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000Genome project | missense | V278M |
279 | rs145717248 | dbSNP SNPdbe | missense | Y279H | |
280 | rs148081446 | dbSNP | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000 Genome project | synonymous | P280P |
280_2 | CM990197 | HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | P280S |
280_3 | VAR_039088 | SNPdbe | Elpeleg (1999) J Inherit Metab Dis 22 | missense | P280L |
281 | rs141858640 | dbSNP SNPdbe | missense | V281M | |
285 | rs28940279 CM930046 | dbSNP SNPdbe OMIM HGMD | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data Kaul (1993) Nat Genet 5 | missense | E285A |
285_2 | SNPdbe | missense | E285D | ||
286 | rs138062143 | dbSNP | synonymous | A286A | |
287 | CM990198 | SNPdbe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | A287T |
288 | SNPdbe | missense | Y288F | ||
288_2 | CM034717 | HGMD | Surendran (2003) Mol Genet Metab 80 | missense | Y288C |
295 | CM950105 | SNPdbe HGMD | Shaag (1995) Am J Hum Genet 57 | missense | F295S |
305 | rs28940574 CM940124 | dbSNP SNPdbe OMIM HGMD | Multiple independent submissions to the refSNP cluster Kaul (1994) Am J Hum Genet 55 | missense | A305E |
310 | SNPdbe | missense | C310G | ||
314 | CM023608 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | Ter314W |
</figtable>
SNP Visualization
The coding SNPs are written below the reference sequence. An 'X' denotes a nonsense mutation coding a stop codon
MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKP 50 R G T R P G TR R FITNPRAVKKCTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLF 100 A T A H T A K GPKDSEDSYDIIFDLHNTTSNMGCTLILEDSRNNFLIQMFHYIKTSLAPL 150 X V Y D E V E I F T PCYVYLIEHPSLKYATTRSIAKYPVGIEVGPQPQGVLRADILDQMRKMIK 200 RYI T F I H T A T HX F R W C L Y HALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIAAIIHPNLQDQ 250 S EX X T K C KN T R V T X L DWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK 300 T P R TMHPM AATF S S D C LTLNAKSIRCCLH- 314 E G W
Hotspots
Visual inspection
For the visual inspection, we looked at two kinds of reported SNPs: first, those reported by the HGMD, so they are reported to cause the Canavan Disease. Second, SNPs from dbSNP and xx that are not reported to cause the Canavan Disease (however, they are also not reported to NOT cause it).
For the HGMD data, SNPs were pretty much scattered all over the structure. It was surprising for us that even many SNPs on the surface of the protein, not even close to the dimer interaction site, were also reported to cause the Canavan Disease. See <xr id="hgmd_all" /> and <xr id ="hgmd_surface"/>.
<figtable id="HGMD">
<figure id="hgmd_all"></figure> | <figure id="hgmd_surface"></figure> |
</figtable>
For SNPs NOT reported to cause the Canavan Disease, SNPs again were scattered all over the structure. We had expected to visually be able to see some concentration of mutations in certain - apparently functionally unimportant or structurally flexible - regions, but could not validate this expectation with the given data, at least not visually. See <xr id="others_all" /> and <xr id ="others_surface"/>.
<figtable id="others">
<figure id="others_all"></figure> | <figure id="others_surface"></figure> |
</figtable>
Frequency Distribution
Since visual inspection did not bring further enlightenment, we had a look at the frequency distribution of the disease causing and non-disease causing SNPs. These can be found in <xr id = "can"/> and <xr id="no_can"/>. Neither for the disease causing, nor for the non-disease causing SNPs, we were able to identify distinct hotspot-regions. The only exception might be the last block of mutations in <xr id = "no_can"/> close to the end of the sequence. It is curious, though, that this sequence part should be evolutionary flexible, since it includes residue 288, which is part of the binding site (but keep in mind that maybe this mutation has just not been annotated to cause the Canavan Disease. It might do it, still).
<figure id="can"></figure> |
<figure id="no_can"></figure> |