Task 5 - Mapping SNPs Canavan
Contents
First impression
Protocol
Further information can be found in the protocol.
First impression
HGMD
74 (79 in 2012 professional) total for cDNA sequence NM_000049.2 and amino acid sequence NP_000040.1, out of which
- 47 missense/nonsense
- 5 splicing
- 12 small deletions
- 2 small insertions
- 1 indel
- 7 gross deletions
SNPdbe
- 55 total, includes predicted functional effect. 29 of these 55 labelled as involved in Canavan Disease.
dbSNP
- same identifiers for sequence as HGMD
- "synonymous-codon"[Function_Class] AND ASPA[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS] yields only 9 results
- 505 results for SNPs in general in human
- 458 for NP_000040.1 (coding: 23)
- 493 for NP_001121557.1 (coding: 23)
SNPedia
links to other pages
OMIM
- 12 allelic variants listed for Aspartoacylase => all linkes to Canavan Disease
Coding SNPs
From the above mentioned SNP databases we extracted all coding SNPs for Aspartoacylase. We used the sequence position of the listed SNPs as identifier to create a unique list of known SNPs. <xr id="coding_snp_table"/> shows the resulting list of unique SNPS. For each polymorphism, the source is given (from which DB the SNP was extracted), as well as annotated validation.
<figtable id="coding_snp_table">
Residue Position | Identifier | Reference DB | Validation evidence | SNP Type | Mutation |
4 | rs142041344 | dbSNP SNPDBe |
1000Genomes | missense | C4R |
14 | CM063852 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | V14G |
16 | CM960084 | SNPDBe HGMD | Kaul (1996) Am J Hum Genet 59 | missense | I16T |
18 | CM067343 | HGMD | Zeng (2006) Adv Exp Med Biol 576 | missense | G18R |
21 | CM001608 | SNPDBe HGMD | SHGMD857217istermans (2000) Eur J Hum Genet 8 | missense | H21P |
24 | rs104894551 CM023602 | dbSNP SNPDBe OMIM HGMD | Multiple independent submissions to the refSNP cluster Zeng (2002) J Inherit Metab Dis 25 | missense | E24G |
26 | rs145616193 | dbSNP | synonymous | T26T | |
27 | CM960085 | SNPDBe HGMD | Kaul (1996) Am J Hum Genet 59 | missense | G27R |
33 | rs138158568 | dbSNP SNPDBe | missense | H33R | |
53 | rs17850703 | dbSNP SNPDBe | missense | T53A | |
57 | CM001609 | SNPDBe HGMD | Sistermans (2000) Eur J Hum Genet 8 | missense | A57T |
68 | CM023603 | SNPDBe HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | D68A |
71 | rs104894553 CM060201 | dbSNP SNPDBe HGMD | multiple independent submissions to the refSNP cluster Janson (2006) Ann Neurol 59 | missense | R71H |
71_2 | SNPDBe | missense | R71K | ||
82 | rs80099330 | dbSNP SNPDBe | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000 Genome project | missense | M82T |
93 | rs144639820 | dbSNP | Validated by frequency or genotype data | synonymous | A93A |
109 | CM990192 | HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | nonsense | Y109Ter |
111 | rs181347986 | dbSNP SNPDBe | 1000 Genome project | missense | I111V |
114 | CM023014 | SNPDBe HGMD | Olsen (2002) J Med Genet 39 | missense | D114Y |
114_2 | CM960086 | HGMD | Kaul (1996) Am J Hum Genet 59 | missense | D114E |
121 | rs148451498 | dbSNP SNPDBe | missense | N121D | |
121_2 | CM063846 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | N121I |
123 | CM960087 | SNPDBe HGMD | Kaul (1996) Am J Hum Genet 59 | missense | G123E |
143 | rs199565861 | dbSNP SNPDBe | 1000 Genome project | missense | I143V |
143_2 | CM063849 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | I143F |
143_3 | CM980125 | HGMD | Kobayashi (1998) Hum Mutat S1, S308 | missense | I143T |
152 | rs104894548 CM950102 | dbSNP SNPDBe OMIM HGMD | Multiple independent submissions to the refSNP cluster Kaul (1995) Hum Mutat 5 | missense | C152R |
152_2 | CM023604 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | C152W |
152_3 | CM960088 | HGMD | Kaul (1996) Am J Hum Genet 59, 95 | missense | C152Y |
153 | rs141755746 | dbSNP | synonymous | Y153Y | |
154 | rs147193431 rs2228435 | dbSNP SNPDBe | missense | V154I | |
157 | rs140357187 | dbSNP SNPDBe | missense | I157T | |
164 | SNPDBe | missense | Y164F | ||
166 | CM063847 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | T166I |
168 | CM001610 | SNPDBe HGMD | Sistermans (2000) Eur J Hum Genet 8 | missense | R168H |
168_2 | CM960089 | HGMD | Kaul (1996) Am J Hum Genet 59 | missense | R168C |
170 | rs144321760 | dbSNP SNPDBe | Validated by frequency or genotype data by freq | missense | I170T |
178 | SNPDBe | missense | E178A | ||
181_2 | CM063850 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | P181L |
181 | CM001611 | SNPDBe HGMD | Sistermans (2000) Eur J Hum Genet 8 | missense | P181T |
183 | CM990193 | SNPDBe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | P183H |
184 | CM023605 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | nonsense | Q184Ter |
186 | CM990194 | SNPDBe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | V186F |
195 | CM990195 | SNPDBe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | M195R |
202 | rs147763700 | dbSNP SNPDBe | missense | A202S | |
213 | CM055097 | HGMD | Tacke (2005) Neuropediatrics 36 | missense | K213E |
214 | CM023606 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | nonsense | E214Ter |
218 | rs104894549 CM950103 | dbSNP HGMD | Multiple independent submissions to the refSNP cluster Shaag (1995) Am J Hum Genet 57 | nonsense | C218Ter |
220 | rs139053885 | dbSNP SNPDBe | missense | I220T | |
226_2 | CM086530 | HGMD | Di Pietro (2008) Clin Biochem 41 | missense | I226T |
226 | rs201887670 | dbSNP | missense | I226K | |
231 | rs104894550 CM994594 | dbSNP SNPDBe OMIM HGMD | Multiple independent submissions to the refSNP cluster Rady (1999) Am J Med Genet 87 | missense | Y231C |
231_2 | CM940123 | HGMD | Kaul (1994) Am J Hum Genet 55 | nonsense | Y231Ter |
235 | rs149842031 | dbSNP SNPDBe | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000 Genome project | missense | E235K |
236 | rs149189911 | dbSNP | synonymous | N236N | |
239 | rs145085349 | dbSNP SNPDBe | missense | I239T | |
244 | CM023607 | SNPDBe HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | H244R |
244_2 | CM063848 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | H244L |
249 | rs104894552 CM023015 | dbSNP SNPDBe HGMD | Multiple independent submissions to the refSNP cluster Olsen (2002) J Med Genet 39 | missense | D249V |
270 | rs200126822 | dbSNP SNPDBe | 1000 Genome project | missense | I270T |
272 | CM063851 | HGMD | Zeng (2006) Mol Genet Metab 89 | missense | L272P |
274 | CM950104 | SNPDBe HGMD | Shaag (1995) Am J Hum Genet 57 | missense | G274R |
277 | rs78677072 | dbSNP | Multiple independent submissions to the refSNP cluster 1000 Genome project | synonymous | T277T |
278 | rs140581464 | dbSNP SNPDBe | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000Genome project | missense | V278M |
279 | rs145717248 | dbSNP SNPDBe | missense | Y279H | |
280 | rs148081446 | dbSNP | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data 1000 Genome project | synonymous | P280P |
280_2 | CM990197 | HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | P280S |
281 | rs141858640 | dbSNP SNPDBe | missense | V281M | |
285 | rs28940279 CM930046 | dbSNP SNPDBe OMIM HGMD | Multiple independent submissions to the refSNP cluster Validated by frequency or genotype data Kaul (1993) Nat Genet 5 | missense | E285A |
285_2 | SNPDBe | missense | E285D | ||
286 | rs138062143 | dbSNP | synonymous | A286A | |
287 | CM990198 | SNPDBe HGMD | Elpeleg (1999) J Inherit Metab Dis 22 | missense | A287T |
288 | SNPDBe | missense | Y288F | ||
288_2 | CM034717 | HGMD | Surendran (2003) Mol Genet Metab 80 | missense | Y288C |
295 | CM950105 | SNPDBe HGMD | Shaag (1995) Am J Hum Genet 57 | missense | F295S |
305 | rs28940574 CM940124 | dbSNP SNPDBe OMIM HGMD | Multiple independent submissions to the refSNP cluster Kaul (1994) Am J Hum Genet 55 | missense | A305E |
310 | SNPDBe | missense | C310G | ||
314 | CM023608 | HGMD | Zeng (2002) J Inherit Metab Dis 25 | missense | Ter314W |
</figtable>
SNP Visualization
The coding SNPs are written below the reference sequence. An 'X'denotes a nonsense mutation coding a stop codon
MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKP 50 R G T R P G TR R FITNPRAVKKCTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLF 100 A T A H T A K GPKDSEDSYDIIFDLHNTTSNMGCTLILEDSRNNFLIQMFHYIKTSLAPL 150 X V Y D E V E I F T PCYVYLIEHPSLKYATTRSIAKYPVGIEVGPQPQGVLRADILDQMRKMIK 200 RYI T F I H T A T HX F R W C L Y HALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIAAIIHPNLQDQ 250 S EX X T K C KN T R V T X L DWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK 300 T P R TMHPM AATF S S D C LTLNAKSIRCCLH- 314 E G W
Hotspots
Visual inspection
For the visual inspection, we looked at two kinds of reported SNPs: first, those reported by the HGMD, so they are reported to cause the Canavan Disease. Second, SNPs from dbSNP and xx that are not reported to cause the Canavan Disease (however, they are also not reported to NOT cause it).
For the HGMD data, SNPs were pretty much scattered all over the structure. It was surprising for us that even many SNPs on the surface of the protein, not even close to the dimer interaction site, were also reported to cause the Canavan Disease. See <xr id="hgmd_all" /> and <xr id ="hgmd_surface"/>.
<figtable id="HGMD">
<figure id="hgmd_all"></figure> | <figure id="hgmd_surface"></figure> |
</figtable>
For SNPs NOT reported to cause the Canavan Disease, SNPs again were scattered all over the structure. We had expected to visually be able to see some concentration of mutations in certain - apparently functionally unimportant or structurally flexible - regions, but could not validate this expectation with the given data, at least not visually. See <xr id="others_all" /> and <xr id ="others_surface"/>.
<figtable id="others">
<figure id="others_all"></figure> | <figure id="others_surface"></figure> |
</figtable>
Frequency Distribution
Since visual inspection did not bring further enlightenment, we had a look at the frequency distribution of the disease causing and non-disease causing SNPs. These can be found in <xr id = "can"/> and <xr id="no_can"/>. Neither for the disease causing, nor for the non-disease causing SNPs, we were able to identify remarkable hotspot-regions (except for das ende da, which is curious since it involves residue 288)
<figtable id="freq">
<figure id="can"></figure> | <figure id="no_can"></figure> |
</figtable>