Task 5 - Mapping SNPs Canavan

From Bioinformatikpedia
Revision as of 13:47, 30 August 2012 by Vorbergs (talk | contribs) (dbSNP)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

First impression

Protocol

Further information can be found in the protocol.


Sources for mutations

HGMD

The aim of ther Human Gene Mutation Database (HGMD) ist to collect known gene variants that are associated with human inherited diseases. All listed mutations are based on published results, that stand as a validation for the listed mutations. The database is manually curated and relies on the information given by authors in their work. It is stated on the HGMD website that "Many published mutation searches identify more than one genetic change in a single patient. In such cases, the relationship between a given lesion and the clinical phenotype has not always been immediately clear,[..]. The possibility of unintentional inclusion of some lesions with little or no pathological significance can therefore not be ruled out." Listed mutations therefore must not be causal for any disease

For Canavan Disease there are 74 mutations listed (79 in the 2012 professional version):

link to search

Type Description Amount
missense/nonsense base-pair substitution that results in a triplet change 47
splicing mutations that alter splicing of the gene 5
small deletions Micro-deletions (20 bp or less) 12
small insertions Micro-insertions (20 bp or less) 2
small indels Micro-indels (20 bp or less) 1
gross deletions Deletions of >20 bp 7


SNPdbe

The nsSNP database of functional effects (SNPdbe) collects information on SNPs from a variety of accessible webservices and additionally aims at providing a functional annotation. For reach variant, SNPdbe lists its predicted effect (SIFT, Snap) and also experimentally derived functional effects.

For Aspartoacylase (P45381 and NP_000040), there are 55 results in total:

  • 29 labelled as involved in Canavan Disease
  • 12 with experimental evidence

link to search

dbSNP

The Single Nucleotide Polymorphism database (dbSNP) aims at being a "central repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms". It also lists synonymous variants, that are not listed in any other web resources.

For the gene "aspa" there are 508 SNPs annotated in the whole gene region (including 5' and 3' flanking regions):

  • 458 for NP_000040.1 (coding SNPs: 35)
  • 493 for NP_001121557.1 (coding SNPs: 35)
  • "synonymous-codon"[Function_Class] AND ASPA[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS] yields only 9 results

link to search

SNPedia

SNPedia comes up in wiki style and collects information on human genetics from other web resouces, citing peer-reviewed scientific publications. One can browse for genes, genomes, phenotypes and even medical substances.

For the gene "ASPA" it lists 9 variants out of which three are associated with Canavan disease.

links to search

OMIM

OMIM stands for Online Mendelian Inheritance in Man and represents a "compendium of human genes and genetic phenotypes". Vast information are given on human mendelian diseases and associated genes that are all based on referenced publications. The database is updated daily and also lists links to other genetic resources.

For Aspartoacylase there are 12 allelic variants listed, which are all linked to Canavan Disease.

link to ASPA variants

Coding SNPs

<figure id="venny">

<xr nolink id="venny"/>
Overlap of contained SNPs between the databases.

</figure>

From the above mentioned SNP databases we extracted all coding SNPs for Aspartoacylase. The venn diagram in <xr id="venny"/> shows the overlap of the contained SNPs in the three biggest database sources.

We used the sequence position of the listed SNPs as an identifier to create a unique list of known SNPs. <xr id="coding_snp_table"/> shows the resulting list of unique SNPs. For each polymorphism, the source is given (i.e., from which DB the SNP was extracted), as well as annotated validation, if available.

The resulting table contains 79 SNPs in total, out of which 48 are reported to cause the Canavan Disease.


<figtable id="coding_snp_table">









<xr nolink id="coding_snp_table"/> In this table all SNPs for Aspartoacylase are listed that could be found in HGMD, dbSNPs, SNPdbe and OMIM. Mutations in red are reported to cause the Canavan Disease.
Residue Position Identifier Reference DB Validation evidence SNP Type Mutation
4 rs142041344 dbSNP
SNPdbe
1000Genomes missense C4R
14 CM063852 HGMD Zeng (2006) Mol Genet Metab 89 missense V14G
16
CM960084
SNPdbe
HGMD
Kaul (1996) Am J Hum Genet 59missenseI16T
18CM067343HGMDZeng (2006) Adv Exp Med Biol 576missenseG18R
21CM001608SNPdbe
HGMD
SHGMD857217istermans (2000) Eur J Hum Genet 8missenseH21P
24rs104894551
CM023602
dbSNP
SNPdbe
OMIM
HGMD
Multiple independent submissions to the refSNP cluster
Zeng (2002) J Inherit Metab Dis 25
missenseE24G
26rs145616193 dbSNPsynonymousT26T
27CM960085SNPdbe
HGMD
Kaul (1996) Am J Hum Genet 59missenseG27R
33rs138158568 dbSNP
SNPdbe
missenseH33R
53rs17850703 dbSNP
SNPdbe
missenseT53A
57CM001609SNPdbe
HGMD
Sistermans (2000) Eur J Hum Genet 8missenseA57T
68 CM023603SNPdbe
HGMD
Zeng (2002) J Inherit Metab Dis 25missenseD68A
71rs104894553
CM060201
dbSNP
SNPdbe
HGMD
multiple independent submissions to the refSNP cluster
Janson (2006) Ann Neurol 59
missenseR71H
71_2SNPdbemissenseR71K
82rs80099330 dbSNP
SNPdbe
Multiple independent submissions to the refSNP cluster
Validated by frequency or genotype data
1000 Genome project
missenseM82T
93rs144639820 dbSNPValidated by frequency or genotype datasynonymousA93A
109CM990192HGMDElpeleg (1999) J Inherit Metab Dis 22nonsenseY109Ter
111rs181347986 dbSNP
SNPdbe
1000 Genome projectmissenseI111V
114CM023014SNPdbe
HGMD
Olsen (2002) J Med Genet 39missenseD114Y
114_2CM960086HGMDKaul (1996) Am J Hum Genet 59missenseD114E
121rs148451498 dbSNP
SNPdbe
missenseN121D
121_2CM063846HGMDZeng (2006) Mol Genet Metab 89missenseN121I
123CM960087SNPdbe
HGMD
Kaul (1996) Am J Hum Genet 59missenseG123E
143rs199565861dbSNP
SNPdbe
1000 Genome projectmissenseI143V
143_2CM063849HGMDZeng (2006) Mol Genet Metab 89missenseI143F
143_3CM980125HGMDKobayashi (1998) Hum Mutat S1, S308missenseI143T
152rs104894548
CM950102
dbSNP
SNPdbe
OMIM
HGMD
Multiple independent submissions to the refSNP cluster
Kaul (1995) Hum Mutat 5
missenseC152R
152_2CM023604HGMDZeng (2002) J Inherit Metab Dis 25missenseC152W
152_3CM960088HGMDKaul (1996) Am J Hum Genet 59, 95missenseC152Y
153rs141755746 dbSNPsynonymousY153Y
154rs147193431
rs2228435
dbSNP
SNPdbe
missenseV154I
157rs140357187 dbSNP
SNPdbe
missenseI157T
164SNPdbemissenseY164F
166CM063847HGMDZeng (2006) Mol Genet Metab 89missenseT166I
168 CM001610SNPdbe
HGMD
Sistermans (2000) Eur J Hum Genet 8missenseR168H
168_2CM960089HGMDKaul (1996) Am J Hum Genet 59missenseR168C
170rs144321760 dbSNP
SNPdbe
Validated by frequency or genotype data
by freq
missenseI170T
178SNPdbemissenseE178A
181_2CM063850HGMDZeng (2006) Mol Genet Metab 89missenseP181L
181CM001611SNPdbe
HGMD
Sistermans (2000) Eur J Hum Genet 8missenseP181T
183CM990193SNPdbe
HGMD
Elpeleg (1999) J Inherit Metab Dis 22missenseP183H
184CM023605HGMDZeng (2002) J Inherit Metab Dis 25nonsenseQ184Ter
186CM990194SNPdbe
HGMD
Elpeleg (1999) J Inherit Metab Dis 22missenseV186F
195CM990195SNPdbe
HGMD
Elpeleg (1999) J Inherit Metab Dis 22missense M195R
202rs147763700 dbSNP
SNPdbe
missenseA202S
213CM055097HGMDTacke (2005) Neuropediatrics 36missenseK213E
214CM023606HGMDZeng (2002) J Inherit Metab Dis 25nonsenseE214Ter
218rs104894549
CM950103
dbSNP
HGMD
Multiple independent submissions to the refSNP cluster
Shaag (1995) Am J Hum Genet 57
nonsenseC218Ter
220rs139053885 dbSNP
SNPdbe
missenseI220T
226_2CM086530HGMDDi Pietro (2008) Clin Biochem 41missenseI226T
226rs201887670dbSNPmissenseI226K
231rs104894550
CM994594
dbSNP
SNPdbe
OMIM
HGMD
Multiple independent submissions to the refSNP cluster
Rady (1999) Am J Med Genet 87
missenseY231C
231_2CM940123HGMDKaul (1994) Am J Hum Genet 55nonsenseY231Ter
235rs149842031 dbSNP
SNPdbe
Multiple independent submissions to the refSNP cluster
Validated by frequency or genotype data
1000 Genome project
missenseE235K
236rs149189911 dbSNPsynonymousN236N
239rs145085349 dbSNP
SNPdbe
missenseI239T
244CM023607SNPdbe
HGMD
Zeng (2002) J Inherit Metab Dis 25missenseH244R
244_2CM063848HGMDZeng (2006) Mol Genet Metab 89missenseH244L
249rs104894552
CM023015
dbSNP
SNPdbe
HGMD
Multiple independent submissions to the refSNP cluster
Olsen (2002) J Med Genet 39
missenseD249V
270rs200126822dbSNP
SNPdbe
1000 Genome project missenseI270T
272CM063851HGMDZeng (2006) Mol Genet Metab 89missenseL272P
274CM950104SNPdbe
HGMD
Shaag (1995) Am J Hum Genet 57missenseG274R
277rs78677072 dbSNP Multiple independent submissions to the refSNP cluster
1000 Genome project
synonymousT277T
278rs140581464 dbSNP
SNPdbe
Multiple independent submissions to the refSNP cluster
Validated by frequency or genotype data
1000Genome project
missenseV278M
279rs145717248 dbSNP
SNPdbe
missenseY279H
280rs148081446 dbSNP Multiple independent submissions to the refSNP cluster
Validated by frequency or genotype data
1000 Genome project
synonymousP280P
280_2CM990197HGMDElpeleg (1999) J Inherit Metab Dis 22missenseP280S
280_3VAR_039088SNPdbeElpeleg (1999) J Inherit Metab Dis 22missenseP280L
281rs141858640 dbSNP
SNPdbe
missenseV281M
285rs28940279
CM930046
dbSNP
SNPdbe
OMIM
HGMD
Multiple independent submissions to the refSNP cluster
Validated by frequency or genotype data
Kaul (1993) Nat Genet 5
missenseE285A
285_2SNPdbemissenseE285D
286rs138062143 dbSNPsynonymousA286A
287CM990198SNPdbe
HGMD
Elpeleg (1999) J Inherit Metab Dis 22missenseA287T
288SNPdbemissenseY288F
288_2CM034717HGMDSurendran (2003) Mol Genet Metab 80missenseY288C
295CM950105SNPdbe
HGMD
Shaag (1995) Am J Hum Genet 57missenseF295S
305rs28940574
CM940124
dbSNP
SNPdbe
OMIM
HGMD
Multiple independent submissions to the refSNP cluster
Kaul (1994) Am J Hum Genet 55
missenseA305E
310SNPdbemissenseC310G
314CM023608HGMDZeng (2002) J Inherit Metab Dis 25missenseTer314W

</figtable>

SNP Visualization

The coding SNPs are written below the reference sequence. An 'X' denotes a nonsense mutation coding a stop codon


MTSCHIAEEHIQKVAIFGGTHGNELTGVFLVKHWLENGAEIQRTGLEVKP  50
   R         G T R  P  G TR     R                 
                                                  
FITNPRAVKKCTRYIDCDLNRIFDLENLGKKMSEDLPYEVRRAQEINHLF  100
  A   T          A  H          T          A       
                    K                             
GPKDSEDSYDIIFDLHNTTSNMGCTLILEDSRNNFLIQMFHYIKTSLAPL  150
        X V  Y      D E                   V       
             E      I                     F       
                                          T
PCYVYLIEHPSLKYATTRSIAKYPVGIEVGPQPQGVLRADILDQMRKMIK  200
 RYI  T      F I H T       A  T HX F        R     
 W               C            L                   
 Y
HALDFIHHFNEGKEFPPCAIEVYKIIEKVDYPRDENGEIAAIIHPNLQDQ  250
 S          EX   X T     K    C   KN  T    R    V 
                         T    X            L      
DWKPLHPGDPMFLTLDGKTIPLGGDCTVYPVFVNEAAYYEKKEAFAKTTK  300
                   T P R  TMHPM   AATF      S     
                             S    D  C            
LTLNAKSIRCCLH-  314
    E    G   W

Hotspots

Visual inspection

For the visual inspection, we looked at two kinds of reported SNPs: first, those reported by the HGMD, so they are reported to cause the Canavan Disease. Second, SNPs from dbSNP and xx that are not reported to cause the Canavan Disease (however, they are also not reported to NOT cause it).

For the HGMD data, SNPs were pretty much scattered all over the structure. It was surprising for us that even many SNPs on the surface of the protein, not even close to the dimer interaction site, were also reported to cause the Canavan Disease. See <xr id="hgmd_all" /> and <xr id ="hgmd_surface"/>.

<figtable id="HGMD">

<figure id="hgmd_all">
<xr nolink id="hgmd_all"/>
Mutations of Aspartoacylase which cause the Canavan Disease residues are coloured in red. Many of these mutations are close to the binding site (around the zinc ion), as can be expected.
</figure>
<figure id="hgmd_surface">
<xr nolink id="hgmd_surface"/>
Mutations of Aspartoacylase in red - surface view. Many of these mutations that cause the Canavan Disease can be found both on the surface, far away from the binding site or the dimer interaction site, which we found surprising.
</figure>

</figtable>


For SNPs NOT reported to cause the Canavan Disease, SNPs again were scattered all over the structure. We had expected to visually be able to see some concentration of mutations in certain - apparently functionally unimportant or structurally flexible - regions, but could not validate this expectation with the given data, at least not visually. See <xr id="others_all" /> and <xr id ="others_surface"/>.

<figtable id="others">

<figure id="others_all">
<xr nolink id="others_all"/>
Mutations of Aspartoacylase residues are coloured in blue. Visually, we cannot detect a pattern or hotspots.
</figure>
<figure id="others_surface">
<xr nolink id="others_surface"/>
Mutations of Aspartoacylase in blue - surface view. Mutations that are not reported to cause the Canavan Disease can be found both on the surface, as seen in this picture, as well as on the inside of the protein (as seen on the left).
</figure>

</figtable>

Frequency Distribution

Since visual inspection did not bring further enlightenment, we had a look at the frequency distribution of the disease causing and non-disease causing SNPs. These can be found in <xr id = "can"/> and <xr id="no_can"/>. Neither for the disease causing, nor for the non-disease causing SNPs, we were able to identify distinct hotspot-regions. The only exception might be the last block of mutations in <xr id = "no_can"/> close to the end of the sequence. It is curious, though, that this sequence part should be evolutionary flexible, since it includes residue 288, which is part of the binding site (but keep in mind that maybe this mutation has just not been annotated to cause the Canavan Disease. It might do it, still).

<figure id="can">
<xr nolink id="can"/>
Frequency distribution of disease-causing mutations along the sequence. Mutations are fairly evenly distributed.
</figure>


<figure id="no_can">
<xr nolink id="no_can"/>
Frequency distribution of mutations not reported to cause Canavan Disease. Again, the distribution is fairly even, except for a slight accumulation towards the end of the sequence in regions 277-288.
</figure>