Canavan Disease: Task 07 - Researching SNPs

From Bioinformatikpedia
Revision as of 17:12, 4 September 2013 by Boehma (talk | contribs)

Researching SNPs: Since it is already known from Task01, Canavan Disease is primarily caused by point mutations. These point mutations are either synonymous or non-synonymous. Those with an effect almost all refer to non-synonymous SNPs. Here all known disease-causing SNPs concerning Canavan Disease were looked up. Generally non-synonymous mutations are the SNPs of interest. However insertion and especially deletions can be of interest if they occur in specific parts of the protein. Deletions of residues that make up the binding pocket may for example disrupt the function of the protein. Insertions of a Proline within a helix may have a significant impact on secondary structure. Deletions and insertions in loop regions or near the end or start of the amino acid chain are supposed to have no severe effect however.

Overview of Databases

Five databases were used to find SNPs associated with the protein aspartoacylase or directly associated with Canavan Disease. <xr id="data"></xr> gives an overview of the resulting information including a small description of the databases:

<figtable id="data">

Results for database searches
Database further Information Mutation type Mutations Positions Canavan Disease
HGMD refers to BIOBASE
collate published gene lesions responsible for human inherited disease
public version is out of date for 4 years
(Task results are from August, 7th 2013)
missense, nonsense 49 40 all
indels 23
splicing 5
dbSNP dbSNP build 137 (newer version not completely released)
contains short genetic variations
(Task results are from August, 7th 2013)
SNP (silent mutations) 12 12 none
SNP (Canavan) 10 9 all
SNPdbe refers to dbSNP, SwissProt, SwissVar, PMD and 1000Genomes
offers information like experimentally derivation and evidences,
prediction of functional effects, disease associations, heterozygosity,
evolutionary conservation, links to external databases
(Task results are from August, 7th 2013)
SNP (no association) 26 22 none
SNP (Canavan) 29 24 all
SNPedia wiki with informations about risk alleles and effects of DNA variation
refers often to dbSNP
(Task results are from June, 21st 2013)
SNP (no association) 6 5 none
SNP (Canavan) 4 4 all
OMIM refers to dbSNP
updated daily
(Task results are from June, 21st 2013)
SNP 9 8 all
indels 3
Overview of database searches to find SNPs in aspartoacylase either or not associated with Canavan Disease. There are often different mutations on the same position of the protein. Therefore the column positions should give information about the number of positions found in all mutations.

</figtable>

In HGMD only mutations associated with Canavan Disease are listed. For dbSNP two searches were made: one for silent mutations in dbSNP and one for SNPs associated with Canavan Disease. In SNPdbe the search was performed against aspartoacylase, those associated with Canavan Disease were filtered. SNPedia had also the possibility to search for different inputs. Therefore two searches, one against aspartoacylase and one against Canavan Disease were done. Since OMIM refers to diseases, the search was restricted to Canavan Disease specific mutations.
Some detailed results can be found in the following sections per database. A list of all specific mutations can be found in the Supplement, to keep the content of this wiki entry clear.

HGMD

<xr id="hgmd"></xr> gives a more detailed view on which information HGMD provides searching for aspartoacylase:

<figtable id="hgmd">

HGMD Data for ASPA
Mutation Type Explanation Number of Mutations
Missense (Nonsense) Single base-pair substitutions in coding regions (resulting into STOP Codon) 49 (5)
Splicing Mutations with consequences for mRNA splicing 5
Regulatory Substitutions causing regulatory abnormalities 0
Small Deletions Micro-deletions (20 bp or less) 12
Small Insertions Micro-insertions (20 bp or less) 2
Small Indels Micro-indels (20 bp or less) 1
Gross Deletions Information regarding the nature and location of each lesion 8
Gross Insertions / Duplications Information regarding the nature and location of each lesion 0
Complex Rearrangements Information regarding the nature and location of each lesion 0
Repeat Variations Information regarding the nature and location of each lesion 0
Total (see on HGMD website) 77
Result list from HGMD searching for aspartoacylase

</figtable>

SNPdbe

Since SNPdbe provides the opportunity to search for experimental evidence of the data, <xr id="snpdbe"></xr> shows the kind of experimental evidence and its number of entries (multiple entries per mutation are possible)

<figtable id="snpdbe">

Experimental Evidence in SNPdbe
Experimental Evidence Number of entries
1000Genomes 4
by cluster 10
by frequency 5
not validated 43
Experimental Evidence of SNPdbe entries for ASPA

</figtable>

The data provided by SNPdbe can also be used to calculate a rough estimate if a mutation has an effect: Calculating the average of the PSSM (position specific scoring matrix) and PERC (percentage) scores per wild type and mutated type, a conservational score can be build. A SNP is assumed to be disease causing if the following four assumptions are true:

  • the PSSM score of the wildtype is larger than the average wildtype PSSM score (of SNPs found in ASPA)
  • the PSSM score of the mutation type is smaller than the average mutation PSSM score (of SNPs found in ASPA)
  • the PERC score of the wildtype is larger than the average wildtype PERC score (of SNPs found in ASPA)
  • the PERC score of the mutation type is smaller than the average PERC score (of SNPs found in ASPA)

Using this method estimated 12 SNPs (of originally 55) to have an effect. Nine of those twelve SNPS are already associated with Canavan Disease. Three have an already known dbSNP id.

Comparison

For a better overview the following two Venn Diagrams <xr id="vennSNPA"></xr> and <xr id="vennSNPB"></xr> show the number of common SNPs among the databases as well as the number of common SNP positions, both associated with Canavan Disease:

<figure id="vennSNPA">
Figure 1: Venn Diagram, showing the number of common SNPs associated with Canavan Disease using different databases.
</figure>
<figure id="vennSNPB">
Figure 2: Venn Diagram, showing the number of common SNP positions associated with Canavan Disease using different databases.
</figure>

Hot-Spots

From <xr id="vennSNPB">Venn Diagram</xr>, the hotspots associated with Canavan Disease can be read off from the overlapping regions. This brings 6 hot-spot positions:
One position is part of all database searches:

  • It is position 24 of the protein sequence in aspartoacylase, which is important for binding the zinc ion in the active center (referenced in Uniprot). Position 24 is a mutation from Glutamic acid to Glycin (HGMD data)

Five positions are part of at least three databases (dbSNP, SNPdbe and HGMD), whereas some of them are part of a secondary structure element. There is no information referring to Uniprot:

  • position 152: beginning of a beta sheet, in HGMD three annotations are listed: Cysteine to Arginine, Tyrosine or Tryptophan
  • position 231: loop region, in HGMD one annotation is listed: Tyrosine to Cysteine
  • position 249: loop region, in HGMD one annotation is listed: Aspatic acid to Valine
  • position 285: part of helix, in HGMD one annotation is listed: Glutamic acid to Alanine
  • position 305: ending of a beta sheet, in HGMD one annotation is listed: Alanine to Glutamic acid

Mutation Map

To get an overview of the mutations concerning aspartoacylase <xr id="mutation"></xr> shows disease mutations in red and silent mutations in green: <figure id="mutation">

Figure: Mutation Map of disease causing mutations (red) and silent mutations (green) in aspartoacylase.

</figure>

Supplement

Tasks