Canavan Disease: Task 07 - Researching SNPs
Researching SNPs: It is already known from Task 01, that Canavan Disease is primarily caused by point mutations. These point mutations are either synonymous or non-synonymous. Those with an effect almost all refer to non-synonymous SNPs. Here all known disease-causing SNPs concerning Canavan Disease were looked up. Generally non-synonymous mutations are the SNPs of interest. However insertion and especially deletions can be of interest if they occur in specific parts of the protein. Deletions of residues that make up the binding pocket may for example disrupt the function of the protein. Insertions of a Proline within a helix may have a significant impact on secondary structure. Deletions and insertions in loop regions or near the end or start of the amino acid chain are supposed to have no severe effect however.
Contents
Overview of Databases
Five databases were used to find SNPs associated with the protein aspartoacylase or directly associated with Canavan Disease. <xr id="data"></xr> gives an overview of the resulting information including a small description of the databases:
<figtable id="data">
Results for Database Searches | |||||
---|---|---|---|---|---|
Database | Further Information | Mutation Type | Mutations | Positions | Canavan Disease |
HGMD | refers to BIOBASE collates published gene lesions responsible for human inherited disease public version is out of date for 4 years (Task results are from August, 7th 2013) |
missense, nonsense | 49 | 40 | all |
indels | 23 | ||||
splicing | 5 | ||||
dbSNP | dbSNP build 137 (newer version not completely released) contains short genetic variations (Task results are from August, 7th 2013) |
SNP (silent mutations) | 12 | 12 | none |
SNP (Canavan) | 10 | 9 | all | ||
SNPdbe | refers to dbSNP, SwissProt, SwissVar, PMD and 1000Genomes offers information like experimentally derivation and evidences, prediction of functional effects, disease associations, heterozygosity, evolutionary conservation, links to external databases (Task results are from August, 7th 2013) |
SNP (no association) | 26 | 22 | none |
SNP (Canavan) | 29 | 24 | all | ||
SNPedia | wiki with informations about risk alleles and effects of DNA variation refers often to dbSNP (Task results are from June, 21st 2013) |
SNP (no association) | 6 | 5 | none |
SNP (Canavan) | 4 | 4 | all | ||
OMIM | refers to dbSNP updated daily (Task results are from June, 21st 2013) |
SNP | 9 | 8 | all |
indels | 3 |
Therefore the column positions should give information about the number of positions found in all mutations.
</figtable>
In HGMD only mutations associated with Canavan Disease are listed. For dbSNP two searches were made: one for silent mutations in dbSNP and one for SNPs associated with Canavan Disease. In SNPdbe the search was performed against aspartoacylase, those associated with Canavan Disease were filtered. SNPedia had also the possibility to search for different inputs. Therefore two searches, one against aspartoacylase and one against Canavan Disease were done. Since OMIM refers to diseases, the search was restricted to Canavan Disease specific mutations.
Some detailed results can be found in the following sections per database. A list of all specific mutations can be found in the Supplement, to keep the content of this wiki entry clear.
HGMD
<xr id="hgmd"></xr> gives a more detailed view on which information HGMD provides searching for aspartoacylase:
<figtable id="hgmd">
HGMD Data for ASPA | ||
---|---|---|
Mutation Type | Explanation | Number of Mutations |
Missense (Nonsense) | Single base-pair substitutions in coding regions (resulting into STOP Codon) | 49 (5) |
Splicing | Mutations with consequences for mRNA splicing | 5 |
Regulatory | Substitutions causing regulatory abnormalities | 0 |
Small Deletions | Micro-deletions (20 bp or less) | 12 |
Small Insertions | Micro-insertions (20 bp or less) | 2 |
Small Indels | Micro-indels (20 bp or less) | 1 |
Gross Deletions | Information regarding the nature and location of each lesion | 8 |
Gross Insertions / Duplications | Information regarding the nature and location of each lesion | 0 |
Complex Rearrangements | Information regarding the nature and location of each lesion | 0 |
Repeat Variations | Information regarding the nature and location of each lesion | 0 |
Total | (see on HGMD website) | 77 |
</figtable>
SNPdbe
Since SNPdbe provides the opportunity to search for experimental evidence of the data, <xr id="snpdbe"></xr> shows the kind of experimental evidence and its number of entries (multiple entries per mutation are possible)
<figtable id="snpdbe">
Experimental Evidence in SNPdbe | |
---|---|
Experimental Evidence | Number of Entries |
1000Genomes | 4 |
by cluster | 10 |
by frequency | 5 |
not validated | 43 |
</figtable>
The data provided by SNPdbe can also be used to calculate a rough estimate if a mutation has an effect: Calculating the average of the PSSM (position specific scoring matrix) and PERC (percentage) scores per wild type and mutated type, a conservational score can be build. A SNP is assumed to be disease causing if the following four assumptions are true:
- the PSSM score of the wildtype is larger than the average wildtype PSSM score (of SNPs found in ASPA)
- the PSSM score of the mutation type is smaller than the average mutation PSSM score (of SNPs found in ASPA)
- the PERC score of the wildtype is larger than the average wildtype PERC score (of SNPs found in ASPA)
- the PERC score of the mutation type is smaller than the average PERC score (of SNPs found in ASPA)
Using this method estimated 12 SNPs (of originally 55) to have an effect. Nine of those twelve SNPs are already associated with Canavan Disease. Three have an already known dbSNP id.
Comparison
For a better overview the following two Venn Diagrams <xr id="vennSNPA"></xr> and <xr id="vennSNPB"></xr> show the number of common SNPs among the databases as well as the number of common SNP positions, both associated with Canavan Disease:
<figure id="vennSNPA"></figure> | <figure id="vennSNPB"></figure> |
Hot-Spots
From <xr id="vennSNPB"></xr>, the hotspots associated with Canavan Disease can be read off from the overlapping regions. This brings 6 hot-spot positions:
Possibly the most interesting of the positions that is part of all database searches:
- It is position 24 of the protein sequence in aspartoacylase, which is important for binding the zinc ion in the active center (referenced in Uniprot). Position 24 is a mutation from Glutamic acid to Glycin (HGMD data)
The other five positions that are part of all databases, whereas some of them are part of a secondary structure element. There is no information referring to Uniprot:
- position 152: beginning of a beta sheet, in HGMD three annotations are listed: Cysteine to Arginine, Tyrosine or Tryptophan
- position 231: loop region, in HGMD one annotation is listed: Tyrosine to Cysteine
- position 249: loop region, in HGMD one annotation is listed: Aspatic acid to Valine
- position 285: part of helix, in HGMD one annotation is listed: Glutamic acid to Alanine
- position 305: ending of a beta sheet, in HGMD one annotation is listed: Alanine to Glutamic acid
Mutation Map
To get an overview of the mutations concerning aspartoacylase <xr id="mutation"></xr> shows disease mutations in red and silent mutations in green: <figure id="mutation">
</figure>
Supplement
Tasks
- Link to Task 01: Canavan Disease
- Link to Task 02: Alignments
- Link to Task 03: Sequence-based Predictions
- Link to Task 04: Structural Alignments
- Link to Task 05: Homology Modelling
- Link to Task 06: Protein Structure Prediction from Evolutionary Sequence Variation
- Link to Task 07: Researching SNPs
- Link to Task 08: Sequence-based Mutation Analysis
- Link to Task 09: Structure-based Mutation Analysis
- Link to Task 10: Normal Mode Analysis