Difference between revisions of "Canavan Disease: Task 07 - Researching SNPs"

From Bioinformatikpedia
(Mutation Map)
(Hot-Spots)
 
(40 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''Researching SNPs:''' Since it is already known from Task 1, Canavan Disease is also caused by point mutations. These point mutations are either synonymous or non-synonymous. Those with an effect almost all refer to non-sysnonymous SNPs. Here all known disease-causing SNPs concerning Canavan Disease were looked up. Generally non-synonymous mutations are the SNPs of interest. However insertion and especially deletions can be of interest if they occur in specific parts of the protein. Deletions of residues that make up the binding pocket may for example disrupt the function of the protein. Insertions of a Proline within a helix may have a significant impact on secondary structure. Deletions and insertions in loop regions or near the end or start of the amino acid chain are supposed to have no severe effect however.
+
'''Researching SNPs:''' It is already known from '''[[Canavan_Disease|Task 01]]''', that Canavan Disease is primarily caused by point mutations. These point mutations are either synonymous or non-synonymous. Those with an effect almost all refer to non-synonymous SNPs. Here all known disease-causing SNPs concerning Canavan Disease were looked up. Generally non-synonymous mutations are the SNPs of interest. However insertion and especially deletions can be of interest if they occur in specific parts of the protein. Deletions of residues that make up the binding pocket may for example disrupt the function of the protein. Insertions of a Proline within a helix may have a significant impact on secondary structure. Deletions and insertions in loop regions or near the end or start of the amino acid chain are supposed to have no severe effect however.
 
== [[Canavan_Disease:_Task_07_-_Journal|LabJournal]] ==
 
   
 
== Overview of Databases ==
 
== Overview of Databases ==
   
Five databases were used to find SNPs associated to aspartoacylase or even Canavan Disease. The following '''<xr id="data">Table</xr>''' should give an overview of the resulting information from those databases:
+
Five databases were used to find SNPs associated with the protein aspartoacylase or directly associated with Canavan Disease. '''<xr id="data"></xr>''' gives an overview of the resulting information including a small description of the databases:
   
 
<figtable id="data">
 
<figtable id="data">
 
{| border="1" cellpadding="5" cellspacing="0" align="center"
 
{| border="1" cellpadding="5" cellspacing="0" align="center"
 
|-
 
|-
! colspan="6" style="background:#87cefa;" | Results for database searches
+
! colspan="6" style="background:#87cefa;" | Results for Database Searches
 
|-
 
|-
 
! style="background:#BFBFBF;" align="center" | Database
 
! style="background:#BFBFBF;" align="center" | Database
! style="background:#BFBFBF;" align="center" | further Information
+
! style="background:#BFBFBF;" align="center" | Further Information
! style="background:#BFBFBF;" | Mutation type
+
! style="background:#BFBFBF;" | Mutation Type
 
! style="background:#BFBFBF;" | Mutations
 
! style="background:#BFBFBF;" | Mutations
 
! style="background:#BFBFBF;" | Positions
 
! style="background:#BFBFBF;" | Positions
Line 20: Line 18:
 
|-
 
|-
 
| rowspan="3" | [http://www.hgmd.org/ HGMD]
 
| rowspan="3" | [http://www.hgmd.org/ HGMD]
| rowspan="3" | refers to BIOBASE<br> collate published gene lesions responsible for human inherited disease <br>public version is out of date for 4 years <br> (for this task results are from August, 7th 2013)
+
| rowspan="3" | refers to BIOBASE<br> collates published gene lesions responsible for human inherited disease <br>public version is out of date for 4 years <br> (Task results are from August, 7th 2013)
 
|| missense, nonsense || 49 || 40
 
|| missense, nonsense || 49 || 40
 
| rowspan="3" | all
 
| rowspan="3" | all
Line 29: Line 27:
 
|-
 
|-
 
| rowspan="2" | [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP]
 
| rowspan="2" | [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP]
| rowspan="2" | dbSNP build 137 (newer version not completely released) <br> contains short genetic variations <br>(for this task results are from August, 7th 2013)
+
| rowspan="2" | dbSNP build 137 (newer version not completely released) <br> contains short genetic variations <br>(Task results are from August, 7th 2013)
 
|| SNP (silent mutations) || 12 || 12 || none
 
|| SNP (silent mutations) || 12 || 12 || none
 
|-
 
|-
Line 35: Line 33:
 
|-
 
|-
 
| rowspan="2" | [http://www.rostlab.org/services/snpdbe/ SNPdbe]
 
| rowspan="2" | [http://www.rostlab.org/services/snpdbe/ SNPdbe]
| rowspan="2" | refers to dbSNP, SwissProt, SwissVar, PMD and 1000Genomes <br> offers information like experimentally derivation and evidences,<br>prediction of functional effects, disease associations, heterozygosity,<br>evolutionary conservation, links to external databases <br>(for this task results are from August, 7th 2013)
+
| rowspan="2" | refers to dbSNP, SwissProt, SwissVar, PMD and 1000Genomes <br> offers information like experimentally derivation and evidences,<br>prediction of functional effects, disease associations, heterozygosity,<br>evolutionary conservation, links to external databases <br>(Task results are from August, 7th 2013)
 
|| SNP (no association) || 26 || 22 || none
 
|| SNP (no association) || 26 || 22 || none
 
|-
 
|-
Line 41: Line 39:
 
|-
 
|-
 
| rowspan="2" | [http://www.snpedia.com/index.php/SNPedia SNPedia]
 
| rowspan="2" | [http://www.snpedia.com/index.php/SNPedia SNPedia]
| rowspan="2" | wiki with informations about risk alleles and effects of DNA variation <br> refers often to dbSNP <br>(for this task results are from June, 21st 2013)
+
| rowspan="2" | wiki with informations about risk alleles and effects of DNA variation <br> refers often to dbSNP <br>(Task results are from June, 21st 2013)
 
|| SNP (no association) || 6 || 5 || none
 
|| SNP (no association) || 6 || 5 || none
 
|-
 
|-
Line 47: Line 45:
 
|-
 
|-
 
| rowspan="2" | [http://omim.org/ OMIM]
 
| rowspan="2" | [http://omim.org/ OMIM]
| rowspan="2" | refers to dbSNP <br> updated daily <br> (for this task results are from June, 21st 2013)
+
| rowspan="2" | refers to dbSNP <br> updated daily <br> (Task results are from June, 21st 2013)
 
|| SNP || 9 || 8
 
|| SNP || 9 || 8
 
| rowspan="2" | all
 
| rowspan="2" | all
Line 54: Line 52:
 
|-
 
|-
 
|}
 
|}
<center><small>'''<caption>''' Overview of database searches to find SNPs in aspartoacylase either or not associated with Canavan Disease. There are often different mutations on the same position of the protein. Therefore the column ''positions'' should give information about the number of positions found in all mutations. </caption></small></center>
+
<center><small>'''<caption>''' Overview of database searches to find SNPs in aspartoacylase either or not associated with Canavan Disease. There are often different mutations on the same position of the protein.<br>Therefore the column ''positions'' should give information about the number of positions found in all mutations. </caption></small></center>
 
</figtable>
 
</figtable>
   
In HGMD only mutations associatied with Canavan Disease are listed. For dbSNP two searches were made: one for silent mutations in dbSNP and one for SNPs associated with Canavan Disease. In SNPdbe the search was against asartoacylase, those associated with Canavan Disease were filtered. SNPedia had also the possibility in searching for different inputs. Therefore two searches, one against aspartoacylae and one against Canavan Disease were done. Since OMIM refers to diseases, the search was restricted to Canavan Disease specific mutations.<br>
+
In HGMD only mutations associated with Canavan Disease are listed. For dbSNP two searches were made: one for silent mutations in dbSNP and one for SNPs associated with Canavan Disease. In SNPdbe the search was performed against aspartoacylase, those associated with Canavan Disease were filtered. SNPedia had also the possibility to search for different inputs. Therefore two searches, one against aspartoacylase and one against Canavan Disease were done. Since OMIM refers to diseases, the search was restricted to Canavan Disease specific mutations.<br>
Some detailed results can be found in the following sections per database. A list of all specific mutations can be found in the [https://i12r-studfilesrv.informatik.tu-muenchen.de/wiki/index.php/Canavan_Disease:_Task_07_-_Supplement Supplement], to keep the content complete.
+
Some detailed results can be found in the following sections per database. A list of all specific mutations can be found in the '''[[Canavan_Disease:_Task_07_-_Researching_SNPs#Supplement|Supplement]]''', to keep the content of this wiki entry clear.
   
 
=== HGMD ===
 
=== HGMD ===
'''<xr id="hgmd">Table</xr>''' should give a more detailed view on which information HGMD provides searching for aspartoacylase:
+
'''<xr id="hgmd"></xr>''' gives a more detailed view on which information HGMD provides searching for aspartoacylase:
   
 
<figtable id="hgmd">
 
<figtable id="hgmd">
Line 95: Line 93:
 
|-
 
|-
 
|}
 
|}
<center><small>'''<caption>''' Resultlist from HGMD in search for ASPA </caption></small></center>
+
<center><small>'''<caption>''' Result list from HGMD searching for aspartoacylase (ASPA).</caption></small></center>
 
</figtable>
 
</figtable>
   
 
=== SNPdbe ===
 
=== SNPdbe ===
   
Since SNPdbe gives the opportunity to search for experimental evidence of the data, '''<xr id="snpdbe">Table</xr>''' shows the kind of experimental evidence and its number of entries (multiple entries per mutation are possible)
+
Since SNPdbe provides the opportunity to search for experimental evidence of the data, '''<xr id="snpdbe"></xr>''' shows the kind of experimental evidence and its number of entries (multiple entries per mutation are possible)
   
 
<figtable id="snpdbe">
 
<figtable id="snpdbe">
Line 108: Line 106:
 
|-
 
|-
 
! style="background:#BFBFBF;" align="center" | Experimental Evidence
 
! style="background:#BFBFBF;" align="center" | Experimental Evidence
! style="background:#BFBFBF;" | Number of entries
+
! style="background:#BFBFBF;" | Number of Entries
 
|-
 
|-
 
| 1000Genomes || 4
 
| 1000Genomes || 4
Line 119: Line 117:
 
|-
 
|-
 
|}
 
|}
<center><small>'''<caption>''' Experimental Evidence of SNPdbe entries </caption></small></center>
+
<center><small>'''<caption>''' Experimental Evidence of SNPdbe entries for ASPA </caption></small></center>
 
</figtable>
 
</figtable>
   
The data provided by SNPdbe can also be used to calculate quick and dirty if a mutation has an effect: When calculation the average of the PSSM (position specific scoring matrix) and PERC (percentage) scores per wild type and mutated type, a ''conservational score'' can be build. A SNP is assumed to be disease causing if the following three assumptions are true:
+
The data provided by SNPdbe can also be used to calculate a rough estimate if a mutation has an effect: Calculating the average of the PSSM (position specific scoring matrix) and PERC (percentage) scores per wild type and mutated type, a '''conservational score''' can be build. A SNP is assumed to be disease causing if the following four assumptions are true:
* the PSSM score of the wildtype is larger to its average
+
* the PSSM score of the wildtype is larger than the average wildtype PSSM score (of SNPs found in ASPA)
* the PSSM score of the mutation type is smaller then its average
+
* the PSSM score of the mutation type is smaller than the average mutation PSSM score (of SNPs found in ASPA)
* the PERC score of the wildtype is larger to its average
+
* the PERC score of the wildtype is larger than the average wildtype PERC score (of SNPs found in ASPA)
* the PERC score of the mutation type is smaller then its average
+
* the PERC score of the mutation type is smaller than the average PERC score (of SNPs found in ASPA)
Using this ''method''' brought 12 SNPs (of originally 55) from which three have an already known dbSNP id.
+
Using this ''method'' estimated 12 SNPs (of originally 55) to have an effect. '''Nine''' of those twelve SNPs '''are already associated''' with Canavan Disease. '''Three have an already known dbSNP id'''.
   
 
== Comparison ==
 
== Comparison ==
   
For a better overview the following two Venn Diagrams '''<xr id="vennSNPA"></xr>''' and '''<xr id="vennSNPB"></xr>''' shows the number of common SNPs among the databases as well as the number of common SNP positions, both associated with Canavan Disease:
+
For a better overview the following two Venn Diagrams '''<xr id="vennSNPA"></xr>''' and '''<xr id="vennSNPB"></xr>''' show the number of common SNPs among the databases as well as the number of common SNP positions, both associated with Canavan Disease:
   
 
{| border="0" cellpadding="5" cellspacing="0" align="center"
 
{| border="0" cellpadding="5" cellspacing="0" align="center"
 
|-
 
|-
| align="center" | <figure id="vennSNPA">[[Image:Venn_Canavan_SNPs.png|thumb|359px|'''Figure 1:''' Venn Diagram, showing the number of common SNPs associated with Canavan Disease using different databases]]</figure>
+
| align="center" | <figure id="vennSNPA">[[Image:Venn_Canavan_SNPs.png|thumb|359px|'''Figure 1:''' Venn Diagram, showing the number of common SNPs associated with Canavan Disease using different databases.]]</figure>
| align="center" | <figure id="vennSNPB">[[Image:venn_Position_Canavan_SNPs.png|thumb|370px|'''Figure 2:''' Venn Diagram, showing the number of common SNP positions associated with Canavan Disease using different databases]]</figure>
+
| align="center" | <figure id="vennSNPB">[[Image:venn_Position_Canavan_SNPs.png|thumb|370px|'''Figure 2:''' Venn Diagram, showing the number of common SNP positions associated with Canavan Disease using different databases.]]</figure>
 
|-
 
|-
 
|}
 
|}
   
  +
===Hot-Spots===
 
From '''<xr id="vennSNPB"> Venn Diagram </xr>''', the hotspots associated with Canavan Disease can be read off from the overlapping regions. This brings 24 hot-spot positions. One position is part of all database searches: It is position 24 of the protein sequence in aspartoacylase, which is important for binding the zinc atom in the active center (referenced in [http://www.uniprot.org/uniprot/P45381 Uniprot])
+
From '''<xr id="vennSNPB"></xr>''', the hotspots associated with Canavan Disease can be read off from the overlapping regions. This brings 6 hot-spot positions:<br>
  +
Possibly the most interesting of the positions that is part of all database searches:
  +
* It is position 24 of the protein sequence in aspartoacylase, which is important for binding the zinc ion in the active center (referenced in [http://www.uniprot.org/uniprot/P45381 Uniprot]). Position 24 is a mutation from Glutamic acid to Glycin (HGMD data)<br>
  +
The other five positions that are part of all databases, whereas some of them are part of a secondary structure element. There is no information referring to Uniprot:
  +
* position 152: beginning of a beta sheet, in HGMD three annotations are listed: Cysteine to Arginine, Tyrosine or Tryptophan
  +
* position 231: loop region, in HGMD one annotation is listed: Tyrosine to Cysteine
  +
* position 249: loop region, in HGMD one annotation is listed: Aspatic acid to Valine
  +
* position 285: part of helix, in HGMD one annotation is listed: Glutamic acid to Alanine
  +
* position 305: ending of a beta sheet, in HGMD one annotation is listed: Alanine to Glutamic acid
   
 
== Mutation Map ==
 
== Mutation Map ==
To get an overview of the mutations concerning aspartoacylase the following '''<xr id="mutation"></xr>''' shows disease mutations in red and silent mutations in green:
+
To get an overview of the mutations concerning aspartoacylase '''<xr id="mutation"></xr>''' shows disease mutations in red and silent mutations in green:
 
<figure id="mutation">
 
<figure id="mutation">
[[Image:Canavan_PositionMap.png|centre|thumb|694px|'''Figure:''' mutation Map]]
+
[[Image:Canavan_PositionMap.png|centre|thumb|694px|'''<caption>''' Mutation Map of disease causing mutations (red) and silent mutations (green) in aspartoacylase.</caption>]]
 
</figure>
 
</figure>
   

Latest revision as of 10:48, 24 October 2013

Researching SNPs: It is already known from Task 01, that Canavan Disease is primarily caused by point mutations. These point mutations are either synonymous or non-synonymous. Those with an effect almost all refer to non-synonymous SNPs. Here all known disease-causing SNPs concerning Canavan Disease were looked up. Generally non-synonymous mutations are the SNPs of interest. However insertion and especially deletions can be of interest if they occur in specific parts of the protein. Deletions of residues that make up the binding pocket may for example disrupt the function of the protein. Insertions of a Proline within a helix may have a significant impact on secondary structure. Deletions and insertions in loop regions or near the end or start of the amino acid chain are supposed to have no severe effect however.

Overview of Databases

Five databases were used to find SNPs associated with the protein aspartoacylase or directly associated with Canavan Disease. <xr id="data"></xr> gives an overview of the resulting information including a small description of the databases:

<figtable id="data">

Results for Database Searches
Database Further Information Mutation Type Mutations Positions Canavan Disease
HGMD refers to BIOBASE
collates published gene lesions responsible for human inherited disease
public version is out of date for 4 years
(Task results are from August, 7th 2013)
missense, nonsense 49 40 all
indels 23
splicing 5
dbSNP dbSNP build 137 (newer version not completely released)
contains short genetic variations
(Task results are from August, 7th 2013)
SNP (silent mutations) 12 12 none
SNP (Canavan) 10 9 all
SNPdbe refers to dbSNP, SwissProt, SwissVar, PMD and 1000Genomes
offers information like experimentally derivation and evidences,
prediction of functional effects, disease associations, heterozygosity,
evolutionary conservation, links to external databases
(Task results are from August, 7th 2013)
SNP (no association) 26 22 none
SNP (Canavan) 29 24 all
SNPedia wiki with informations about risk alleles and effects of DNA variation
refers often to dbSNP
(Task results are from June, 21st 2013)
SNP (no association) 6 5 none
SNP (Canavan) 4 4 all
OMIM refers to dbSNP
updated daily
(Task results are from June, 21st 2013)
SNP 9 8 all
indels 3
Overview of database searches to find SNPs in aspartoacylase either or not associated with Canavan Disease. There are often different mutations on the same position of the protein.
Therefore the column positions should give information about the number of positions found in all mutations.

</figtable>

In HGMD only mutations associated with Canavan Disease are listed. For dbSNP two searches were made: one for silent mutations in dbSNP and one for SNPs associated with Canavan Disease. In SNPdbe the search was performed against aspartoacylase, those associated with Canavan Disease were filtered. SNPedia had also the possibility to search for different inputs. Therefore two searches, one against aspartoacylase and one against Canavan Disease were done. Since OMIM refers to diseases, the search was restricted to Canavan Disease specific mutations.
Some detailed results can be found in the following sections per database. A list of all specific mutations can be found in the Supplement, to keep the content of this wiki entry clear.

HGMD

<xr id="hgmd"></xr> gives a more detailed view on which information HGMD provides searching for aspartoacylase:

<figtable id="hgmd">

HGMD Data for ASPA
Mutation Type Explanation Number of Mutations
Missense (Nonsense) Single base-pair substitutions in coding regions (resulting into STOP Codon) 49 (5)
Splicing Mutations with consequences for mRNA splicing 5
Regulatory Substitutions causing regulatory abnormalities 0
Small Deletions Micro-deletions (20 bp or less) 12
Small Insertions Micro-insertions (20 bp or less) 2
Small Indels Micro-indels (20 bp or less) 1
Gross Deletions Information regarding the nature and location of each lesion 8
Gross Insertions / Duplications Information regarding the nature and location of each lesion 0
Complex Rearrangements Information regarding the nature and location of each lesion 0
Repeat Variations Information regarding the nature and location of each lesion 0
Total (see on HGMD website) 77
Result list from HGMD searching for aspartoacylase (ASPA).

</figtable>

SNPdbe

Since SNPdbe provides the opportunity to search for experimental evidence of the data, <xr id="snpdbe"></xr> shows the kind of experimental evidence and its number of entries (multiple entries per mutation are possible)

<figtable id="snpdbe">

Experimental Evidence in SNPdbe
Experimental Evidence Number of Entries
1000Genomes 4
by cluster 10
by frequency 5
not validated 43
Experimental Evidence of SNPdbe entries for ASPA

</figtable>

The data provided by SNPdbe can also be used to calculate a rough estimate if a mutation has an effect: Calculating the average of the PSSM (position specific scoring matrix) and PERC (percentage) scores per wild type and mutated type, a conservational score can be build. A SNP is assumed to be disease causing if the following four assumptions are true:

  • the PSSM score of the wildtype is larger than the average wildtype PSSM score (of SNPs found in ASPA)
  • the PSSM score of the mutation type is smaller than the average mutation PSSM score (of SNPs found in ASPA)
  • the PERC score of the wildtype is larger than the average wildtype PERC score (of SNPs found in ASPA)
  • the PERC score of the mutation type is smaller than the average PERC score (of SNPs found in ASPA)

Using this method estimated 12 SNPs (of originally 55) to have an effect. Nine of those twelve SNPs are already associated with Canavan Disease. Three have an already known dbSNP id.

Comparison

For a better overview the following two Venn Diagrams <xr id="vennSNPA"></xr> and <xr id="vennSNPB"></xr> show the number of common SNPs among the databases as well as the number of common SNP positions, both associated with Canavan Disease:

<figure id="vennSNPA">
Figure 1: Venn Diagram, showing the number of common SNPs associated with Canavan Disease using different databases.
</figure>
<figure id="vennSNPB">
Figure 2: Venn Diagram, showing the number of common SNP positions associated with Canavan Disease using different databases.
</figure>

Hot-Spots

From <xr id="vennSNPB"></xr>, the hotspots associated with Canavan Disease can be read off from the overlapping regions. This brings 6 hot-spot positions:
Possibly the most interesting of the positions that is part of all database searches:

  • It is position 24 of the protein sequence in aspartoacylase, which is important for binding the zinc ion in the active center (referenced in Uniprot). Position 24 is a mutation from Glutamic acid to Glycin (HGMD data)

The other five positions that are part of all databases, whereas some of them are part of a secondary structure element. There is no information referring to Uniprot:

  • position 152: beginning of a beta sheet, in HGMD three annotations are listed: Cysteine to Arginine, Tyrosine or Tryptophan
  • position 231: loop region, in HGMD one annotation is listed: Tyrosine to Cysteine
  • position 249: loop region, in HGMD one annotation is listed: Aspatic acid to Valine
  • position 285: part of helix, in HGMD one annotation is listed: Glutamic acid to Alanine
  • position 305: ending of a beta sheet, in HGMD one annotation is listed: Alanine to Glutamic acid

Mutation Map

To get an overview of the mutations concerning aspartoacylase <xr id="mutation"></xr> shows disease mutations in red and silent mutations in green: <figure id="mutation">

Mutation Map of disease causing mutations (red) and silent mutations (green) in aspartoacylase.

</figure>

Supplement

Tasks