Difference between revisions of "Researching SNPs Gaucher Disease"

From Bioinformatikpedia
(Mutation map)
(SNPdbe)
 
(28 intermediate revisions by 2 users not shown)
Line 161: Line 161:
 
We tried to search the gene symbol([http://www.rostlab.org/services/snpdbe/dosearch.php?id=name&val=GBA&organism2=human&organism1= GBA]), protein (UniProt) id([http://www.rostlab.org/services/snpdbe/dosearch.php?id=name&val=P04062&organism2=human&organism1= P04062]), and the disease name ([http://www.rostlab.org/services/snpdbe/dosearch.php?id=disease&val=gaucher gaucher]), different number of SNPs were returned (GBA: 563, p04062: 163, gaucher: 174). The results were all mixed with the SAASs located at similar protein sequences from different database (e.g. NCBI or UniProt).
 
We tried to search the gene symbol([http://www.rostlab.org/services/snpdbe/dosearch.php?id=name&val=GBA&organism2=human&organism1= GBA]), protein (UniProt) id([http://www.rostlab.org/services/snpdbe/dosearch.php?id=name&val=P04062&organism2=human&organism1= P04062]), and the disease name ([http://www.rostlab.org/services/snpdbe/dosearch.php?id=disease&val=gaucher gaucher]), different number of SNPs were returned (GBA: 563, p04062: 163, gaucher: 174). The results were all mixed with the SAASs located at similar protein sequences from different database (e.g. NCBI or UniProt).
   
The searching results which were only limited to the protein seqeunce p04062 returns 65 SAASs, in red color as showing in the following map. All of them have no experimental evidence. The SAASs with high conservation score (which showed that is a mutant more than a wildtype) are labeled in blue :
+
In SNPdbe, for each SAAS, there are two evolutionary conservation scores available. One is for the wildtype residue and one is for the mutation residue. If the score of wildtype residue is much bigger than that of mutant, it is more likely that such mutant might be harmful, therefore could be disease causing. The searching results which were only limited to the protein sequence p04062 returns 65 SAASs, in red color as showing in the following map. All of them have no experimental evidence. The SAASs with low conservation score are labeled in blue :
 
<br/>
 
<br/>
   
 
<figtable id="tab:snpdbe_map">
 
<figtable id="tab:snpdbe_map">
 
<code>
 
<code>
  +
<span style='font-weight:bold'>SAASs </span> ND<span style='color: red; font-weight: bold'>R</span>D<span style='color: red; font-weight: bold'>P</span>VALMH PDGSAV<span style='color: red; font-weight: bold'>L</span>VV<span style='color: red; font-weight: bold'>P</span> <span style='color: red; font-weight: bold'>K</span><span style='color: red; font-weight: bold'>P</span>SSKDVPLT IK<span style='color: red; font-weight: bold'>Y</span>PAVGFLE TISPGYSIHT YLWR<span style='color: red; font-weight: bold'>C</span>Q
 
 
<span style='font-weight:bold'> </span> <span style='font-weight: bold'>1 </span> <span style='font-weight: bold'>11 </span> <span style='font-weight: bold'>21 </span> <span style='font-weight: bold'>31 </span> <span style='font-weight: bold'>41 </span> <span style='font-weight: bold'>51 </span> <span style='font-weight: bold'>61 </span> <span style='font-weight: bold'>71 </span>
 
<span style='font-weight:bold'> </span> <span style='font-weight: bold'>1 </span> <span style='font-weight: bold'>11 </span> <span style='font-weight: bold'>21 </span> <span style='font-weight: bold'>31 </span> <span style='font-weight: bold'>41 </span> <span style='font-weight: bold'>51 </span> <span style='font-weight: bold'>61 </span> <span style='font-weight: bold'>71 </span>
 
<span style='font-weight:bold'>Wildtype</span> MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVV<span style='color: red; font-weight: bold'>C</span>VCNAT YC<span style='color: red; font-weight: bold'>D</span>SFDPPTF PALGT<span style='color: red; font-weight: bold'>F</span>SRYE
 
<span style='font-weight:bold'>Wildtype</span> MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVV<span style='color: red; font-weight: bold'>C</span>VCNAT YC<span style='color: red; font-weight: bold'>D</span>SFDPPTF PALGT<span style='color: red; font-weight: bold'>F</span>SRYE
Line 190: Line 190:
   
 
</code>
 
</code>
<caption>Genetic variants of P04062 annotated in SNPdbe. Red: missense mutations(SAASs), Blue: SAASs with high conservation score.</caption>
+
<caption>Genetic variants of P04062 annotated in SNPdbe. Red: missense mutations(SAASs), Blue: SAASs with low conservation score.</caption>
 
</figtable>
 
</figtable>
  +
  +
<figure id="fig:snpdbe_2nt0_A">
  +
{|
  +
|[[File:snpdbe_2nt0_A.png|thumb|300px|left|<caption>Genetic variants of P04062 annotated in SNPdbe mapped to 2nt0_A. Red: non-synonymous mutations</caption>]]
  +
|}
  +
</figure>
   
 
== OMIM ==
 
== OMIM ==
 
[http://omim.org/ OMIM] (Online Mendelian Inheritance in Man) is a database containing information on all known genetic disorders and the corresponding genes in human genome with particular focus on the molecular relationship between phenotype and genotype. In the [http://omim.org/statistics/entry up to date version] (8 June 2012), there are 19971 autosomal related, 1171 X-linked,59 Y-linked, 65 Mitochondrial related entries. Each entry is either Gene description with/without phenotype information, or phenotype description with/without molecular basis.
 
[http://omim.org/ OMIM] (Online Mendelian Inheritance in Man) is a database containing information on all known genetic disorders and the corresponding genes in human genome with particular focus on the molecular relationship between phenotype and genotype. In the [http://omim.org/statistics/entry up to date version] (8 June 2012), there are 19971 autosomal related, 1171 X-linked,59 Y-linked, 65 Mitochondrial related entries. Each entry is either Gene description with/without phenotype information, or phenotype description with/without molecular basis.
   
Searching OMIM againt "GBA" returned 27 entries. 7 of them are clearly related to Caucher disease.The others are related to Parkinson disease, Prostate cancer etc. Among them [http://omim.org/entry/606463 *606463] is the most representative one which contains 48 variants. The most of them have links to dbSNP. In OMIM, abundant scientific publications are given for each entry which provide confirmed evidence for annotation/validation purpose.
+
Searching OMIM against "GBA" returned 27 entries. 7 of them are clearly related to Gaucher disease.The others are related to Parkinson disease, Prostate cancer etc. Among them, [http://omim.org/entry/606463 *606463] is the most representative one which contains 48 variants( see <xr id="OMIM_SNPs"/> where the entries are ordered by OMIM entry number but not the position order ). The most of them have links to dbSNP. Among them, the most are missenses, other types of mutation like mutation at the splicing site (e.g.<span style='color: red'>IVS10DS, G-A, -1</span>), insertion (e.g. <span style='color: red'>IVS10DS, G-A, -1</span> ) and deletion (e.g. <span style='color: red'>1-BP DEL, CODON 139C</span>), are also included (colored in red). However, such variants entries are from different reference sequences, therefore it is not easy to map them to the one single protein sequence. In OMIM, abundant scientific publications are given for each entry which provide confirmed evidence for annotation/validation purpose.
  +
  +
<br style="clear:both;">
  +
<figtable id="OMIM_SNPs">
  +
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 1px 0 1px 0" align="left" width="1200px"
  +
|-
  +
| style="border-style: solid; border-width: 0 0 1px 0" | HGVS Name
  +
| style="border-style: solid; border-width: 0 0 1px 0" | dbSNP id
  +
| style="border-style: solid; border-width: 0 0 1px 0" | HGVS Name
  +
| style="border-style: solid; border-width: 0 0 1px 0" | dbSNP id
  +
| style="border-style: solid; border-width: 0 0 1px 0" | HGVS Name
  +
| style="border-style: solid; border-width: 0 0 1px 0" | dbSNP id
  +
| style="border-style: solid; border-width: 0 0 1px 0" | HGVS Name
  +
| style="border-style: solid; border-width: 0 0 1px 0" | dbSNP id
  +
|-
  +
|-
  +
| L444P || rs421016 || P415R || rs121908295 || N370S || rs76763715 || R119Q || rs79653797
  +
|-
  +
| V394L || rs80356769 || D409H || rs1064651 || D409V|| rs77369218 || R463C || rs80356771
  +
|-
  +
| V460V || rs421016 || F216Y || rs74500255 || E326K || rs2230288 || E326K || rs121908297
  +
|-
  +
| F213I || rrs381737 || <span style='color: red'>84insG</span>|| - || <span style='color: red'>IVS2DS+1G-A</span> || rs104886460 || P289L || rs121908298
  +
|-
  +
| R463C || rs76539814 || <span style='color: red'>72delC</span> || - || P122S || rs121908299 || Y212H || rs121908300
  +
|-
  +
| G478S || rs121908301 || R496H || rs80356773 || <span style='color: red'>55-bp deletion</span> || rs80356768 || V15L || rs121908302
  +
|-
  +
| G46E || rs77829017 || N188S || rs364897 || F216V || rs121908303|| A309V || rs78396650
  +
|-
  +
| W312C || rs121908304 || G325R || rs121908305 || C342G || rs121908306 || S364T || rs121908307
  +
|-
  +
| <span style='color: red'>259C-T transition </span>|| - || <span style='color: red'>1-BP DEL, CODON 139C</span> || - || R353G || rs121908308|| P401L || rs74598136
  +
|-
  +
| H311R || rs78198234 || R359X || rs121908309 || V398F || rs121908310|| G377S || rs121908311
  +
|-
  +
| R257U || - || R131L || rs80356763 || K79N || rs121908312 || F251L || rs121908313
  +
|-
  +
| L371V || rs121908314 || <span style='color: red'>IVS10DS, G-A, -1</span> || - || H255Q,D409H || rs1064651|| D443N|| -
  +
|-
  +
|}
  +
<br style="clear:both;">
  +
<caption>SNPs entries in OMIM for Gaucher Disease.</caption>
  +
</figtable>
  +
<br style="clear:both;">
   
 
== SNPedia ==
 
== SNPedia ==
 
[http://www.snpedia.com/index.php/SNPedia SNPedia] is wiki styled database providing the information about the effects of SNPs. Each entry contains a description of a SNP, together with the links to scientific publications and other related genomics web sites. The SNPs data is supported by some personal genomics company, e.g., 23andMe, Navigenics, deCODEme or Knome. In the [http://www.snpedia.com/index.php/SNPedia:FAQ up to data version] of SNPedia, there are 29135 SNPs.
 
[http://www.snpedia.com/index.php/SNPedia SNPedia] is wiki styled database providing the information about the effects of SNPs. Each entry contains a description of a SNP, together with the links to scientific publications and other related genomics web sites. The SNPs data is supported by some personal genomics company, e.g., 23andMe, Navigenics, deCODEme or Knome. In the [http://www.snpedia.com/index.php/SNPedia:FAQ up to data version] of SNPedia, there are 29135 SNPs.
   
There is [http://www.snpedia.com/index.php/GBA one entry for gene "GBA"] and [http://www.snpedia.com/index.php/Gaucher_disease one entry for "Gaucher Disease"] respectively. Both list only the most important SNPs which were experimentally confirmed to be strongly associated to gaucher disease, e.g.N370S,L444P,D409H etc.
+
There is [http://www.snpedia.com/index.php/GBA one entry for gene "GBA"] and [http://www.snpedia.com/index.php/Gaucher_disease one entry for "Gaucher Disease"] respectively. Both list only the most important SNPs which were experimentally confirmed to be strongly associated to gaucher disease. <xr id="SNPedia_SNPs"/> shows the list of SNPs related to GBA gene in SNPedia.
   
  +
<br style="clear:both;">
== Mutation map ==
 
  +
<figtable id="SNPedia_SNPs">
  +
{| style="border-collapse: separate; border-style: solid; border-spacing: 0; border-width: 1px 0 1px 0" align="left" width="550px"
  +
|-
  +
| style="border-style: solid; border-width: 0 0 1px 0" | OMIM id
  +
| style="border-style: solid; border-width: 0 0 1px 0" | HGVS Name
  +
| style="border-style: solid; border-width: 0 0 1px 0" | dbSNP id
  +
| style="border-style: solid; border-width: 0 0 1px 0" | verified by
  +
|-
  +
|-
  +
| 606463.0001 || L444P || rs35095275,rs421016 ||23andMe v3, HumanOmni1Quad
  +
|-
  +
| 606463.0006 || D409H || rs1064651 || 23andMe v1, 23andMe v3
  +
|-
  +
| 606463.0013 || F213I || rs381737 || -
  +
|-
  +
| 606463.0026 || N188S || rs364897 || 23andMe v3
  +
|-
  +
| - || E326K || rs2230288 || 23andMe v3, Illumina Human 1M
  +
|-
  +
|}
  +
<br style="clear:both;">
  +
<caption>SNPs entries in OMIM for Gaucher Disease.</caption>
  +
</figtable>
  +
<br style="clear:both;">
  +
  +
== Mutation map and Comparison ==
   
In the following mapping, SNPs(only synonymous) from different data sources (HGMD,dbSNP, SNPdbe) are shown. We can see that the SNPs data from different database varied. It might suggest that the most SNPs (even though they are missens) are evolutionary neutral and therefore lead to no obvious impact on the function of its gene product. <xr id="fig:venn_compare"/> shows the intersection and unique part of SNPs from different databases. Only 20 SNPs are reported by all three databases, therefroe deserve further investigation.
+
In the following mapping, SNPs(only synonymous) from different data sources (HGMD,dbSNP, SNPdbe) are shown. We can see that the SNPs data from different database varied. It might suggest that the most SNPs (even though they are missense) are evolutionary neutral and therefore lead to no obvious impact on the function of its gene product. <xr id="fig:compare"/> is a representation of PDB structure with annotated/predicted SNPs from different databases. The residues in purple are the SNPs found in at least two different databases. The residues in red are unique SNPs from HGMD. The residues in green are unique SNPs from dbSNP and the residues in blue are unique SNPs from SNPdbe. <xr id="fig:venn_compare"/> shows the intersection and unique part of SNPs from different databases. We can see that overall 200 SNPs (only synonymous) are reported from HGMD,dbSNP and SNPdbe. HGMD has return the most part of SNPs and SNPdbe the least. Only 20 SNPs are reported by all three databases, therefore deserve further investigation.
 
 
 
<figtable id="tab:snpdbe_map">
 
<figtable id="tab:snpdbe_map">
Line 244: Line 320:
 
<span style='font-weight:bold'>dbsnp__Missense</span> N<span style='color: green; font-weight: bold'>N</span>LDAVALM<span style='color: green; font-weight: bold'>R</span> PDGSAVVVVL NRS<span style='color: green; font-weight: bold'>F</span>KDVP<span style='color: green; font-weight: bold'>P</span>T IK<span style='color: green; font-weight: bold'>E</span>PAVGFLE TISPGYSIH<span style='color: green; font-weight: bold'>I</span> YLW<span style='color: green; font-weight: bold'>C</span><span style='color: green; font-weight: bold'>H</span>Q
 
<span style='font-weight:bold'>dbsnp__Missense</span> N<span style='color: green; font-weight: bold'>N</span>LDAVALM<span style='color: green; font-weight: bold'>R</span> PDGSAVVVVL NRS<span style='color: green; font-weight: bold'>F</span>KDVP<span style='color: green; font-weight: bold'>P</span>T IK<span style='color: green; font-weight: bold'>E</span>PAVGFLE TISPGYSIH<span style='color: green; font-weight: bold'>I</span> YLW<span style='color: green; font-weight: bold'>C</span><span style='color: green; font-weight: bold'>H</span>Q
 
<span style='font-weight:bold'>SNPdbe_SAASs </span> ND<span style='color: blue; font-weight: bold'>R</span>D<span style='color: blue; font-weight: bold'>P</span>VALMH PDGSAV<span style='color: blue; font-weight: bold'>L</span>VV<span style='color: blue; font-weight: bold'>P</span> <span style='color: blue; font-weight: bold'>K</span><span style='color: blue; font-weight: bold'>P</span>SSKDVPLT IK<span style='color: blue; font-weight: bold'>Y</span>PAVGFLE TISPGYSIHT YLWR<span style='color: blue; font-weight: bold'>C</span>Q
 
<span style='font-weight:bold'>SNPdbe_SAASs </span> ND<span style='color: blue; font-weight: bold'>R</span>D<span style='color: blue; font-weight: bold'>P</span>VALMH PDGSAV<span style='color: blue; font-weight: bold'>L</span>VV<span style='color: blue; font-weight: bold'>P</span> <span style='color: blue; font-weight: bold'>K</span><span style='color: blue; font-weight: bold'>P</span>SSKDVPLT IK<span style='color: blue; font-weight: bold'>Y</span>PAVGFLE TISPGYSIHT YLWR<span style='color: blue; font-weight: bold'>C</span>Q
 
 
</code>
 
</code>
<caption>Genetic variants of P04062 from different data source. </caption>
+
<caption>Genetic variants of P04062 from different data source. Purple: SNPs appearing in at least two different databases, Red: SNPs from HGMD, Green: SNPs from dbSNP, Blue: SNPs from SNPdbe. </caption>
 
</figtable>
 
</figtable>
   
  +
<figure id="fig:compare">
<br/>
 
  +
{|
  +
|[[File:compare.png|thumb|300px|left|<caption>Genetic variants of P04062 annotated in different databases mapped to 2nt0_A. Purple: SNPs with overlaps from at least two different databases, Red: SNPs from HGMD, Green: SNPs from dbSNP, Blue: SNPs from SNPdbe</caption>]]
  +
|}
  +
</figure>
   
 
<figure id="fig:venn_compare">[[File:venn_compare.png|thumb|300px|left|<caption> The Venn diagram to compare the SNPs in protein P04062 from different data source.</caption>]]</figure>
 
<figure id="fig:venn_compare">[[File:venn_compare.png|thumb|300px|left|<caption> The Venn diagram to compare the SNPs in protein P04062 from different data source.</caption>]]</figure>
 
<br style="clear:both;">
 
<br style="clear:both;">
  +
  +
== Discussion ==
  +
  +
In HGMD, for the given searching keywords, the free version returns well organized SNPs entries, i.e. the mutation type of each entry is clearly defined. For each entry, important links to the scientific publications are also available. However, the free version does not contain the up to date data compared to the professional version.
  +
dbSNP is a powerful online tool provided by NCBI. Because it is integrated in the whole NCBI online workbench, there are abundant links to the related items which help the user to get further information. Compared to the other databases, dbSNP also provides synonymous SNPs.
  +
SNPdbe is the only database which tends to annotate automatically the experimental SNPs data. The layout of the database is well designed and the search engine is powerful. However, the output format can be made more user friendly.
  +
OMIM only contains verified SNPs data together with abundant scientific articles which provide further information about related genetic diseases. Similar to OMIM, SNPs in SNPedia are verified by private genomic companies.
   
 
== References ==
 
== References ==

Latest revision as of 18:22, 12 June 2012

The aim of this task was to map non-synonymous SNPs known for the Gaucher Disease causing gene GBA onto the corresponding protein sequence P04062. Several databases should be taken into account, namely HGMD<ref name="hgmd">Stenson et al. (2009). The Human Gene Mutation Database (HGMD): 2008 Update. Genome Med</ref>, dbSNP<ref name="dbsnp">Sherry ST, Ward M, and Sirotkin, K. (1999). dbSNP - database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Research</ref>, SNPdb <ref name="snpdbe">C Schaefer, A Meier, B Rost. (2012). SNPdbe: constructing an nsSNP functional impacts database. Bioinformatics</ref>, and OMIN <ref name="omim">Hamosh, A.; Scott, A.; Amberger, J.; Bocchini, C.; McKusick, V. (2004). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research</ref>. Technical details are reported in our protocol.

HGMD

The Human Genome Mutation Database (HGMD) is a collection of observed mutations in the human genome which are associated with various diseases. The mutations are taken from publications, both manually and automatically [1]. There is a public version with 88745 entries (9th June, 2012) and a professional version with 123565 entries. Since 2009, the public version has been extended only by few new entries such that it does no longer contain all currently known mutations [2]. The HGMD encompasses following types of mutations:

Missense
Mutations which result in another amino acid
Nonsense
Mutations which result in a stop codon
Splicing
Mutations which alter splice sites
Regulatory
Mutation which alter gene regulation
Small/gross deletions
Deletions which remove residues
Small/gross insertions
Insertions which give rise to new residues
Small indels
Insertions or deletions
Duplications
Duplication of gene fragments
Complex rearrangements
Change in the location for gene fragments
Repeat variations
Varying number of sequence copies

<xr id="tab:hgmd_number"/> lists the number of mutations annotated in HGMD for GBA. A comprehensive list of all missense/nonsense mutations can be obtained by following the respective hyperlink.

<figtable id="tab:hgmd_number">

Type Count
Missense/nonsense 250
Splicing 15
Regulatory 0
Small deletions 26
Small insertions 13
Small indels 2
Gross deletions 3
Gross insertions/duplications 1
Complex rearrangements 15
Public total 325

Number of mutations in HGMD for GBA. </figtable>

<xr id="tab:hgmd_map"/> shows the wildtype version of P04062 with non-synonymous mutations highlighted in red. Since the transcript NM_001005741.1 already referred to P04062, the amino acid positions could be used directly for creating <xr id="tab:hgmd_map"/> without any additional translation.

<figtable id="tab:hgmd_map">

               1          11         21         31         41         51         61         71        
Wildtype       MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVVCVCNAT YCDSFDPPTF PALGTFSRYE
Non-synonymous MEFSSPSREE CPRPLSRVSI MAGSLTGLLL LQAVS!ASGA RPCIPKSFSY ISVLSVCNAT YCNSFDPPTF PALSTVSRYK
               81         91         101        111        121        131        141        151       
Wildtype       STRSGRRMEL SMGPIQANHT GTGLLLTLQP EQKFQKVKGF GGAMTDAAAL NILALSPPAQ NLLLKSYFSE EGIGYNIIRV
Non-synonymous NIRSE!QMEL SMGPIQANHT GTGLPLTLQP E!!FQKANGF GGATTDAATL NILALSPPAQ NLLRKLCVSK EEIGYDISQA
               161        171        181        191        201        211        221        231       
Wildtype       PMASCDFSIR TYTYADTPDD FQLHNFSLPE EDTKLKIPLI HRALQLAQRP VSLLASPWTS PTWLKTNGAV NGKGSLKGQP
Non-synonymous LVASCVFSIL TYP!EDTPHD FQLHNFSLPE ADTKLQILLS PQALQLA!CP V!PLDSS!PS LTRFKTKVEG NGKWPPEGQP
               241        251        261        271        281        291        301        311       
Wildtype       GDIYHQTWAR YFVKFLDAYA EHKLQFWAVT AENEPSAGLL SGYPFQCLGF TPEHQRDFIA RDLGPTLANS THHNVRLLML
Non-synonymous EDICHQTRVR HCVKYLDACA EHKLQFWAVR A!NETPAVLL SVHHFQCQGL TPEQQ!DLTA HDLDATLANG THHNVRLPML
               321        331        341        351        361        371        381        391       
Wildtype       DDQRLLLPHW AKVVLTDPEA AKYVHGIAVH WYLDFLAPAK ATLGETHRLF PNTMLFASEA CVGSKFWEQS VRLGSWDRGM
Non-synonymous DDQHLLPLHW AKVVLTDPEA AI!LHGIVVR RHLHFLDSAK AIPWKTHCLS PNTMPFASET YVGSKFGK!S LWLDF!D!GI
               401        411        421        431        441        451        461        471       
Wildtype       QYSHSIITNL LYHVVGWTDW NLALNPEGGP NWVRNFVDSP IIVDITKDTF YKQPMFYHLG HFSKFIPEGS QRVGLVASQK
Non-synonymous QCRHGIIMKV LYHLVS!TA! KRDPNL!ERL IRLCTSFYSL FIVDITKHTI HQRRVVCHLD HFSEVIPEGS QGVGLVASQK
               481        491        501        511        521        531       
Wildtype       NDLDAVALMH PDGSAVVVVL NRSSKDVPLT IKDPAVGFLE TISPGYSIHT YLWRRQ
Non-synonymous NDRDPVALMR PDGSPVVVMP KPSSKDVPLT IKYPAVSFPE TISPGYPTHI YLWRCR

Genetic variants of P04062 annotated in HGMD. Red: non-synonymous mutations; </figtable>

<figure id="fig:hgmd_2nt0_A">

Genetic variants of P04062 annotated in HGMD mapped to 2nt0_A. Red: non-synonymous mutations

</figure>

In order to get an expression about the structural elements which are subject to mutations, we mapped the mutations of <xr id="tab:hgmd_map"/> from P04062 to its closest crystal structure 2nt0. We found that the mutations were scattered all over the structure without major aggregations in certain domains (cf. <xr id="fig:hgmd_2nt0_A"/>).


dbSNP

dbSNP is a publicly available collection of genetic variations from different species. Genetic variations include (1) SNPs, (2) short deletion and insertion polymorphisms (indels/DIPs), (3) microsatellite markers or short tandem repeats (STRs), (4) multinucleotide polymorphisms (MNPs), (5) heterozygous sequences, and (6) named variants. Genetic variants are submitted by, amongst others, research institutions, large scale genome sequencing centers, and other SNP databases. Statistics about the latest release can be found here.

Variations of GBA in dbSNP included 92 missense mutations, 47 synonymous mutations and 3 deletions. Missense and synonymous mutations are visualized in <xr id="tab:dbsnp_map"/>. All three deletions cause a frame-shift due to the loss of a single nucleotide which severely alters the amino acid sequence after the position where the mutation occurred. The effect would be less severe if a complete codon were inserted or deleted which would change the length of the protein sequence but not the amino acid composition.

<figtable id="tab:dbsnp_map">

               1          11         21         31         41         51         61         71        
Wildtype       MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVVCVCNAT YCDSFDPPTF PALGTFSRYE
Non-synonymous MEFSSPSREE CPRPSGGVSV MAGSLTGLLL LQAVSWASGA CPCIPESFGY SSVVCVCNAT YYESFDPRTF PALGTLSCYK
Synonymous     MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVVCVCNAT YCDSFDPPTF PALGTFSRYE
               81         91         101        111        121        131        141        151       
Wildtype       STRSGRRMEL SMGPIQANHT GTGLLLTLQP EQKFQKVKGF GGAMTDAAAL NILALSPPAQ NLLLKSYFSE EGIGYNIIRV
Non-synonymous SRSSGQWMEL STGPIQANYT GTGLLLTLQP EQKFQKVKGF GGAMTDAAAL NILALSPPAQ NLLLKSYFSE EGIGYKISWV
Synonymous     STRSGRRMEL SMGPIQANHT GTGLLLTLQP EQKFQKVKGF GGAMTDAAAL NILALSPPAQ NLLLKSYFSE EGIGYNIIRV
               161        171        181        191        201        211        221        231       
Wildtype       PMASCDFSIR TYTYADTPDD FQLHNFSLPE EDTKLKIPLI HRALQLAQRP VSLLASPWTS PTWLKTNGAV NGKGSLKGQP
Non-synonymous LIASCVFSIL TYIYEDTPHD FQLHNFSLPE EDTKLNILLN PRALQLAQRP ISLLASPWTS LTRLKTKVEE NGKESLKGQP
Synonymous     PMASCDFSIR TYTYADTPDD FQLHNFSLPE EDTKLKIPLI HRALQLAQRP VSLLASPWTS PTWLKTNGAV NGKGSLKGQP
               241        251        261        271        281        291        301        311       
Wildtype       GDIYHQTWAR YFVKFLDAYA EHKLQFWAVT AENEPSAGLL SGYPFQCLGF TPEHQRDFIA RDLGPTLANS THHNVRLLML
Non-synonymous EDICHQTWAR YLVKFLDAYA EHKLQFWAVR AENEPSAGLL NGYPFQCLGF TPEHQRDFIA HDLDRTLANG THHNVRLLML
Synonymous     GDIYHQTWAR YFVKFLDAYA EHKLQFWAVT AENEPSAGLL SGYPFQCLGF TPEHQRDFIA RDLGPTLANS THHNVRLLML
               321        331        341        351        361        371        381        391       
Wildtype       DDQRLLLPHW AKVVLTDPEA AKYVHGIAVH WYLDFLAPAK ATLGETHRLF PNTMLFASEA CVGSKFWEQS VRLGSWDRGM
Non-synonymous DDQHLLLPQW AKVVLTDPEA AICVYGIAVH WYLDFLDPAK ATLGETHHLF PNTMLFASEA CVGSKFWEQS VQLGSWDQGI
Synonymous     DDQRLLLPHW AKVVLTDPEA AKYVHGIAVH WYLDFLAPAK ATLGETHRLF PNTMLFASEA CVGSKFWEQS VRLGSWDRGM
               401        411        421        431        441        451        461        471       
Wildtype       QYSHSIITNL LYHVVGWTDW NLALNPEGGP NWVRNFVDSP IIVDITKDTF YKQPMFYHLG HFSKFIPEGS QRVGLVASQK
Non-synonymous QCSHRIIMNL LYHVVGCTAW NIALNPKGGL IWVRTSVESP TIVDITKETL YKQPILCHLG HFSKFIPEGS QRVGLVASPK
Synonymous     QYSHSIITNL LYHVVGWTDW NLALNPEGGP NWVRNFVDSP IIVDITKDTF YKQPMFYHLG HFSKFIPEGS QRVGLVASQK
               481        491        501        511        521        531       
Wildtype       NDLDAVALMH PDGSAVVVVL NRSSKDVPLT IKDPAVGFLE TISPGYSIHT YLWRRQ
Non-synonymous NNLDAVALMR PDGSAVVVVL NRSFKDVPPT IKEPAVGFLE TISPGYSIHI YLWCHQ
Synonymous     NDLDAVALMH PDGSAVVVVL NRSSKDVPLT IKDPAVGFLE TISPGYSIHT YLWRRQ

Genetic variants of P04062 annotated in dbSNP. Red: missense mutations; Green: synonymous mutations; Purple: missense and synonymous mutations. </figtable>

</figure> </figure>
<figure id="fig:dbsnp_2nt0_A">
Genetic variants of P04062 annotated in dbSNP mapped to 2nt0_A. Red: missense mutations; Green: synonymous mutations; Purple: missense and synonymous mutations.
<figure id="fig:hgmd_dbsnp">
Overlap of non-synonymous mutations between HGMD and dbSNP.


<xr id="fig:dbsnp_2nt0_A"/> visualises the genetic variants of <xr id="tab:dbsnp_map"/> in 2nt0_A. As in <xr id="fig:hgmd_2nt0_A"/>, the mutations were evenly distributed with a slightly higher concentration in the glycosyl hydrolase domain (the domain on the left hand site).

<xr id="fig:hgmd_dbsnp"/> depicts the overlap of non-synonymous mutations between HGMD and dbSNP. If shows that HGMD comprises significantly more non-synonymous mutations than dbSNP although it has not been updated since 2009.

SNPdbe

SNPdbe (nsSNP database of functional effects) is a database and a web interface which provides predictions of computationally annotated functional impacts of SNPs. Instead of SNPs, each entry is represented as a SAAS (single amino acid substitution) which means each non-synonymous SNP (nsSNP) changes one amino acid in the gene product. The impact of SAAS is predicted by using the SNAP and SIFT algorithms and is augmented with the information and disease associations from PMD, OMIM and UniProt. In the up to date version (2012/03/05), SNPdbe contains 1691464 entries/SAASs which belong to 2985 Organisms and 159142 Protein sequences.

We tried to search the gene symbol(GBA), protein (UniProt) id(P04062), and the disease name (gaucher), different number of SNPs were returned (GBA: 563, p04062: 163, gaucher: 174). The results were all mixed with the SAASs located at similar protein sequences from different database (e.g. NCBI or UniProt).

In SNPdbe, for each SAAS, there are two evolutionary conservation scores available. One is for the wildtype residue and one is for the mutation residue. If the score of wildtype residue is much bigger than that of mutant, it is more likely that such mutant might be harmful, therefore could be disease causing. The searching results which were only limited to the protein sequence p04062 returns 65 SAASs, in red color as showing in the following map. All of them have no experimental evidence. The SAASs with low conservation score are labeled in blue :

<figtable id="tab:snpdbe_map">

         1          11         21         31         41         51         61         71        
Wildtype MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVVCVCNAT YCDSFDPPTF PALGTFSRYE
SAASs    MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVVSVCNAT YCNSFDPPTF PALGTVSRYE
         81         91         101        111        121        131        141        151       
Wildtype STRSGRRMEL SMGPIQANHT GTGLLLTLQP EQKFQKVKGF GGAMTDAAAL NILALSPPAQ NLLLKSYFSE EGIGYNIIRV
SAASs    STRSGRRMEL SMGPIQANHT GTGLLLTLQP EQKFQKVKGF GGAMTDAATL NILALSPPAQ NLLLKLYFSE EEIGYDITRV
         161        171        181        191        201        211        221        231       
Wildtype PMASCDFSIR TYTYADTPDD FQLHNFSLPE EDTKLKIPLI HRALQLAQRP VSLLASPWTS PTWLKTNGAV NGKGSLKGQP
SAASs    PVASCDFSIC TYPYADTPDD FQLHNFSLPE EDTKLKITLI HRALQLAQPP VSFLDSSWTS TTWFKTNGTV NEKWSLEGQP
         241        251        261        271        281        291        301        311       
Wildtype GDIYHQTWAR YFVKFLDAYA EHKLQFWAVT AENEPSAGLL SGYPFQCLGF TPEHQRDFIA RDLGPTLANS THHNVRLLML
SAASs    RDIYHQTWAR YFVKFLDAYA EHKLQFWAVT AENEPPAGLL SGYPFQCLGF TPEQQRDLIA RDIGPTLANS THHNVRLLML
         321        331        341        351        361        371        381        391       
Wildtype DDQRLLLPHW AKVVLTDPEA AKYVHGIAVH WYLDFLAPAK ATLGETHRLF PNTMLFASEA CVGSKFWEQS VRLGSWDRGM
SAASs    DDQCLLLPHW AKVVLTDPEA AKYVHGIAVH WHLHFLAPAK ATPGETHRLF PNTMLFASET CVGSKFWKQS LWLGSWDRGM
         401        411        421        431        441        451        461        471       
Wildtype QYSHSIITNL LYHVVGWTDW NLALNPENGP NWVRNFVDSP IIVDITKDTF YKQPMFYHLG HFSKFIPEGS QRVGLVASQK
SAASs    QYSHNIITNL LYHLVGGTHW KLALNLEERP NRVRNFLNSP FIVDITKDTI HQQPVVYHLD HFSEFIPEGS QRVGLVASQK
         481        491        501        511        521        531       
Wildtype NDLDAVALMH PDGSAVVVVL NRSSKDVPLT IKDPAVGFLE TISPGYSIHT YLWRRQ
SAASs    NDRDPVALMH PDGSAVLVVP KPSSKDVPLT IKYPAVGFLE TISPGYSIHT YLWRCQ

Genetic variants of P04062 annotated in SNPdbe. Red: missense mutations(SAASs), Blue: SAASs with low conservation score. </figtable>

<figure id="fig:snpdbe_2nt0_A">

Genetic variants of P04062 annotated in SNPdbe mapped to 2nt0_A. Red: non-synonymous mutations

</figure>

OMIM

OMIM (Online Mendelian Inheritance in Man) is a database containing information on all known genetic disorders and the corresponding genes in human genome with particular focus on the molecular relationship between phenotype and genotype. In the up to date version (8 June 2012), there are 19971 autosomal related, 1171 X-linked,59 Y-linked, 65 Mitochondrial related entries. Each entry is either Gene description with/without phenotype information, or phenotype description with/without molecular basis.

Searching OMIM against "GBA" returned 27 entries. 7 of them are clearly related to Gaucher disease.The others are related to Parkinson disease, Prostate cancer etc. Among them, *606463 is the most representative one which contains 48 variants( see <xr id="OMIM_SNPs"/> where the entries are ordered by OMIM entry number but not the position order ). The most of them have links to dbSNP. Among them, the most are missenses, other types of mutation like mutation at the splicing site (e.g.IVS10DS, G-A, -1), insertion (e.g. IVS10DS, G-A, -1 ) and deletion (e.g. 1-BP DEL, CODON 139C), are also included (colored in red). However, such variants entries are from different reference sequences, therefore it is not easy to map them to the one single protein sequence. In OMIM, abundant scientific publications are given for each entry which provide confirmed evidence for annotation/validation purpose.


<figtable id="OMIM_SNPs">

HGVS Name dbSNP id HGVS Name dbSNP id HGVS Name dbSNP id HGVS Name dbSNP id
L444P rs421016 P415R rs121908295 N370S rs76763715 R119Q rs79653797
V394L rs80356769 D409H rs1064651 D409V rs77369218 R463C rs80356771
V460V rs421016 F216Y rs74500255 E326K rs2230288 E326K rs121908297
F213I rrs381737 84insG - IVS2DS+1G-A rs104886460 P289L rs121908298
R463C rs76539814 72delC - P122S rs121908299 Y212H rs121908300
G478S rs121908301 R496H rs80356773 55-bp deletion rs80356768 V15L rs121908302
G46E rs77829017 N188S rs364897 F216V rs121908303 A309V rs78396650
W312C rs121908304 G325R rs121908305 C342G rs121908306 S364T rs121908307
259C-T transition - 1-BP DEL, CODON 139C - R353G rs121908308 P401L rs74598136
H311R rs78198234 R359X rs121908309 V398F rs121908310 G377S rs121908311
R257U - R131L rs80356763 K79N rs121908312 F251L rs121908313
L371V rs121908314 IVS10DS, G-A, -1 - H255Q,D409H rs1064651 D443N -


SNPs entries in OMIM for Gaucher Disease. </figtable>

SNPedia

SNPedia is wiki styled database providing the information about the effects of SNPs. Each entry contains a description of a SNP, together with the links to scientific publications and other related genomics web sites. The SNPs data is supported by some personal genomics company, e.g., 23andMe, Navigenics, deCODEme or Knome. In the up to data version of SNPedia, there are 29135 SNPs.

There is one entry for gene "GBA" and one entry for "Gaucher Disease" respectively. Both list only the most important SNPs which were experimentally confirmed to be strongly associated to gaucher disease. <xr id="SNPedia_SNPs"/> shows the list of SNPs related to GBA gene in SNPedia.


<figtable id="SNPedia_SNPs">

OMIM id HGVS Name dbSNP id verified by
606463.0001 L444P rs35095275,rs421016 23andMe v3, HumanOmni1Quad
606463.0006 D409H rs1064651 23andMe v1, 23andMe v3
606463.0013 F213I rs381737 -
606463.0026 N188S rs364897 23andMe v3
- E326K rs2230288 23andMe v3, Illumina Human 1M


SNPs entries in OMIM for Gaucher Disease. </figtable>

Mutation map and Comparison

In the following mapping, SNPs(only synonymous) from different data sources (HGMD,dbSNP, SNPdbe) are shown. We can see that the SNPs data from different database varied. It might suggest that the most SNPs (even though they are missense) are evolutionary neutral and therefore lead to no obvious impact on the function of its gene product. <xr id="fig:compare"/> is a representation of PDB structure with annotated/predicted SNPs from different databases. The residues in purple are the SNPs found in at least two different databases. The residues in red are unique SNPs from HGMD. The residues in green are unique SNPs from dbSNP and the residues in blue are unique SNPs from SNPdbe. <xr id="fig:venn_compare"/> shows the intersection and unique part of SNPs from different databases. We can see that overall 200 SNPs (only synonymous) are reported from HGMD,dbSNP and SNPdbe. HGMD has return the most part of SNPs and SNPdbe the least. Only 20 SNPs are reported by all three databases, therefore deserve further investigation.

<figtable id="tab:snpdbe_map">

                1          11         21         31         41         51         61         71        
Wildtype        MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVVCVCNAT YCDSFDPPTF PALGTFSRYE
HGMD_Missense   MEFSSPSREE CPRPLSRVSI MAGSLTGLLL LQAVS!ASGA RPCIPKSFSY ISVLSVCNAT YCNSFDPPTF PALSTVSRYK
dbsnp__Missense MEFSSPSREE CPRPSGGVSV MAGSLTGLLL LQAVSWASGA CPCIPESFGY SSVVCVCNAT YYESFDPRTF PALGTLSCYK
SNPdbe_SAASs    MEFSSPSREE CPKPLSRVSI MAGSLTGLLL LQAVSWASGA RPCIPKSFGY SSVVSVCNAT YCNSFDPPTF PALGTVSRYE
                81         91         101        111        121        131        141        151       
Wildtype        STRSGRRMEL SMGPIQANHT GTGLLLTLQP EQKFQKVKGF GGAMTDAAAL NILALSPPAQ NLLLKSYFSE EGIGYNIIRV
HGMD_Missense   NIRSE!QMEL SMGPIQANHT GTGLPLTLQP E!!FQKANGF GGATTDAATL NILALSPPAQ NLLRKLCVSK EEIGYDISQA
dbsnp__Missense SRSSGQWMEL STGPIQANYT GTGLLLTLQP EQKFQKVKGF GGAMTDAAAL NILALSPPAQ NLLLKSYFSE EGIGYKISWV
SNPdbe_SAASs    STRSGRRMEL SMGPIQANHT GTGLLLTLQP EQKFQKVKGF GGAMTDAATL NILALSPPAQ NLLLKLYFSE EEIGYDITRV
                161        171        181        191        201        211        221        231       
Wildtype        PMASCDFSIR TYTYADTPDD FQLHNFSLPE EDTKLKIPLI HRALQLAQRP VSLLASPWTS PTWLKTNGAV NGKGSLKGQP
HGMD_Missense   LVASCVFSIL TYP!EDTPHD FQLHNFSLPE ADTKLQILLS PQALQLA!CP V!PLDSS!PS LTRFKTKVEG NGKWPPEGQP
dbsnp__Missense LIASCVFSIL TYIYEDTPHD FQLHNFSLPE EDTKLNILLN PRALQLAQRP ISLLASPWTS LTRLKTKVEE NGKESLKGQP
SNPdbe_SAASs    PVASCDFSIC TYPYADTPDD FQLHNFSLPE EDTKLKITLI HRALQLAQCP VSFLDSSWTS TTWFKTNGTV NEKWSLEGQP
                241        251        261        271        281        291        301        311       
Wildtype        GDIYHQTWAR YFVKFLDAYA EHKLQFWAVT AENEPSAGLL SGYPFQCLGF TPEHQRDFIA RDLGPTLANS THHNVRLLML
HGMD_Missense   EDICHQTRVR HCVKYLDACA EHKLQFWAVR A!NETPAVLL SVHHFQCQGL TPEQQ!DLTA HDLDATLANG THHNVRLPML
dbsnp__Missense EDICHQTWAR YLVKFLDAYA EHKLQFWAVR AENEPSAGLL NGYPFQCLGF TPEHQRDFIA HDLDRTLANG THHNVRLLML
SNPdbe_SAASs    RDIYHQTWAR YFVKFLDAYA EHKLQFWAVT AENEPPAGLL SGYPFQCLGF TPEQQRDLIA RDIGPTLANS THHNVRLLML
                321        331        341        351        361        371        381        391       
Wildtype        DDQRLLLPHW AKVVLTDPEA AKYVHGIAVH WYLDFLAPAK ATLGETHRLF PNTMLFASEA CVGSKFWEQS VRLGSWDRGM
HGMD_Missense   DDQHLLPLHW AKVVLTDPEA AI!LHGIVVR RHLHFLDSAK AIPWKTHCLS PNTMPFASET YVGSKFGK!S LWLDF!D!GI
dbsnp__Missense DDQHLLLPQW AKVVLTDPEA AICVYGIAVH WYLDFLDPAK ATLGETHHLF PNTMLFASEA CVGSKFWEQS VQLGSWDQGI
SNPdbe_SAASs    DDQCLLLPHW AKVVLTDPEA AKYVHGIAVH WHLHFLAPAK ATPGETHRLF PNTMLFASET CVGSKFWKQS LWLGSWDRGM
                401        411        421        431        441        451        461        471       
Wildtype        QYSHSIITNL LYHVVGWTDW NLALNPEGGP NWVRNFVDSP IIVDITKDTF YKQPMFYHLG HFSKFIPEGS QRVGLVASQK
HGMD_Missense   QCRHGIIMKV LYHLVS!TA! KRDPNL!ERL IRLCTSFYSL FIVDITKHTI HQRRVVCHLD HFSEVIPEGS QGVGLVASQK
dbsnp__Missense QCSHRIIMNL LYHVVGCTAW NIALNPKGGL IWVRTSVESP TIVDITKETL YKQPILCHLG HFSKFIPEGS QRVGLVASPK
SNPdbe_SAASs    QYSHGIITNL LYHLVGGTHW KLALNLEERP NRVRNFLNSP FIVDITKDTI HQQPVVYHLD HFSEFIPEGS QRVGLVASQK
                481        491        501        511        521        531       
Wildtype        NDLDAVALMH PDGSAVVVVL NRSSKDVPLT IKDPAVGFLE TISPGYSIHT YLWRRQ
HGMD_Missense   NDRDPVALMR PDGSPVVVMP KPSSKDVPLT IKYPAVSFPE TISPGYPTHI YLWRCR
dbsnp__Missense NNLDAVALMR PDGSAVVVVL NRSFKDVPPT IKEPAVGFLE TISPGYSIHI YLWCHQ
SNPdbe_SAASs    NDRDPVALMH PDGSAVLVVP KPSSKDVPLT IKYPAVGFLE TISPGYSIHT YLWRCQ

Genetic variants of P04062 from different data source. Purple: SNPs appearing in at least two different databases, Red: SNPs from HGMD, Green: SNPs from dbSNP, Blue: SNPs from SNPdbe. </figtable>

<figure id="fig:compare">

Genetic variants of P04062 annotated in different databases mapped to 2nt0_A. Purple: SNPs with overlaps from at least two different databases, Red: SNPs from HGMD, Green: SNPs from dbSNP, Blue: SNPs from SNPdbe

</figure>

<figure id="fig:venn_compare">

The Venn diagram to compare the SNPs in protein P04062 from different data source.

</figure>


Discussion

In HGMD, for the given searching keywords, the free version returns well organized SNPs entries, i.e. the mutation type of each entry is clearly defined. For each entry, important links to the scientific publications are also available. However, the free version does not contain the up to date data compared to the professional version. dbSNP is a powerful online tool provided by NCBI. Because it is integrated in the whole NCBI online workbench, there are abundant links to the related items which help the user to get further information. Compared to the other databases, dbSNP also provides synonymous SNPs. SNPdbe is the only database which tends to annotate automatically the experimental SNPs data. The layout of the database is well designed and the search engine is powerful. However, the output format can be made more user friendly. OMIM only contains verified SNPs data together with abundant scientific articles which provide further information about related genetic diseases. Similar to OMIM, SNPs in SNPedia are verified by private genomic companies.

References

<references/>