Researching SNPs TSD

From Bioinformatikpedia
Revision as of 12:12, 10 June 2012 by Reeb (talk | contribs)

Oh it was gorgeousness and gorgeosity made flesh. The trombones crunched redgold under my bed, and behind my gulliver the trumpets three-wise silverflamed, and there by the door the timps rolling through my guts and out again crunched like candy thunder. Oh, it was wonder of wonders. And then, a bird of like rarest spun heavenmetal, or like silvery wine flowing in a spaceship, gravity all nonsense now, came the violin solo above all the other strings, and those strings were like a cage of silk round my bed. Then flute and oboe bored, like worms of like platinum, into the thick thick toffee gold and silver. I was in such bliss, my brothers.

-A Clockwork Orange

The journal for this task can be found here.

Sequence mapping

The different databases use different sequences as basis for the indices of their SNP data. In the following, the reference protein sequence remains P06865, however all databases base their annotations on nucleotide sequences as well. While the final annotations will only be displayed mapped onto the protein sequence, NM_000520.4 will be used as a nucleotide reference sequence in the background. This entry describes an mRNA of HEXA and is also linked to by the Uniprot entry of P06865.

HGMD lists NM_000520.3 as reference, which is a previous version of NM_000520.4 that was chosen as reference for this task. A Needleman-Wunsch pairwise sequence alignment between the two nucleotide sequences in the entries shows that there are two single nucleotide differences in the last third of the sequence and that the more current version of the entry is 117 nucleotides longer at the beginning of the sequence. Since this region is annotated to belong to an exon, the question remains whether this has an effect on the protein sequence. A short comparison shows that there is a single differing residue at position 436 where a Val in NM_000520.3 is subsituted by an Ile in NM_000520.4. However since HGMD does not list a SNP at this position, this is not an issue.


A table here?, with the different reference sequences per database? snpdbe has a lot tough! HGMD lists 'NM_000520.3' as


Mutations

Mutations are described by the simple scheme of TODO. Nonsense mutations were ignored, if present in the database at all, since it would not make sense to predict an effect for these in the following tasks.

HGMD

Database description

The Human Gene Mutation Database (HGMD) <ref name="hgmd">Stenson,P.D. et al. (2009) The Human Gene Mutation Database: 2008 update. Genome medicine, 1, 13.</ref> is freely available to non-profit organisations and academic users. This free version is updated with a delay of three years after inclusion in the database <ref name="hgmd_inclusiondelay">http://www.hgmd.cf.ac.uk/docs/disclaimer.html</ref>. Indeed the most recent mutation linked to HEXA was published in 2008. However in the whole database there seems to be a small number of entries with publication dates from 2010 to 2012 that are also available in the free version <ref name="hgmd_statistics">http://www.hgmd.cf.ac.uk/ac/hahaha.php</ref>, the exact mechanism is therefore not entirely clear.
HGMD is updated semi-automatically, amognst others, by screening the PubMed database. In contrast to other databases like dbSNP the same mutation is only recorded once and attributed to the publication that first mentioned it <ref name="hgmd"/>. The entries are not limited to SNPs but also include splice site changes, small and larger insertions and deletions as well as changes affecting regulation and complex rearrangements like inversions. Synonymous SNPs however are not recorded <ref name="hgmd"/>.
Currently HGMD contains 88745 entries.

HGMD Professional

A 3-day trial for the professional version of HGMD has been requested. This section will be updated once a reply is received.

dbSNP

dbSNP stores the information on diverse DNA variations such as single base nucleotide substitutions and short deletion and insertion polymorphisms. It was established by the National Center for Biotechnology Information and is now available as build 135 with a last update on Oct 2011. Up to now it consists of 292,031,791 submissions <ref name="dbSNPsummary"> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi </ref>.

A search for all mutations reveals 579 entries. Some rs IDs are obsolete and redundant thus there are 526 unique and up to date SNPs alltogether for HEXA in homo sapiens. 406 of those are in an intron region, 14 mRNA utr, 14 nonsense or stop gained. There are 18 unique non-synonymous mutations and 51 unique missense mutations. What kind of mutations the remaining 23 entries are remains in disguise.

SNPdbe

SNPdbe is a database of non-synonymous SNPs in the form of single amino acid substitutions (SAASs). It combines the data from dbSNP, SwissProt (including SwissVar), PMD and 1000 genomes. This webinterface was designed to provide combined annotation such as disease information and functional prediction. The database currently holds 1,691,464 entries and was last updated in March 2012 <ref name="snpdbe"> C Schaefer, A Meier, B Rost, Y Bromberg (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics 28(4):601-602.</ref>.


The search for hexosaminidase A in human yields 76 entries. Of those 55 are present in dbSNP with protein identifier NP_000511 and 21 are solely from Swissprot or SwissVar with the accession number P06865. 18 snps exp validated, conservation?

OMIM

don't know yet where to look

SNPedia

Database description

SNPedia is a freely available wiki-based database of SNPs <ref name="snpedia">http://en.wikipedia.org/wiki/SNPedia (since SNPedia references this article themselves (http://www.snpedia.com/index.php/SNPedia:FAQ#What_is_SNPedia.3F) Wikipedia will be accepted as a citation)</ref>. The bulk of information are entries crosslinked to dbSNP via Rs-Identifiers, therefore anything in dbSNP can potentially be present in SNPedia as well. In addition there are pages for identifiers from 23andMe <ref name="23andme">https://www.23andme.com/</ref>, haplotypes and even pages describing information gained from complete genomes of specific people <ref name="snpedia_faq_onlysnps">http://www.snpedia.com/index.php/SNPedia:FAQ#SNPs_only.3F_what_about_CNVs.2C_indels.2C_inversions.2C_epigenetics_..._.3F</ref>.
Due to the nature of the wiki system every user is able to add information at any time. In addition, periodic updates based on text-mining are fed into the database as well <ref name="snpedia_nar">Cariaso,M. and Lennon,G. (2012) SNPedia: a wiki supporting personal genome annotation, interpretation and analysis. Nucleic acids research, 40, D1308-12.</ref>. Quality of the data is ensured on the hand by manual curation of users and editors and on the other hand by automated external programs that check for inconsistencies or missing information on a regular basis <ref name="snpedia_nar"/>.
At the time of writing SNPedia claims to contain 29135 SNPs <ref name="snpedia_faq_howmany">http://www.snpedia.com/index.php/SNPedia:FAQ#How_many_SNPs_are_in_SNPedia.3F</ref>, however this number was last update in August 2011 <ref name="snpedia_faq_history">http://www.snpedia.com/index.php?title=SNPedia:FAQ&action=history</ref>. Given the previous trends <ref name="snpedia_nar"/> it number should have significantly increased by now.


HEXA mutations

SNPedia does contain a dedicated subpage for TSD, however only few SNPs are listed there. More importantly SNPedia does not contain a dedicated page for HEXA. Therefore SNPs were searched with the query 'Gene = HEXA'. This results in 36 entries, most of which were last updated in Februrary 2012 by the automated SNPediaBot. One entry (Rs28940871) contains additional, user added information, all others are empty apart from the cross-references to other databases. During retrieveal one entry was already excluded, since it did not contain information of the change on the protein level in dbSNP. Additionally 6 more entries were excluded because they described neither missense nor synonymous mutations. There final set of SNPs from SNPedia therefore consists of 29 missense mutations and one synonymous mutation, all of which are also contained in the SNPs that were retrieved directly from dbSNP <ref name"snpedia_dbsnp_intersection">Intersection SNPedia and dbSNP</ref>.

Mutation map

Conclusion?

SNPedia not of great help, we knew all these already and the entries did not really contain additional information (apart from one.) but the possibility is there. (not a lot of active users, link to this on statistics subpage)

References

<references/>