Researching SNPs TSD
Oh it was gorgeousness and gorgeosity made flesh. The trombones crunched redgold under my bed, and behind my gulliver the trumpets three-wise silverflamed, and there by the door the timps rolling through my guts and out again crunched like candy thunder. Oh, it was wonder of wonders. And then, a bird of like rarest spun heavenmetal, or like silvery wine flowing in a spaceship, gravity all nonsense now, came the violin solo above all the other strings, and those strings were like a cage of silk round my bed. Then flute and oboe bored, like worms of like platinum, into the thick thick toffee gold and silver. I was in such bliss, my brothers.
-A Clockwork Orange
The journal for this task can be found here.
Contents
Sequence mapping
The different databases use different sequences as basis for the indices of their SNP data. In the following, the reference protein sequence remains P06865, however all databases base their annotations on nucleotide sequences as well. While the final annotations will only be displayed mapped onto the protein sequence, NM_000520.4 will be used as a nucleotide reference sequence in the background. This entry describes an mRNA of HEXA and is also linked to by the Uniprot entry of P06865.
HGMD lists NM_000520.3 as reference, which is a previous version of NM_000520.4 that was chosen as reference for this task. A Needleman-Wunsch pairwise sequence alignment between the two nucleotide sequences in the entries shows that there are two single nucleotide differences in the last third of the sequence and that the more current version of the entry is 117 nucleotides longer at the beginning of the sequence. Since this region is annotated to belong to an exon, the question remains whether this has an effect on the protein sequence. A short comparison shows that there is a single differing residue at position 436 where a Val in NM_000520.3 is subsituted by an Ile in NM_000520.4. However since HGMD does not list a SNP at this position, this is not an issue.
A table here?, with the different reference sequences per database? snpdbe has a lot tough! HGMD lists 'NM_000520.3' as
Mutations
Mutations are described by the simple scheme of TODO. Nonsense mutations were ignored, if present in the database at all, since it would not make sense to predict an effect for these in the following tasks.
HGMD
Database description
The Human Gene Mutation Database (HGMD) <ref name="hgmd">Stenson,P.D. et al. (2009) The Human Gene Mutation Database: 2008 update. Genome medicine, 1, 13.</ref> is freely available to non-profit organisations and academic users. This free version is updated with a delay of three years after inclusion in the database <ref name="hgmd_inclusiondelay">http://www.hgmd.cf.ac.uk/docs/disclaimer.html</ref>. Indeed the most recent mutation linked to HEXA was published in 2008. However in the whole database there seems to be a small number of entries with publication dates from 2010 to 2012 that are also available in the free version <ref name="hgmd_statistics">http://www.hgmd.cf.ac.uk/ac/hahaha.php</ref>, the exact mechanism is therefore not entirely clear.
HGMD is updated semi-automatically, amognst others, by screening the PubMed database. In contrast to other databases like dbSNP the same mutation is only recorded once and attributed to the publication that first mentioned it <ref name="hgmd"/>. The entries are not limited to SNPs but also include splice site changes, small and larger insertions and deletions as well as changes affecting regulation and complex rearrangements like inversions. Synonymous SNPs however are not recorded <ref name="hgmd"/>.
Currently HGMD contains 88745 entries.
HGMD Professional
A 3-day trial for the professional version of HGMD has been requested. This section will be updated once a reply is received.
dbSNP
dbSNP stores the information on diverse DNA variations such as single base nucleotide substitutions and short deletion and insertion polymorphisms. It was established by the National Center for Biotechnology Information and is now available as build 135 with a last update on Oct 2011. Up to now it consists of 292,031,791 submissions <ref name="dbSNPsummary"> http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi </ref>.
A search for all mutations reveals 579 entries. Some rs IDs are obsolete and redundant thus there are 526 unique and up to date SNPs alltogether for HEXA in homo sapiens. 406 of those are in an intron region, 14 mRNA utr, 14 nonsense or stop gained. There are 18 unique non-synonymous mutations and 51 unique missense mutations. What kind of mutations the remaining 23 entries are remains in disguise.
SNPdbe
SNPdbe is a database of non-synonymous SNPs in the form of single amino acid substitutions (SAASs). It combines the data from dbSNP, SwissProt (including SwissVar), PMD and 1000 genomes. <ref name="snpdbe"> C Schaefer, A Meier, B Rost, Y Bromberg (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics 28(4):601-602.</ref>
hexosaminidase A in human: 76 entries 55 are present in dbSNP with protein identifier NP_000511, 21 are from Swissprot or SwissVar id P06865
OMIM
don't know yet where to look
SNPedia
TODO which way? something missing
37 entries were retrieved this way. After excluding entries that are neither missense nor synonymous mutations and removing entry rs121907980, which does not contain information on the change on the protein level, 30 entries remain, one of which is a synonymous change. The 30 entries found are a subset of those found by querying dbSNP <ref name"snpedia_dbsnp_intersection">Intersection SNPedia and dbSNP</ref>.
References
<references/>