Researching SNPs (PKU)

Mutants walk the earth.

There are mutants on your block. There's a mutant right next door. There's one in your shoes.

We are, face it, all mutants. Humanity amounts to a string of mutations millennia-old, tweaks that gave us an edge over the next brute down the line. Upright posture; opposable thumb; prefrontal lobes.

The all-time specifically human thing is language. Familiar yet mysterious, language is the angelic part of us angel/beasts. It is spiritual, evanescent, fugitive - even gorgeous words like those have an unearthly shimmer. - from philly.com

Short Task Description

In this week's task, we will research SNPs in the PAH gene that cause or do not cause a different phenotype of the phenylalanine hydroxylase. The chosen databases are: The public release of the Human Gene Mutation Database (HGMD), dbSNP, SNPdbe, OMIM and SNPedia. You may find a detailed task description here. A very short journal, which offers a small overview over the things we did can be found here.

Databases

Overview

<figtable id="tab:modelling_scores"> Key Values of the different SNP databases

Database	Last Update	Number of Entries	Number of Entries concerning PAH	Type of information	Sources	Curation/Verification	Comment
HGMD	public after 3 years (quarterly updated)	50,129 (only mis-/nonsense)	397 (only mis-/nonsense)	all types of mutations	current literature	manual and computerised search in current literature	too much advertising
dbSNP	Oct 2011	292,031,791	2590	SNPs, short in/dels, polymorphisms, others	submitted by registered sources (labs, institutes,.. )	clustering of identical submissions by NCBI
SNPdbe	Mar 2012	1,691,464	328	nonsyn. SNPs	Swissprot, dbSNP, PMD, OMIM, 1000 genomes	cf. sources	Predictions of functional effect, experimental evidence if available in source
OMIM	June 2012	21,257 (Summary entries)	1 (64 selected SNPs)	catalog of human genes and genetic disorders and traits	current literature	manually curated
SNPedia	continuous, Wiki-style	29,058	53	SNPs	publicly edited	publicly edited	get genotyped and predicted

</figtable>

HGMD

Click here to get the HGMD entry for the PAH gene. We extracted 397 disease-causing mis- and non-sense SNPs from HGMD. The reference sequence is NM_000277.1.c.

dbSNP

Search for missense SNPs or for synonymous SNPs in the human PAH gene.

As dbSNP offers differentiated search options, we extracted 29 synonymous SNPs and 358 missense SNPs separately. The reference sequence is again NM_000277.1.c.

OMIM

OMIM has a detailed entry on PAH. We will use OMIM to select common variants and interesting SNPs for next week's task.

SNPdbe

SNPdbe provides 328 results for PAH. Following the task description, we looked at the conservation score, to determine quickly, whether a mutation found here is disease causing or not. Our reasoning can be found at this subpage. In short, we take low PSSM scores and a low percentage of conservation as indicator for no disease effect.

SNPedia

The SNPs concerning PAH in SNPedia all appear in dbSNP and there is no additional information attached so we do not investigate them in any separate way.

Distribution of Mutations

<figtable id="tab:mutation_distri"> Distribution of SNPs from different databases on the PAH gene.

Distribution of SNPs from HGMD on the PAH gene.

Distribution of SNPs from SNPdbe on the PAH gene.

Distribution of SNPs from dbSNP on the PAH gene.

</figtable> <figtable id="tab:type_mutation_distri"> Distribution of silent and missense SNPs from dnSNP on the PAH gene.

Distribution of missense SNPs from dbSNP on the PAH gene.

Distribution of silent SNPs from dbSNP on the PAH gene.

</figtable>

Looking only at the histograms of table 2 and not at individual positions, there appear to be less disease causing mutations at the termini of the protein sequence which is as expected as they are functionally less critical. Hotspots appear around positions 55-70, 165-180 and roughly from 250 to 400. The first region is in the center of the regulatory domain, the latter two regions span the whole catalytical domain.
Unfortunately there are too few silent mutations to draw valid conclusions (see table 3), but they appear to be distributed quite randomly and at least the spikes in frequency visible in the histogram do not coincide with the spikes of the other mutation types pictured.

Mapping

We want to present as much information in our map of SNPs on the PAH gene found in <xr id="fig:SNPMapping"> as possible. As it contains information from different sources and multiple annotations for the same positions, some of the annotations may appear to conflict. Consult the descriptions above and the raw material for more information in these cases. Our map lists all the gathered information and signifies their (somtimes: inferred) severity by color coding.

Mapping to Sequence

Mapping of SNP entries found in the different databases. The bar on the left symbolizes the sequence of the PAH-gene. Every horizontal line refers to at least one entry from any of the databases. The numbers on the left of the bar refer to their amino acid position. Colorcoding in the middle section which contains the different mutations:
red: disease causing according to HGMD
orange: missense, but no entry in HGMD
green: silent mutation
Colorcoding in the last column of the picture is for better overview and is according to the meaning of the effect from SNPdbe:
red: probably disease causing
green: probably not disease causing

</figure>

Mapping to Structure

In order to see where the SNPs which we already mapped to the sequence are in the structure we used the PDB structure of 2PAH and marked the residues according to the colorcoding we used above. This means: red is a residue known to be disease causing, orange is a misssense-SNP with no entry in HGMD and green is a silent mutation. Just keep in mind that several residues have different annotations according to their substitution and that the PDB-structure is not the structure of the complete gene. The structure which can be found in the PDB starts with residue number 118 in the original sequence.

Mapping of the residues which are known to be disease causing.

</figure><figure id="fig:SNPMappingstructuremutation">

Mapping of the residues which are not silent, but no entry in the HGMD was found.

</figure><figure id="fig:SNPMappingstructuresilent">

Mapping of the residues with silent variants.

</figure><figure id="fig:SNPMappingstructureall">

Mapping of all residues, from figures 2, 3 and 4 with the same colorcoding. Note that for each residue we only used the color which has the strongest impact. Meaning disease causing>mutation>silent

</figure>

For the silent and the SNPs not found in HGMD, there are no surprises to be found in the structure, most of them are to be found on the outside of the protein, or in the coiled regions which can be seen in <xr id="fig:SNPMappingstructuremutation"/> and <xr id="fig:SNPMappingstructuresilent"/>. The other thing we expected was the low number of silent and noneffective mutations as disease causing mutations (visible by patients with the disease phenotype) draw more research attention.
For the disease causing residues on the other hand, we were quite surprised that we found so many and for them to be located in the coiled regions almost as often as in the structural regions(<xr id="fig:SNPMappingstructuredisease"/>). Most of the SNPs were in the inside of the protein, where the cosubstrate-, iron- and substrate-binding site is located, so in this part we found exactly what we expected.

Researching SNPs (PKU)

Contents

Short Task Description

Databases

Overview

HGMD

dbSNP

OMIM

SNPdbe

SNPedia

Distribution of Mutations

Mapping

Mapping to Sequence

Mapping to Structure

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools