Mapping SNPs
by Robert Greil and Cedric Landerer
Contents
HGMD
The Human Genom Mutation Database<ref>http://link.springer.de/link/service/journals/00439/papers/6098005/60980629.pdf</ref> is a collection of disease related proteins and mutations provided by the Cardiff University.
Mutation types:
- Missense/nonsense: codon change to code for a different amino acid / premature stop codon
- Splicing: modification of the number and composition of exons.
- Regulatory: mutations that effects the regulation of gene expression
- Small deletions: deletion if amino acids.
- Small insertions: insertion of amino acids.
- Small indels: replacement of amino acids.
- Gross deletions: large deletions caused by DNA structure change
- Gross insertions/duplications: large insertion and dulications caused by DNA structure change
- Complex rearrangements: change of gene position
- Repeat variations: number of microsatelite repeats which can differ in homologous proteins, splicing variants and expression types.
HFE
Main article: Hemochromatosis
Mutation type | # of mutations |
---|---|
Missense/nonsense | 24 |
Splicing | 3 |
Regulatory | 1 |
Small deletions | 4 |
Small insertions | 1 |
Small indels | 0 |
Gross deletions | 1 |
Gross insertions/duplications | 0 |
Complex rearrangements | 1 |
Repeat variations | 0 |
Public total | 35 |
The sequence used by HGMD is the DNA sequence of the HFE protein. We used the 'DNA-RNA-protein' (the translator is somewhat buggy and only functional with the IE) to translate the DNA sequence into RNA into the protein sequence. Afterwards we used Jalview 2.6.1 to align the translated protein sequence against the UniProt FASTA: HFE_HUMAN Amino acid sequence. The alignment shows a 100% match. We will use the UniProt sequence for the visualization of the SNPs.
dbSNP<ref>http://www.ncbi.nlm.nih.gov/pmc/articles/PMC29783/?tool=pubmed</ref>
synonymous
'Synonymous' or 'silent' mutation are mutation, that only affect the nucleotide but not the amino acid sequence. Therefore the protein stays the same, because the mutation does not change the encoding codon or the mutated codon does encode the same amino acid. But in special cases, a silent mutation can lead to a change in the intron/exon splice site which can also change the amino acid, but will not be seen in the nucleotide sequence.
We found 4 SNP's in the dbSNP for homo sapiens by using
- "synonymous-codon"[Function_Class] AND HFE[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS]
as query.
ID | Sequence | Mutation | Position |
---|---|---|---|
rs114758821 | GGGGAAATGGGCCCGCGAGCCAGGCC[A/G]GCGCTTCTCCTCCTGATGCTTTTGC | A/G | 7 |
rs114038675 | CTTTAACTTGCTTTTTCTGTTTTAGA[A/G]CCCTCACCGTCTGGCACCCTAGTCA | A/G | 26 |
rs62625342 | GTGGAGCCCCGAACTCCATGGGTTTC[C/T]AGTAGAATTTCAAGCCAGATGTGGC | C/T | 76 |
rs35201683 | TCCACAGGAGGAGCCATGGGGCACTA[C/T]GTCTTAGCTGAACGTGAGTGACACG | C/T | 70 |
For rs62625342, no position is assigned. We translated the nucleotide sequence into an amino acid sequence and aligned it with the reference sequence to find the correct position with respect to the reading frame.
non synonymous
'Non synonymous' or 'missense' mutation are mutation, that affect the nucleotide and the amino acid sequence. This happens if a codon is changed into a codon, that encodes a different amino acid. Therefore the sequence of the protein changes. This can lead to functional changes or the total breakdown of the function.
We found 13 SNP's in the dbSNP for homo sapiens by using
- "missense"[Function_Class] AND HFE[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS]
as query.
ID | Sequence | Mutation | Residue change | Position |
---|---|---|---|---|
rs111033563 | TGGGGAAGAGCAGAGATATACGTGCC[A/C]GGTGGAGCACCCAGGCCTGGATCAG | A/C | Q/P | 283 |
rs111033558 | CATTGGAATTTTGTTCATAATATTAA[G/T]GAAGAGGCAGGGTTCAAGTGAGTAG | G/T | R/M | 58 |
rs111033557 | TGGGCTACGTGGATGACCAGCTGTTC[A/G]TGTTCTATGATCATGAGAGTCGCCG | A/G | V/M | 59 |
rs62625346 | TGTGACCTCTTCAGTGACCACTCTAC[A/G]GTGTCGGGCCTTGAACTACTACCCC | A/G | R/Q | 224 |
rs28934889 | TTTCCTTGTTTGAAGCTTTGGGCTAC[A/G]TGGATGACCAGCTGTTCGTGTTCTA | A/G | V/M | 53 |
rs28934597 | GGCTGCAGCTGAGTCAGAGTCTGAAA[C/G]GGTGGGATCACATGTTCACTGTTGA | C/G | G/R | 93 |
rs28934596 | CATGTTCACTGTTGACTTCTGGACTA[C/T]TATGGAAAATCACAACCACAGCAAG | C/T | I/T | 105 |
rs28934595 | CAGGTCATCCTGGGCTGTGAAATGCA[A/C]GAAGACAACAGTACCGAGGGCTACT | A/C | Q/H | 127 |
rs4986950 | TTTGGTGAAGGTGACACATCATGTGA[C/T]CTCTTCAGTGACCACTCTACGGTGT | C/T | T/I | 217 |
rs2242956 | TTCACACTCTCTGCACTACCTCTTCA[C/T]GGGTGCCTCAGAGCAGGACCTTGGT | C/T | M/T | 35 |
rs1800730 | AGCTGTTCGTGTTCTATGATCATGAG[A/T]GTCGCCGTGTGGAGCCCCGAACTCC | A/T | S/C | 65 |
rs1800562 | CCCTGGGGAAGAGCAGAGATATACGT[A/G]CCAGGTGGAGCACCCAGGCCTGGAT | A/G | C/Y | 282 |
rs1799945 | ATGACCAGCTGTTCGTGTTCTATGAT[C/G]ATGAGAGTCGCCGTGTGGAGCCCCG | C/G | H/D | 63 |
Residue exchanges marked in red are not annotated in dbSNP yet. We determined these changes by translating the nucleotide sequence into the amino acid sequence and mapped the different reading frames onto the sequence by using a global alignment method (Needlemann-Wunsch). If a frame matches the sequence, we assume that this this frame shows the amino acid exchange. We annotated a mutation based on the UniProt sequence as reference. G/R means that the G is placed in the UniProt sequence and R is placed in the mutated sequence.
Mutation map
3 of the 24 missense mutations listed in the HMGD are at the same codon, therefore we marked only the remaining 21 positions below. There are 5 annotated silent SNP's in dbSNP, but rs62625348 is referenced to rs35201683 and the annotation of rs62625342 is incomplete. Therefore we translated the sequence into a amino acid sequence and according to the reading frame, we mapped the SNP onto the sequence.
>sp|Q30201|HFE_HUMAN Hereditary hemochromatosis protein OS=Homo sapiens GN=HFE PE=1 SV=1 MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLK GWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLE RDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGE EQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE
HGMD
dbSNP
dbSNP/HGMD
The overlap between the two databases is shown in Figure 1.
At the moment, we have no informations about functional residues, but I-Tasser predicted a binding site at a beta-sheet/helix region. We saw that most of the SNP's annotated in HGMD and dbSNP are placed in this region. Because HGMD contains just SNPs with a functional influence, and the accumulation of these SNPs in the Alpha/Beta region indicates a functional region. We also see in the 1DE4 pdb file, that the HFE protein binds in this region at transferin. The beta-region seems to interact with B2M. So, the damaging mutations could affect the binding affinity.
References
<references />