Mapping SNPs

From Bioinformatikpedia

by Robert Greil and Cedric Landerer

HGMD

The Human Genom Mutation Database<ref>http://link.springer.de/link/service/journals/00439/papers/6098005/60980629.pdf</ref> is a collection of disease related proteins and mutations provided by the Cardiff University.
Mutation types:

  • Missense/nonsense: codon change to code for a different amino acid / premature stop codon
  • Splicing: modification of the number and composition of exons.
  • Regulatory: mutations that effects the regulation of gene expression
  • Small deletions: deletion if amino acids.
  • Small insertions: insertion of amino acids.
  • Small indels: replacement of amino acids.
  • Gross deletions: large deletions caused by DNA structure change
  • Gross insertions/duplications: large insertion and dulications caused by DNA structure change
  • Complex rearrangements: change of gene position
  • Repeat variations: number of microsatelite repeats which can differ in homologous proteins, splicing variants and expression types.


HFE

Main article: Hemochromatosis

Mutation type # of mutations
Missense/nonsense 24
Splicing 3
Regulatory 1
Small deletions 4
Small insertions 1
Small indels 0
Gross deletions 1
Gross insertions/duplications 0
Complex rearrangements 1
Repeat variations 0
Public total 35

The sequence used by HGMD is the DNA sequence of the HFE protein. We used the 'DNA-RNA-protein' (the translator is somewhat buggy and only functional with the IE) to translate the DNA sequence into RNA into the protein sequence. Afterwards we used Jalview 2.6.1 to align the translated protein sequence against the UniProt FASTA: HFE_HUMAN Amino acid sequence. The alignment shows a 100% match. We will use the UniProt sequence for the visualization of the SNPs.

dbSNP<ref>http://www.ncbi.nlm.nih.gov/pmc/articles/PMC29783/?tool=pubmed</ref>

synonymous

'Synonymous' or 'silent' mutation are mutation, that only affect the nucleotide but not the amino acid sequence. Therefore the protein stays the same, because the mutation does not change the encoding codon or the mutated codon does encode the same amino acid. But in special cases, a silent mutation can lead to a change in the intron/exon splice site which can also change the amino acid, but will not be seen in the nucleotide sequence.

We found 4 SNP's in the dbSNP for homo sapiens by using

  • "synonymous-codon"[Function_Class] AND HFE[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS]

as query.

ID Sequence Mutation Position
rs114758821 GGGGAAATGGGCCCGCGAGCCAGGCC[A/G]GCGCTTCTCCTCCTGATGCTTTTGC A/G 7
rs114038675 CTTTAACTTGCTTTTTCTGTTTTAGA[A/G]CCCTCACCGTCTGGCACCCTAGTCA A/G 26
rs62625342 GTGGAGCCCCGAACTCCATGGGTTTC[C/T]AGTAGAATTTCAAGCCAGATGTGGC C/T 76
rs35201683 TCCACAGGAGGAGCCATGGGGCACTA[C/T]GTCTTAGCTGAACGTGAGTGACACG C/T 70

For rs62625342, no position is assigned. We translated the nucleotide sequence into an amino acid sequence and aligned it with the reference sequence to find the correct position with respect to the reading frame.

non synonymous

'Non synonymous' or 'missense' mutation are mutation, that affect the nucleotide and the amino acid sequence. This happens if a codon is changed into a codon, that encodes a different amino acid. Therefore the sequence of the protein changes. This can lead to functional changes or the total breakdown of the function.


We found 13 SNP's in the dbSNP for homo sapiens by using

  • "missense"[Function_Class] AND HFE[GENE] AND "human"[ORGN] AND "snp"[SNP_CLASS]

as query.

ID Sequence Mutation Residue change Position
rs111033563 TGGGGAAGAGCAGAGATATACGTGCC[A/C]GGTGGAGCACCCAGGCCTGGATCAG A/C Q/P 283
rs111033558 CATTGGAATTTTGTTCATAATATTAA[G/T]GAAGAGGCAGGGTTCAAGTGAGTAG G/T R/M 58
rs111033557 TGGGCTACGTGGATGACCAGCTGTTC[A/G]TGTTCTATGATCATGAGAGTCGCCG A/G V/M 59
rs62625346 TGTGACCTCTTCAGTGACCACTCTAC[A/G]GTGTCGGGCCTTGAACTACTACCCC A/G R/Q 224
rs28934889 TTTCCTTGTTTGAAGCTTTGGGCTAC[A/G]TGGATGACCAGCTGTTCGTGTTCTA A/G V/M 53
rs28934597 GGCTGCAGCTGAGTCAGAGTCTGAAA[C/G]GGTGGGATCACATGTTCACTGTTGA C/G G/R 93
rs28934596 CATGTTCACTGTTGACTTCTGGACTA[C/T]TATGGAAAATCACAACCACAGCAAG C/T I/T 105
rs28934595 CAGGTCATCCTGGGCTGTGAAATGCA[A/C]GAAGACAACAGTACCGAGGGCTACT A/C Q/H 127
rs4986950 TTTGGTGAAGGTGACACATCATGTGA[C/T]CTCTTCAGTGACCACTCTACGGTGT C/T T/I 217
rs2242956 TTCACACTCTCTGCACTACCTCTTCA[C/T]GGGTGCCTCAGAGCAGGACCTTGGT C/T M/T 35
rs1800730 AGCTGTTCGTGTTCTATGATCATGAG[A/T]GTCGCCGTGTGGAGCCCCGAACTCC A/T S/C 65
rs1800562 CCCTGGGGAAGAGCAGAGATATACGT[A/G]CCAGGTGGAGCACCCAGGCCTGGAT A/G C/Y 282
rs1799945 ATGACCAGCTGTTCGTGTTCTATGAT[C/G]ATGAGAGTCGCCGTGTGGAGCCCCG C/G H/D 63

Residue exchanges marked in red are not annotated in dbSNP yet. We determined these changes by translating the nucleotide sequence into the amino acid sequence and mapped the different reading frames onto the sequence by using a global alignment method (Needlemann-Wunsch). If a frame matches the sequence, we assume that this this frame shows the amino acid exchange. We annotated a mutation based on the UniProt sequence as reference. G/R means that the G is placed in the UniProt sequence and R is placed in the mutated sequence.

Mutation map

Figure 1: Overlap of SNP's annotated in HGMD and dbSNP

3 of the 24 missense mutations listed in the HMGD are at the same codon, therefore we marked only the remaining 21 positions below. There are 5 annotated silent SNP's in dbSNP, but rs62625348 is referenced to rs35201683 and the annotation of rs62625342 is incomplete. Therefore we translated the sequence into a amino acid sequence and according to the reading frame, we mapped the SNP onto the sequence.

>sp|Q30201|HFE_HUMAN Hereditary hemochromatosis protein OS=Homo sapiens GN=HFE PE=1 SV=1
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVFYDHESRRVEPRTPWVSSRISSQMWLQLSQSLK
GWDHMFTVDFWTIMENHNHSKESHTLQVILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNRAYLE
RDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWLKDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGE
EQRYTCQVEHPGLDQPLIVIWEPSPSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE


HGMD
dbSNP
dbSNP/HGMD
The overlap between the two databases is shown in Figure 1.

At the moment, we have no informations about functional residues, but I-Tasser predicted a binding site at a beta-sheet/helix region. We saw that most of the SNP's annotated in HGMD and dbSNP are placed in this region. Because HGMD contains just SNPs with a functional influence, and the accumulation of these SNPs in the Alpha/Beta region indicates a functional region. We also see in the 1DE4 pdb file, that the HFE protein binds in this region at transferin. The beta-region seems to interact with B2M. So, the damaging mutations could affect the binding affinity.

References

<references />