Task 7: Research SNPs

From Bioinformatikpedia
Revision as of 23:18, 1 September 2013 by Betza (talk | contribs) (Databases comparison)

Lab journal Task 7

<css> table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; width: 60%; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;} </css>

<css> table.colBasic3 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; width: 30%; }

.colBasic3 th,td { padding: 3px; border: 1px solid black; }

.colBasic3 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic3 tr th { background-color:#efefef; color: black;} .colBasic3 tr:first-child th { background-color:#adceff; color:black;}

</css>

HGMD (The Human Gene Mutation Database)

The search results for the HFE gene contain the different types of mutations that are specified in <xr id="hgmd"/>:

<figtable id="hgmd">

mutation type definition number
missense, nonsense mutation that leads to a change of amino acid or a stop codon 28
splicing mutation that affects mRNA splicing 3
regulatory substitiution causing abnormal regulation 1
small deletion micro deletion (<= 20 bp) 4
small insertion micro insertions (<= 20 bp) 1
small indel micro indels (<= 20 bp) 0
gross deletion delition > 20 bp 2
gross insertions/duplications insertion > 20bp 0
complex rearrangments rearrangements of stretches of the DNA sequence 1
repeat variations differences in repeat length 0
Table 1: Table of the different mutation types that were found for HFE in the HGMD.

</figtable>

In total, we found 40 mutation in the public version of the database and 49 in the non-public version.

<figtable id="hgmd missense">

accession number codon change aa change codon number
CM032270 AGGc-AGC Arg-Ser 6
CM091838 TTG-TGG Leu-Trp 46
CM994469 cGTG-ATG Val-Met 53
CM994470 cGTG-ATG Val-Met 59
HM971246 CATg-CAC His-His 63
CM960827 tCAT-GAT His-Asp 63
CM990718 gAGT-TGT Ser-Cys 65
CM033969 tCGC-TGC Arg-Cys 66
CM020721 cCGA-TGA Arg-Term 71
CM990719 aGGG-CGG Gly-Arg 93
CM990720 ATT-ACT Ile-Thr 105
CM990721 CAAg-CAC Gln-His 127
CM091839 aGAC-AAC Asp-Asn 129
CM091840 TACg-TAG Tyr-Term 138
CM004810 gGAG-CAG Glu-Gln 168
CM004106 gGAG-TAG Glu-Term 168
CM004107 TGG-TAG Trp-Term 169
CM015326 GCC-GTC Ala-Val 176
CM081301 CTG-CCG Leu-Pro 183
CM034097 CGG-CAG Arg-Gln 224
CM101181 cCAG-TAG Gln-Term 233
CM024530 tGTA-TTA Val-Leu 272
CM994771 aGAG-AAG Glu-Lys 277
CM960828 TGC-TAC Cys-Tyr 282
CM004391 TGC-TCC Cys-Ser 282
CM032271 CAG-CCG Gln-Pro 283
HM030028 GTG-GCG Val-Ala 295
CM990722 AGG-ATG Arg-Met 330
Table 2: 28 missense and nonsense mutations for HFE from the HGMD. All of them are disease causing.

</figtable>

The 28 missense and nonsense mutations for HFE are listed in <xr id="hgmd missense"/> together with the amino acid (aa) change and the codon number. They are all connected with the hemochromatosis phenotype.

dbSNP

dbSNP was searched for non-synonymous and silent (synonymous) mutations of the HFE gene. Silent mutations are mutations in the nucleotide sequence that do not lead to a change in the amino acid sequence of the protein.

<figtable id="dbSNP all">

cluster ID Function codon number codon pos nucleotide change aa change
rs149342416 missense 6 3 G -> C Arg -> Ser
rs114758821 synonymous 7 3 G -> A Pro -> Pro
rs368895240 synonymous 10 3 C -> T Leu -> Leu
rs201657128 missense 14 1 C -> G Leu -> Val
rs143662783 missense 17 2 C -> T Thr -> Ile
rs148161858 missense 23 2 G -> A Arg -> His
rs2242956 missense 35 2 T -> C Met -> Thr
rs377254261 missense 37 2 C -> T Ala -> Val
rs147297176 synonymous 58 3 C -> T Phe -> Phe
rs147426902 synonymous 63 3 T -> C His -> His
rs139523708 missense 67 2 G -> A Arg -> His
rs62625342 synonymous 76 3 C -> T Ser -> Ser
rs376650371 missense 97 3 G -> A Met -> Ile
rs199988202 missense 106 2 T -> C Met -> Thr
rs200706856 missense 129 1 G -> A Asp -> Asn
rs201885016 missense 130 2 A -> G Asn -> Ser
rs369790080 synonymous 132 3 C -> T Thr -> Thr
rs372789940 missense 141 1 G -> A Asp -> Asn
rs199879669 missense 157 1 G -> C Ala -> Pro
rs145475682 missense 162 1 G -> T Ala -> Ser
rs148480830 synonymous 162 3 C -> G Ala -> Ala
rs144170531 missense 166 1 A -> G Lys -> Glu
rs146519482 missense 168 1 G -> C Glu -> Gln
rs199916850 missense 183 2 T -> C Leu -> Pro
rs140957442 nonsense 192 1 C -> T Gln -> [Te
rs4986950 missense 217 2 C -> T Thr -> Ile
rs144797937 missense 224 1 C -> T Arg -> Trp
rs62625346 missense 224 2 G -> A Arg -> Gln
rs140515012 missense 245 1 C -> G Pro -> Ala
rs150402693 missense 251 3 C -> A Phe -> Leu
rs182920795 synonymous 253 3 T -> A Pro -> Pro
rs202068193 missense 256 1 G -> A Val -> Ile
rs143846467 missense 259 2 A -> G Asn -> Ser
rs140080192 missense 277 1 G -> A Glu -> Lys
rs369354634 synonymous 281 3 G -> A Thr -> Thr
rs201310322 synonymous 292 3 C -> T Pro -> Pro
rs143175221 missense 295 2 T -> C Val -> Ala
rs114038675 synonymous 298 3 G -> A Glu -> Glu
rs372856303 synonymous 301 3 G -> A Pro -> Pro
rs147519426 missense 315 2 T -> G Val -> Gly
rs148632352 synonymous 315 3 T -> C Val -> Val
rs371192232 synonymous 317 3 C -> T Val -> Val
rs141229562 missense 318 1 G -> A Val -> Ile
rs150716212 missense 322 2 T -> C Ile -> Thr
rs138993448 missense 327 2 T -> C Ile -> Thr
rs368122334 missense 340 2 G -> C Gly -> Ala
rs35201683 synonymous 342 3 C -> T Tyr -> Tyr
rs370285936 missense 343 2 T -> A Val -> Asp
rs146508927 missense 347 2 G -> A Arg -> His
Table 3: All 49 mutations in the HFE gene from dbSNP.

</figtable>

In total, we found 49 SNPs in the transcript variant 1 of the HFE gene. They are listed in <xr id="dbSNP all"/>. The column "Function" states if the mutation is synonymous or non-synonymous.

SNPdbe

35 mutations that are associated with the human HFE protein were found in SNPdbe.

<figtable id="exp evidence">

exp. evidence count
1000Genome,freq,cluster 6
by cluster 6
by cluster,freq 2
Not validated 13
by freq 8
Table 4: Types of experimental evidence and their occurence among the 35 mutations associated with HFE.

</figtable>

Not all mutations have experimental evidence, over one third is not validated, see <xr id="exp evidence"/>.

<figtable id="snpdbe">

dbSNP Mutation Disease association Experimental evidence
rs1799945 H63D In hereditary haemochromatosis (HH) (PMD) 1000Genome,freq,cluster
rs1800562 C282Y hemochromatosis (SwissVar) 1000Genome,freq,cluster
rs1800730 S65C hemochromatosis (SwissVar). In hereditary hemochromatosis patient who had resulted positive to screening for iron overload (PMD) 1000Genome,freq,cluster
rs2242956 M35T N/A by cluster
rs4986950 T217I N/A by cluster,freq
rs28934595 Q127H hemochromatosis (SwissVar). In variegate porphyria (VP) (PMD) by cluster
rs28934596 I105T hemochromatosis (SwissVar). In hemochromatosis (PMD) by cluster
rs28934597 G93R hemochromatosis (SwissVar). In hemochromatosis (PMD) by cluster
rs28934889 V53M N/A 1000Genome,freq,cluster
rs62625346 R224Q N/A by cluster,freq
rs28934890 V59M N/A Not validated
rs111033558 R330M hemochromatosis (SwissVar). In hereditary haemochromatosis (HH) (PMD) by cluster
rs111033563 Q283P hemochromatosis (SwissVar) by cluster
rs149342416 R6S hemochromatosis (SwissVar) by freq
rs140080192 E277K N/A 1000Genome,freq,cluster
rs143175221 V295A hemochromatosis (SwissVar) by freq
rs148161858 R23H N/A 1000Genome,freq,cluster
N/A M106T N/A Not validated
rs146519482 E168Q N/A Not validated
N/A L183P N/A Not validated
rs138176635 E252G N/A Not validated
rs138993448 I327T N/A Not validated
rs139523708 R67H N/A Not validated
rs140515012 P245A N/A Not validated
rs141229562 V318I N/A Not validated
rs143662783 T17I N/A by freq
rs143846467 N259S N/A Not validated
rs144170531 K166E N/A by freq
rs144797937 R224W N/A by freq
rs145475682 A162S N/A by freq
rs146508927 R347H N/A by freq
rs147519426 V315G N/A by freq
rs149662565 P160T N/A Not validated
rs150402693 F251L N/A Not validated
rs150716212 I322T N/A Not validated
Table 5: Mutations in the HFE protein from SNPdbe.

</figtable>

A list of the 35 mutations and their experimental evidence can be found in <xr id="snpdbe"/>.

OMIM

OMIM (Online Mendelian Inheritance in Man) also contains information about HFE, but only a small amount of all known mutations can be found.

<figtable id="omim">

dbSNP accession Phenotype Mutation
rs1799945 HEMOCHROMATOSIS HIS63ASP
rs1800730 HEMOCHROMATOSIS SER65CYS
rs28934889 HFE POLYMORPHISM VAL53MET
rs28934595 HEMOCHROMATOSIS GLN127HIS
rs111033558 HEMOCHROMATOSIS ARG330MET
rs28934596 HEMOCHROMATOSIS ILE105THR
rs28934597 HEMOCHROMATOSIS GLY93ARG
rs111033563 HEMOCHROMATOSIS GLN283PRO
Table 6: Mutations from OMIM that are associated with the HFE protein .

</figtable>

A complete list of the mutations can be found in <xr id="omim"/>. Two of the mutations are poylmorphisms of the HFE protein and 8 of them cause the disease hemochromatosis.

Databases comparison

<figtable id="comparison">

database last update version what information where from # entries homo sapiens # HFE mutations
HGMD spring 2013 public 2013.1 (mainly 3 year old data) Collection of published gene lesions in the human genome that cause inherited diseases. Only from publications. Journals are searched manually and by computational means each week. 99869 28 missense/nonsense
dbSNP 26.06.2012 Build 137 Short nucleotide sequence variations in different organisms (common and rare) Submissions from laboratories but also private research companies. 192,678,553 10 synonymous, 41 non-synonymous, 10 disease causing SNPs and 162 SNPs in the UTR
SNPdbe 05.03.2012 - Annotations for single amino acid substitutions (SAASs), e.g. functional effect (experimental, predicted), associated disease, evol. conservation,... Based on entries from SwissProt, dbSNP, 1000 Genomes, PMD 967879 10 disease associated, 25 other
OMIM daily - Compendium of human genes, genetic phenotypes and diseases. 3,035 genes with phenotype-causing mutations known. Information from publications and databases is reviewed and summed up in texts by scientists. 21,934 2 8 disease causing mutations, 2 other
Table 7: Comparison of HGMD, dnSNP, SNPdbe and OMIM with respect to the date of the last update, the information contained in the databases and the number of entries.

</figtable>

<xr id="comparison"/> contains a comparison of the four databases used in this task. It clearly shows that OMIM is most up to date and also the private version of HGMD is from 2013, but dbSNP, SNPdbe and execially the public HGMD have not been updated for over a year. dbSNP is very large and is the only database that also contains synonymous SNPs.

Mutation map

In total, we could collect a set of 72 point mutations from the different databases, containing missense, nonsense mutations and also synonymous SNPs. The complete list can be viewed in the Lab journal Task 7.

<figure id="mutation map">

Figure 1: Mutation map for the HFE protein. Disease causing mutations are marked in red and non disease causing mutations with blue. The protein sequence with the MHC antigen-recognition domain (green) and the Immunoglobulin domain (blue) are shown below. dc: disease causing

</figure>

<xr id="mutation map"/> shows the mutation map for HFE. Disease causing mutations (red) and non disease causing mutations (blue) are nearly equally distributed over the protein sequence. Nevertherless, there are some region where only dc mutation are located. Those regions are probably especially relevant for the protein function and can not be mutated without a dicrease in function.

<figtable id="mutations struc">

Mutations front HFE.png
Mutations top hfe.png
Mutations side hfe.png
Figure 2: Visualisation of the location of the different mutations in the pdb structure for 1A6Z chain A (HFE). The structure is shown in three different orientations. The protein domains are coloured accoording to <xr id="mutation map"/> with the MHC antigen-recognition domain in green and the Immunoglobulin domain in blue. Disease causing mutations are marked in red and non disease causing mutations in grey.

</figtable>

In addition to the mutation map, <xr id="mutations struc"/> shows a 3D visualisation of the location of the mutated amin acids in the protein structure (1A6Z,A). Most disease causing mutations are located in secondary structure regions, but there are also some mutations that are located in loops. Therefore, it is difficult to tell which regions of the structure are important for the function when only looking at the the location of the mutations.

References

Stenson et al (2009), The Human Gene Mutation Database (HGMD®): 2008 Update. Genome Med 1(1):13
http://www.ncbi.nlm.nih.gov/books/NBK3848/
ftp://ftp.ncbi.nih.gov/pub/factsheets/Factsheet_SNP.pdf
https://www.rostlab.org/services/snpdbe/