Difference between revisions of "Researching And Mapping Point Mutations Hemochromatosis"

From Bioinformatikpedia
(SNPdbe)
(Riddle of the task: moved riddle to user my page)
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
[[Hemochromatosis|Hemochromatosis]]>>[[Researching And Mapping Point Mutations Hemochromatosis|Task 5: Researching and mapping point mutations]]
 
[[Hemochromatosis|Hemochromatosis]]>>[[Researching And Mapping Point Mutations Hemochromatosis|Task 5: Researching and mapping point mutations]]
 
== Riddle of the task ==
 
 
 
Previously on "The Tale of Sir Binfo": [[User:Bernhoferm#Riddle_1|Link]]
 
 
After careful thinking you finally decide which Symbol to push and then... nothing...<br>
 
Did you push the wrong button? What if you're trapped forever? ...But suddendly you hear a loud noise. The sound of stone grinding on stone. A trapdoor opens in the floor and reveals a dusty stairway into the darkness below. As you enter it, magical torches on the walls of a long corridor spring to life and lighten your way. The cobwebs and stale air tell you that this way has been unused for a long time.<br>
 
After a few minutes the hallway suddenly ends and you stand before a giant door. At least that's what you think it is as you can't find a doorknob or anything else to open it. The only thing you see is a strange circle of runes and a text beside it:
 
 
 
[[File:bernhoferm_glyph.jpg|thumb|200px|left]]
 
''Here we stand. Undefeated, undivided.''<br>
 
''We guard this door until united.''<br>
 
''Though only one can break us all, for he is King.''
 
<br style="clear:both;">
 
It seems like you could move the two outer rings of runes (separately). Maybe some kind of combination lock?
 
 
<br style="clear:both;">
 
   
 
== Short task description ==
 
== Short task description ==
Line 39: Line 20:
   
   
The public version of [http://www.hgmd.org/ HGMD] contains 88745 mutations total <ref name="hgmd:stat">http://www.hgmd.cf.ac.uk/ac/hahaha.php</ref>. Most (50129) of these are missense/nonsense mutations. HGMD also lists regulatory changes, small and large deletions and insertions, splicing changes, repeat variations, and even complex rearrangements. The database entries come from literature and are manually curated. In contrast to the professional version of HGMD, the public one gets updated with a great delay (2176 additions in the last three years, compared to 34236).
+
The public version of [http://www.hgmd.org/ HGMD] contains 88745 mutations total<ref name="hgmd:stats">http://www.hgmd.cf.ac.uk/ac/hahaha.php</ref>. Most (50129) of these are missense/nonsense mutations. HGMD also lists regulatory changes, small and large deletions and insertions, splicing changes, repeat variations, and even complex rearrangements. The database entries come from literature and are manually curated. In contrast to the professional version of HGMD, the public one gets updated with a great delay (2176 additions in the last three years, compared to 34236).
 
The most recent reference for HFE is from 2008, the second most is from 2004. All of HFE's missense/nonsense mutations are supposed to be involved with hemochromatosis or possible side effects thereof (Altered iron status, Diabetes).
 
The most recent reference for HFE is from 2008, the second most is from 2004. All of HFE's missense/nonsense mutations are supposed to be involved with hemochromatosis or possible side effects thereof (Altered iron status, Diabetes).
   
Line 89: Line 70:
   
   
[http://omim.org/ OMIM] contains information on all known mendelian disorders and over 12,000 genes and is updated daily. Its focus is on the relation between phenotype and genotype. The entries are based on literature references and are manually curated.
+
[http://omim.org/ OMIM] contains information on all known mendelian disorders and over 12,000 genes and is updated daily<ref name="omim:stats">http://omim.org/statistics/entry</ref>. Its focus is on the relation between phenotype and genotype. The entries are based on literature references and are manually curated<ref name="omim:about">http://omim.org/about</ref>.
   
 
OMIM lists the following [http://omim.org/allelicVariant/613609 allelic variants] for [http://omim.org/entry/613609 HFE]:
 
OMIM lists the following [http://omim.org/allelicVariant/613609 allelic variants] for [http://omim.org/entry/613609 HFE]:
Line 112: Line 93:
   
   
[http://www.snpedia.com/index.php/SNPedia SNPedia] currently (June 10, 2012) contains 29135 SNPs. Its based on citing peer-reviewed scientific publications and is updated constantly. SNPedia focuses on SNPs that have significant medical or genealogical consequences, are common, are reproducible, and/or have other historic or medical significance (cf. [http://www.snpedia.com/index.php/SNPedia:FAQ#Which_SNPs_are_selected_to_go_into_the_Wiki.3F_And_where_are_they_selected_from.3F FAQ]).
+
[http://www.snpedia.com/index.php/SNPedia SNPedia] currently (June 10, 2012) contains 29135 SNPs<ref name="snpedia:faq">http://www.snpedia.com/index.php/SNPedia:FAQ</ref>. Its based on citing peer-reviewed scientific publications and is updated constantly. SNPedia focuses on SNPs that have significant medical or genealogical consequences, are common, are reproducible, and/or have other historic or medical significance.
   
 
SNPedia lists 14 SNPs for [http://www.snpedia.com/index.php/HFE HFE]. Two of them are redundant as they have been merged into one entry. All amino acid mutations have already been listed in the dbSNP table. The only exception is [http://www.snpedia.com/index.php/Rs9366637 Rs9366637] which can't be mapped onto the protein sequence as it is located within the first intron.
 
SNPedia lists 14 SNPs for [http://www.snpedia.com/index.php/HFE HFE]. Two of them are redundant as they have been merged into one entry. All amino acid mutations have already been listed in the dbSNP table. The only exception is [http://www.snpedia.com/index.php/Rs9366637 Rs9366637] which can't be mapped onto the protein sequence as it is located within the first intron.
Line 125: Line 106:
 
</figure>
 
</figure>
   
The Venn diagram in <xr id="overlap"/> shows that only 15 out of all 66 SNPs are common to HGMD, SNPdbe, and dbSNP. SNPdbe has the most database unique SNPs (15) while missing only 15 SNPs total. dbSNP contributes the second most SNPs (43, with 9 unique to dbSNP). The two most significant mutations for hemochromatosis (Cys-282-Tyr and His-63-Asp) are shared among all databases.
+
The Venn diagram in <xr id="overlap"/> (created with [http://bioinfogp.cnb.csic.es/tools/venny/index.html Venny]) shows that only 15 out of all 66 SNPs are common to HGMD, SNPdbe, and dbSNP. SNPdbe has the most database unique SNPs (15) while missing only 15 SNPs total. dbSNP contributes the second most SNPs (43, with 9 unique to dbSNP). The two most significant mutations for hemochromatosis (Cys-282-Tyr and His-63-Asp) are shared among all databases.
   
 
SNPs contained in all databases:
 
SNPs contained in all databases:
Line 153: Line 134:
 
</figure>
 
</figure>
   
A list of all unique SNPs can be found [[Hemochromatosis ALL SNP TABLE|here]].
+
A list of all 66 unique SNPs can be found [[Hemochromatosis ALL SNP TABLE|here]].
   
<xr id="3dmap"/> shows a 3D representation of all SNPs that could be mapped onto the 3D structure of HFE (PDB: 1a6zA). This includes all SNPs in the range of 26 to 397 as only these residues are included in the 3D model. Most of the SNPs are within the helices of the MHC I domain and the sheets of the C1 domain. The sheets of the MHC I domain contains far less SNPs.
+
<xr id="3dmap"/> shows a 3D representation of all SNPs that could be mapped onto the 3D structure of HFE (PDB: 1a6zA). This includes all SNPs in the range of 26 to 397 as only these residues are shown in the 3D model. Most of the SNPs are within the helices of the MHC I domain and the sheets of the C1 domain. The sheets of the MHC I domain contain far less SNPs.
   
 
<figure id="distribution">
 
<figure id="distribution">
Line 161: Line 142:
 
</figure>
 
</figure>
   
The distribution of all SNPs along the protein sequence of HFE is shown in <xr id="distribution"/>. Each column spans about 10 residues. The SNPs don't cover the whole sequence, though we only had 66 unique SNPs to map onto the protein sequence. It also seems that the SNPs' distribution is more spikey than evenly.
+
The distribution of all SNPs along the protein sequence of HFE is shown in <xr id="distribution"/>. Each column spans about 10 residues. The SNPs are more frequent at the beginning and the end of the protein with only 6 SNPs between residue 80 and 150. It also seems that the SNPs' distribution is more spikey than evenly.
   
   
Line 179: Line 160:
 
KDKQP<font color="orange">M</font>DA<font color="orange">K</font>EF<font color="orange">E</font><font color="orange">P</font>KDV<font color="orange">L</font>PN<font color="orange">G</font>DGTYQGWITLAV<font color="orange">P</font>PG<font color="orange">E</font>E<font color="orange">Q</font>RYTC<font color="orange">Q</font><font color="orange">V</font>EHPGLDQPLIV<font color="orange">I</font>WE<font color="green">P</font>S
 
KDKQP<font color="orange">M</font>DA<font color="orange">K</font>EF<font color="orange">E</font><font color="orange">P</font>KDV<font color="orange">L</font>PN<font color="orange">G</font>DGTYQGWITLAV<font color="orange">P</font>PG<font color="orange">E</font>E<font color="orange">Q</font>RYTC<font color="orange">Q</font><font color="orange">V</font>EHPGLDQPLIV<font color="orange">I</font>WE<font color="green">P</font>S
 
PSGTLVIGVIS<font color="orange">G</font>IAV<font color="cyan">F</font>VV<font color="orange">I</font>LFI<font color="orange">G</font>ILFI<font color="orange">I</font>LR<font color="orange">K</font>RQGSRGAM<font color="orange">G</font>HY<font color="blue">V</font>LAER<font color="orange">E</font>
 
PSGTLVIGVIS<font color="orange">G</font>IAV<font color="cyan">F</font>VV<font color="orange">I</font>LFI<font color="orange">G</font>ILFI<font color="orange">I</font>LR<font color="orange">K</font>RQGSRGAM<font color="orange">G</font>HY<font color="blue">V</font>LAER<font color="orange">E</font>
  +
  +
<br style="clear:both;">
  +
  +
== References ==
  +
  +
  +
<references/>
  +
<br style="clear:both;">

Latest revision as of 09:34, 12 June 2012

Hemochromatosis>>Task 5: Researching and mapping point mutations

Short task description

Detailed description: Researching and mapping point mutations

In this task we've searched several SNP databases to find SNPs for HFE. These SNPs will then be used in the next assignment.


Protocol

A protocol with a description of the data acquisition and other scripts used for this task is available here.


HGMD

The public version of HGMD contains 88745 mutations total<ref name="hgmd:stats">http://www.hgmd.cf.ac.uk/ac/hahaha.php</ref>. Most (50129) of these are missense/nonsense mutations. HGMD also lists regulatory changes, small and large deletions and insertions, splicing changes, repeat variations, and even complex rearrangements. The database entries come from literature and are manually curated. In contrast to the professional version of HGMD, the public one gets updated with a great delay (2176 additions in the last three years, compared to 34236). The most recent reference for HFE is from 2008, the second most is from 2004. All of HFE's missense/nonsense mutations are supposed to be involved with hemochromatosis or possible side effects thereof (Altered iron status, Diabetes).

SNP statistics for HFE in HGMD (non-professional):

  • Missense/nonsense: 24
  • Splicing: 3
  • Regulatory: 1
  • Small deletions: 4
  • Small insertions: 1
  • Small indels: 0
  • Gross deletions: 2
  • Gross insertions/duplications: 0
  • Complex rearrangements: 1
  • Repeat variations: 0
  • Total: 36


The alignment between the sequence (NM_000410.3) used by HGMD and the uniprot sequence for Q30201 had 100% identity. Thus the SNP positions don't have to be adjusted.

The missense/nonsense SNPs can be found here.


dbSNP

We searched for silent mutations in the dbSNP and retrieved 9 entries. The search was done using this search options. . We also searched for missense and nonsense mutations, retrieving 34 missense mutations (1 redundant) and 2 nonsense mutations. Tables for the mutations can be found here.


The dbSNP web query was introduced Oct 13, 2011, and new submissions (or curations) happen constantly (with the last submission listed as 04/27/2012). Therefore this database should be up to date.

The used methods are listed as a wide variety, mostly PCR and sequencing of DNA done by different groups etc. This means it should be possible to exclude methods or groups you don't trust.


SNPdbe

Other than HGMD, SNPdbe is up-to-date (2012/03/05). It combines 1691464 SAASs (Single Amino Acid Substitutions) from 159142 protein sequences and 2985 organisms<ref name="snpdbe:stats">http://www.rostlab.org/services/snpdbe/statistics.php</ref>. Over half (57%) of these are for homo sapiens. The SAASs are collected from SwissProt, dbSNP, 1000Genomes, and PMD. In addition SNPdbe provides effect predictions (SNAP, SIFT), conservation statistics, and references to experimental evidence (if available) for these SNPs.

We've searched SNPdbe with NP_000401 and Q30201 for SNPs in the human genome which listed 35 and 26 results respectively. Some of these hits overlapped though. This resulted in 46 unique SNPs. Both query sequences have an identity of 100% with each other which means that the annotated positions can be used unaltered.
A further search with "HFE" as query yielded another 193 SNPs. This time sequences different from Q30201 were used for the positions. Thus we created a MSA with the T-Coffee Server and adjusted the positions accordingly. This resulted in a final number of 51 unique SNPs for our SNPdbe search.

The SNPs can be found here. For a SNP to be "disease causing" the wild type must have had a conservation of over 40 or hemochromatosis must have been listed as a disease.


OMIM

OMIM contains information on all known mendelian disorders and over 12,000 genes and is updated daily<ref name="omim:stats">http://omim.org/statistics/entry</ref>. Its focus is on the relation between phenotype and genotype. The entries are based on literature references and are manually curated<ref name="omim:about">http://omim.org/about</ref>.

OMIM lists the following allelic variants for HFE:

  • VAL-53-MET
  • VAL-59-MET
  • HIS-63-ASP
  • SER-65-CYS
  • GLY-93-ARG
  • ILE-105-THR
  • GLN-127-HIS
  • CYS-282-TYR
  • GLN-283-PRO
  • ARG-330-MET
  • 5569G-A


All of the amino acid mutations have been listed before. The only new information is the single nucleotide polymorphism (5569G-A) in the 4th intron of HFE.


SNPedia

SNPedia currently (June 10, 2012) contains 29135 SNPs<ref name="snpedia:faq">http://www.snpedia.com/index.php/SNPedia:FAQ</ref>. Its based on citing peer-reviewed scientific publications and is updated constantly. SNPedia focuses on SNPs that have significant medical or genealogical consequences, are common, are reproducible, and/or have other historic or medical significance.

SNPedia lists 14 SNPs for HFE. Two of them are redundant as they have been merged into one entry. All amino acid mutations have already been listed in the dbSNP table. The only exception is Rs9366637 which can't be mapped onto the protein sequence as it is located within the first intron.


Database comparison

<figure id="overlap">

Figure 1: Overlap between the SNPs annotated by HGMD, SNPdbe, and dbSNP.

</figure>

The Venn diagram in <xr id="overlap"/> (created with Venny) shows that only 15 out of all 66 SNPs are common to HGMD, SNPdbe, and dbSNP. SNPdbe has the most database unique SNPs (15) while missing only 15 SNPs total. dbSNP contributes the second most SNPs (43, with 9 unique to dbSNP). The two most significant mutations for hemochromatosis (Cys-282-Tyr and His-63-Asp) are shared among all databases.

SNPs contained in all databases:

  • Arg-6-Ser
  • Val-53-Met
  • Val-59-Met
  • His-63-Asp
  • Ser-65-Cys
  • Gly-93-Arg
  • Ile-105-Thr
  • Gln-127-His
  • Glu-168-Gln
  • Arg-224-Gln
  • Glu-277-Lys
  • Cys-282-Tyr
  • Gln-283-Pro
  • Val-295-Ala
  • Arg-330-Met


Mapping

<figure id="3dmap">

Figure 2: Mapping of all SNPs in the range of 26 to 297 onto the 3D structure of 1a6zA. SNP positions that are only synonymous are green, missense orange, and nonsense red. Positions that are both synonymous and missense are cyan, while positions that are both missense and nonsense are magenta.

</figure>

A list of all 66 unique SNPs can be found here.

<xr id="3dmap"/> shows a 3D representation of all SNPs that could be mapped onto the 3D structure of HFE (PDB: 1a6zA). This includes all SNPs in the range of 26 to 397 as only these residues are shown in the 3D model. Most of the SNPs are within the helices of the MHC I domain and the sheets of the C1 domain. The sheets of the MHC I domain contain far less SNPs.

<figure id="distribution">

Figure 3: Distribution of SNPs along HFE's sequence. One column spans about 10 residues.

</figure>

The distribution of all SNPs along the protein sequence of HFE is shown in <xr id="distribution"/>. Each column spans about 10 residues. The SNPs are more frequent at the beginning and the end of the protein with only 6 SNPs between residue 80 and 150. It also seems that the SNPs' distribution is more spikey than evenly.


A representation of the SNPs mapped onto the protein sequence of HFE is shown below. The color coding is as follows:

  • green: synonymous only
  • orange: missense only
  • red: nonsense only
  • cyan: synonymous/missense
  • magenta: missense/nonsense
  • blue: synonymous/nonsense
>sp|Q30201|HFE_HUMAN Hereditary hemochromatosis protein OS=Homo sapiens GN=HFE PE=1 SV=1
MGPRARPALLLLMLLQTAVLQGRLLRSHSLHYLFMGASEQDLGLSLFEALGYVDDQLFVF
YDHESRRVEPRTPWVSSRISSQMWLQLSQSLKGWDHMFTVDFWTIMENHNHSKESHTLQV
ILGCEMQEDNSTEGYWKYGYDGQDHLEFCPDTLDWRAAEPRAWPTKLEWERHKIRARQNR
AYLERDCPAQLQQLLELGRGVLDQQVPPLVKVTHHVTSSVTTLRCRALNYYPQNITMKWL
KDKQPMDAKEFEPKDVLPNGDGTYQGWITLAVPPGEEQRYTCQVEHPGLDQPLIVIWEPS
PSGTLVIGVISGIAVFVVILFIGILFIILRKRQGSRGAMGHYVLAERE


References

<references/>