Task 5 - Researching SNPs

From Bioinformatikpedia
Revision as of 01:12, 5 June 2012 by Kloppmann (talk | contribs)

All proteins studied in this practical are involved in monogenetic diseases. These diseases can be caused by single point mutations, so called missense and nonsense mutations, also termed SNPs (single-nucleotide polymorphisms). In principle, a SNP refers to a mutation on the genetic level. However, the term is also commonly applied to point mutations in protein sequences. On the protein level, we are interested in non-synonymous SNPs, i.e. those SNPs that lead to a change in protein sequence. Here, we are especially interested in disease-causing SNPs.

This week you will research (again) your protein, focusing on the known point mutations for your protein. Next week you will predict the effect of point mutations from protein sequence.


Introductory talk

The talk introduces:

  • Genes and mutations
  • The databases: OMIN, HGMD, dbSNP, SNPdbe


Tasks and questions

Research the known point mutations for your protein using the following databases:

  • Human Gene Mutation Database (HGMD) -> To access HGMD, you have to create a login for the public version of HGMD (which is out of date by 3 years).
  • NCBI's dbSNP
  • nsSNP database of functional effects SNPdbe
  • Online Mendelian Inheritance in Man OMIN
  • SNPedia

Research the databases:

  • What information is given?
  • How recent?
  • Where does the information come from?

Extract information for your protein/disease from the databases and map approximately 100 point mutations onto your protein sequence (disease and non-disease causing).

In SNPdbe, look for disease-causing and not-disease causing mutations. In HGMD, look for missense and nonsense mutations (HGMD does not list silent point mutations). In dbSNP, look for silent (point) mutations. In most cases, significantly more than 100 mutations are known. The residue numbers in the databases may not correspond to each other.

In the following, the task is described in more detail using cystic fibrosis as an example. You are free to use a different approach!


Example: Cystic fibrosis

Cystic fibrosis (CF) or mucoviscidosis is a recessive monogenetic disease that affects the respiratory, digestive and reproductive systems. CF involves the production of abnormally thick mucus linings in the lungs [WHO http://www.who.int]. CF is caused by a mutation in the gene coding for the cystic fibrosis transmembrane conductance regulator (CFTR).


1. HGMD

  • Search for "gene symbol" CFTR
  • Which mutation types occur? Give a short definition.
  • Get the "missense/nonsense" mutations. How many are given in the non-professional version of HGMD?
  • The protein sequence can be obtained via the accession number, here "NM_000492.3", that links to the NCBI. At NCBI, you can download the protein sequence in FASTA format.
  • Alternatively, you can obtain the protein sequence via the the cDNA sequence. Using the "switch view" link, the protein sequence in three-letter code is given.
  • If you use the cDNA/switch view option, you have to find a tool to translate the three-letter to the one-letter amino acid code, for example: http://molbiol.ru/eng/scripts/01_17.html. You can also translate the DNA sequence directly to the (one-letter) amino acid code. You can also write simple scripts yourself.


2. dbSNP

  • Search for "SNP" and "gene" CFTR
  • Take care that you look at the results for Homo sapiens.
  • The results of the "SNP" search can be displayed for example as "Graphic Summary" and "FASTA". Choose the display that is most helpful to you.
  • From the "gene" search, you can go to the "SNP Geneview Report". The link is found in the section "Genotypes".
  • In dbSNP, you may find many deletions and insertions. If a single nucleotide is deleted or inserted, in principle, this constitutes a point mutation. Why can the effect of deletions and insertions be significant? When would it be less severe? Do not regard deletions and insertion on your map.


3. Mutation map

  • Generate a map of your protein sequences showing (about 100) point mutations. Mark silent and disease causing mutations.
  • To map the mutations from HGMD and dbSNP onto the same protein sequence, you may have to align the two protein sequences you obtain from HGMD and dbSNP.
  • The results can be presented in a table or graphically.