Task 5 - Mapping SNPs

From Bioinformatikpedia

All the proteins studied in this practical are involved in monogenetic diseases. These diseases can be caused by single point mutations, so called missense and nonsense mutations.

Introductory talk

The talk introduces:

  • Genes and mutations
  • The databases: OMIN, dbSNP, HGMD, SNPdbe

Tasks and questions

This week you will research (again) your protein. In particular, you will focus on the known point mutations for your protein. Imagine somebody just realized that this protein sequence is related to a specific (your) disease.

Your task is to map approximately 100 point mutations onto your protein sequence. You will have to create a login for the free version of "The Human Gene Mutation Database" (HGMD).

In HGMD (The Human Gene Mutation Database), look for missense and nonsense mutations. HGMD does not list silent point mutations. In dbSNP (Short Genetic Variations), look for silent (point) mutations. In most cases, significantly more than 100 mutations are known. The residue numbers in the two databases may not correspond to each other. You need to find a way to compare between HGMD and dbSNP. In the following, the task is described in more detail using cystic fibrosis as an example. You are free to use a different approach!

Example: Cystic fibrosis

Cystic fibrosis (CF) or mucoviscidosis is a recessive monogenetic disease that affects the respiratory, digestive and reproductive systems. CF involves the production of abnormally thick mucus linings in the lungs (WHO, http://www.who.int). CF is caused by a mutation in the gene coding for the cystic fibrosis transmembrane conductance regulator (CFTR).


  • Search for "gene symbol" CFTR
  • Which mutation types occur? Give a short definition.
  • Get the "missense/nonsense" mutations. How many are given in the non-professional version of HGMD?
  • The protein sequence can be obtained via the accession number, here "NM_000492.3", that links to the NCBI. At NCBI, you can download the protein sequence in FASTA format.
  • Alternatively, you can obtain the protein sequence via the the cDNA sequence. Using the "switch view" link, the protein sequence in three-letter code is given.
  • If you use the cDNA/switch view option, you have to find a tool to translate the three-letter to the one-letter amino acid code, for example: http://molbiol.ru/eng/scripts/01_17.html. You can also translate the DNA sequence directly to the (one-letter) amino acid code. You can also write simple scripts yourself.

2. dbSNP

  • Search for "SNP" and "gene" CFTR
  • Take care that you look at the results for Homo sapiens.
  • The results of the "SNP" search can be displayed for example as "Graphic Summary" and "FASTA". Choose the display that is most helpful to you.
  • From the "gene" search, you can go to the "SNP Geneview Report". The link is found in the section "Genotypes".
  • In dbSNP, you may find many deletions and insertions. If a single nucleotide is deleted or inserted, in principle, this constitutes a point mutation. Why can the effect of deletions and insertions be significant? When would it be less severe? Do not regard deletions and insertion on your map.

3. Mutation map

  • Generate a map of your protein sequences showing (about 100) point mutations. Mark silent and disease causing mutations.
  • To map the mutations from HGMD and dbSNP onto the same protein sequence, you may have to align the two protein sequences you obtain from HGMD and dbSNP.
  • The results can be presented in a table or graphically.