Task 5 - Mapping SNPs 2011

From Bioinformatikpedia

All the proteins studied in this practical are involved in monogenetic diseases. These diseases can be caused by single point mutations, so called missense and nonsense mutations.

Note: Due to the Whit holiday (Pfingstferien) this weeks meeting is postponed. The introductory talks will be given next week, June 21st. The following tasks should already be completed until then.

Introductory talks

The first talk gives background information on: Biological_Databases.pdf

  • DSSP
  • HSSP
  • UniProt (if there is time)

The second talk introduces: Biological_Databases_Mutations.pdf

  • Genes and mutations
  • HGMD
  • OMIN

Tasks and questions

Your task for this week is to map approximately 100 point mutations onto your protein sequence. In HGMD (The Human Gene Mutation Database), look for missense and nonsense mutations. HGMD does not list silent point mutations. In dbSNP (Short Genetic Variations), look for silent (point) mutations. In most cases, significantly more than 100 mutations are known. The residue numbers in the two databases may not correspond to each other. You need to find a way to compare between HGMD and dbSNP. In the following, the task is described in more detail using cystic fibrosis as an example. You are free to use a different approach!

Example: Cystic fibrosis

Cystic fibrosis (CF) or mucoviscidosis is a recessive monogenetic disease that affects the respiratory, digestive and reproductive systems. CF involves the production of abnormally thick mucus linings in the lungs (WHO, http://www.who.int). CF is caused by a mutation in the gene coding for the cystic fibrosis transmembrane conductance regulator (CFTR).


  • Search for "gene symbol" CFTR
  • Which mutation types occur? Give a short definition.
  • Get the "missense/nonsense" mutations. How many are given in the non-professional version of HGMD?
  • The protein sequence can be obtained via the accession number, here "NM_000492.3", that links to the NCBI. At NCBI, you can download the protein sequence in FASTA format.
  • Alternatively, you can obtain the protein sequence via the the cDNA sequence. Using the "switch view" link, the protein sequence in three-letter code is given.
  • If you use the cDNA/switch view option, you have to find a tool to translate the three-letter to the one-letter amino acid code, for example: http://molbiol.ru/eng/scripts/01_17.html. You can also translate the DNA sequence directly to the (one-letter) amino acid code. You can also write simple scripts yourself.

2. dbSNP

  • Search for "SNP" and "gene" CFTR
  • Take care that you look at the results for Homo sapiens.
  • The results of the "SNP" search can be displayed for example as "Graphic Summary" and "FASTA". Choose the display that is most helpful to you.
  • From the "gene" search, you can go to the "SNP Geneview Report". The link is found in the section "Genotypes".
  • In dbSNP, you may find many deletions and insertions. If a single nucleotide is deleted or inserted, in principle, this constitutes a point mutation. Why can the effect of deletions and insertions be significant? When would it be less severe? Do not regard deletions and insertion on your map.

3. Mutation map

  • Generate a map of your protein sequences showing (about 100) point mutations. Mark silent and disease causing mutations.
  • To map the mutations from HGMD and dbSNP onto the same protein sequence, you may have to align the two protein sequences you obtain from HGMD and dbSNP.
  • The results can be presented in a table or graphically.