Task 5 - Researching SNPs
All proteins studied in this practical are involved in monogenetic diseases. These diseases can be caused by single point mutations, so called missense and nonsense mutations, also termed SNPs (single-nucleotide polymorphisms). In principle, a SNP refers to a mutation on the genetic level. However, the term is also commonly applied to point mutations in protein sequences. On the protein level, we are interested in non-synonymous SNPs, i.e. those SNPs that lead to a change in protein sequence. Here, we are especially interested in disease-causing SNPs.
This week you will research (again) available information for your protein, focusing on the known point mutations for your protein. Next week you will predict the effect of point mutations from protein sequence.
The talk introduces:
- Genes and mutations
- The databases: OMIN, HGMD, dbSNP, SNPdbe
Tasks and questions
Research the known point mutations for your protein using the following databases:
- Human Gene Mutation Database (HGMD) -> To access HGMD, you have to create a login for the public version of HGMD (which is out of date by 3 years).
- NCBI's dbSNP
- nsSNP database of functional effects SNPdbe
- Online Mendelian Inheritance in Man OMIM
Research the databases:
- What information is given?
- How recent?
- Where does the information come from?
Extract information for your protein/disease from the databases. Are there mutation hot-spots? Map approximately 100 point mutations onto your protein sequence (disease and non-disease causing). In SNPdbe, look for disease causing mutations and those that have no effect (often not verified). Please disregard the two colums "Predicted functional effect" for this week. In HGMD, look for missense and nonsense mutations (HGMD does not list silent point mutations). In dbSNP, look for silent (point) mutations. In most cases, significantly more than 100 mutations are known. The residue numbers in the databases may not correspond to each other.
Using cystic fibrosis as an example, the task is described in more detail. You are free to use a different approach!
Example: Cystic fibrosis
Cystic fibrosis (CF) or mucoviscidosis is a recessive monogenetic disease that affects the respiratory, digestive and reproductive systems. CF involves the production of abnormally thick mucus linings in the lungs WHO. CF is caused by a mutation in the gene coding for the cystic fibrosis transmembrane conductance regulator (CFTR).
- Search for "gene symbol" CFTR
- Which mutation types occur? Give a short definition.
- Get the "missense/nonsense" mutations. How many are given in the non-professional version of HGMD?
- The protein sequence can be obtained via the accession number, here "NM_000492.3", that links to the NCBI. At NCBI, you can download the protein sequence in FASTA format.
- Alternatively, you can obtain the protein sequence via the the cDNA sequence. Using the "switch view" link, the protein sequence in three-letter code is given.
- If you use the cDNA/switch view option, you have to find a tool to translate the three-letter to the one-letter amino acid code, for example: http://molbiol.ru/eng/scripts/01_17.html. You can also translate the DNA sequence directly to the (one-letter) amino acid code. You can also write simple scripts yourself.
- Search for "SNP" and "gene" CFTR
- Take care that you look at the results for Homo sapiens.
- The results of the "SNP" search can be displayed for example as "Graphic Summary" and "FASTA". Choose the display that is most helpful to you.
- From the "gene" search, you can go to the "SNP Geneview Report". The link is found in the section "Genotypes".
- In dbSNP, you may find many deletions and insertions. If a single nucleotide is deleted or inserted, in principle, this constitutes a point mutation. Why can the effect of deletions and insertions be significant? When would it be less severe? Do not regard deletions and insertion on your map.
- Hint: How to search dbSNP
- You can search for example your protein, gene or disease.
- Look at results for human.
- Is experimental evidence available?
- Use the conservation score as a first (simple) estimate of effect (disease causing) or no effect.
4. Mutation map
- Generate a map of your protein sequences showing (about 100) point mutations. Mark silent and disease causing mutations.
- To map the mutations from different databases onto the same protein sequence, you may have to align the protein sequences used in the respective databases.
- How does the information, e.g. number and type of mutations, compare between the different databases.
- The results can be presented in a table or graphically, i.e. visualized on the sequence and also in the structure.
Adrienne Kitts and Stephen Sherry. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence Variation. Chapter 5 in: The NCBI Handbook. Created: October 9, 2002; Last Update: February 2, 2011
Christian Schaefer, Alice Meier, Burkhard Rost, Yana Bromberg (2012). SNPdbe: Constructing an nsSNP functional impacts database. Bioinformatics 28(4):601-602.