Pairwise sequence alignments

From Protein Prediction 1 Summer Semester 2016 For Informaticians

This page is organized as follows:

Keywords are terms that you need to understand to follow the lecture. To test your knowledge, try to define and explain these keywords (in a few sentences). If you cannot think of anything to say about a keyword, read up on that topic.

Under Sources you find literature we suggest (textbooks, web pages, articles) that will help you to understand the topic. You can use this as a resource to complete your knowledge of the keywords and to help you answer the questions and solve the tasks. You are not required to read and study any of these, but they provide more detailed knowledge on the topic and are a good complement to the lecture. Of course you can feel free to use any other source you like.

In the section Exercise we provide Questions and Hands-on tasks that allow you to test and further your knowledge of the given topic. During the exercise session you can ask questions pertaining to the topic (keywords and exercises).


Keywords

  • Pairwise alignments: global, local (Needlman-Wunsch / Smith-Waterman)
  • Dynamic programming as solution for cascading recursion, Backtracking
  • Substitution matrix: PAM, BLOSUM
  • Match, mismatch, gap
  • Sequence identity, similarity, conservation
  • E-value
  • FASTA format (basic principle)
  • One letter code for amino acids (you do not have to learn the code, just know what it means)
  • Homology, homologues/homologs


Sources


Excercise

Questions

  • How can you define similarity between two protein sequences?
  • What does "conservation" mean in the context of sequence alignments?
  • Why are sequence alignments useful?
  • What are the main differences in the algorithms of Global and Local alignment?
    • Why does it make sense to not always perform a global alignment.
  • Which amino acids can (with high likelihood) be substituted for Leucin without having an effect on protein function?
  • Which substitution is more probable according to PAM250 and according to BLOSUM62:
    • W <> F
    • H <> R


Hands on tasks

  • Find the best alignment between the sequences “WHAT” and “WHY”, using the Needleman-Wunsch algorithm, with +1 for a match, -1 for a mismatch, and -2 for a gap.
  • Write a small program that calculates a global alignment by implementing the Needleman-Wunsch algorithm and prints out the alignment.
  • Go to the UniProt website
    • Look up UniProt ID P02144 (which protein is this?) and use UniProt to run BLAST.
    • First, start a default BLAST run. What are the parameters used? What do they mean? Look at the result. In which organisms do you find hits? Do you believe that you found all possible hits in UniProt?
    • Now start an advanced BLAST run. Which parameters can you adjust? Try different parameters. Look at the result. In which organisms do you find hits? Do you believe that you found all possible hits in UniProt?
    • You can also try different proteins.
  • Write a tool that retrieves all Uniprot entries with a given EC number and calculates pairwise sequence similarities.
  • Compare the species horse (Equus caballus), Minke whale (Balaenoptera acutorostrata) and Red kangaroo (Macropus rufus) based on two of their proteins (do pairwise alignments and compare sequence identities). What would you expect? Compare the results on similarities of these species. Use the proteins
    • Cytochrome b from the mitochondria
    • pancreas ribonuclease