Difference between revisions of "Task Structural Alignments"

From Bioinformatikpedia
m (Theoretical background talk)
 
(9 intermediate revisions by 3 users not shown)
Line 2: Line 2:
   
 
== Theoretical background talk ==
 
== Theoretical background talk ==
The introductory talks should given an overview of
+
The introductory talks should give an overview of
 
* short review of SCOP und CATH
 
* short review of SCOP und CATH
 
* Alignment methods:
 
* Alignment methods:
** the one used by CATH (CATHEDRAL)
+
** the one used by CATH (SSAP / CATHEDRAL)
 
** Topmatch
 
** Topmatch
 
** SAP or CE
 
** SAP or CE
** LGA (see http://proteinmodel.org/AS2TS/LGA/lga_format.html)
+
** [http://proteinmodel.org/AS2TS/LGA/lga.html LGA] (see http://proteinmodel.org/AS2TS/LGA/lga_format.html for documentation)
 
* Modelling scores:
 
* Modelling scores:
 
** RMSD
 
** RMSD
Line 14: Line 14:
 
** GDT
 
** GDT
 
** LCS
 
** LCS
  +
  +
The slides can be found here: [[File:presentation_structuralAlignments.pdf]]
   
 
== Explore structural alignments ==
 
== Explore structural alignments ==
   
  +
* Assemble a set of 8 to 9 structures related to your protein. These structures should span the range of similarities from almost identical to completely unrelated. You can take structures found in the sequence search and you can go to CATH. E.g.
  +
** one reference structure of your protein
  +
** one or two structure with identical sequence (ideally once with filled binding site, once unfilled, so you can make one pair with similar binding site status, one with different)
  +
** one similar sequence (>60% seq. identity)
  +
** one rather unrelated sequence (<30% seq. identity)
  +
** one arbitrary structure with a CATH code which is identical to your protein at each of these levels:
  +
*** CAT
  +
*** CA
  +
*** C
  +
** on arbitrary structure from a different CATH class
  +
  +
* Apply different structural alignment methods to these structures (only superimpose to your reference structure, not all against all):
  +
** use Pymol tools (align / superimpose -- will only work on more closely related structures)
  +
*** if you have a defined binding site, see what changes if you use all atoms / only C_alpha / only binding site atoms
  +
** LGA
  +
** the one used by CATH
  +
** Topmatch
  +
** SAP or CE
  +
* List the alignment scores the methods give you (e.g. RMSD)
  +
* If numerically equivalent alignment scores differ (e.g. RMSD), think about why -- e.g. different sets of atoms used for superimposition.
  +
* Qualitatively evaluate which methods give you the best feeling for structural relatedness. This might depend on the level of relatedness of the structures. In order to do this, look at some of the alignments in 3D.
   
 
== Use structural alignments to evaluate sequence alignments ==
 
== Use structural alignments to evaluate sequence alignments ==
  +
In the HHblits package, there is a perl script called hhmakemodel.pl, which you can use to make very crude models out of your alignments by simply copying the C_alpha coordinates of the aligned residues.
  +
* Apply this tool to generate models of your protein based on the pdb structures found in [[Task_alignments|Task 2 (Run sequence searches on the disease gene product and produce alignments)]].
  +
* Compare the models to the experimental structure (with LGA) and see whether there is any correlation between model similarity and any of the alignment scores (e.g. E-value, probability, sequence identity).
  +
* In order for LGA to work you will probably have to make sure that the residue numbering is the same as in your experimental structure. Maria will probably be able to help with this.
  +
  +
* If anyone has time and likes programming you could prepare an extra task that would help me in my research:
  +
** Based on the hhmakemodel.pl write a script that does the same kind of modellling for (Psi-)Blast alignments.
  +
** Once you have that everyone could apply the new script to evaluate the (Psi-)Blast alignments.

Latest revision as of 13:16, 28 May 2013

In order to evaluate the similarity between protein structures, the structures have to be superimposed in 3D. A multitude of methods are available to achieve this task. Also, there are many different measures to quantify structural similarity. In this task we will explore different methods and compare different measures to get a feeling for the structural similarity they imply. We will then apply structural alignment to evaluate some sequence-based alignments generated in Task 2 (Run sequence searches on the disease gene product and produce alignments).

Theoretical background talk

The introductory talks should give an overview of

The slides can be found here: File:Presentation structuralAlignments.pdf

Explore structural alignments

  • Assemble a set of 8 to 9 structures related to your protein. These structures should span the range of similarities from almost identical to completely unrelated. You can take structures found in the sequence search and you can go to CATH. E.g.
    • one reference structure of your protein
    • one or two structure with identical sequence (ideally once with filled binding site, once unfilled, so you can make one pair with similar binding site status, one with different)
    • one similar sequence (>60% seq. identity)
    • one rather unrelated sequence (<30% seq. identity)
    • one arbitrary structure with a CATH code which is identical to your protein at each of these levels:
      • CAT
      • CA
      • C
    • on arbitrary structure from a different CATH class
  • Apply different structural alignment methods to these structures (only superimpose to your reference structure, not all against all):
    • use Pymol tools (align / superimpose -- will only work on more closely related structures)
      • if you have a defined binding site, see what changes if you use all atoms / only C_alpha / only binding site atoms
    • LGA
    • the one used by CATH
    • Topmatch
    • SAP or CE
  • List the alignment scores the methods give you (e.g. RMSD)
  • If numerically equivalent alignment scores differ (e.g. RMSD), think about why -- e.g. different sets of atoms used for superimposition.
  • Qualitatively evaluate which methods give you the best feeling for structural relatedness. This might depend on the level of relatedness of the structures. In order to do this, look at some of the alignments in 3D.

Use structural alignments to evaluate sequence alignments

In the HHblits package, there is a perl script called hhmakemodel.pl, which you can use to make very crude models out of your alignments by simply copying the C_alpha coordinates of the aligned residues.

  • Apply this tool to generate models of your protein based on the pdb structures found in Task 2 (Run sequence searches on the disease gene product and produce alignments).
  • Compare the models to the experimental structure (with LGA) and see whether there is any correlation between model similarity and any of the alignment scores (e.g. E-value, probability, sequence identity).
  • In order for LGA to work you will probably have to make sure that the residue numbering is the same as in your experimental structure. Maria will probably be able to help with this.
  • If anyone has time and likes programming you could prepare an extra task that would help me in my research:
    • Based on the hhmakemodel.pl write a script that does the same kind of modellling for (Psi-)Blast alignments.
    • Once you have that everyone could apply the new script to evaluate the (Psi-)Blast alignments.