Task Homology Modelling

From Bioinformatikpedia

For the sequences used in this practical, protein structures have been determined. However, in real-life projects, you often do not have structures. Therefore, we will use structure prediction methods to predict the 3D structures of our sequences. We will also check whether and how the SNPs change the predicted structures.

Theoretical background talks

We will be looking at two parts of homology modelling

  • Identifying suitable templates and producing an alignment
  • Calculating the actual models and evaluating the results

Please include these programs in your talks

  • Swissmodel
  • Modeller
  • iTasser

The talks should cover

  • how the methods work behind the scenes,
  • some information on their performance, strengths and weaknesses (as e.g. seen in CASP),
  • some information about model scoring (blindly, without experimental structure)

The slides can be found here: File:Homology modeling maria kalemanov.pdf.

Calculation of models

  • Build on the ensemble of structures assembled in Structural alignment, evaluation of alignments using structures:
  • Divide your homologous structures into two groups at
    • > 60% sequence identity
    • < 30% sequence identity (ideally go towards 20%)
  • If possible (i.e. if there are structures at that level of sequence identity) create models using one template from each of the groups with
    • Modeller (command line)
      • The students from 2011 wrote a basic tutorial on all necessary steps for using Modeller for this task.
    • Swissmodel (online - you can specify a template to use, even in the "automatic" mode)
    • iTasser (online - use "Option II" to exclude homologous templates for the low similarity template groups)

Note: Karolina and Sonja realized on Wednesday that the iTasser server is overloaded and therefore extremely slow, i.e. 60 h for one job, and also allows just one job. (ek)

  • Try out what happens if you change the input settings. Therefore:
    • Use the default settings of the methods, i.e. use the standard workflow and directly feed the alignments to the modelling step
    • In addition: Have a look at the alignments you use for modelling.
      • Consider the sequence-based information you have collected so far (important residues, sequence family profiles, secondary structure prediction, etc.) to check the alignment.
      • Edit the alignment.
      • Then, proceed with modelling.
      • Document what you changed and why.
    • In addition (if possible due to availability of templates): For modelling with Modeller: Use more than one template in one modelling step. - Explore different combinations of templates:
      • several close homologues (> 60% sequence identity)
      • several distant homologues (< 30% sequence identity)
      • one or more close and one or more distant homologues
      • What would you expect with respect to model quality? -- In the evaluation (see below), check whether you can see the expected trend.

Now, you should have quite a large number of models.

Evaluate your models

  • Check the numeric evaluation of your models (scores given by the modelling programs)
  • Compare the models to the experimental structure (Select one apo and one complex structure if there are several experimental structures, document your choice of reference)
    • Look at your models!
    • Calculate the GDT scores of the models.
    • Calculate the C_alpha RMSD of the models (use /mnt/project/pracstrucfunc13/bin/sap).
    • Extra diligence task: define a radius of 6 Angstrom around the catalytic centre / binding site and calculate the all atom RMSD in that region
  • Discuss your results (You do not need to calculate correlation coefficients, a qualitative estimation is enough.):
    • How do the RMSD and GDT correlate? Is one score more helpful in finding meaningful models?
    • Do you see any correlation between the quality scores provided by the modelling tools and the RMSD/GDT?
    • Is any method systematically better at predicting the structure?
    • Does this depend on the similarity of the template?
    • Does refining the alignments by hand help?
    • Can you imagine any other kind of information that might improve the models?
    • For Modeller: How does including more templates change the model quality?
    • Hint: You can force pymol to only calculate the RMSD using all C-alpha atoms by calculating the RMSD between two selections: [1]