Difference between revisions of "Task homologyModelling"

From Bioinformatikpedia
m (Calculation of models)
(Theoretical background talks)
 
(11 intermediate revisions by 3 users not shown)
Line 7: Line 7:
   
 
Please include these programs in your talks
 
Please include these programs in your talks
* HHpred
 
 
* Swissmodel
 
* Swissmodel
 
* Modeller
 
* Modeller
Line 15: Line 14:
 
* how the methods work behind the scenes,
 
* how the methods work behind the scenes,
 
* some information on their performance, strengths and weaknesses (as e.g. seen in CASP),
 
* some information on their performance, strengths and weaknesses (as e.g. seen in CASP),
  +
* some information about model scoring (blindly, without experimental structure)
* a brief intro into how to call them and where to find documentation,
 
  +
* some information about quality criteria like RMSD and TM Score
* some discussion how to evaluate the resulting models
 
  +
  +
[[File:homology-modeling_angermueller.pdf|Sides homology modelling]]
   
 
== Calculation of models ==
 
== Calculation of models ==
   
 
* Get an overview of available homologous structures based on the sequence searches and alignments. -- Here you can build on your searches from [[task_alignments | the alignment task (2)]].
 
* Get an overview of available homologous structures based on the sequence searches and alignments. -- Here you can build on your searches from [[task_alignments | the alignment task (2)]].
** If you have not found remote homologues before, then use HHsearch and/or [http://bioinformatics.ibt.lt:8085/coma/ COMA] to check whether you can extend your list towards more remotely similar structures
+
** If you have not found remote homologues before, then use [http://toolkit.tuebingen.mpg.de/hhpred HHpred] and/or [http://bioinformatics.ibt.lt:8085/coma/ COMA] to check whether you can extend your list towards more remotely similar structures
 
* Divide your homologous structures into three groups at
 
* Divide your homologous structures into three groups at
 
** > 80% sequence identity
 
** > 80% sequence identity
Line 27: Line 28:
 
** < 30% sequence identity (ideally go towards 20%)
 
** < 30% sequence identity (ideally go towards 20%)
 
* If possible (i.e. if there are structures at that level of sequence identity) create models using one template from each of the groups with
 
* If possible (i.e. if there are structures at that level of sequence identity) create models using one template from each of the groups with
  +
** Modeller (command line. '''NOTE''': the modeller 9.10 installation seems to be working -- at least if you try it with "<code>python youModellerScript.py</code>" or if you set <code>MODINSTALL9v10</code> to the correct path {which I haven't worked out yet.})
** Modeller (command line)
 
 
*** The students from last year wrote [[Using Modeller for TASK 4 | a basic tutorial]] on all necessary steps for using Modeller for this task.
 
*** The students from last year wrote [[Using Modeller for TASK 4 | a basic tutorial]] on all necessary steps for using Modeller for this task.
  +
** Swissmodel ([http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1&userid=USERID&token=TOKEN online] - you can specify a template to use, even in the "automatic" mode)
** Swissmodel
 
  +
** iTasser ([http://zhanglab.ccmb.med.umich.edu/I-TASSER/ online] - use "Option II" to exclude homologous templates for the low similarity template groups)
** iTasser (online)
 
 
* Try out what happens if you change the input settings. Therefore:
 
* Try out what happens if you change the input settings. Therefore:
 
** Use the default settings of the methods, i.e. use the standard workflow and directly feed the alignments to the modelling step
 
** Use the default settings of the methods, i.e. use the standard workflow and directly feed the alignments to the modelling step
 
** ''In addition:'' Have a look at the alignments you use for modelling.
 
** ''In addition:'' Have a look at the alignments you use for modelling.
*** Collect sequence-based information (important residues, sequence family profiles, secondary structure prediction, etc.) to check the alignment.
+
*** Consider the sequence-based information you have collected so far (important residues, sequence family profiles, secondary structure prediction, etc.) to check the alignment.
 
*** Edit the alignment.
 
*** Edit the alignment.
 
*** Then, proceed with modelling.
 
*** Then, proceed with modelling.
Line 43: Line 44:
 
*** one close and one distant homologue
 
*** one close and one distant homologue
 
*** What would you expect with respect to model quality? -- In the evaluation (see below), check whether you can see the expected trend.
 
*** What would you expect with respect to model quality? -- In the evaluation (see below), check whether you can see the expected trend.
* Feed your 5 (subjectively best) models into [http://bmm.cancerresearchuk.org/~populus/populus_submit.html 3D-Jigsaw] to get out recombined, optimised (?) models. --'' Do this separately for the different categories of templates.''
+
* Feed your 5 (subjectively best) models into [http://bmm.cancerresearchuk.org/~populus/populus_submit.html 3D-Jigsaw] to get out recombined, optimised (?) models. --''Do this separately for the different categories of templates.'' The script repairPDB can be downloaded from [https://github.com/offmarc/AGroS https://github.com/offmarc/AGroS].
   
 
Now, you should have quite a large number of models.
 
Now, you should have quite a large number of models.
Line 51: Line 52:
 
* Compare the models to the experimental structure (Select one apo and one complex structure if there are several experimental structures, document your choice of reference)
 
* Compare the models to the experimental structure (Select one apo and one complex structure if there are several experimental structures, document your choice of reference)
 
** ''Look'' at your models!
 
** ''Look'' at your models!
** Calculate the TM score of the models (use TMscore or TMS in /apps/bin/).
+
** Calculate the TM_score, GDT_HA and GDT_TS of the models (use <code>/mnt/project/pracstrucfunc12/bin/TMscore</code>).
** Calculate the C_alpha RMSD of the models (use SAP in /apps/bin/).
+
** Calculate the C_alpha RMSD of the models (use <code>/mnt/project/pracstrucfunc12/bin/sap</code>).
 
** Extra diligence task: define a radius of 6 Angstrom around the catalytic centre and calculate the all atom RMSD in that region
 
** Extra diligence task: define a radius of 6 Angstrom around the catalytic centre and calculate the all atom RMSD in that region
* Discuss your results:
+
* Discuss your results (You do not need to calculate correlation coefficients, a qualitative estimation is enough.):
** How do the RMSD and the TM score correlate? Is one score more helpful in finding meaningful models?
+
** How do the RMSD and the TM_score or GDT correlate? Is one score more helpful in finding meaningful models?
** Do you see any correlation between the quality scores and the RMSD/TM score? (You do not need to calculate correlation coefficients, a qualitative estimation is enough.)
+
** Do you see any correlation between the quality scores and the RMSD/TM_score/GDT?
 
** Is any method systematically better at predicting the structure?
 
** Is any method systematically better at predicting the structure?
 
** Does this depend on the similarity of the template?
 
** Does this depend on the similarity of the template?
 
** Does refining the alignments by hand help?
 
** Does refining the alignments by hand help?
  +
** Can you imagine any other kind of information that might improve the models?

Latest revision as of 21:56, 30 May 2012

For the sequences used in this practical, protein structures have been determined. However, in real-life projects, you often do not have structures. Therefore, we will use structure prediction methods to predict the 3D structures of our sequences. We will also check whether and how the SNPs change the predicted structures.

Theoretical background talks

We will be looking at two parts of homology modelling

  • Identifying suitable templates and producing and alignment
  • Calculating the actual models and evaluating the results

Please include these programs in your talks

  • Swissmodel
  • Modeller
  • iTasser

The talks should cover

  • how the methods work behind the scenes,
  • some information on their performance, strengths and weaknesses (as e.g. seen in CASP),
  • some information about model scoring (blindly, without experimental structure)
  • some information about quality criteria like RMSD and TM Score

File:Homology-modeling angermueller.pdf

Calculation of models

  • Get an overview of available homologous structures based on the sequence searches and alignments. -- Here you can build on your searches from the alignment task (2).
    • If you have not found remote homologues before, then use HHpred and/or COMA to check whether you can extend your list towards more remotely similar structures
  • Divide your homologous structures into three groups at
    • > 80% sequence identity
    • 40% - 80% sequence identity
    • < 30% sequence identity (ideally go towards 20%)
  • If possible (i.e. if there are structures at that level of sequence identity) create models using one template from each of the groups with
    • Modeller (command line. NOTE: the modeller 9.10 installation seems to be working -- at least if you try it with "python youModellerScript.py" or if you set MODINSTALL9v10 to the correct path {which I haven't worked out yet.})
      • The students from last year wrote a basic tutorial on all necessary steps for using Modeller for this task.
    • Swissmodel (online - you can specify a template to use, even in the "automatic" mode)
    • iTasser (online - use "Option II" to exclude homologous templates for the low similarity template groups)
  • Try out what happens if you change the input settings. Therefore:
    • Use the default settings of the methods, i.e. use the standard workflow and directly feed the alignments to the modelling step
    • In addition: Have a look at the alignments you use for modelling.
      • Consider the sequence-based information you have collected so far (important residues, sequence family profiles, secondary structure prediction, etc.) to check the alignment.
      • Edit the alignment.
      • Then, proceed with modelling.
      • Document what you changed and why.
    • In addition (if possible due to availability of templates): For modelling with Modeller: Use more than one template in one modelling step. - Explore different combinations of templates:
      • several close homologues (> 80% sequence identity)
      • several distant homologues (< 30% sequence identity)
      • one close and one distant homologue
      • What would you expect with respect to model quality? -- In the evaluation (see below), check whether you can see the expected trend.
  • Feed your 5 (subjectively best) models into 3D-Jigsaw to get out recombined, optimised (?) models. --Do this separately for the different categories of templates. The script repairPDB can be downloaded from https://github.com/offmarc/AGroS.

Now, you should have quite a large number of models.

Evaluate your models

  • Check the numeric evaluation of your models (scores given by the modelling programs)
  • Compare the models to the experimental structure (Select one apo and one complex structure if there are several experimental structures, document your choice of reference)
    • Look at your models!
    • Calculate the TM_score, GDT_HA and GDT_TS of the models (use /mnt/project/pracstrucfunc12/bin/TMscore).
    • Calculate the C_alpha RMSD of the models (use /mnt/project/pracstrucfunc12/bin/sap).
    • Extra diligence task: define a radius of 6 Angstrom around the catalytic centre and calculate the all atom RMSD in that region
  • Discuss your results (You do not need to calculate correlation coefficients, a qualitative estimation is enough.):
    • How do the RMSD and the TM_score or GDT correlate? Is one score more helpful in finding meaningful models?
    • Do you see any correlation between the quality scores and the RMSD/TM_score/GDT?
    • Is any method systematically better at predicting the structure?
    • Does this depend on the similarity of the template?
    • Does refining the alignments by hand help?
    • Can you imagine any other kind of information that might improve the models?