Gaucher Disease: Task 05 - Homology Modelling
<css>
table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }
.colBasic2 th,td { padding: 3px; border: 1px solid black; }
.colBasic2 td { text-align:left; }
/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff
maria's style blue: #adceff grey: #efefef
- /
.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}
</css>
This page is still under construction.
Contents
Calculation of models
For information on the selected structure set see Lab journal.
We created two single target models of our protein sequence, P04062: one with a high sequence identity target, 2XWD_A, and one with a low sequence identity target, 2WNW_A, with each one of the three tools: Modeller, Swiss-Model and iTasser.
For Modeller we additionally executed multiple target modeling mode. In the multiple target modeling mode, Modeller first aligns the user selected templates, then adds the target to the MSA, which is finally used for modeling. We tried the following template combinations:
- close homologues (> 60% sequence identity): all four (3KE0_A, 2XWD_A, 2WKL_A and 2NSX_A)
- distant homologues (< 30% sequence identity): all three (2WNW_A, 1VFF_A and 3II1_A)
- close and distant homologues: 2XWD_A and 2WNW_A
Evaluation of models
In the following we present the results of the created models. We compare the models to two reference structures - 1OGS_A and 2V3E_B - by superimposing and calculating C_alpha RMSD by Pymol as well as all atom RMSD, GDT-TS-score and TM-scoreby TMscore tool. In addition, we compare main scores given by the modelling programs.
We selected 1OGS_A for consistency reasons (as we already used it in the task 04). Moreover, as there are many reference structures for Glucocerebrosidase, we chose another structure for comparison: 2V3E_B. Both structure have resolution of 2.0, however 2V3E was resolved at neutral pH, whereas 1OGS had an acidic pH-value of 4.5. (Much more detailed information about the structures is given in the lab journal of task 09.)
Modeller
<figtable id="single_templates">
High sequence identity | ||||
---|---|---|---|---|
Template | 2XWD_A | |||
Alignment method | malign | salign | ||
DOPE score | -64821.74609 | -64821.746094 | ||
Reference targets | 1OGS_A | 2V3E_B | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 0.284 (408) | 0.194 (444) | 0.284 (408) | 0.194 (444) |
RMSD (of 497 common residues) | 23.987 | 24.032 | 23.987 | 24.032 |
TM score | 0.2160 | 0.2167 | 0.2160 | 0.2167 |
GDT-TS score (%) | 0.0664 | 0.0669 | 0.0664 | 0.0669 |
Pymol visualization | ||||
Low sequence identity | ||||
Template | 2WNW_A | |||
Alignment method | malign | salign | ||
DOPE score | -57316.70703 | -53548.45703 | ||
Reference targets | 1OGS_A | 2V3E_B | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 1.217 (340) | 1.116 (336) | 1.523 (337) | 1.523 (337) |
RMSD (of 497 common residues) | 22.979 | 22.980 | 22.972 | 22.949 |
TM score | 0.2301 | 0.2307 | 0.2257 | 0.2249 |
GDT-TS score (%) | 0.0659 | 0.0664 | 0.0634 | 0.0649 |
Pymol visualization |
</figtable>
<figtable id="multiple_templates">
High sequence identity | ||
---|---|---|
Templates | 3KE0_A, 2XWD_A, 2WKL_A, 2NSX_A | |
DOPE score | NA | |
Reference targets | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 0.272 (439) | 0.337 (413) |
RMSD (of 497 common residues) | 24.370 | 24.402 |
TM score | 0.2173 | 0.2175 |
GDT-TS score (%) | 0.0664 | 0.0674 |
Pymol visualization | ||
Low sequence identity | ||
Templates | 2WNW_A, 1VFF_A, 3II1_A | |
DOPE score | NA | |
Reference targets | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 20.935 (471) | 21.012 (473) |
RMSD (of 497 common residues) | 22.759 | 22.875 |
TM score | 0.1931 | 0.1924 |
GDT-TS score (%) | 0.0508 | 0.0523 |
Pymol visualization | ||
Mixed sequence identity | ||
Templates | 2XWD_A, 2WNW_A | |
DOPE score | NA | |
Reference targets | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 0.328 (418) | 0.228 (450) |
RMSD (of 497 common residues) | 23.625 | 23.659 |
TM score | 0.2170 | 0.2175 |
GDT-TS score (%) | 0.0664 | 0.0669 |
Pymol visualization |
</figtable>
TODO: explanation of DOPE score
Swiss-Model
<figtable id="swiss-model">
High sequence identity | ||
---|---|---|
Template | 2XWD_A | |
QMEAN Z-score | -1.37 | |
Reference targets | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 0.305 (407) | 0.198 (447) |
RMSD (of # common residues) | 23.688 (458) | 23.737 (458) |
TM score | 0.2137 | 0.2143 |
GDT-TS score (%) | 0.0654 | 0.0664 |
Pymol visualization | ||
Low sequence identity | ||
Template | 2WNW_A | |
QMEAN Z-score | -4.1 | |
Reference targets | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 1.105 (368) | 1.032 (372) |
RMSD (of # common residues) | 22.058 (422) | 22.022 (422) |
TM score | 0.2129 | 0.2138 |
GDT-TS score (%) | 0.0634 | 0.0653 |
Pymol visualization |
</figtable>
Swiss-model's QMEAN4 score is explained on the result page:
The QMEAN4 score is a composite score consisting of a linear combination of 4 statistical potential terms:
- C_beta interaction energy
- All-atom pairwise energy
- Solvation energy
- Torsion angle energy
Its meaning is estimated model reliability (between 0-1). The pseudo-energies of the contributing terms are calculated together with their Z-scores with respect to scores obtained for high-resolution experimental structures of similar size solved by X-ray crystallography.
iTasser
<figtable id="itasser">
High sequence identity | ||
---|---|---|
Template | 2XWD_A | |
C-score | -0.33 | |
Predicted RMSD | 8.2±4.5 | |
Predicted TM score | 0.67±0.13 | |
Reference targets | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 0.692 (461) | 0.755 (467) |
RMSD (of # common residues) | 24.427 (497) | 24.487 (497) |
TM score | 0.2171 | 0.2175 |
GDT-TS score (%) | 0.0674 | 0.0674 |
Pymol visualization | ||
Low sequence identity | ||
Template | 2WNW_A | |
C-score | -0.40 | |
Predicted RMSD | 8.4±4.5 | |
Predicted TM score | 0.66±0.13 | |
Reference targets | 1OGS_A | 2V3E_B |
C_alpha RMSD (# atoms pairs) | 1.232 (431) | 1.185 (439) |
RMSD (of # common residues) | 24.087 (497) | 24.147 (497) |
TM score | 0.2196 | 0.2208 |
GDT-TS score (%) | 0.0694 | 0.0699 |
Pymol visualization |
</figtable>
iTasser scores are explained on the results page:
C-score is a confidence score for estimating the quality of predicted models by I-TASSER. It is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa.
TM-score and RMSD are known standards for measuring structural similarity between two structures which are usually used to measure the accuracy of structure modeling when the native structure is known. In case where the native structure is not known, it becomes necessary to predict the quality of the modeling prediction, i.e. what is the distance between the predicted model and the native structures? To answer this question, we tried predicted the TM-score and RMSD of the predicted models relative the native structures based on the C-score.
TM-score is a recently proposed scale for measuring the structural similarity between two structures (see Zhang and Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710). The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will araise a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. A TM-score >0.5 indicates a model of correct topology and a TM-score<0.17 means a random similarity. These cutoff does not depends on the protein length.
C_alpha RMSD summary
In the following table C_alpha RMSD computed by Pymol with 1OGS_A as reference are summarized in the following table.
<figtable id="models_RMSD">
Program | Templates | ||||
---|---|---|---|---|---|
High PIDE (2XWD) | low PIDE (2WNW) | high PIDE (4 temp.) | low PIDE (3 temp.) | mixed PIDE (2XWD & 2WNW) | |
Modeller | 0.284 (408) | 1.523 (337) | 0.272 (439) | 20.935 (471) | 0.328 (418) |
Swiss-Model | 0.305 (407) | 1.105 (368) | |||
iTasser | 0.692 (461) | 1.232 (431) |
</figtable>
TODO:
- Select one apo and one complex structure if there are several experimental structures -> document your choice of reference
- Extra diligence task: define a radius of 6 Angstrom around the catalytic centre / binding site and calculate the all atom RMSD in that region
Discussion
TODO:
- Discuss your results (You do not need to calculate correlation coefficients, a qualitative estimation is enough.):
- How do the RMSD and GDT correlate? Is one score more helpful in finding meaningful models?
- Do you see any correlation between the quality scores provided by the modelling tools and the RMSD/GDT?
- Is any method systematically better at predicting the structure?
- Does this depend on the similarity of the template?
- Can you imagine any other kind of information that might improve the models?
- For Modeller: How does including more templates change the model quality?
Sources
Modeller:
1) N. Eswar, M. A. Marti-Renom, B. Webb, M. S. Madhusudhan, D. Eramian, M. Shen, U. Pieper, A. Sali. Comparative Protein Structure Modeling With MODELLER. Current Protocols in Bioinformatics, John Wiley & Sons, Inc., Supplement 15, 5.6.1-5.6.30, 2006.
2) M.A. Marti-Renom, A. Stuart, A. Fiser, R. Sánchez, F. Melo, A. Sali. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000.
3) A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.
4) A. Fiser, R.K. Do, & A. Sali. Modeling of loops in protein structures, Protein Science 9. 1753-1773, 2000.
Swiss-Model:
1) Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modeling. Bioinformatics, 22,195-201.
2) Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Research 31: 3381-3385.
3) Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modeling. Electrophoresis 18: 2714-2723. 4) Benkert P, Biasini M, Schwede T. (2011). "Toward the estimation of the absolute quality of individual protein structure models." Bioinformatics, 27(3):343-50. (QMEAN-score)
iTasser:
1) Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010).
2) Yang Zhang. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9:40 (2008).
3) Ambrish Roy, Jianyi Yang, Yang Zhang. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Research, vol 40, W471-W477 (2012).