Gaucher Disease: Task 05 - Homology Modelling

From Bioinformatikpedia

<css>

table.colBasic2 { margin-left: auto; margin-right: auto; border: 1px solid black; border-collapse:collapse; }

.colBasic2 th,td { padding: 3px; border: 1px solid black; }

.colBasic2 td { text-align:left; }

/* for orange try #ff7f00 and #ffaa56 for blue try #005fbf and #aad4ff

maria's style blue: #adceff grey: #efefef

  • /

.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}

</css>

Calculation of models

For information on the selected structure set see lab journal.

We created two single target models of our protein sequence, P04062: one with a high sequence identity target, 2XWD_A, and one with a low sequence identity target, 2WNW_A, with each one of the three tools: Modeller, Swiss-Model and iTasser.

For Modeller we additionally executed multiple target modeling mode. In the multiple target modeling mode, Modeller first aligns the user selected templates, then adds the target to the MSA, which is finally used for modeling. We tried the following template combinations:

  • close homologues (> 60% sequence identity): all four (3KE0_A, 2XWD_A, 2WKL_A and 2NSX_A)
  • distant homologues (< 30% sequence identity): all three (2WNW_A, 1VFF_A and 3II1_A)
  • close and distant homologues: 2XWD_A and 2WNW_A

Evaluation of models

In the following we present the results of the created models. We compare the models to two reference structures - 1OGS_A and 2V3E_B - by superimposing and calculating C_alpha RMSD by Pymol as well as all atom RMSD, GDT-TS-score and TM-scoreby TMscore tool. In addition, we compare main scores given by the modelling programs.

  • Select one apo and one complex structure if there are several experimental structures -> document your choice of reference

We selected 1OGS_A for consistency reasons (as we already used it in the task 04). Moreover, as there are many reference structures for glucocerebrosidase, we chose another structure for comparison: 2V3E_B. Both structure have resolution of 2.0, however 2V3E was resolved at neutral pH, whereas 1OGS had an acidic pH-value of 4.5. (Much more detailed information about the structures is given in the lab journal of task 09.)

For brief explanation of the execution of the programs and calculation of the scores, please consult lab journal.

Modeller

<figtable id="single_templates">

High sequence identity
Template 2XWD_A
Alignment method malign salign
DOPE score -64821.74609 -64821.746094
Reference targets 1OGS_A 2V3E_B 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 0.284 (408) 0.194 (444) 0.284 (408) 0.194 (444)
RMSD (of 497 common residues) 23.987 24.032 23.987 24.032
TM score 0.2160 0.2167 0.2160 0.2167
GDT-TS score (%) 0.0664 0.0669 0.0664 0.0669
Pymol visualization
Visualization of the reference target structure 1OGS_A (green), the high sequence identity template 2XWD_A (blue) and the Modeller model (purple), created with "malign" method.
Visualization of the reference target structure 2V3E_B (limegreen), the high sequence identity template 2XWD_A (blue) and the Modeller model (purple), created with "malign" method.
Visualization of the reference target structure 1OGS_A (green), the high sequence identity template 2XWD_A (blue) and the Modeller model (purple), created with "salign" method.
Visualization of the reference target structure 2V3E_B (limegreen), the high sequence identity template 2XWD_A (blue) and the Modeller model (purple), created with "salign" method.
Low sequence identity
Template 2WNW_A
Alignment method malign salign
DOPE score -57316.70703 -53548.45703
Reference targets 1OGS_A 2V3E_B 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 1.217 (340) 1.116 (336) 1.523 (337) 1.523 (337)
RMSD (of 497 common residues) 22.979 22.980 22.972 22.949
TM score 0.2301 0.2307 0.2257 0.2249
GDT-TS score (%) 0.0659 0.0664 0.0634 0.0649
Pymol visualization
Visualization of the reference target structure 1OGS_A (green), the low sequence identity template 2WNW_A (blue) and the Modeller model (purple), created with "malign" method.
Visualization of the reference target structure 2V3E_B (limegreen), the low sequence identity template 2WNW_A (blue) and the Modeller model (purple), created with "malign" method.
Visualization of the reference target structure 1OGS_A (green), the low sequence identity template 2WNW_A (blue) and the Modeller model (purple), created with "salign" method.
Visualization of the reference target structure 2V3E_B (limegreen), the low sequence identity template 2WNW_A (blue) and the Modeller model (purple), created with "salign" method.
Modeller results of the modeling with single templates and comparison with two reference template structures. Two alignment methods were used: "malign" (classical pairwise sequence alignment) and "salign" (inclusion of 2D-structural information).

</figtable>


<figtable id="multiple_templates">

High sequence identity
Templates 3KE0_A, 2XWD_A, 2WKL_A, 2NSX_A
DOPE score NA
Reference targets 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 0.272 (439) 0.337 (413)
RMSD (of 497 common residues) 24.370 24.402
TM score 0.2173 0.2175
GDT-TS score (%) 0.0664 0.0674
Pymol visualization
Visualization of the reference target structure 1OGS_A (green) and the Modeller model created from high sequence identity structures 3KE0_A, 2XWD_A, 2WKL_A and 2NSX_A (purple).
Visualization of the reference target structure 2V3E_B (limegreen) and the Modeller model created from high sequence identity structures 3KE0_A, 2XWD_A, 2WKL_A and 2NSX_A (purple).
Low sequence identity
Templates 2WNW_A, 1VFF_A, 3II1_A
DOPE score NA
Reference targets 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 20.935 (471) 21.012 (473)
RMSD (of 497 common residues) 22.759 22.875
TM score 0.1931 0.1924
GDT-TS score (%) 0.0508 0.0523
Pymol visualization
Visualization of the reference target structure 1OGS_A (green) and the Modeller model created from low sequence identity structures 2WNW_A, 1VFF_A and 3II1_A (purple).
Visualization of the reference target structure 2V3E_B (limegreen) and the Modeller model created from low sequence identity structures 2WNW_A, 1VFF_A and 3II1_A (purple).
Mixed sequence identity
Templates 2XWD_A, 2WNW_A
DOPE score NA
Reference targets 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 0.328 (418) 0.228 (450)
RMSD (of 497 common residues) 23.625 23.659
TM score 0.2170 0.2175
GDT-TS score (%) 0.0664 0.0669
Pymol visualization
Visualization of the reference target structure 1OGS_A (green) and the Modeller model created from high sequence identity 2XWD_A and low sequence identity 2WNW_A (purple).
Visualization of the reference target structure 2V3E_B (limegreen) and the Modeller model created from high sequence identity 2XWD and low sequence identity 2WNW_A (purple).
Modeller results of the modeling with multiple templates and comparison with two reference template structures. The "salign" method was used (alignment using 2D information).

</figtable>

DOPE score is an energy score, this means the model with the lowest DOPE score is the best. (Modeller scores are also explained in the tutorial.)

Swiss-Model

<figtable id="swiss-model">

High sequence identity
Template 2XWD_A
QMEAN Z-score -1.37
Reference targets 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 0.305 (407) 0.198 (447)
RMSD (of # common residues) 23.688 (458) 23.737 (458)
TM score 0.2137 0.2143
GDT-TS score (%) 0.0654 0.0664
Pymol visualization
Visualization of the reference target structure 1OGS_A (green) and the Swiss-model model created from the high sequence identity structure 2XWD_A (purple).
Visualization of the reference target structure 2V3E_B (limegreen) and the Swiss-model model created from high sequence identity structure 2XWD_A (purple).
Low sequence identity
Template 2WNW_A
QMEAN Z-score -4.1
Reference targets 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 1.105 (368) 1.032 (372)
RMSD (of # common residues) 22.058 (422) 22.022 (422)
TM score 0.2129 0.2138
GDT-TS score (%) 0.0634 0.0653
Pymol visualization
Visualization of the reference target structure 1OGS_A (green) and the Swiss-model model created from the low sequence identity structure 2WNW_A (purple).
Visualization of the reference target structure 2V3E_B (limegreen) and the Swiss-model model created from low sequence identity structure 2WNW_A (purple).
Swiss-model results of the modeling with a template with high and low sequence identity and comparison with two reference template structures.

</figtable>

Swiss-model's QMEAN4 score is explained on the result page:
The QMEAN4 score is a composite score consisting of a linear combination of 4 statistical potential terms:

  • C_beta interaction energy
  • All-atom pairwise energy
  • Solvation energy
  • Torsion angle energy

Its meaning is estimated model reliability (between 0-1). The pseudo-energies of the contributing terms are calculated together with their Z-scores with respect to scores obtained for high-resolution experimental structures of similar size solved by X-ray crystallography.

iTasser

<figtable id="itasser">

High sequence identity
Template 2XWD_A
C-score -0.33
Predicted RMSD 8.2±4.5
Predicted TM score 0.67±0.13
Reference targets 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 0.692 (461) 0.755 (467)
RMSD (of # common residues) 24.427 (497) 24.487 (497)
TM score 0.2171 0.2175
GDT-TS score (%) 0.0674 0.0674
Pymol visualization
Visualization of the reference target structure 1OGS_A (green) and the best iTasser model created from the high sequence identity structure 2XWD_A (purple).
Visualization of the reference target structure 2V3E_B (limegreen) and the best iTasser model created from high sequence identity structure 2XWD_A (purple).
Low sequence identity
Template 2WNW_A
C-score -0.40
Predicted RMSD 8.4±4.5
Predicted TM score 0.66±0.13
Reference targets 1OGS_A 2V3E_B
C_alpha RMSD (# atoms pairs) 1.232 (431) 1.185 (439)
RMSD (of # common residues) 24.087 (497) 24.147 (497)
TM score 0.2196 0.2208
GDT-TS score (%) 0.0694 0.0699
Pymol visualization
Visualization of the reference target structure 1OGS_A (green) and the iTasser model created from the low sequence identity structure 2WNW_A (purple).
Visualization of the reference target structure 2V3E_B (limegreen) and the iTasser model created from low sequence identity structure 2WNW_A (purple).
iTasser results of the modeling with a template with high and low sequence identity and comparison with two reference template structures.

</figtable>

iTasser scores are explained on the results page:

C-score is a confidence score for estimating the quality of predicted models by I-TASSER. It is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa.

TM-score and RMSD are known standards for measuring structural similarity between two structures which are usually used to measure the accuracy of structure modeling when the native structure is known. In case the native structure is not known, it becomes necessary to predict the quality of the modeling prediction, i.e. what is the distance between the predicted model and the native structures? To answer this question, we tried predicted the TM-score and RMSD of the predicted models relative the native structures based on the C-score.

TM-score is a recently proposed scale for measuring the structural similarity between two structures (see Zhang and Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710). The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a missorientation of the tail) will arise a big RMSD value although the global topology is correct. However, in TM-score, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. A TM-score>0.5 indicates a model of correct topology and a TM-score<0.17 means a random similarity. This cutoff does not depends on the protein length.

C_alpha RMSD summary

In <xr id="models_RMSD"/> C_alpha RMSD computed by Pymol with 1OGS_A as reference are summarized.

<figtable id="models_RMSD">

Program Templates
High PIDE (2XWD) low PIDE (2WNW) high PIDE (4 temp.) low PIDE (3 temp.) mixed PIDE (2XWD & 2WNW)
Modeller 0.284 (408) 1.523 (337) 0.272 (439) 20.935 (471) 0.328 (418)
Swiss-Model 0.305 (407) 1.105 (368)
iTasser 0.692 (461) 1.232 (431)
Pymol C_alpha RMSD of alignments between 1OGS_A and the models of P06042 created with the different programs and templates. Here only the Modeller models created with "salign" method are considered. The number of atoms considered in the calculation of the RMSD is given in brackets.

</figtable>

All atom RMSD calculation near the binding sites (extra diligence task)

Defining a radius of 6 Å around the ligands of 1OGS_A and 2V3E_B (the catalytic centers / binding sites), we calculated the all atom RMSD in that region between the best model - Modeller multiple templates with high PIDE - and the reference structures. The structure 1OGS_A has one ligand, NAG, which bind to a loop on the outer side of the protein. The structure 2V3E_B has multiple ligands that bind to three different binding sites:

  1. the same binding site as in 1OGS_A: 2 NAG molecules, BMA, FUC
  2. a loop on the other outer side: third NAG molecule
  3. a groove in the center of a protein: NND

<figure id="contact_ras" >

Pymol visualization of the binding sites within 5 Angstrom (sticks) of the ligands (spheres). 1OGS_A: green, 2V3E_B: limegreen, ligand NAG and binding site atoms of 1OGS_A: orange, ligands (3 NAG, BMA, FUC, NND) and binding site atoms of 2V3E_B: yelloworange, binding site atoms of the model (Modeller multiple templates with high PIDE): blue (to 1OGS_A ligand) and cyan (to 2V3E_B ligands).

</figure>

The resulting RMSD value are summarized in the table <xr id="binding_site_rmsd"/> and the Pymol visualization is shown in <xr id="contact_ras"/>.

<figtable id="binding_site_rmsd">

Reference 1OGS_A 2V3E_B
RMSD <6 of ligand(s) (#atoms) 0.266 (29) 0.469 (108)
All atom RMSD calculated between the best model (Modeller multiple templates with high PIDE) and a reference (1OGS_A and 2V3E_B) atoms within a distance of 6 Angstrom of the bound ligand(s).

</figtable>

The all atom RMSD values for the binding sites are pretty low (0.266 and 0.469), much lower then the respective all atom RMSD values for the entire molecules (24.37 and 24.402) and resemble the C-alpha RMSD values for the entire molecules (0.272 and 0.337). The RMSD for the binding sites of 2V3E_B is higher, then for the binding site of 1OGS_A, because in 2V3E_B all three binding sites are included and the number of atom pairs is higher (108 vs. 29). Overall, Modeller aligned the binding site residue well.

Discussion

All-atom vs. C-alpha RMSD

In both cases of RMSD, some distant atoms pairs beyond some threshold were omitted from the calculation. All-atom RMSD is relatively high and sometimes even higher for models created with low sequence identity templates. Pymol C-alpha RMSD is much lower and is lower for models created with high sequence identity, which also look good aligned with the reference structures. Therefore, in the following usage of "RMSD" we always regard the C-alpha RMSD.

Reference structures

In most of the cases, there are almost no differences in the scores when using 1OGS_A or 2V3E_B as a reference template structure. In the following, we regard RMSD values compared to the structure 1OGS, but the trend applied also with 2V3E_B as reference.

How do the RMSD and GDT correlate? Is one score more helpful in finding meaningful models?

The lower the C_alpha RMSD, the higher the GDT-TS score. This means, that a good quality model has a high GDT score. It ranges between 0 and 1. However, GDT scores we received range only between 0.0508 and 0.0699.

Do you see any correlation between the quality scores provided by the modelling tools and the RMSD/GDT?

Modeller DOPE score becomes lower (=better energy) with falling RMSD and rising GDT score. (DOPE was not given the multiple structure models results.)

Swiss-Model QMEAN Z-score becomes more negative for higher RMSD and lower GDT score. It means that models having a higher (or smaller negative number) QMEAN Z-score are better. The same correlation applies for iTasser C-score. ("C-score is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa.")

Is any method systematically better at predicting the structure and does this depend on the similarity of the template?

Looking again at the C_alpha RMSD of all models (see <xr id="models_RMSD"/>), Modeller produced the best two models: using multiple high PIDE templates (RMSD=0.272) and the high PIDE 2XWD (RMSD=0.284) as a template. The model using mixed PIDE templates war also relatively good (RMSD=0.328). The low PIDE template (2WNW) yielded worse results (RMSD=1.523) and using 3 low PIDE templates led to RMSD of 20.935.

Swiss-model produced the third best alignment (RMSD=0.305), also using the high PIDE template. Remarkably, it was the best method using the low PIDE template (RMSD=1.105), the superposed model looks really very similar to the references and the best among the models created with a low PIDE template.

iTasser high PIDE models are not so accurate (RMSD=0.692). However, the low PIDE models are relatively good and come after Swiss-model (RMSD=1.232).

For Modeller: How does including more templates change the model quality?

Using 4 high PIDE templates did not improve much the results as we have expected. Probably combining the templates in one alignment may have introduced some "noise". In equivalent case of low PIDE templates, this noise became much worse and the RMSD has risen from 1.523 (1 low PIDE template) to 20.935 (3 low PIDE templates). However, adding one template with a high PIDE to a template with a low PIDE significantly improved the model quality (from 1.523 to 0.328).

Sources

Modeller:

1) N. Eswar, M. A. Marti-Renom, B. Webb, M. S. Madhusudhan, D. Eramian, M. Shen, U. Pieper, A. Sali. Comparative Protein Structure Modeling With MODELLER. Current Protocols in Bioinformatics, John Wiley & Sons, Inc., Supplement 15, 5.6.1-5.6.30, 2006.

2) M.A. Marti-Renom, A. Stuart, A. Fiser, R. Sánchez, F. Melo, A. Sali. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291-325, 2000.

3) A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.

4) A. Fiser, R.K. Do, & A. Sali. Modeling of loops in protein structures, Protein Science 9. 1753-1773, 20005)

5) Tutorial [1]

Swiss-Model:

1) Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modeling. Bioinformatics, 22,195-201.

2) Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Research 31: 3381-3385.

3) Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modeling. Electrophoresis 18: 2714-2723.

4) Benkert P, Biasini M, Schwede T. (2011). "Toward the estimation of the absolute quality of individual protein structure models." Bioinformatics, 27(3):343-50. (QMEAN-score)

5) Server [2]

iTasser:

1) Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010).

2) Yang Zhang. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9:40 (2008).

3) Ambrish Roy, Jianyi Yang, Yang Zhang. COFACTOR: an accurate comparative algorithm for structure-based protein function annotation. Nucleic Acids Research, vol 40, W471-W477 (2012).

4) Server [3]