Canavan Task 4 - Homology based structure predictions
- 1 Protocol
- 2 Template Identification
- 3 Comparison of Aspartoacylase Structures
- 4 Modeller
- 5 SwissModell
- 6 I-Tasser
- 7 Modeller - Multimodel from templates with < 30% seq ID
- 8 Editing the Alignments
- 9 Jigsaw
- 10 Evaluation of Methods
- 11 General conclusions
- 12 References
Commands, Source Code and other methodocial issues are kept in the protocol.
Since we did not identify suitable homologues so far, we used HHPred and COMA to search for them. In <xr id="templates"/> the results from the HHPred and COMA search are summarized.
With HHPred we received 42 hits using standard parameters. Changing the E-Value threshold for MSA generation and the number of sequences to be shown per HMM, did not result in more hits. There is at least one hit for each sequence identity category.
COMA yielded 22 results, out of which only one structure has an sequence identity of more than 40%. Interestingly, COMA did not find the highest ranked hit from the HHPred output (2GU2 87%). In general, one can say that COMA generates longer alingments with respective lower sequence identities. Running COMA with less restricive E-Values results in more diverse hits, eg several Carboxypeptidases.
It is stated in Bitto et al (1999) <ref name="pnas_aspa_structure">Eduard Bitto, Craig A. Bingman, Gary E. Wesenberg, Jason G. McCoy, and George N. Phillips, Jr., Structure of aspartoacylase, the brain enzyme impaired in Canavan disease, Proc Natl Acad Sci U S A. 2007 January 9; 104(2): 456–461. </ref>, that the "N-terminal domain of aspartoacylase adopts a protein fold similar to that of zinc-dependent hydrolases related to carboxypeptidases A. The catalytic site of aspartoacylase shows close structural similarity to those of carboxypeptidases despite only 10–13% sequence identity between these proteins". Therefore it is reasonable to find several carboxypeptidases as hits with low sequence identity within the results of HHPRed as well as of COMA.
We decided to use those templates, that have been found by both methods (see <xr id="templates"/>) plus the rat homologue found by HHPred.
|PDB ID||Organism||Protein Name||Seq ID||Alignment length||PDB ID||Organism||Protein Name||Seq ID||Alignment length|
|Seq Id > 80 %||2GU2||Rattus Norvegicus||ASPA protein||87%||306|
|Seq Id 40 - 80%||3NH4||Mus musculus||ASPA protein||43%||304||3NFZ||Mus musculus||Aspartoacylase-2||42%||300|
|Seq Id < 30%||1YW4||Chromobacterium violaceum||Succinylglutamate desuccinylase||15%||250||1YW4||Chromobacterium violaceum||Succinylglutamate desuccinylase||12%||331|
|3CDX||Rhodobacter sphaeroides 2||Succinylglutamatedesuccinylase/ aspartoacylase||15%||251||3CDX||Rhodobacter sphaeroides 2||Succinylglutamatedesuccinylase/ aspartoacylase||12%||330|
|2QJ8||Mesorhizobium loti||hydrolase||21%||261||2QJ8||mesorhizobium loti||Mlr6093 protein||15%||314|
|1KWM||Human||Procarboxypeptidase B||11%||192||3glj||Sus scrofa(Pig)||Carboxypeptidase B||8%||262|
Comparison of Aspartoacylase Structures
In order to be able to assess the quality of the homology models, we will shortly introduce the Aspartoacylase structure. In the PDB there are several structures of the human aspartoacylase.
- Apo-structure: 2O53: Resolution: 2,7 R-free: 0,269 chains: A,B
- Holo-structure: 2O4H: Resolution: 2,7 R-free: 0,271 chains: A,B intermediate substrate analog: N-phosphonomethyl-L-aspartate
- Apo-structure: 2I3C: Resolution: 2,8 R-free: 0,243 chains: A,B
- Ensemble Refinement 2Q51: Resolution: 2,8 R-free: 0,239 chains: A,B
for 2O4H, PDBSum states: "Ligand matches this enzyme's product L-aspartate with similarity 69.23%".
Superpositioning of the four different crystal structures results in low RMSD values. When visually inspecting the superpositioning, one can also hardly identify any differences (see <xr id="aspa_superpos"/>, <xr id="aspa_superpos_binding"/>). In 2Q51 the beta sheet formed by residues 218-223 and 299-306 is not represented as a sheet in Pymol, which means, that there are some slight angle deviations from the orderly definition of a beta sheet.
<figtable id="aspa_structures_superposed" >
|<figure id="aspa_superpos">||<figure id="aspa_superpos_binding">|
Only for residues 158-164, that form a loop which is involved in opening and closing the channel, there are major differences. In <xr id="aspa_superpos_loop"/> the different conformations of this gating loop are emphasized. 2O53 and 2O4H both represent the closed conformation (see <xr id="aspa_superpos_closed"/>), whereas 2I3C and 2Q51 represent the open conformation (see <xr id="aspa_superpos_open"/>).
|<figure id="aspa_superpos_loop">||<figure id="aspa_superpos_closed">||<figure id="aspa_superpos_open">|
We decided to use 2O53 and 2O4H, since they are the two latest structures with the best resolution. Furthermore they have been solved by the same group, once with bound ligand and once in the apo-form. Yet, as a reference structure we decided to use 2O53. When comparing the residues in and around the binding sites of 2O53 and 2O4H, there are slight differences. Because most templates do not have a cocrystallized ligand, the models based on these templates will obviously be more similar to the Aspartoacylase structure without bound ligand.
For taking a closer look at both crystal structures of Aspartoacylase, which might be helpful for the following analysis, please refer to Task 1 > Protein.
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
2GU2 was crystallized as a dimer. We only used the monomer (chain A) as a template for modelling Aspartoacylase.
The two different alignment methods Modeller provides, yielded almost identical results (differences for only the first 6 residues - see protocol for details). We chose the align-2D Alignment.
The model and the native structure can be superimposed with an RMSD of 0.371. Visual inspection of the superposition of the model with the Aspartoacylase structure shows, that there are only minor differences between the two structures (compare <xr id="2gu2_modeller_overall"/>). Some loop regions deviate from the original structure, for example the loop formed by residues 158-164. This is the loop involved in closing and binding the channel, that needs to be very flexible for this purpose. But especially in the most important part, the binding site, the agreement is very good (compare <xr id="2gu2_modeller_binding"/>). Only Y164, which is situated on the flexible loop, has a totally different position in the model.
|<figure id="2gu2_modeller_overall">||<figure id="2gu2_modeller_binding">|
The Ramachandran plot (see: File:2gu2 model rama.pdf) for the model detects only one outlier residue: Asn-236 . Therefore, the overal geometry of the model is correct.
|DOPE Score||Weighted Rmsd (302 Ca-atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (18 residues)|
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
The two different alignment methods provided by Modeller yielded completely identical results.
Again, the model and the native structure are very similar (compare <xr id="3nfz_modeller_overall"/>). Yet, compared to the model based on template 2GU2, there are more deviations, especially in the loop regions. When taking a closer look at the binding site, one can identify some small differences. R71 and R168 have different orientations as well as Y164 (compare <xr id="3nfz_modeller_binding"/>).
The Ramachandran Plot (see File:3nfz model rama.pdf) for the model identifies 5 outlier residues (131 SER, 148 ALA, 161 SER, 174 PRO, 227 GLU), but still more than 98.4% of the residues lie in allowed regions of the plot.
|<figure id="3nfz_modeller_overall">||<figure id="3nfz_modeller_binding">|
The generated model has a DOPE score of -34539.4, which is only slightly higher, than for the 2GU2 template. Also the TM, GDT_HA and GDT_TS scores are very high, yet lower than for the 87% sequence identity template 2GU2. Especially both GDT scores are far lower than the 2GU2 score than compared to the TM score, that shows only a small drop.
|DOPE Score||Weighted Rmsd (302 Ca-atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (16 residues)|
|-34539.4||0.775||0.9641||0.7152 %||0.8932 %||0.370|
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
Protein 1YW4 is a Succinylglutamate desuccinylase, i.e., it belongs to the same Pfam family as Aspartoacylase. Still, the sequence identity is at only 15% at an alignment length of 250 residues (HHPred), and 12% with length 331 (COMA), respectively. Considering the HSSP curve as first presented by Schneider and Sander in 1990 (<xr id="hssp"/>), we are definitely in an unsafe region for homology modelling.
1YW4 was also crystallized as a dimer, and again, we only used the monomer (chain A) as a template for modelling Aspartoacylase. There were some minor differences in the alignment methods, and we chose the align-2D alignment to include secondary structure information.
A superposition of the model and the reference can be found in <xr id="1yw4_modeller_betasheets"/>. It is easy to see that correct structure modelling failed, and even a seeming similarity - the buried beta-sheets - cannot be taken as correct since they are not the same corresponding sequence regions. The binding site is not even in the same region, but scattered all over the structure (<xr id="1yw4_modeller_binding"/>).
|<figure id="1yw4_modeller_betasheets">||<figure id="1yw4_modeller_binding">|
The failure of model building is also reflected in the Scores. The DOPE score is about 10.000 points higher than for 2GU2 and also the RMSD almost reaches 2A. Furthermore TM and GDT scores are very low.
|DOPE Score||Weighted Rmsd (257 Ca-atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (16 residues)|
|-26201.18750||1.899||0.1760||0.0364 %||0.0704 %||no conserved binding site|
Template < 30% Seq Id: 1KWM (Human)
1KWM is a procarboxypeptidase of type B, and we wanted to find out about eventual similarities between the two types of proteins since, as stated before, there is a strong reported structural relationship between the two. Therefore, we also performed homology modelling for 1KWM. It comes at a sequence identity of only 11% at an alignment length of 192 amino acids.
However, we found the simliarities to be rather un-striking: again, with a template so unsimilar to the original, homology modelling fails. See <xr id="1kwm_modeller_full"/> and <xr id= "1kwm_modeller_binding"/>.
We tried to improve the model by comparing the Alignments calculated by modeller to those of BLAST, and by looking at the alignments and trying to place binding site residues of the ASPA sequence at same residues for 1KWM. However, the differences to BLAST were small, and trying to correlate according binding site residues often difficult, and including the new alignments into the prediction did not improve the results.
|<figure id="1kwm_modeller_full">||<figure id="1kwm_modeller_binding">|
Note that, even though at a first glance, the weighted RMSD provided by SAP seems to be not quite so bad, it is only calculated for 187 superimposed residues. The RMSD calculated by TM-Score lies at 19.529 for 302 superimposed residues. As for the other low sequence identitiy template, TM and GDT scores are really low.
|DOPE Score||Weighted Rmsd (187 Ca-atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (16 residues)|
|-16439.97461||2.146||0.1885||0.0430 %||0.0745 %||no conserved binding site|
Template > 80% Seq Id: 2GU2(Rattus Norvegicus)
SwissModel automatically identifies 2GU2 as a dimer and also creates a dimer model from the Aspartoacylase sequence using chain B of the 2GU2 structure:
"To build the complex the following chains of the complex has been additionally identified: 2gu2B"
SwissModel also included the Zinc ligand:
- All the residues interacting with the ligand are completely conserved between model and template.
- The RMSD between the interacting residues of model and template is lesser than two: 0.080
The model created by SwissModel is almost identical to the native structure. The secondary structure elements can be superposed very well and only in the loop regions there are some deviations (compare <xr id="2gu2_sw_overall"/>). Once again, especially the channel gating loop, formed by residues 158-164 shows the largest deviations. The agreement between the two binding sites is also very good. As for the Modeller model, Y164 is located on the flexible channel entrance loop and has a completely different positon than compared to the reference crystal structure of Aspartoacylse. Arg71 has a slightly different orientation as well(compare <xr id="2gu2_sw_binding"/>).
|<figure id="2gu2_sw_overall">||<figure id="2gu2_sw_binding">|
SwissModel provides its own SwissModel Score. It is -30164.225 KJ/mol for the model. Both RMSDs are very low, and not significantly higher than for the Modeller model generated based on 2GU2. The same holds true for the TM and the GDT scores.
|SwissModel Score||Weighted Rmsd (301 Ca - atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (18 residues)|
|-30164.225||0.412||0.9788||0.8957 %||0.9652 %||0.255|
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
In this case, the model was created as a monomer, although the template is annotated as a dimer:
"The target and template sequences are too diverse (seqid: 42.295) to infer a conservation of the oligomeric state"
SwissModel does not include any ligands from the template for modelling either:
- "CL321: The ligand is farther than 3 Angstroem from the template, so it is assumed that they are not interacting."
- "Given the properties calculated previously, the ligand A.CL321 will not be included in the final model."
There are some deviations, especially in loop regions, between the model and the native structure. Yet, the overal fold and shape is modelled correctly (see <xr id="3nfz_sw_overall"/>). When taking a closer look at the binding site one can find some residues, that have altered orientations: Y164, E178, R71 (see <xr id="3nfz_sw_binding"/>). In the 3NFZ model generated by Modeler, the deviations are similar (residues Y164, R71 and R168).
|<figure id="3nfz_sw_overall">||<figure id="3nfz_sw_binding">|
The total energy of the model is calculated to be -10093.196 KJ/mol, which is three times worse than for the 2GU2 model. In case of the models generated by Modeler, the differences in the DOPE score have not been so immense (-36900 vs.-34500). Yet, the RMSD values are almost identical compared to the Modeller result and also the TM and GDT scores are similar.
|SwissModel Score||Weighted Rmsd (302 Ca-atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (21 residues)|
|-10093.196||0.750||0.9665||0.7285 %||0.9015 %||0.414|
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
With default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.
Template < 30% Seq Id: 1KWM (Human)
Again, with default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
I-Tasser modelled Aspartoacylase as a dimer, since the given template 2GU2 was crystallized as a dimer as well.
The model in general looks very similar to the reference structure of Aspartoacylase, except for the already mentioned loop regions. But when taking a closer look at the binding site, one can find that the orientation of all of the side chains is slightly different to the the reference structure. This is in contrast to the models created by Modeller and SwissModel, where there are deviations for only some of the binding site residues.
|<figure id="superposition_itasser_2gu2_aspa">||<figure id="itasser_2gu2_aspa2_binding">|
The deviation of residues in the binding site is also reflected in RMSD values, that are higher for this model than for the other two models. E.g. the binding site RMSD is 2.9 compared to 2.4 and 2.5 for Modeller and SwissModel respectively, and the weighted RMSD is 0.55 compared to 0.4 and 0.41 for Modeller and SwissModel respectively. Yet, the TM and GDT scores do not reflect this loss in accuracy.
|C-Score||Weighted Rmsd (302 Ca - atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (18 residues)|
|1.81||0.552||0.9747||0.8278 %||0.9470 %||0.297|
Template > 40%, <80% Seq Id: 3NFZ (Mus musculus)
I-tasser generated a monomeric model, that shows already some deviations to the reference structure of 2O53, when visually inspecting the model (see <xr id="superposition_itasser_3nfz_aspa"/>). Especially loop regions deviate and also some secondary structure elements can not be superposed without errors.
When taking a closer look at the binding site, the deviations become even more obviuous (see <xr id="itasser_3nfz_aspa_binding"/>). Y164 is once again located very differently from the native Y164 position. This is due to the incorrect modelling of the channel closing loop, upon which Y164 is located. The other binding site residues are aligned and positioned correctly, but with different residue conformations. Thus the binding site has already a different shape than compared with the reference structure of 2O53.
|<figure id="superposition_itasser_3nfz_aspa">||<figure id="itasser_3nfz_aspa_binding">|
As for the Rat template, I-tasser generates worse results than Modeller and SwissModel. This is reflected in all scores. The weighted RMSD is higher (0.9) than compared to the Modeller (0.7) and SwissModel (0.7) models, as well as the binding site RMSD (0.7 compared to 0.37 and 0.4 respectively). The same trend can be seen in the GDT_HA score that is significantly smaller than for the models generated by the other two methods. GDT_TS and TM-score there against do not reflect this observation: both scores are similar to those of the other two models.
|C-Score||Weighted Rmsd (302 Ca - atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (21 residues)|
|1.55||0.915||0.9592||0.6714 %||0.8725 %||0.709|
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
Given the little sequence similarity and the results of the methods applied before, we were surprised to find that the I-Tasser models managed to capture a large part of the secondary structure arrangements of an Aspartoacylase monomer as can be seen in <xr id="1yw4_itasser_aspa"/>. We re-ran I-Tasser to make sure we had not forgotten to exclude helpful templates with >30% sequence identity, and obtained the same - very decent - models again. A closeup of the binding site shows, however, that the exact position of the residues involved in substrate binding are not modelled correctly.
<figtable id = "1yw4_itasser">
|<figure id="1yw4_itasser_aspa">||<figure id="1yw4_itasser_aspa_binding">|
|C-Score||Weighted Rmsd (209 Ca - atoms)||TM-Score||GDT_HA||GDT_TS||Binding Site RMSD|
|-0.35||4.685||0.6353||0.2404 %||0.4321 %||3.835|
Template < 30% Seq Id: 1KWM (Human)
For the template 1KWM, the model did not show as good results as 1YW4, but rather what could have been expected from such low sequence identity. <xr id = "1kwm_itasser_aspa"/> shows that some helices and few beta sheets are in a similar position, but the overall correctness is not high enough to be able to use it as a model. The binding site in <xr id ="1kwm_itasser_aspa_binding"/> is not captured, which is reflected in the bad RMSD scores for the binding site residues (table below). However, the model is still much better than for Modeller, since some secondary structure elements are similar, and binding site residues are at least positioned somewhere close to the binding site, and not scattered all over the structure.
|<figure id="1kwm_itasser_aspa">||<figure id="1kwm_itasser_aspa_binding">|
|C-Score||Weighted Rmsd (186 Ca - atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (21 residues)|
|-2.98||9.525||0.50||0.2136 %||0.3510 %||6.03|
Modeller - Multimodel from templates with < 30% seq ID
We did not really expect the combined models from the < 30% templates to really improve the overall model. When first aligning the combined model to our reference structure, we were surprised to find many structural elements to be actually captured in the model (<xr id="multimodel30_all"/>). But looking at the binding residues (<xr id="multimodel_binding"/>, we found that the apparent similarities did not include binding site residues and are therefore practically meaningless.
|<figure id="multimodel30_all">||<figure id="multimodel_binding">|
Editing the Alignments
For 2GU2 and 3NFZ, we did not perform manual alignment modifications since the models can hardly be improved - mostly, loop and side chain orientations differed for these models, something that cannot be improved by an aligmnet.
We had actually higher hopes to achieve a usable model for 1KWM, since Carboxypeptidases are reported to be structurally related to Aspartoacylases. We therefore tried to improve the alignment manually, including a BLAST alignment of the two sequences and trying to map catalytic center positions of Aspartoacylase to the Carboxypeptidase sequence. However, when feeding this manual alignment to modeller, the resulting model looked before, having more loop regions. The scores, however, improved slightly, but not so much as to claim that manual alignment is a safe way to improve performance.
|DOPE Score||Weighted Rmsd (234 Ca - atoms)||TM-Score||GDT_HA||GDT_TS||binding site RMSD (18 residues)|
|-6289.75244||5.259||0.1960||0.0331 %||0.0654 %||no conserved binding site|
We fed jigsaw our 5 best results, i.e., the 2GU2 models from the three different methods, and 3NFZ from modeller and SwissModel. Jigsaw provided 5 new results for us, in which the differences were small. The resulting model energies ranged from -438.03 to -431.33 kj/mol, and sequence coverage went from 313 amino acids (which is the complete Aspartoacylase sequence) to 311 amino acids.
Since four of the 5 models were very similar, we only chose one of them for the pictures below for easier comparison, shown in blue. The fifth model differed only in loop regions, and differed more from the original Aspartoacylase than the other four, and is shown in yellow.
For the binding site, orientations of Tyr164 and and Arg168 were least correct. For Tyr164, note that it is part of the flexible loop allowing entrance to the binding site.
|<figure id="jigsaw_all">||<figure id="jigsaw_binding">|
Evaluation of Methods
All three methods generated very accurate models, having RMSD values of 0.406(Modeller), 0.412(SwissModel) and 0.552(I-Tasser). Based on the RMSD, Modeller generates the best model, whereas I-tasser generates the least accurate one. The same ranking is reflected by the other scores (see <xr id="eval_2gu2_distri"/>). TM_Score and the GDT Scores are as well very high, proving the quality of the models.
Also a visual inspection of the models shows their accuracy, as can be seen in <xr id="eval_2gu2_overall"/>. Here, all models are superposed with the reference structure 2O53. Only in some loop regions, and especially for the very flexible channel gating loop, there are some deviations from the crystal structure (see <xr id="eval_2gu2_loop"/> ).
|<figure id="eval_2gu2_distri">||<figure id="eval_2gu2_overall">||<figure id="eval_2gu2_loop">|
When taking a closer look at the binding site, one can see, that all models chose a different conformer for Y164 (see <xr id="eval_2gu2_y164"/> ). This correlates with the different positioning of the channel gating loop, on which Y164 is located. For the other binding site residues, there are only minor deviations. Yet, it is clearly obvious, that the binding site residues in the I-tasser model have the most deviations compared to the reference structure (see <xr id="eval_2gu2_r63"/> - <xr id="eval_2gu2_e178"/>). This is also reflected in the binding site RMSD, that has the highest value in case of the I-tasser model (Modeller: 0.248, SwissModel: 0.255, I-tasser: 0.297).
|<figure id="eval_2gu2_r63">||<figure id="eval_2gu2_r71">||<figure id="eval_2gu2_r168">||<figure id="eval_2gu2_y164">||<figure id="eval_2gu2_e178">|
In Conclusion, all generated homology models based on 2GU2 are very accurate. This goes along with the expactations, since the template shares over 80% of the residues with the target sequence. The differences between the models are very little. Yet, I-tasser has the largest differences for the binding site residues which is also reflected in all of the evaluation scores.
For this template from mus musculus with 40% sequence identity, Modeller and SwissModel generated very accurate results. I-tasser generated a model, that in overal has all the secondary structure elements in common with the reference structure, but deviates in the details, like all loops and residue orientations. The RMSD values for the 3NFZ models from Modeller and SwissModel are very low (Modeller: 0.775, SwissModel: 0.75) and for I-tasser bit higher (0.915). Yet, compared to the template 2GU2, they are about 0.3 (0.5 respectively) points higher, which indicates slightly less accurate models.
As can be seen in <xr id="eval_3nfz_overall"/>, visually there are only some deviations in the loop regions. Again, as for the 2GU2 template, residues 158-164 forming this flexible channel entrance loop, show the largest deviation compared to the reference structure of Aspartoacylase.
In the binding site, the orientation of some residues differs from the reference residue orientation. According to this observation, the binding site RMSD is slightly higher (about 0.15, respectively 0.45 higher for the I-tasser model), than for the 2GU2 model, where there have been almost no deviations. For example, Y164 is positined on the channel closing loop which is incorrectly modelled in all three cases (see <xr id="eval_3nfz_binding"/>). Furthermore, I-tasser has a different rotamer for R71 and Modeller predicts a different orientation for R168 (see <xr id="eval_3nfz_binding2"/>).
This qulitative loss of the models based on 3NFZ compared to the models based on 2GU2 is also reflected in the GDT_HA scores. The GDT_HA score is much lower for all 3NFZ models, than for the 2GU2 models. In contrast, the TM_Score for both template models is about the same (average TM_score 2GU2: 0.978, 3NFZ: 0.965) and the GDT_TS score is only slightly lower for the 3NFZ models than for the 2GU2 models (average GDT_TS 2GU2: 0.959, 3NFZ: 0.87).
|<figure id="eval_3nfz_distri">||<figure id="eval_3nfz_overall">||<figure id="eval_3nfz_binding">||<figure id="eval_3nfz_binding2">|
1KWM and 1YW4
For 1KWM and 1YW4, SwissModel does not create a model at all, given the low similarity of template and target.
The I-Tasser model for 1YW4 was very good, given the little sequence identity, but is difficult if exact simulations need to be done, because the binding site residues are not modelled correctly. The I-Tasser model for 1KWM was correct only when looking very roughly at overall secondary structure elements, but apart from that, showed major differences to the original.
So, the modeller and SwissModel results already indicate what one could expect: if the chosen template is not of good quality, homology modelling is unreliable and not a safe tool to use.
Regarding the methods
SwissModel and Modeller generated very similar models for the chosen templates. The scores indicate that the models from Modeller are marginally better than those from SwissModel. For I-Tasser we did not receive all results yet, which is why it might be wrong to already come to a conclusion. Overall, the generated models are similar to the models generated by SwissModel and Modeller. Yet, as the scores indicate as well, the I-Tasser models have larger deviations when it comes to residue orientations. This result is somewhat dissapointing, since I-Tasser is dealt to be the best method out there.
Regarding the scores
In our case, the RMSD values are a good measure for model quality as is the more restrictive RMSD value of 6 Angstrom binding site residues. One should note that, while the bad template models and results (<30%) seem to have good RMSDs at first sight, these are only based on the actual common residues, which can go down to 187 residues. The TM-Score does not give a real discrimination for the models based on a 80% template and a only 40% template. There against, the GDT scores discriminate a better, especially the GDT_HA score (see <xr id="comparison_scores_rat_mouse"/>).
For 2GU2 and 3NFZ the C-score gives a good hint, on which is the better model:
- 2GU2: 1.81
- 3NFZ: 1.55
Since, 2 is defined as the score for an identical model and the C-Score ranges down to -5, a score over 1.5 declares the model to be very accurate, what is correct in our case.
For 2GU2 and 3NFZ the SwissModel Score also gives a good hint, on which is the better model:
- 2GU2: -30164.225
- 3NFZ: -10093.196
The score for the 3NFZ model is almost three times higher than for the 2GU2 model. This ratio is somewhat unappropriate regarding the good quality of both models.
Modeller Dope Score: The Dope Score also describes the quality of the models very well:
- 2GU2: -36987.5
- 3NFZ: -34539.4
- 1YW4: -26201.2
- 1KWM: -16440
The models based on 2GU2 and 3NFZ are both very accurate and thus have similar good Dope Scores, whereas the other both models are of really bad quality which is well reflected in the Dope Score.
Regarding the strategy
It is known that Homology Modelling relies heavily on choosing a good template. It was therefore not very surprising to find that the models produced by templates of little similarity to our protein were sometimes plain wrong, even though the I-Tasser models were surprisingly good. For the Modeller results, manual alignment did - in our case and with the approach we tried - nothing to improve this.