Canavan Task 4 - Homology based structure predictions

From Bioinformatikpedia
Revision as of 19:50, 4 June 2012 by Gatzmannf (talk | contribs) (General conclusions)
So you see, if you fall into a lion's pit, the reason the lion will tear you to pieces is not because it's hungry—be assured, zoo animals are 
amply fed—or because it's bloodthirsty, but because you've invaded its territory. 

As an aside, that is why a circus trainer must always enter the lion ring first, and in full sight of the lions. In doing so, he establishes that 
the ring is his territory, not theirs, a notion that he reinforces by shouting, by stomping about, by snapping his whip. The lions are impressed. 
Their disadvantage weighs heavily on them. Notice how they come in: mighty predators though they are, "kings of beasts", they crawl in with
their tails low and they keep to the edges of the ring, which is always round so that they have nowhere to hide. They are in the presence of a 
strongly dominant male, a super-alpha male, and they must submit to his dominance rituals. So they open their jaws wide, they sit up, they
jump through paper-covered hoops, they crawl through tubes, they walk backwards, they roll over. 
"He's a queer one," they think dimly. "Never seen a top lion like him. But he runs a good pride. The larder's always full and—let's be honest, 
mates—his antics keep us busy. Napping all the time does get a bit boring. At least we're not riding bicycles like the brown bears or catching 
flying plates like the chimps." 

From 'Life of Pi' by Yann Martel.

Protocol

Commands, Source Code and other methodocial issues are kept in the protocol.


Template Identification

<figtable id="templates">

HHPRED COMA
PDB ID Organism Protein Name Seq ID Alignment length PDB ID Organism Protein Name Seq ID Alignment length
Seq Id > 80 % 2GU2 Rattus Norvegicus ASPA protein 87% 306
Seq Id 40 - 80% 3NH4 Mus musculus ASPA protein 43% 304 3NFZ Mus musculus Aspartoacylase-2 42% 300
Seq Id < 30% 1YW4 Chromobacterium violaceum Succinylglutamate desuccinylase 15% 250 1YW4 Chromobacterium violaceum Succinylglutamate desuccinylase 12% 331
3CDX Rhodobacter sphaeroides 2 Succinylglutamatedesuccinylase/ aspartoacylase 15% 251 3CDX Rhodobacter sphaeroides 2 Succinylglutamatedesuccinylase/ aspartoacylase 12% 330
2QJ8 Mesorhizobium loti hydrolase 21% 261 2QJ8 mesorhizobium loti Mlr6093 protein 15% 314
1KWM Human Procarboxypeptidase B 11% 192 3glj Sus scrofa(Pig) Carboxypeptidase B 8% 262

</figtable>

With HHPred we received 42 hits using standard parameters. Changing the E-Value threshold for MSA generation and the number of sequences to be shown per HMM, did not result in more hits. There is at least one hit for each sequence identity category.

COMA yielded 22 results, out of which only one structure has an sequence identity of more than 40%. Interestingly, COMA did not find the highest ranked hit from the HHPred output (2GU2 87%). In general, one can say that COMA generates longer alingments with respective lower sequence identities. Running COMA with less restricive E-Values results in more diverse hits, eg several Carboxypeptidases.

It is stated in <ref name="pnas_aspa_structure">Eduard Bitto, Craig A. Bingman, Gary E. Wesenberg, Jason G. McCoy, and George N. Phillips, Jr., Structure of aspartoacylase, the brain enzyme impaired in Canavan disease, Proc Natl Acad Sci U S A. 2007 January 9; 104(2): 456–461. </ref>, that the "N-terminal domain of aspartoacylase adopts a protein fold similar to that of zinc-dependent hydrolases related to carboxypeptidases A. The catalytic site of aspartoacylase shows close structural similarity to those of carboxypeptidases despite only 10–13% sequence identity between these proteins". Therefore it is reasonable to find several carboxypeptidases as hits with low sequence identity within the results of HHPRed as well as of COMA.


We decided to use those templates, that have been found by both methods (see <xr id="templates"/>).

Comparison of Aspartoacylase Structures

In order to be able to assess the quality of the homology models, we will shortly introduce the Aspartoacylase structure. In the PDB there are several structures of the human aspartoacylase.

  • Apo-structure: 2O53: Resolution: 2,7 R-free: 0,269 chains: A,B
  • Holo-structure: 2O4H: Resolution: 2,7 R-free: 0,271 chains: A,B intermediate substrate analog: N-phosphonomethyl-L-aspartate
  • Apo-structure: 2I3C: Resolution: 2,8 R-free: 0,243 chains: A,B
  • Ensemble Refinement 2Q51: Resolution: 2,8 R-free: 0,239 chains: A,B


Superpositioning of the four different crystal structures results in low RMSD values. When visually inspecting the superpositioning, one can also hardly identify any differences. In 2Q51 the beta sheet formed by residues 218-223 and 299-306 is not represented as a sheet in Pymol, which means, that there are some slight angle deviations from the orderly definition of a beta sheet.

<figtable id="aspa_structures_superposed" >

<figure id="aspa_superpos">
<xr nolink id="aspa_superpos"/>
Superposition of the four available Aspartoacylase crystal structures.2O53in pink, 2O4H in green, 2I3C in cyan and 2Q51 in yellow. Only in loop regions there are some small deviations.
</figure>
<figure id="aspa_superpos_binding">
<xr nolink id="aspa_superpos_binding"/>
Closeup of the binding site. The annotated binding site residues have the same orientation and conformation. Only R71 has a different orientation in 2O4H, since it interacts with the ligand which has been cocrystallized in the structure. 2O53in pink, 2O4H in green, 2I3C in cyan and 2Q51 in yellow. Only in loop regions there are some small deviations.
</figure>

</figtable>

Only for residues 158-164, that form a loop which is involved in opening and closing the channel, there are major differences. 2O53 and 2O4H both represent the closed conformation, whereas 2I3C and 2Q51 represent the open conformation.



<figtable id="open_and_close">

<figure id="aspa_superpos_loop">
<xr nolink id="aspa_superpos_loop"/>
Emphasizing the different orientation of the loop formed by residues 158-164. 2O53 and 2O4H show the protein in the closed conformation whereas 2Q51 and 2I3C show the protein in the open conformation. 2O53in pink, 2O4H in green, 2I3C in cyan and 2Q51 in yellow. Only in loop regions there are some small deviations.
</figure>
<figure id="aspa_superpos_closed">
<xr nolink id="aspa_superpos_closed"/>
Surface representation of 2O4H: entrance to binding site is closed.
</figure>
<figure id="aspa_superpos_open">
<xr nolink id="aspa_superpos_open"/>
Surface representation of 2Q51: entrance to binding site is open. The ligand in the binding site of 2O4H is shown in green.
</figure>

</figtable>

We decided to use 2O53 and 2O4H, since they are the two latest structures with the best resolution. Furthermore they have been solved by the same group, once with bound ligand and once in the apo-form.Yet, as a reference structure we decided to use 2O53. Whe comparing the residues in and around the binding sites of 2O53 and 2O4H, there are slight differences. Because most templates do not have a cocrystallized ligand, the models based on these templates will obviously be more similar to the Aspartoacylase structure without bound ligand.

Analysis of the Aspartoacylase structures

<figtable id="aspa_structure">

<figure id="aspa_domains">
<xr nolink id="aspa_domains"/>(2O53)
Aspartoacylase homodimer. The N-terminal domain of one monomer is colored in green, the C-terminal domain is colored in blue.
</figure>

Concerning <xr id="aspa_domains"/>:

  • Aspartoacylase functions as a homodimer.
  • A monomer consists of two domains
    • N-terminal domain (N-domain): residues 1–212 (green)
    • C-terminal domain (C-domain): residues 213–313, globular domain: a two-stranded β-sheet linker wraps around the N-terminal domain (blue)
<figure id="aspa_zinc">
<xr nolink id="aspa_zinc"/>(2O53)
Closeup of the Aspartoacylase zinc binding site. The N-terminal and C-terminal domains are colored in green and blue, respectively. The Zinc ion is represented as a grey sphere. The coordinating residues are colored red. The flexible loop, that mediates opening and closing of the channel is colored in yellow.
</figure>
<figure id="aspa_zinc_ligplot">
<xr nolink id="aspa_zinc_ligplot"/>(2O53)
Ligplot of the interactions between metal binding residues and the Zinc ion.
</figure>

Concerning <xr id="aspa_zinc"/>:

  • N-domain and C-domain of ASPA form a deep narrow channel that leads to the active site
  • residues 158–164 may undergo a conformational change that results in opening and partial closing of the channel entrance (yellow loop in <xr id="aspa_zinc"/>)
  • Coordination of Zinc Ion: H21, E24, H116, R63, E178, Y288
<figure id="aspa_binding">
<xr nolink id="aspa_binding"/>(2O4H)
Closeup of the Aspartoacylase binding site. The protein backbone is colored green and the binding site residues are colored blue. The Zinc ion as represented as a sphere in grey and the intermediate analog N-phosphonomethyl-L-aspartate is colored red. The polar interaction between this substrate and the binding site residues are depicted in orange.
</figure>
<figure id="aspa_ligplot">
<xr nolink id="aspa_ligplot"/>(2O4H)
Ligplot of the interactions between binding site residues ad the substrate analogon in 2O4H.
</figure>

Concerning <xr id="aspa_binding"/>:

  • Active site residues involved in substrate binding: H21, E24, R63, N70, R71, H116, Y164, R168, E178, Y288
  • there are multiple H-bonds between the intermediate analog and the active site residues
    • PDBSum: Ligand matches this enzyme's product L-aspartate with similarity 69.23%
  • glycosylated at a consensus N×T glycosylation motif at position N117

</figtable>

Modeller

Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)

2GU2 was crystallized as a dimer. We only used the monomer (chain A) as a template for modelling Aspartoacylase.

The two different alignment methods Modeller provides, yielded almost identical results (differences for only the first 6 residues - see protocoll for details). We chose the align-2D Alignment.


The model and the native structure can be superimposed with an RMSD of 0.371. Visual inspection of the superposition of the model with the Aspartoacylase structure shows, that there are only minor differences between the two structures (compare <xr id="2gu2_modeller_overall"/>). Some loop regions deviate from the original structure, for example the loop formed by residues 158-164. This is the loop involved in closing and binding the channel, that needs to be very flexible for this purpose. But especially in the most important part, the binding site, the agreement is very good (compare <xr id="2gu2_modeller_binding"/>). Only Y164, which is situated on the flexible loop, has a totally different position in the model.


Filename                          molpdf     DOPE score    GA341 score
----------------------------------------------------------------------
p45381.B99990001.pdb          1640.80847   -36987.52734        1.00000


<figtable id="2gu2_modeller">

<figure id="2gu2_modeller_overall">
<xr nolink id="2gu2_modeller_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 2GU2 in blue. There are hardly any differences in the structure
</figure>
<figure id="2gu2_modeller_binding">
<xr nolink id="2gu2_modeller_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. Except for Y164, these important residues have the same location and orientation.
</figure>

</figtable>

The Ramachandran plot (see: File:2gu2 model rama.pdf) for the model detects only one outlyer residue: Asn-236 . Therefore, the overal geometry of the model is correct.

Scores

<figtable id="modeller_2gu2_scores">

<xr nolink id="modeller_2gu2_scores"/> Different Scores for the model created by Modeller based on the template 2GU2
DOPE Score Weighted Rmsd (302 Ca-atoms) TM-Score GDT_HA GDT_TS binding site RMSD (18 residues)
-36987.5 0.406 0.9815 0.8982 0.9669 0.248

</figtable>







Template >40%, <80% Seq Id: 3NFZ (Mus musculus)

The two different alignment methods provided by Modeller yielded completely identical results.

The generated model has a DOPE score of -34539.4, which is only slightly higher, than for the 2GU2 template.

Filename                          molpdf     DOPE score    GA341 score
----------------------------------------------------------------------
p45381.B99990001.pdb          1969.36560   -34539.44141        1.00000


Again, the model and the native structure are very similar (compare <xr id="3nfz_modeller_overall"/>). Yet, compared to the model based on template 2GU2, there are more deviations, especially in the loop regions. When taking a closer look at the binding site, one can identify some small differences. R71 and R168 have different orientations as well as Y164 (compare <xr id="3nfz_modeller_binding"/>).


<figtable id="3nfz_modeller">

<figure id="3nfz_modeller_overall">
<xr nolink id="3nfz_modeller_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 3NFZ in yellow. There are some differences for loop regions.
</figure>
<figure id="3nfz_modeller_binding">
<xr nolink id="3nfz_modeller_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. of these residues have different rotamers compared to the native structure.
</figure>

</figtable>


The Ramachandran Plot (see File:3nfz model rama.pdf) for the model identifies 5 outlier residues, but still more than 98.4% of the residues lie in allowed regions of the plot :

  • 131 SER
  • 148 ALA
  • 161 SER
  • 174 PRO
  • 227 GLU

Scores

<figtable id="modeller_3nfz_scores">

<xr nolink id="modeller_3nfz_scores"/> Different Scores for the model created by Modeller based on the template 3NFZ
DOPE Score Weighted Rmsd (302 Ca-atoms) TM-Score GDT_HA GDT_TS binding site RMSD (16 residues)
-34539.4 0.775 0.9641 0.7152 % 0.8932 % 0.370

</figtable>









Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

<figure id="hssp">

<xr nolink id="hssp"/>
Schneider and Sander curve regarding safety of homology modelling. We are either at 12% seq ID and 331 alignment length (COMA), or 15% and 250 (HHPred).

</figure>


Protein 1YW4 is a Succinylglutamate desuccinylase, i.e., it belongs to the same Pfam family as Aspartoacylase. Still, the sequence identity is at only 15 at an alignment length of 250 residues (HHPred), and 12 % with length 331 (COMA), respectively. Considering the HSSP curve as first presented by Schneider and Sander in 1990 (<xr id="hssp"/>), we are definitely in an unsafe region for homology modelling.

1YW4 was also crystallized as a dimer, and again, we only used the monomer (chain A) as a template for modelling Aspartoacylase. There were some minor differences in the alignment methods, and we chose the align-2D alignment to include secondary structure information.

A superposition of the model and the reference can be found in <xr id="1yw4_modeller_betasheets"/>. It is easy to see that correct structure modelling failed, and even a seeming similarity - the buried beta-sheets - cannot be taken as correct since they are not the same corresponding sequence regions. The binding site is not even in the same region, but scattered all over the structure (<xr id="1yw4_modeller_binding"/>).

<figtable id="1yw4_modeller">

<figure id="1yw4_modeller_betasheets">
<xr nolink id="1yw4_modeller_betasheets"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 1YW4 in blue. Buried beta sheets are coloured in yellow and orange.
</figure>
<figure id="1yw4_modeller_binding">
<xr nolink id="1yw4_modeller_binding"/>
Binding site residues are coloured in red. In the model (blue), they are scattered all over the structure.
</figure>

</figtable>

Scores

<figtable id="modeller_1yw4_scores">

<xr nolink id="modeller_1yw4_scores"/> Different Scores for the model created by Modeller based on the template 1YW4
DOPE Score Weighted Rmsd (257 Ca-atoms) TM-Score GDT_HA GDT_TS binding site RMSD (16 residues)
-26201.18750 1.899 0.1760 0.0364 % 0.0704 % no conserved binding site

</figtable>









Template < 30% Seq Id: 1KWM (Human)

1KWM is a procarboxypeptidase of type B, and we wanted to find out about eventual similarities between the two types of proteins since, as stated before, there is a strong reported structural relationship between the two. Therefore, we also performed homology modelling for 1KWM. It comes at a sequence identity of only 11% at an alignment length of 192 amino acids.

However, we found the simliarities to be rather un-striking: again, with a template so unsimilar to the original, homology modelling fails. See <xr id="1kwm_modeller_full"/> and <xr id= "1kwm_modeller_binding"/>.

Regarding the scores: note that, even though at a first glance, the weighted RMSD provided by SAP seems to be not quite so bad, it is only calculated for 187 superimposed residues. The RMSD calculated by TM-Score lies at 19.529 for 302 superimposed residues.

We tried to improve the model by comparing the Alignments calculated by modeller to those of BLAST, and by looking at the alignments and trying to place binding site residues of the ASPA sequence at same residues for 1KWM. However, the differences to BLAST were small, and trying to correlate according binding site residues often difficult, and including the new alignments into the prediction did not improve the results.

<figtable id="1kwm_modeller">

<figure id="1kwm_modeller_full">
<xr nolink id="1kwm_modeller_full"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 1KWM in blue.
</figure>
<figure id="1kwm_modeller_binding">
<xr nolink id="1kwm_modeller_binding"/>
Aspa binding residues in red, model residues in magenta, again scattered anywhere but in catalytic centre.
</figure>

</figtable>

Scores

<figtable id="modeller_1kwm_scores">

<xr nolink id="modeller_1kwm_scores"/> Different Scores for the model created by Modeller based on the template 1KWM
DOPE Score Weighted Rmsd (187 Ca-atoms) TM-Score GDT_HA GDT_TS binding site RMSD (16 residues)
-16439.97461 2.146 0.1885 0.0430 % 0.0745 % no conserved binding site

</figtable>









SwissModell

Template > 80% Seq Id: 2GU2(Rattus Norvegicus)

Since the template is a homodimer, SwissModel also creates a dimer model from the Aspartoacylase sequence:

  • 2gu2 is annotated as DIMER
  • The quaternary structure of the target can be assumed to be identical
  • To build the complex the following chains of the complex has been additionally identified: 2gu2B

SwissModel also included the Zinc ligand:

  • All the residues interacting with the ligand are completely conserved between model and template.
  • The RMSD between the interacting residues of model and template is lesser than two: 0.080

The total energy of the model is calculated to be -30164.225 KJ/mol.

The model created by SwissModel is almost identical to the native structure. The secondary structure elements can be superposed very well and only in the loop regions there are some deviations (compare <xr id="2gu2_sw_overall"/>). Once again, especially the channel gating loop, formed by residues 158-164 shows the largest deviations. The agreement between the two binding sites is also very good. As for the Modeller model, Y164 is located on the flexible channel entrance loop and has a completely different positon than compared to the reference crystal structure of Aspartoacylse. Arg71 has a slightly different orientation as well(compare <xr id="2gu2_sw_binding"/>).

<figtable id="2gu2_sw">

<figure id="2gu2_sw_overall">
<xr nolink id="2gu2_sw_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by SwissModel based on the template 2GU2 in blue. There are hardly any differences in the structure
</figure>
<figure id="2gu2_sw_binding">
<xr nolink id="2gu2_sw_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. Except for Arg71 and Y164, these important residues have about the same location and orientation.
</figure>

</figtable>

Scores

<figtable id="sm_2gu2_scores">

<xr nolink id="sm_2gu2_scores"/> Different Scores for the model created by SwissModel based on the template 2GU2
SwissModel Score Weighted Rmsd (301 Ca - atoms) TM-Score GDT_HA GDT_TS binding site RMSD (18 residues)
-30164.225 0.412 0.9788 0.8957 % 0.9652 % 0.255

</figtable>









Template >40%, <80% Seq Id: 3NFZ (Mus musculus)

In this case, the model was created as a monomer, although the template is annotated as a dimer.

  • The target and template sequences are too diverse (seqid: 42.295) to infer a conservation of the oligomeric state


SwissModel does not include any ligands from the template for modelling.

  • CL321: The ligand is farther than 3 Angstroem from the template, so it is assumed that they are not interacting.
  • Given the properties calculated previously, the ligand A.CL321 will not be included in the final model.

The total energy of the model is calculated to be -10093.196 KJ/mol. Thus, the energy is three times higher than for the 2GU2 model.

<figtable id="3nfz_sw">

<figure id="3nfz_sw_overall">
<xr nolink id="3nfz_sw_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by SwissModel based on the template 3nfz in yellow. There are some deviations between the structures especially in loop regions.
</figure>
<figure id="3nfz_sw_binding">
<xr nolink id="3nfz_sw_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. R71, R168, E178 have altered orientations in the model. Y164 even has a different location due to altered loop positioning.
</figure>

</figtable>

Scores

<figtable id="sm_3nfz_scores">

<xr nolink id="sm_3nfz_scores"/> Different Scores for the model created by SwissModel based on the template 3NFZ
SwissModel Score Weighted Rmsd (302 Ca-atoms) TM-Score GDT_HA GDT_TS binding site RMSD (21 residues)
-10093.196 0.750 0.9665 0.7285 % 0.9015 % 0.414

</figtable>









Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

With default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.

Template < 30% Seq Id: 1KWM (Human)

Again, with default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.

iTasser

Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)

I-Tasser modelled Aspartoacylase as a dimer, alhtough the given template 2GU2 was crystallized as a dimer.

The model in general looks very similar to the reference structure of Aspartoacylase, except for the already mentioned loop regions. But when taking a closer look at the binding site, one can find that the orientation of all of the side chains is slightly different to the the reference structure. This is in contrast to the models created by Modeller and SwissModel, where there are deviations for only some of the binding site residues. This is also reflected in the binding site RMSD, that is higher for this model than for the other two models.

<figtable id="2gu2_itasser">

<figure id="superposition_itasser_2gu2_aspa">
<xr nolink id="superposition_itasser_2gu2_aspa"/>
Superposition of Aspartoacylase (PDB:2O53) with N-terminal in green and C-terminal in blue, and the model created by SwissModel based on the template 1KWM in magenta. The 1KWM Human procarboxypeptidase B can be seen as the subset (N-terminal) of Aspartoacylase.
</figure>
<figure id="itasser_2gu2_aspa2_binding">
<xr nolink id="itasser_2gu2_aspa2_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. There are slight deviations in orientation for lle of the binding site residiues when compared to the reference binding site residues of Aspartoacylase.
</figure>

</figtable>


Scores

<figtable id="it_2gu2_scores">

<xr nolink id="it_2gu2_scores"/> Different Scores for the model created by I-Tassser based on the template 2GU2
C-Score Weighted Rmsd (302 Ca - atoms) TM-Score GDT_HA GDT_TS binding site RMSD (18 residues)
1.81 0.552 0.9747 0.8278 % 0.9470 % 0.297

</figtable>










Template > 40%, <80% Seq Id: 3NFZ (Mus musculus)

Commited some time last week; still running

Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

Committed on Friday evening, still running (Monday, 8.30am).

Template < 30% Seq Id: 1KWM (Human)

Committed on Friday evening, still running (Monday, 8.30am).

Modeller - Multimodel from templates with < 30% seq ID

We did not really expect the combined models from the < 30% templates to really improve the overall model. When first aligning the combined model to our reference structure, we were surprised to find many structural elements to be actually captured in the model (<xr id="multimodel30_all"/>). But looking at the binding residues (<xr id="multimodel_binding"/>, we found that the apparent similarities did not include binding site residues and are therefore practically meaningless.

<figtable id="multimodel">

<figure id="multimodel30_all">
<xr nolink id="multimodel30_all"/>
Multimodel in blue, X-Ray aspartoacylase structure in green. Some overall structural elements seem to be well-captured. However, the figure on the right shows that the binding site residues are again scattered over the entire sequence.
</figure>
<figure id="multimodel_binding">
<xr nolink id="multimodel_binding"/>
Residues involved in binding the substrate are coloured in red for X-Ray aspartoacylase (green), and in magenta for the multimodel (blue).
</figure>

</figtable>

Evaluation of Methods

2GU2

All three methods generated very accurate models, having RMSD values of 0.406(Modeller), 0.412(SwissModel) and 0.552(I-Tasser). Based on the RMSD, Modeller generates the best model, whereas I-tasser generates the least accurate one. The same ranking is reflected by the other scores (see <xr id="eval_2gu2_distri"/>). TM_Score and the GDT Scores are as well very low, proving the quality of the models.

Also a visual inspection of the models shows their accuracy, as can be seen in <xr id="eval_2gu2_overall"/>. Here, all models are superposed with the reference structure 2O53. Only in some loop regions, and especially for the very flexible channel gating loop, there are some deviations from the crystal structure (see <xr id="eval_2gu2_loop"/> ).


<figtable id="eval_2gu2">

<figure id="eval_2gu2_distri">
<xr nolink id="eval_2gu2_distri"/>
Distribution of the TM- and GDT Scores for the models of Modeller, SwissModel and I-tasser based on template 2GU2.
</figure>
<figure id="eval_2gu2_overall">
<xr nolink id="eval_2gu2_overall"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53.yellow=2O53, green=I-tasser, magenta=SwissModel, cyan=Modeller
</figure>
<figure id="eval_2gu2_loop">
<xr nolink id="eval_2gu2_loop"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53. The loop gating the entrance to the binding site (residues 158-164) is emphasized. yellow=2O53, green=I-tasser, magenta=SwissModel, cyan=Modeller
</figure>

</figtable>


When taking a closer look at the binding site, one can see, that all models chose a different conformer for Y164 (see <xr id="eval_2gu2_y164"/> ). This correlates with the different positioning of the channel gating loop, on which Y164 is located. For the other binding site residues, there are only minor deviations. Yet, it is clearly obvious, that the binding site residues in the I-tasser model have the most deviations compared to the reference structure (see <xr id="eval_2gu2_r63"/> - <xr id="eval_2gu2_e178"/>). This is also reflected in the binding site RMSD, that is the lowest for I-tasser (Modeller: 0.248, SwissModel: 0.255, I-tasser: 0.297).


<figtable id="eval_2gu2_bindingsite">

<figure id="eval_2gu2_r63">
<xr nolink id="eval_2gu2_r63"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R63.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller
</figure>
<figure id="eval_2gu2_r71">
<xr nolink id="eval_2gu2_r71"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R71.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller
</figure>
<figure id="eval_2gu2_r168">
<xr nolink id="eval_2gu2_r168"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R168.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller
</figure>
<figure id="eval_2gu2_y164">
<xr nolink id="eval_2gu2_y164"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R164.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller
</figure>
<figure id="eval_2gu2_e178">
<xr nolink id="eval_2gu2_e178"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on E178.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller
</figure>

</figtable>

In Conclusion, all generated homology models based on 2GU2 are very accurate. This goes along with the expactations, since the template shares over 80% of the residues with the target sequence. The differences between the models are very little. Yet, I-tasser has the largest differences for the binding site residues which is also reflected in all of the evaluation scores.

3FNZ

For this template from mus musculus with 40% sequence identity, Modeller and SwissModel generated very accurate results. We did not receive the results from iTasser yet. The RMSD values for both 3NFZ models are very low (Modeller: 0.775, SwissModel: 0.75) but yet, compared to the template 2GU2, they are about 0.3 points higher, which indicates slightly less accurate models.

As can be seen in <xr id="eval_3nfz_overall"/>, visually there are only some deviations in the loop regions. Again, as for the 2GU2 template, residues 158-164 forming this flexible channel entrance loop, show the largest deviation compared to the reference structure of Aspartoacylase.

In the binding site, the orientation of some residues differs from the reference residue orientation. According to this observation, the binding site RMSD is slightly higher (about 0.15), than for the 2GU2 model, where there have been almost no deviations.

This qulitative loss of the models based on 3NFZ compared to the models based on 2GU2 is also reflected in the GDT_HA scores. The GDT_HA score is much lower for both 3NFZ models, than for the 2GU2 models. In contrast, the TM_Score for both template models is about the same (average TM_score 2GU2: 0.978, 3NFZ: 0.965) and the GDT_TS score is only slightly lower for the 3NFZ models than for the 2GU2 models (average GDT_TS 2GU2: 0.959, 3NFZ: 0.87).


<figtable id="eval_3nfz">

<figure id="eval_3nfz_distri">
<xr nolink id="eval_3nfz_distri"/>
Distribution of the TM and GDT Scores for the models based on template 3NFZ.
</figure>
<figure id="eval_3nfz_overall">
<xr nolink id="eval_3nfz_overall"/>
Superposition of the Modeller and SwissModel models based on template 3NFZ with the Aspartoacylase reference structure. green=2O53, magenta=SwissModel, blue=Modeller
</figure>
<figure id="eval_3nfz_binding">
<xr nolink id="eval_3nfz_binding"/>
Superposition of the Modeller and SwissModel models with the Aspartoacylase reference structure 2O53. The orientation of the residues in the binding site differs in many cases from the original. green=2O53, magenta=SwissModel, blue=Modeller
</figure>

</figtable>

1KWM

For the moment, only the results of modeller are availabe. SwissModel does not create a model at all, given the low similarity of template and target, while we are still expecting the iTasser results. However, the modeller and SwissModel results already indicate what one could expect: if the chosen template is not of good quality, homology modelling is bound to fail. See <xr id = "1kwm_modeller_full"/> for details and scores.

1YW4

The same as for 1KWM holds true for 1YW4 - bad template, bad results. See <xr id="1yw4_modeller_betasheets"/> for details.

General conclusions

Regarding the methods:

SwissModel and Modeller generated very similar models for the chosen templates. The Scores indicate that the models from Modeller are marginally better than those from SwissModel. For iTasser we did not receive all results yet, which is why it might be wrong to already come to a conclusion. Overall, the generated models are similar to the models generated by SwissModel and Modeller. Yet, as the scores indicate as well, the iTasser models have larger deviations when it comes to residue orientations. This result is somehwat dissapointing, since iTasser is dealt to be the best method out there.

Regarding the scores:

In our case, the RMSD values are a good measure for model quality as is the more restrictive RMSD value of 6 Angstrom binding site residues. One should note that, while the bad template models and results (<30%) seem to have good RMSDs at first sight, these are only based on the actual common residues, which can go down to xxx. The TM-Score does not give a real discrimination for the models based on a 80% template and a only 40% template. There against, the GDT scores discriminate a better, especially the GDT_HA score.

References

<references/>