Canavan Task 4 - Homology based structure predictions

Protocol

Commands, Source Code and other methodocial issues are kept in the protocol.

Template Identification

Since we did not identify suitable homologues so far, we used HHPred and COMA to search for them. In <xr id="templates"/> the results from the HHPred and COMA search are summarized.

With HHPred we received 42 hits using standard parameters. Changing the E-Value threshold for MSA generation and the number of sequences to be shown per HMM, did not result in more hits. There is at least one hit for each sequence identity category.

COMA yielded 22 results, out of which only one structure has an sequence identity of more than 40%. Interestingly, COMA did not find the highest ranked hit from the HHPred output (2GU2 87%). In general, one can say that COMA generates longer alingments with respective lower sequence identities. Running COMA with less restricive E-Values results in more diverse hits, eg several Carboxypeptidases.

It is stated in Bitto et al (1999) <ref name="pnas_aspa_structure">Eduard Bitto, Craig A. Bingman, Gary E. Wesenberg, Jason G. McCoy, and George N. Phillips, Jr., Structure of aspartoacylase, the brain enzyme impaired in Canavan disease, Proc Natl Acad Sci U S A. 2007 January 9; 104(2): 456–461. </ref>, that the "N-terminal domain of aspartoacylase adopts a protein fold similar to that of zinc-dependent hydrolases related to carboxypeptidases A. The catalytic site of aspartoacylase shows close structural similarity to those of carboxypeptidases despite only 10–13% sequence identity between these proteins". Therefore it is reasonable to find several carboxypeptidases as hits with low sequence identity within the results of HHPRed as well as of COMA.

We decided to use those templates, that have been found by both methods (see <xr id="templates"/>) plus the rat homologue found by HHPred.

	HHPRED					COMA
	PDB ID	Organism	Protein Name	Seq ID	Alignment length	PDB ID	Organism	Protein Name	Seq ID	Alignment length
Seq Id > 80 %	2GU2	Rattus Norvegicus	ASPA protein	87%	306
Seq Id 40 - 80%	3NH4	Mus musculus	ASPA protein	43%	304	3NFZ	Mus musculus	Aspartoacylase-2	42%	300
Seq Id < 30%	1YW4	Chromobacterium violaceum	Succinylglutamate desuccinylase	15%	250	1YW4	Chromobacterium violaceum	Succinylglutamate desuccinylase	12%	331
	3CDX	Rhodobacter sphaeroides 2	Succinylglutamatedesuccinylase/ aspartoacylase	15%	251	3CDX	Rhodobacter sphaeroides 2	Succinylglutamatedesuccinylase/ aspartoacylase	12%	330
	2QJ8	Mesorhizobium loti	hydrolase	21%	261	2QJ8	mesorhizobium loti	Mlr6093 protein	15%	314
	1KWM	Human	Procarboxypeptidase B	11%	192	3glj	Sus scrofa(Pig)	Carboxypeptidase B	8%	262

</figtable>

Comparison of Aspartoacylase Structures

In order to be able to assess the quality of the homology models, we will shortly introduce the Aspartoacylase structure. In the PDB there are several structures of the human aspartoacylase.

Apo-structure: 2O53: Resolution: 2,7 R-free: 0,269 chains: A,B
Holo-structure: 2O4H: Resolution: 2,7 R-free: 0,271 chains: A,B intermediate substrate analog: N-phosphonomethyl-L-aspartate
Apo-structure: 2I3C: Resolution: 2,8 R-free: 0,243 chains: A,B
Ensemble Refinement 2Q51: Resolution: 2,8 R-free: 0,239 chains: A,B

for 2O4H, PDBSum states: "Ligand matches this enzyme's product L-aspartate with similarity 69.23%".

Superpositioning of the four different crystal structures results in low RMSD values. When visually inspecting the superpositioning, one can also hardly identify any differences (see <xr id="aspa_superpos"/>, <xr id="aspa_superpos_binding"/>). In 2Q51 the beta sheet formed by residues 218-223 and 299-306 is not represented as a sheet in Pymol, which means, that there are some slight angle deviations from the orderly definition of a beta sheet.

<xr nolink id="aspa_superpos"/>
Superposition of the four available Aspartoacylase crystal structures. Pink = 2O53, Green = 2O4H, Cyan = 2I3C, Yellow =2Q51. Only in loop regions there are some small deviations.

</figure>

<xr nolink id="aspa_superpos_binding"/>
Closeup of the binding site. The annotated binding site residues have the same orientation and conformation. Only R71 has a different orientation in 2O4H, since it interacts with the ligand which has been cocrystallized in the structure. Green = 2O4H, Cyan = 2I3C, Yellow =2Q51.

</figure>

</figtable>

Only for residues 158-164, that form a loop which is involved in opening and closing the channel, there are major differences. In <xr id="aspa_superpos_loop"/> the different conformations of this gating loop are emphasized. 2O53 and 2O4H both represent the closed conformation (see <xr id="aspa_superpos_closed"/>), whereas 2I3C and 2Q51 represent the open conformation (see <xr id="aspa_superpos_open"/>).

<xr nolink id="aspa_superpos_loop"/>
Emphasizing the different orientation of the loop formed by residues 158-164. 2O53 and 2O4H show the protein in the closed conformation whereas 2Q51 and 2I3C show the protein in the open conformation. 2O53in pink, 2O4H in green, 2I3C in cyan and 2Q51 in yellow. Only in loop regions there are some small deviations.

</figure>

<xr nolink id="aspa_superpos_closed"/>
Surface representation of 2O4H: entrance to binding site is closed.

</figure>

<xr nolink id="aspa_superpos_open"/>
Surface representation of 2Q51: entrance to binding site is open. The ligand in the binding site of 2O4H is shown in green.

</figure>

</figtable>

We decided to use 2O53 and 2O4H, since they are the two latest structures with the best resolution. Furthermore they have been solved by the same group, once with bound ligand and once in the apo-form. Yet, as a reference structure we decided to use 2O53. When comparing the residues in and around the binding sites of 2O53 and 2O4H, there are slight differences. Because most templates do not have a cocrystallized ligand, the models based on these templates will obviously be more similar to the Aspartoacylase structure without bound ligand.

For taking a closer look at both crystal structures of Aspartoacylase, which might be helpful for the following analysis, please refer to Task 1 > Protein.

Modeller

Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)

2GU2 was crystallized as a dimer. We only used the monomer (chain A) as a template for modelling Aspartoacylase.

The two different alignment methods Modeller provides, yielded almost identical results (differences for only the first 6 residues - see protocol for details). We chose the align-2D Alignment.

The model and the native structure can be superimposed with an RMSD of 0.371. Visual inspection of the superposition of the model with the Aspartoacylase structure shows, that there are only minor differences between the two structures (compare <xr id="2gu2_modeller_overall"/>). Some loop regions deviate from the original structure, for example the loop formed by residues 158-164. This is the loop involved in closing and binding the channel, that needs to be very flexible for this purpose. But especially in the most important part, the binding site, the agreement is very good (compare <xr id="2gu2_modeller_binding"/>). Only Y164, which is situated on the flexible loop, has a totally different position in the model.

<xr nolink id="2gu2_modeller_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 2GU2 in blue. There are hardly any differences in the structure

</figure>

<xr nolink id="2gu2_modeller_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. Except for Y164, these important residues have the same location and orientation.

</figure>

</figtable>

The Ramachandran plot (see: File:2gu2 model rama.pdf) for the model detects only one outlier residue: Asn-236 . Therefore, the overal geometry of the model is correct.

Scores

**<xr nolink id="modeller_2gu2_scores"/>** Different Scores for the model created by Modeller based on the template 2GU2
DOPE Score	Weighted Rmsd (302 Ca-atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (18 residues)
-36987.5	0.406	0.9815	0.8982	0.9669	0.248

</figtable>

Template >40%, <80% Seq Id: 3NFZ (Mus musculus)

The two different alignment methods provided by Modeller yielded completely identical results.

Again, the model and the native structure are very similar (compare <xr id="3nfz_modeller_overall"/>). Yet, compared to the model based on template 2GU2, there are more deviations, especially in the loop regions. When taking a closer look at the binding site, one can identify some small differences. R71 and R168 have different orientations as well as Y164 (compare <xr id="3nfz_modeller_binding"/>).

The Ramachandran Plot (see File:3nfz model rama.pdf) for the model identifies 5 outlier residues (131 SER, 148 ALA, 161 SER, 174 PRO, 227 GLU), but still more than 98.4% of the residues lie in allowed regions of the plot.

<xr nolink id="3nfz_modeller_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 3NFZ in yellow. There are some differences for loop regions.

</figure>

<xr nolink id="3nfz_modeller_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. of these residues have different rotamers compared to the native structure.

</figure>

</figtable>

Scores

The generated model has a DOPE score of -34539.4, which is only slightly higher, than for the 2GU2 template. Also the TM, GDT_HA and GDT_TS scores are very high, yet lower than for the 87% sequence identity template 2GU2. Especially both GDT scores are far lower than the 2GU2 score than compared to the TM score, that shows only a small drop.

**<xr nolink id="modeller_3nfz_scores"/>** Different Scores for the model created by Modeller based on the template 3NFZ
DOPE Score	Weighted Rmsd (302 Ca-atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (16 residues)
-34539.4	0.775	0.9641	0.7152 %	0.8932 %	0.370

</figtable>

Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

<xr nolink id="hssp"/>
Schneider and Sander curve regarding safety of homology modelling. We are either at 12% seq ID and 331 alignment length (COMA), or 15% and 250 (HHPred).

</figure>

Protein 1YW4 is a Succinylglutamate desuccinylase, i.e., it belongs to the same Pfam family as Aspartoacylase. Still, the sequence identity is at only 15% at an alignment length of 250 residues (HHPred), and 12% with length 331 (COMA), respectively. Considering the HSSP curve as first presented by Schneider and Sander in 1990 (<xr id="hssp"/>), we are definitely in an unsafe region for homology modelling.

1YW4 was also crystallized as a dimer, and again, we only used the monomer (chain A) as a template for modelling Aspartoacylase. There were some minor differences in the alignment methods, and we chose the align-2D alignment to include secondary structure information.

A superposition of the model and the reference can be found in <xr id="1yw4_modeller_betasheets"/>. It is easy to see that correct structure modelling failed, and even a seeming similarity - the buried beta-sheets - cannot be taken as correct since they are not the same corresponding sequence regions. The binding site is not even in the same region, but scattered all over the structure (<xr id="1yw4_modeller_binding"/>).

<xr nolink id="1yw4_modeller_betasheets"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 1YW4 in blue. Buried beta sheets are coloured in yellow and orange.

</figure>

<xr nolink id="1yw4_modeller_binding"/>
Binding site residues are coloured in red. In the model (blue), they are scattered all over the structure.

</figure>

</figtable>

Scores

The failure of model building is also reflected in the Scores. The DOPE score is about 10.000 points higher than for 2GU2 and also the RMSD almost reaches 2A. Furthermore TM and GDT scores are very low.

**<xr nolink id="modeller_1yw4_scores"/>** Different Scores for the model created by Modeller based on the template 1YW4
DOPE Score	Weighted Rmsd (257 Ca-atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (16 residues)
-26201.18750	1.899	0.1760	0.0364 %	0.0704 %	no conserved binding site

</figtable>

Template < 30% Seq Id: 1KWM (Human)

1KWM is a procarboxypeptidase of type B, and we wanted to find out about eventual similarities between the two types of proteins since, as stated before, there is a strong reported structural relationship between the two. Therefore, we also performed homology modelling for 1KWM. It comes at a sequence identity of only 11% at an alignment length of 192 amino acids.

However, we found the simliarities to be rather un-striking: again, with a template so unsimilar to the original, homology modelling fails. See <xr id="1kwm_modeller_full"/> and <xr id= "1kwm_modeller_binding"/>.

We tried to improve the model by comparing the Alignments calculated by modeller to those of BLAST, and by looking at the alignments and trying to place binding site residues of the ASPA sequence at same residues for 1KWM. However, the differences to BLAST were small, and trying to correlate according binding site residues often difficult, and including the new alignments into the prediction did not improve the results.

<xr nolink id="1kwm_modeller_full"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 1KWM in blue.

</figure>

<xr nolink id="1kwm_modeller_binding"/>
Aspa binding residues in red, model residues in magenta, again scattered anywhere but in catalytic centre.

</figure>

</figtable>

Scores

Note that, even though at a first glance, the weighted RMSD provided by SAP seems to be not quite so bad, it is only calculated for 187 superimposed residues. The RMSD calculated by TM-Score lies at 19.529 for 302 superimposed residues. As for the other low sequence identitiy template, TM and GDT scores are really low.

**<xr nolink id="modeller_1kwm_scores"/>** Different Scores for the model created by Modeller based on the template 1KWM
DOPE Score	Weighted Rmsd (187 Ca-atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (16 residues)
-16439.97461	2.146	0.1885	0.0430 %	0.0745 %	no conserved binding site

</figtable>

SwissModell

Template > 80% Seq Id: 2GU2(Rattus Norvegicus)

SwissModel automatically identifies 2GU2 as a dimer and also creates a dimer model from the Aspartoacylase sequence using chain B of the 2GU2 structure:

"To build the complex the following chains of the complex has been additionally identified: 2gu2B"

SwissModel also included the Zinc ligand:

All the residues interacting with the ligand are completely conserved between model and template.
The RMSD between the interacting residues of model and template is lesser than two: 0.080

The model created by SwissModel is almost identical to the native structure. The secondary structure elements can be superposed very well and only in the loop regions there are some deviations (compare <xr id="2gu2_sw_overall"/>). Once again, especially the channel gating loop, formed by residues 158-164 shows the largest deviations. The agreement between the two binding sites is also very good. As for the Modeller model, Y164 is located on the flexible channel entrance loop and has a completely different positon than compared to the reference crystal structure of Aspartoacylse. Arg71 has a slightly different orientation as well(compare <xr id="2gu2_sw_binding"/>).

<xr nolink id="2gu2_sw_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by SwissModel based on the template 2GU2 in blue. There are hardly any differences in the structure

</figure>

<xr nolink id="2gu2_sw_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. Except for Arg71 and Y164, these important residues have about the same location and orientation.

</figure>

</figtable>

Scores

SwissModel provides its own SwissModel Score. It is -30164.225 KJ/mol for the model. Both RMSDs are very low, and not significantly higher than for the Modeller model generated based on 2GU2. The same holds true for the TM and the GDT scores.

**<xr nolink id="sm_2gu2_scores"/>** Different Scores for the model created by SwissModel based on the template 2GU2
SwissModel Score	Weighted Rmsd (301 Ca - atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (18 residues)
-30164.225	0.412	0.9788	0.8957 %	0.9652 %	0.255

</figtable>

Template >40%, <80% Seq Id: 3NFZ (Mus musculus)

In this case, the model was created as a monomer, although the template is annotated as a dimer:

"The target and template sequences are too diverse (seqid: 42.295) to infer a conservation of the oligomeric state"

SwissModel does not include any ligands from the template for modelling either:

"CL321: The ligand is farther than 3 Angstroem from the template, so it is assumed that they are not interacting."
"Given the properties calculated previously, the ligand A.CL321 will not be included in the final model."

There are some deviations, especially in loop regions, between the model and the native structure. Yet, the overal fold and shape is modelled correctly (see <xr id="3nfz_sw_overall"/>). When taking a closer look at the binding site one can find some residues, that have altered orientations: Y164, E178, R71 (see <xr id="3nfz_sw_binding"/>). In the 3NFZ model generated by Modeler, the deviations are similar (residues Y164, R71 and R168).

<xr nolink id="3nfz_sw_overall"/>
Superposition of Aspartoacylase (PDB:2O53) in green and the model created by SwissModel based on the template 3nfz in yellow. There are some deviations between the structures especially in loop regions.

</figure>

<xr nolink id="3nfz_sw_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. R71, R168, E178 have altered orientations in the model. Y164 even has a different location due to altered loop positioning.

</figure>

</figtable>

Scores

The total energy of the model is calculated to be -10093.196 KJ/mol, which is three times worse than for the 2GU2 model. In case of the models generated by Modeler, the differences in the DOPE score have not been so immense (-36900 vs.-34500). Yet, the RMSD values are almost identical compared to the Modeller result and also the TM and GDT scores are similar.

**<xr nolink id="sm_3nfz_scores"/>** Different Scores for the model created by SwissModel based on the template 3NFZ
SwissModel Score	Weighted Rmsd (302 Ca-atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (21 residues)
-10093.196	0.750	0.9665	0.7285 %	0.9015 %	0.414

</figtable>

Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

With default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.

Template < 30% Seq Id: 1KWM (Human)

Again, with default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.

I-Tasser

Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)

I-Tasser modelled Aspartoacylase as a dimer, since the given template 2GU2 was crystallized as a dimer as well.

The model in general looks very similar to the reference structure of Aspartoacylase, except for the already mentioned loop regions. But when taking a closer look at the binding site, one can find that the orientation of all of the side chains is slightly different to the the reference structure. This is in contrast to the models created by Modeller and SwissModel, where there are deviations for only some of the binding site residues.

<xr nolink id="superposition_itasser_2gu2_aspa"/>
Superposition of Aspartoacylase (PDB:2O53) with N-terminal in green and C-terminal in blue, and the model created by SwissModel based on the template 1KWM in magenta. The 1KWM Human procarboxypeptidase B can be seen as the subset (N-terminal) of Aspartoacylase.

</figure>

<xr nolink id="itasser_2gu2_aspa2_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. There are slight deviations in orientation for lle of the binding site residiues when compared to the reference binding site residues of Aspartoacylase.

</figure>

</figtable>

Scores

The deviation of residues in the binding site is also reflected in RMSD values, that are higher for this model than for the other two models. E.g. the binding site RMSD is 2.9 compared to 2.4 and 2.5 for Modeller and SwissModel respectively, and the weighted RMSD is 0.55 compared to 0.4 and 0.41 for Modeller and SwissModel respectively. Yet, the TM and GDT scores do not reflect this loss in accuracy.

**<xr nolink id="it_2gu2_scores"/>** Different Scores for the model created by I-Tassser based on the template 2GU2
C-Score	Weighted Rmsd (302 Ca - atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (18 residues)
1.81	0.552	0.9747	0.8278 %	0.9470 %	0.297

</figtable>

Template > 40%, <80% Seq Id: 3NFZ (Mus musculus)

I-tasser generated a monomeric model, that shows already some deviations to the reference structure of 2O53, when visually inspecting the model (see <xr id="superposition_itasser_3nfz_aspa"/>). Especially loop regions deviate and also some secondary structure elements can not be superposed without errors.

When taking a closer look at the binding site, the deviations become even more obviuous (see <xr id="itasser_3nfz_aspa_binding"/>). Y164 is once again located very differently from the native Y164 position. This is due to the incorrect modelling of the channel closing loop, upon which Y164 is located. The other binding site residues are aligned and positioned correctly, but with different residue conformations. Thus the binding site has already a different shape than compared with the reference structure of 2O53.

<xr nolink id="superposition_itasser_3nfz_aspa"/>
Superposition of Aspartoacylase (PDB:2O53) in green with the model generated by I-tasser based on template 3NFZ. There are some larger deviations in loop regions. Yet, the overal structure with the secondary elements is the same.

</figure>

<xr nolink id="itasser_3nfz_aspa_binding"/>
Closeup of the binding site. The known residues involved in substrate and zinc binding are shown. There are major deviations in orientation for all of the binding site residiues when compared to the reference binding site residues of Aspartoacylase.

</figure>

</figtable>

Scores

As for the Rat template, I-tasser generates worse results than Modeller and SwissModel. This is reflected in all scores. The weighted RMSD is higher (0.9) than compared to the Modeller (0.7) and SwissModel (0.7) models, as well as the binding site RMSD (0.7 compared to 0.37 and 0.4 respectively). The same trend can be seen in the GDT_HA score that is significantly smaller than for the models generated by the other two methods. GDT_TS and TM-score there against do not reflect this observation: both scores are similar to those of the other two models.

**<xr nolink id="it_3nfz_scores"/>** Different Scores for the model created by I-Tassser based on the template 3NFZ
C-Score	Weighted Rmsd (302 Ca - atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (21 residues)
1.55	0.915	0.9592	0.6714 %	0.8725 %	0.709

</figtable>

Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

Given the little sequence similarity and the results of the methods applied before, we were surprised to find that the I-Tasser models managed to capture a large part of the secondary structure arrangements of an Aspartoacylase monomer as can be seen in <xr id="1yw4_itasser_aspa"/>. We re-ran I-Tasser to make sure we had not forgotten to exclude helpful templates with >30% sequence identity, and obtained the same - very decent - models again. A closeup of the binding site shows, however, that the exact position of the residues involved in substrate binding are not modelled correctly.

<xr nolink id="1yw4_itasser_aspa"/>
Superposition of Aspartoacylase (PDB:2O53) in green with the model generated by I-tasser based on template 1YW4. Many overall secondary structure elements are modelled correctly.

</figure>

<xr nolink id="1yw4_itasser_aspa_binding"/>
Closeup of the binding site of Aspartoacylase. When it comes to exact position of binding residues, however, the model is not reliable.

</figure>

</figtable>

Scores

**<xr nolink id="it_1yw4_scores"/>** Different Scores for the model created by I-Tassser based on the template 1yw4
C-Score	Weighted Rmsd (209 Ca - atoms)	TM-Score	GDT_HA	GDT_TS	Binding Site RMSD
-0.35	4.685	0.6353	0.2404 %	0.4321 %	3.835

</figtable>

Template < 30% Seq Id: 1KWM (Human)

For the template 1KWM, the model did not show as good results as 1YW4, but rather what could have been expected from such low sequence identity. <xr id = "1kwm_itasser_aspa"/> shows that some helices and few beta sheets are in a similar position, but the overall correctness is not high enough to be able to use it as a model. The binding site in <xr id ="1kwm_itasser_aspa_binding"/> is not captured, which is reflected in the bad RMSD scores for the binding site residues (table below). However, the model is still much better than for Modeller, since some secondary structure elements are similar, and binding site residues are at least positioned somewhere close to the binding site, and not scattered all over the structure.

<xr nolink id="1kwm_itasser_aspa"/>
Superposition of Aspartoacylase (PDB:2O53) in green with the model generated by I-tasser based on template 1KWM. The models differ greatly from each other.

</figure>

<xr nolink id="1kwm_itasser_aspa_binding"/>
Closeup of the binding site of Aspartoacylase.

</figure>

</figtable>

Scores

**<xr nolink id="it_1kwm_scores"/>** Different Scores for the model created by I-Tassser based on the template 1KWM
C-Score	Weighted Rmsd (186 Ca - atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (21 residues)
-2.98	9.525	0.50	0.2136 %	0.3510 %	6.03

</figtable>

Modeller - Multimodel from templates with < 30% seq ID

We did not really expect the combined models from the < 30% templates to really improve the overall model. When first aligning the combined model to our reference structure, we were surprised to find many structural elements to be actually captured in the model (<xr id="multimodel30_all"/>). But looking at the binding residues (<xr id="multimodel_binding"/>, we found that the apparent similarities did not include binding site residues and are therefore practically meaningless.

<xr nolink id="multimodel30_all"/>
Multimodel in blue, X-Ray aspartoacylase structure in green. Some overall structural elements seem to be well-captured. However, the figure on the right shows that the binding site residues are again scattered over the entire sequence.

</figure>

<xr nolink id="multimodel_binding"/>
Residues involved in binding the substrate are coloured in red for X-Ray aspartoacylase (green), and in magenta for the multimodel (blue).

</figure>

</figtable>

Editing the Alignments

For 2GU2 and 3NFZ, we did not perform manual alignment modifications since the models can hardly be improved - mostly, loop and side chain orientations differed for these models, something that cannot be improved by an aligmnet.

We had actually higher hopes to achieve a usable model for 1KWM, since Carboxypeptidases are reported to be structurally related to Aspartoacylases. We therefore tried to improve the alignment manually, including a BLAST alignment of the two sequences and trying to map catalytic center positions of Aspartoacylase to the Carboxypeptidase sequence. However, when feeding this manual alignment to modeller, the resulting model looked before, having more loop regions. The scores, however, improved slightly, but not so much as to claim that manual alignment is a safe way to improve performance.

**<xr nolink id="1kwm_manual"/>** Different Scores for the model created by Modeller based on the template 1KWM and manually edited aligment
DOPE Score	Weighted Rmsd (234 Ca - atoms)	TM-Score	GDT_HA	GDT_TS	binding site RMSD (18 residues)
-6289.75244	5.259	0.1960	0.0331 %	0.0654 %	no conserved binding site

</figtable>

Jigsaw

We fed jigsaw our 5 best results, i.e., the 2GU2 models from the three different methods, and 3NFZ from modeller and SwissModel. Jigsaw provided 5 new results for us, in which the differences were small. The resulting model energies ranged from -438.03 to -431.33 kj/mol, and sequence coverage went from 313 amino acids (which is the complete Aspartoacylase sequence) to 311 amino acids.

Since four of the 5 models were very similar, we only chose one of them for the pictures below for easier comparison, shown in blue. The fifth model differed only in loop regions, and differed more from the original Aspartoacylase than the other four, and is shown in yellow.

For the binding site, orientations of Tyr164 and and Arg168 were least correct. For Tyr164, note that it is part of the flexible loop allowing entrance to the binding site.

<xr nolink id="jigsaw_all"/>
Superposition of two jigsaw models (blue and yellow) with the Aspartoacylase reference structure 2O53 (green).

</figure>

<xr nolink id="jigsaw_binding"/>
Binding site closeup. Colours as before.

</figure>

</figtable>

Evaluation of Methods

2GU2

All three methods generated very accurate models, having RMSD values of 0.406(Modeller), 0.412(SwissModel) and 0.552(I-Tasser). Based on the RMSD, Modeller generates the best model, whereas I-tasser generates the least accurate one. The same ranking is reflected by the other scores (see <xr id="eval_2gu2_distri"/>). TM_Score and the GDT Scores are as well very high, proving the quality of the models.

Also a visual inspection of the models shows their accuracy, as can be seen in <xr id="eval_2gu2_overall"/>. Here, all models are superposed with the reference structure 2O53. Only in some loop regions, and especially for the very flexible channel gating loop, there are some deviations from the crystal structure (see <xr id="eval_2gu2_loop"/> ).

<xr nolink id="eval_2gu2_distri"/>
Distribution of the TM- and GDT Scores for the models of Modeller, SwissModel and I-tasser based on template 2GU2.

</figure>

<xr nolink id="eval_2gu2_overall"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53.yellow=2O53, green=I-tasser, magenta=SwissModel, cyan=Modeller

</figure>

<xr nolink id="eval_2gu2_loop"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53. The loop gating the entrance to the binding site (residues 158-164) is emphasized. yellow=2O53, green=I-tasser, magenta=SwissModel, cyan=Modeller

</figure>

</figtable>

When taking a closer look at the binding site, one can see, that all models chose a different conformer for Y164 (see <xr id="eval_2gu2_y164"/> ). This correlates with the different positioning of the channel gating loop, on which Y164 is located. For the other binding site residues, there are only minor deviations. Yet, it is clearly obvious, that the binding site residues in the I-tasser model have the most deviations compared to the reference structure (see <xr id="eval_2gu2_r63"/> - <xr id="eval_2gu2_e178"/>). This is also reflected in the binding site RMSD, that has the highest value in case of the I-tasser model (Modeller: 0.248, SwissModel: 0.255, I-tasser: 0.297).

<xr nolink id="eval_2gu2_r63"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R63.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller

</figure>

<xr nolink id="eval_2gu2_r71"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R71.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller

</figure>

<xr nolink id="eval_2gu2_r168"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R168.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller

</figure>

<xr nolink id="eval_2gu2_y164"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on R164.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller

</figure>

<xr nolink id="eval_2gu2_e178"/>
Superposition of the three models with the Aspartoacylase reference structure 2O53 and focus on E178.green=2O53, yellow=I-tasser, magenta=SwissModel, cyan=Modeller

</figure>

</figtable>

In Conclusion, all generated homology models based on 2GU2 are very accurate. This goes along with the expactations, since the template shares over 80% of the residues with the target sequence. The differences between the models are very little. Yet, I-tasser has the largest differences for the binding site residues which is also reflected in all of the evaluation scores.

3FNZ

For this template from mus musculus with 40% sequence identity, Modeller and SwissModel generated very accurate results. I-tasser generated a model, that in overal has all the secondary structure elements in common with the reference structure, but deviates in the details, like all loops and residue orientations. The RMSD values for the 3NFZ models from Modeller and SwissModel are very low (Modeller: 0.775, SwissModel: 0.75) and for I-tasser bit higher (0.915). Yet, compared to the template 2GU2, they are about 0.3 (0.5 respectively) points higher, which indicates slightly less accurate models.

As can be seen in <xr id="eval_3nfz_overall"/>, visually there are only some deviations in the loop regions. Again, as for the 2GU2 template, residues 158-164 forming this flexible channel entrance loop, show the largest deviation compared to the reference structure of Aspartoacylase.

In the binding site, the orientation of some residues differs from the reference residue orientation. According to this observation, the binding site RMSD is slightly higher (about 0.15, respectively 0.45 higher for the I-tasser model), than for the 2GU2 model, where there have been almost no deviations. For example, Y164 is positined on the channel closing loop which is incorrectly modelled in all three cases (see <xr id="eval_3nfz_binding"/>). Furthermore, I-tasser has a different rotamer for R71 and Modeller predicts a different orientation for R168 (see <xr id="eval_3nfz_binding2"/>).

This qulitative loss of the models based on 3NFZ compared to the models based on 2GU2 is also reflected in the GDT_HA scores. The GDT_HA score is much lower for all 3NFZ models, than for the 2GU2 models. In contrast, the TM_Score for both template models is about the same (average TM_score 2GU2: 0.978, 3NFZ: 0.965) and the GDT_TS score is only slightly lower for the 3NFZ models than for the 2GU2 models (average GDT_TS 2GU2: 0.959, 3NFZ: 0.87).

<xr nolink id="eval_3nfz_distri"/>
Distribution of the TM and GDT Scores for the models based on template 3NFZ.

</figure>

<xr nolink id="eval_3nfz_overall"/>
Superposition of the Modeller and SwissModel models based on template 3NFZ with the Aspartoacylase reference structure. green=2O53, magenta=SwissModel, blue=Modeller

</figure>

<xr nolink id="eval_3nfz_binding"/>
Superposition of 3NFz based models with the Aspartoacylase reference structure 2O53. The orientation of the residues in the binding site differs in many cases from the original. green=2O53, magenta=SwissModel, blue=Modeller, yellow=I-tasser

</figure>

<xr nolink id="eval_3nfz_binding2"/>
Superposition of 3NFz based models with the Aspartoacylase reference structure 2O53. The orientation of the residues in the binding site differs in many cases from the original. green=2O53, magenta=SwissModel, blue=Modeller, yellow=I-tasser

</figure>

</figtable>

1KWM and 1YW4

For 1KWM and 1YW4, SwissModel does not create a model at all, given the low similarity of template and target.

The I-Tasser model for 1YW4 was very good, given the little sequence identity, but is difficult if exact simulations need to be done, because the binding site residues are not modelled correctly. The I-Tasser model for 1KWM was correct only when looking very roughly at overall secondary structure elements, but apart from that, showed major differences to the original.

So, the modeller and SwissModel results already indicate what one could expect: if the chosen template is not of good quality, homology modelling is unreliable and not a safe tool to use.

General conclusions

Regarding the methods

SwissModel and Modeller generated very similar models for the chosen templates. The scores indicate that the models from Modeller are marginally better than those from SwissModel. For I-Tasser we did not receive all results yet, which is why it might be wrong to already come to a conclusion. Overall, the generated models are similar to the models generated by SwissModel and Modeller. Yet, as the scores indicate as well, the I-Tasser models have larger deviations when it comes to residue orientations. This result is somewhat dissapointing, since I-Tasser is dealt to be the best method out there.

Regarding the scores

In our case, the RMSD values are a good measure for model quality as is the more restrictive RMSD value of 6 Angstrom binding site residues. One should note that, while the bad template models and results (<30%) seem to have good RMSDs at first sight, these are only based on the actual common residues, which can go down to 187 residues. The TM-Score does not give a real discrimination for the models based on a 80% template and a only 40% template. There against, the GDT scores discriminate a better, especially the GDT_HA score (see <xr id="comparison_scores_rat_mouse"/>).

<xr nolink id="comparison_scores_rat_mouse"/>
Comparison of the average TM- and GDT Scores of the three generated models for the templates 2GU2 and 3NFZ. 3NFZ has lower average scores than 2GU2, yet the TM_Scores are fairly identical and only the GDT scores differ significantly.

</figure>

</figtable>

I-tassser C-Score:

For 2GU2 and 3NFZ the C-score gives a good hint, on which is the better model:

2GU2: 1.81
3NFZ: 1.55

Since, 2 is defined as the score for an identical model and the C-Score ranges down to -5, a score over 1.5 declares the model to be very accurate, what is correct in our case.

SwissModel Score:

For 2GU2 and 3NFZ the SwissModel Score also gives a good hint, on which is the better model:

2GU2: -30164.225
3NFZ: -10093.196

The score for the 3NFZ model is almost three times higher than for the 2GU2 model. This ratio is somewhat unappropriate regarding the good quality of both models.

Modeller Dope Score: The Dope Score also describes the quality of the models very well:

2GU2: -36987.5
3NFZ: -34539.4
1YW4: -26201.2
1KWM: -16440

The models based on 2GU2 and 3NFZ are both very accurate and thus have similar good Dope Scores, whereas the other both models are of really bad quality which is well reflected in the Dope Score.

Regarding the strategy

It is known that Homology Modelling relies heavily on choosing a good template. It was therefore not very surprising to find that the models produced by templates of little similarity to our protein were sometimes plain wrong, even though the I-Tasser models were surprisingly good. For the Modeller results, manual alignment did - in our case and with the approach we tried - nothing to improve this.

References

Canavan Task 4 - Homology based structure predictions

Contents

Protocol

Template Identification

Comparison of Aspartoacylase Structures

Modeller

Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)

Scores

Template >40%, <80% Seq Id: 3NFZ (Mus musculus)

Scores

Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

Scores

Template < 30% Seq Id: 1KWM (Human)

Scores

SwissModell

Template > 80% Seq Id: 2GU2(Rattus Norvegicus)

Scores

Template >40%, <80% Seq Id: 3NFZ (Mus musculus)

Scores

Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

Template < 30% Seq Id: 1KWM (Human)

I-Tasser

Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)

Scores

Template > 40%, <80% Seq Id: 3NFZ (Mus musculus)

Scores

Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)

Scores

Template < 30% Seq Id: 1KWM (Human)

Scores

Modeller - Multimodel from templates with < 30% seq ID

Editing the Alignments

Jigsaw

Evaluation of Methods

2GU2

3FNZ

1KWM and 1YW4

General conclusions

Regarding the methods

Regarding the scores

Regarding the strategy

References

Navigation menu

Search