Canavan Task 4 - Homology based structure predictions
So you see, if you fall into a lion's pit, the reason the lion will tear you to pieces is not because it's hungry—be assured, zoo animals are amply fed—or because it's bloodthirsty, but because you've invaded its territory. As an aside, that is why a circus trainer must always enter the lion ring first, and in full sight of the lions. In doing so, he establishes that the ring is his territory, not theirs, a notion that he reinforces by shouting, by stomping about, by snapping his whip. The lions are impressed. Their disadvantage weighs heavily on them. Notice how they come in: mighty predators though they are, "kings of beasts", they crawl in with their tails low and they keep to the edges of the ring, which is always round so that they have nowhere to hide. They are in the presence of a strongly dominant male, a super-alpha male, and they must submit to his dominance rituals. So they open their jaws wide, they sit up, they jump through paper-covered hoops, they crawl through tubes, they walk backwards, they roll over. "He's a queer one," they think dimly. "Never seen a top lion like him. But he runs a good pride. The larder's always full and—let's be honest, mates—his antics keep us busy. Napping all the time does get a bit boring. At least we're not riding bicycles like the brown bears or catching flying plates like the chimps." From 'Life of Pi' by Yann Martel.
Contents
Protocol
Commands, Source Code and other methodocial issues are kept in the protocol.
Template Identification
<figtable id="templates">
HHPRED | COMA | ||||||||||
PDB ID | Organism | Protein Name | Seq ID | Alignment length | PDB ID | Organism | Protein Name | Seq ID | Alignment length | ||
Seq Id > 80 % | 2GU2 | Rattus Norvegicus | ASPA protein | 87% | 306 | ||||||
Seq Id 40 - 80% | 3NH4 | Mus musculus | ASPA protein | 43 | 304 | 3NFZ | Mus musculus | Aspartoacylase-2 | 42 | 300 | |
Seq Id < 30% | 1YW4 | Chromobacterium violaceum | Succinylglutamate desuccinylase | 15 | 250 | 1YW4 | Chromobacterium violaceum | Succinylglutamate desuccinylase | 12 | 331 | |
Seq Id < 30% | 3CDX | Rhodobacter sphaeroides 2 | Succinylglutamatedesuccinylase/aspartoacylase | 15 | 251 | 3CDX | Rhodobacter sphaeroides 2 | Succinylglutamatedesuccinylase/aspartoacylase | 12 | 330 | |
Seq Id < 30% | 2QJ8 | Mesorhizobium loti | hydrolase | 21 | 261 | 2QJ8 | mesorhizobium loti | Mlr6093 protein | 15 | 314 | |
Seq Id < 30% | 1KWM | Human | Procarboxypeptidase B | 11 | 192 | 3glj | Sus scrofa(Pig) | Carboxypeptidase B | 8 | 262 |
</figtable>
With HHPred we received 42 hits using standard parameters. Changing the E-Value threshold for MSA generation and the number of sequences to be shown per HMM, did not result in more hits. There is at least one hit for each sequence identity category.
COMA yielded 22 results, out of which only one structure has an sequence identity of more than 40%. Interestingly, COMA did not find the highest ranked hit from the HHPred output (2GU2 87%). In general, one can say that COMA generates longer alingments with respective lower sequence identities. Running COMA with less restricive E-Values results in more diverse hits, eg several Carboxypeptidases.
It is stated in <ref name="pnas_aspa_structure">Eduard Bitto, Craig A. Bingman, Gary E. Wesenberg, Jason G. McCoy, and George N. Phillips, Jr., Structure of aspartoacylase, the brain enzyme impaired in Canavan disease, Proc Natl Acad Sci U S A. 2007 January 9; 104(2): 456–461. </ref>, that the "N-terminal domain of aspartoacylase adopts a protein fold similar to that of zinc-dependent hydrolases related to carboxypeptidases A. The catalytic site of aspartoacylase shows close structural similarity to those of carboxypeptidases despite only 10–13% sequence identity between these proteins". Therefore it is reasonable to find several carboxypeptidases as hits with low sequence identity within the results of HHPRed as well as of COMA.
We decided to use those templates, that have been found by both methods (see <xr id="templates"/>).
Comparison of Aspartoacylase Structures
In order to be able to assess the quality of the homology models, we will shortly introduce the Aspartoacylase structure. In the PDB there are several structures of the human aspartoacylase.
- Apo-structure: 2O53: Resolution: 2,7 R-free: 0,269 chains: A,B
- Holo-structure: 2O4H: Resolution: 2,7 R-free: 0,271 chains: A,B intermediate substrate analog: N-phosphonomethyl-L-aspartate
- Apo-structure: 2I3C: Resolution: 2,8 R-free: 0,243 chains: A,B
- Ensemble Refinement 2Q51: Resolution: 2,8 R-free: 0,239 chains: A,B
Superpositioning of the four different crystal structures results in low RMSD values. When visually inspecting the superpositioning, one can also hardly identify any differences. In 2Q51 the beta sheet formed by residues 218-223 and 299-306 is not represented as a sheet in Pymol, which means, that there are some slight angle deviations from the orderly definition of a beta sheet.
<figtable id="aspa_structures_superposed" >
<figure id="aspa_superpos"></figure> | <figure id="aspa_superpos_binding"></figure> |
</figtable>
Only for residues 158-164, that form a loop which is involved in opening and closing the channel, there are major differences. 2O53 and 2O4H both represent the closed conformation, whereas 2I3C and 2Q51 represent the open conformation.
<figtable id="open_and_close">
<figure id="aspa_superpos_loop"></figure> | <figure id="aspa_superpos_closed"></figure> | <figure id="aspa_superpos_open"></figure> |
</figtable>
We decided to use 2O53 and 2O4H, since they are the two latest structures with the best resolution. Furthermore they have been solved by the same group, once with bound ligand and once in the apo-form.Yet, as a reference structure we decided to use 2O53. Whe comparing the residues in and around the binding sites of 2O53 and 2O4H, there are slight differences. Because most templates do not have a cocrystallized ligand, the models based on these templates will obviously be more similar to the Aspartoacylase structure without bound ligand.
Analysis of the Aspartoacylase structures
<figtable id="aspa_structure">
<figure id="aspa_domains"></figure> |
Concerning <xr id="aspa_domains"/>:
|
|
<figure id="aspa_zinc"></figure> | <figure id="aspa_zinc_ligplot"></figure> |
Concerning <xr id="aspa_zinc"/>:
|
<figure id="aspa_binding"></figure> | <figure id="aspa_ligplot"></figure> |
Concerning <xr id="aspa_binding"/>:
|
</figtable>
Modeller
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
2GU2 was crystallized as a dimer. We only used the monomer (chain A) as a template for modelling Aspartoacylase.
The two different alignment methods Modeller provides, yielded almost identical results (differences for only the first 6 residues - see protocoll for details). We chose the align-2D Alignment.
The model and the native structure can be superimposed with an RMSD of 0.371. Visual inspection of the superposition of the model with the Aspartoacylase structure shows, that there are only minor differences between the two structures (compare <xr id="2gu2_modeller_overall"/>). Some loop regions deviate from the original structure. But especially in the most important part, the binding site, the agreement is very good (compare <xr id="2gu2_modeller_binding"/>).
Filename molpdf DOPE score GA341 score ---------------------------------------------------------------------- p45381.B99990001.pdb 1640.80847 -36987.52734 1.00000
<figtable id="2gu2_modeller">
<figure id="2gu2_modeller_overall"></figure> | <figure id="2gu2_modeller_binding"></figure> |
</figtable>
The Ramachandran plot (see: File:2gu2 model rama.pdf) for the model detects only one outlyer residue: Asn-236 . Therefore, the overal geometry of the model is correct.
Scores
<figtable id="modeller_2gu2_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
-36987.5 | 0.406 | 0.9815 | 0.8982 | 0.9669 | 0.248 |
</figtable>
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
The two different alignment methods provided by Modeller yielded completely identical results.
The generated model has a DOPE score of -34539.4, which is only slightly higher, than for the 2GU2 template.
Filename molpdf DOPE score GA341 score ---------------------------------------------------------------------- p45381.B99990001.pdb 1969.36560 -34539.44141 1.00000
The Aspartoacylase model and the native Aspartoacylase structure can be superimposed with an RMSD of 0.938. Again, the model and the native structure are very similar (compare <xr id="3nfz_modeller_overall"/>). Yet, compared to the model based on template 2GU2, there are more deviations, especially in the loop regions. When taking a closer look at the binding site, one can identify some small differences. Arg71 and Arg168 have different orientations as well as Arg178 (compare <xr id="3nfz_modeller_binding"/>).
<figtable id="3nfz_modeller">
<figure id="3nfz_modeller_overall"></figure> | <figure id="3nfz_modeller_binding"></figure> |
</figtable>
The Ramachandran Plot (see File:3nfz model rama.pdf) for the model identifies 5 outlier residues, but still more than 98.4% of the residues lie in allowed regions of the plot :
- 131 SER
- 148 ALA
- 161 SER
- 174 PRO
- 227 GLU
Scores
<figtable id="modeller_3nfz_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (16 residues) |
-34539.4 | 0.775 | 0.9641 | 0.7152 % | 0.8932 % | 0.370 |
</figtable>
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
<figure id="hssp">
</figure>
Protein 1YW4 is a Succinylglutamate desuccinylase, i.e., it belongs to the same Pfam family as Aspartoacylase. Still, the sequence identity is at only 15 at an alignment length of 250 residues (HHPred), and 12 % with length 331 (COMA), respectively. Considering the HSSP curve as first presented by Schneider and Sander in 1990 (<xr id="hssp"/>), we are definitely in an unsafe region for homology modelling.
1YW4 was also crystallized as a dimer, and again, we only used the monomer (chain A) as a template for modelling Aspartoacylase. There were some minor differences in the alignment methods, and we chose the align-2D alignment to include secondary structure information.
SwissModell
Template > 80% Seq Id: 2GU2(Rattus Norvegicus)
Since the template is a homodimer, SwissModel also creates a dimer model from the Aspartoacylase sequence:
- 2gu2 is annotated as DIMER
- The quaternary structure of the target can be assumed to be identical
- To build the complex the following chains of the complex has been additionally identified: 2gu2B
SwissModel also included the Zinc ligand:
- All the residues interacting with the ligand are completely conserved between model and template.
- The RMSD between the interacting residues of model and template is lesser than two: 0.080
The total energy of the model is calculated to be -30164.225 KJ/mol.
The model created by SwissModel is almost identical to the native structure. The secondary structure elements can be superposed very well and only in the loop regions there are some deviations (compare <xr id="2gu2_sw_overall"/>). The RMSD for the superposition of model and original structure is 0.383. The agreement is even better in the binding site, where only Arg71 has a different orientation compared to the crystal structure of the human Aspartoacylase (compare <xr id="2gu2_sw_binding"/>).
<figtable id="2gu2_sw">
<figure id="2gu2_sw_overall"></figure> | <figure id="2gu2_sw_binding"></figure> |
</figtable>
Scores
<figtable id="sm_2gu2_scores">
SwissModel Score | Weighted Rmsd (301 Ca - atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
-30164.225 | 0.412 | 0.9788 | 0.8957 % | 0.9652 % | 0.255 |
</figtable>
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
In this case, the model was created as a monomer, although the template is annotated as a dimer.
- The target and template sequences are too diverse (seqid: 42.295) to infer a conservation of the oligomeric state
SwissModel does not include any ligands from the template for modelling.
- CL321: The ligand is farther than 3 Angstroem from the template, so it is assumed that they are not interacting.
- Given the properties calculated previously, the ligand A.CL321 will not be included in the final model.
The total energy of the model is calculated to be -10093.196 KJ/mol. Thus, the energy is three times higher than for the 2GU2 model.
<figtable id="3nfz_sw">
<figure id="3nfz_sw_overall"></figure> | <figure id="3nfz_sw_binding"></figure> |
</figtable>
Scores
<figtable id="sm_3nfz_scores">
SwissModel Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (21 residues) |
-10093.196 | 0.750 | 0.9665 | 0.9015 % | 0.7285 % | 0.414 |
</figtable>
I-Tasser
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
2GU2 was crystallized as a dimer. We only used the monomer (chain A) as a template for modelling Aspartoacylase.
Scores
<figtable id="it_2gu2_scores">
C-Score | Weighted Rmsd (302 Ca - atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
1.81 | 0.552 | 0.9747 | 0.8278 % | 0.9470 % |
</figtable>