Difference between revisions of "Canavan Task 4 - Homology based structure predictions"
(→Modeller) |
(→Template < 30% Seq Id: 1KWM (Human)) |
||
Line 445: | Line 445: | ||
=== Template < 30% Seq Id: 1KWM (Human) === |
=== Template < 30% Seq Id: 1KWM (Human) === |
||
+ | |||
+ | <figtable id="1kwm_modeller"> |
||
+ | <table> |
||
+ | <tr> |
||
+ | <td> |
||
+ | <figure id="1kwm_modeller_full">[[File:1kwm_modeller_full.png|thumb|250px|<b><xr nolink id="1kwm_modeller_full"/></b><br> Superposition of Aspartoacylase (PDB:2O53) in green and the model created by Modeller based on the template 1KWM in blue.]]</figure> |
||
+ | </td> |
||
+ | <td> |
||
+ | <figure id="1kwm_modeller_binding">[[File:1kwm_modeller_binding.png|thumb|250px|<b><xr nolink id="1kwm_modeller_binding"/></b><br> Binding site.]]</figure></td> |
||
+ | </tr> |
||
+ | </table> |
||
+ | </figtable> |
||
+ | |||
+ | ====Scores ==== |
||
+ | |||
+ | <figtable id="modeller_1kwm_scores"> |
||
+ | <table cellspacing=0 align="left" cellpadding=5> |
||
+ | <caption align="center"><b><xr nolink id="modeller_1kwm_scores"/></b> Different Scores for the model created by Modeller based on the template 1KWM</caption> |
||
+ | <tr> |
||
+ | <td style="border-bottom:solid;border-right:solid;" align="left">DOPE Score</td> |
||
+ | <td style="border-bottom:solid;border-right:solid;" align="left">Weighted Rmsd (302 Ca-atoms)</td> |
||
+ | <td style="border-bottom:solid;border-right:solid;" align="left">TM-Score</td> |
||
+ | <td style="border-bottom:solid;border-right:solid;" align="left">GDT_HA</td> |
||
+ | <td style="border-bottom:solid;border-right:solid;" align="left">GDT_TS</td> |
||
+ | <td style="border-bottom:solid;border-right:solid;" align="left">binding site RMSD (16 residues)</td> |
||
+ | </tr> |
||
+ | |||
+ | <tr> |
||
+ | <td style="border-right:solid;" align="left">-xx</td> |
||
+ | <td style="border-right:solid;" align="left">xx</td> |
||
+ | <td style="border-right:solid;" align="left">0.1885</td> |
||
+ | <td style="border-right:solid;" align="left">0.0430 %</td> |
||
+ | <td style="border-right:solid;" align="left">0.0745 %</td> |
||
+ | <td style="border-right:solid;" align="left">x</td> |
||
+ | </tr> |
||
+ | </table> |
||
+ | </figtable> |
||
+ | <br><br><br><br><br><br><br><br><br><br> |
||
==SwissModell== |
==SwissModell== |
Revision as of 15:24, 2 June 2012
So you see, if you fall into a lion's pit, the reason the lion will tear you to pieces is not because it's hungry—be assured, zoo animals are amply fed—or because it's bloodthirsty, but because you've invaded its territory. As an aside, that is why a circus trainer must always enter the lion ring first, and in full sight of the lions. In doing so, he establishes that the ring is his territory, not theirs, a notion that he reinforces by shouting, by stomping about, by snapping his whip. The lions are impressed. Their disadvantage weighs heavily on them. Notice how they come in: mighty predators though they are, "kings of beasts", they crawl in with their tails low and they keep to the edges of the ring, which is always round so that they have nowhere to hide. They are in the presence of a strongly dominant male, a super-alpha male, and they must submit to his dominance rituals. So they open their jaws wide, they sit up, they jump through paper-covered hoops, they crawl through tubes, they walk backwards, they roll over. "He's a queer one," they think dimly. "Never seen a top lion like him. But he runs a good pride. The larder's always full and—let's be honest, mates—his antics keep us busy. Napping all the time does get a bit boring. At least we're not riding bicycles like the brown bears or catching flying plates like the chimps." From 'Life of Pi' by Yann Martel.
Contents
Protocol
Commands, Source Code and other methodocial issues are kept in the protocol.
Template Identification
<figtable id="templates">
HHPRED | COMA | ||||||||||
PDB ID | Organism | Protein Name | Seq ID | Alignment length | PDB ID | Organism | Protein Name | Seq ID | Alignment length | ||
Seq Id > 80 % | 2GU2 | Rattus Norvegicus | ASPA protein | 87% | 306 | ||||||
Seq Id 40 - 80% | 3NH4 | Mus musculus | ASPA protein | 43% | 304 | 3NFZ | Mus musculus | Aspartoacylase-2 | 42% | 300 | |
Seq Id < 30% | 1YW4 | Chromobacterium violaceum | Succinylglutamate desuccinylase | 15% | 250 | 1YW4 | Chromobacterium violaceum | Succinylglutamate desuccinylase | 12% | 331 | |
Seq Id < 30% | 3CDX | Rhodobacter sphaeroides 2 | Succinylglutamatedesuccinylase/ aspartoacylase | 15% | 251 | 3CDX | Rhodobacter sphaeroides 2 | Succinylglutamatedesuccinylase/ aspartoacylase | 12% | 330 | |
Seq Id < 30% | 2QJ8 | Mesorhizobium loti | hydrolase | 21% | 261 | 2QJ8 | mesorhizobium loti | Mlr6093 protein | 15% | 314 | |
Seq Id < 30% | 1KWM | Human | Procarboxypeptidase B | 11% | 192 | 3glj | Sus scrofa(Pig) | Carboxypeptidase B | 8% | 262 |
</figtable>
With HHPred we received 42 hits using standard parameters. Changing the E-Value threshold for MSA generation and the number of sequences to be shown per HMM, did not result in more hits. There is at least one hit for each sequence identity category.
COMA yielded 22 results, out of which only one structure has an sequence identity of more than 40%. Interestingly, COMA did not find the highest ranked hit from the HHPred output (2GU2 87%). In general, one can say that COMA generates longer alingments with respective lower sequence identities. Running COMA with less restricive E-Values results in more diverse hits, eg several Carboxypeptidases.
It is stated in <ref name="pnas_aspa_structure">Eduard Bitto, Craig A. Bingman, Gary E. Wesenberg, Jason G. McCoy, and George N. Phillips, Jr., Structure of aspartoacylase, the brain enzyme impaired in Canavan disease, Proc Natl Acad Sci U S A. 2007 January 9; 104(2): 456–461. </ref>, that the "N-terminal domain of aspartoacylase adopts a protein fold similar to that of zinc-dependent hydrolases related to carboxypeptidases A. The catalytic site of aspartoacylase shows close structural similarity to those of carboxypeptidases despite only 10–13% sequence identity between these proteins". Therefore it is reasonable to find several carboxypeptidases as hits with low sequence identity within the results of HHPRed as well as of COMA.
We decided to use those templates, that have been found by both methods (see <xr id="templates"/>).
Comparison of Aspartoacylase Structures
In order to be able to assess the quality of the homology models, we will shortly introduce the Aspartoacylase structure. In the PDB there are several structures of the human aspartoacylase.
- Apo-structure: 2O53: Resolution: 2,7 R-free: 0,269 chains: A,B
- Holo-structure: 2O4H: Resolution: 2,7 R-free: 0,271 chains: A,B intermediate substrate analog: N-phosphonomethyl-L-aspartate
- Apo-structure: 2I3C: Resolution: 2,8 R-free: 0,243 chains: A,B
- Ensemble Refinement 2Q51: Resolution: 2,8 R-free: 0,239 chains: A,B
Superpositioning of the four different crystal structures results in low RMSD values. When visually inspecting the superpositioning, one can also hardly identify any differences. In 2Q51 the beta sheet formed by residues 218-223 and 299-306 is not represented as a sheet in Pymol, which means, that there are some slight angle deviations from the orderly definition of a beta sheet.
<figtable id="aspa_structures_superposed" >
<figure id="aspa_superpos"></figure> | <figure id="aspa_superpos_binding"></figure> |
</figtable>
Only for residues 158-164, that form a loop which is involved in opening and closing the channel, there are major differences. 2O53 and 2O4H both represent the closed conformation, whereas 2I3C and 2Q51 represent the open conformation.
<figtable id="open_and_close">
<figure id="aspa_superpos_loop"></figure> | <figure id="aspa_superpos_closed"></figure> | <figure id="aspa_superpos_open"></figure> |
</figtable>
We decided to use 2O53 and 2O4H, since they are the two latest structures with the best resolution. Furthermore they have been solved by the same group, once with bound ligand and once in the apo-form.Yet, as a reference structure we decided to use 2O53. Whe comparing the residues in and around the binding sites of 2O53 and 2O4H, there are slight differences. Because most templates do not have a cocrystallized ligand, the models based on these templates will obviously be more similar to the Aspartoacylase structure without bound ligand.
Analysis of the Aspartoacylase structures
<figtable id="aspa_structure">
<figure id="aspa_domains"></figure> |
Concerning <xr id="aspa_domains"/>:
|
|
<figure id="aspa_zinc"></figure> | <figure id="aspa_zinc_ligplot"></figure> |
Concerning <xr id="aspa_zinc"/>:
|
<figure id="aspa_binding"></figure> | <figure id="aspa_ligplot"></figure> |
Concerning <xr id="aspa_binding"/>:
|
</figtable>
Modeller
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
2GU2 was crystallized as a dimer. We only used the monomer (chain A) as a template for modelling Aspartoacylase.
The two different alignment methods Modeller provides, yielded almost identical results (differences for only the first 6 residues - see protocoll for details). We chose the align-2D Alignment.
The model and the native structure can be superimposed with an RMSD of 0.371. Visual inspection of the superposition of the model with the Aspartoacylase structure shows, that there are only minor differences between the two structures (compare <xr id="2gu2_modeller_overall"/>). Some loop regions deviate from the original structure, for example the loop formed by residues 158-164. This is the loop involved in closing and binding the channel, that needs to be very flexible for this purpose. But especially in the most important part, the binding site, the agreement is very good (compare <xr id="2gu2_modeller_binding"/>). Only Y164, which is situated on the flexible loop, has a totally different position in the model.
Filename molpdf DOPE score GA341 score ---------------------------------------------------------------------- p45381.B99990001.pdb 1640.80847 -36987.52734 1.00000
<figtable id="2gu2_modeller">
<figure id="2gu2_modeller_overall"></figure> | <figure id="2gu2_modeller_binding"></figure> |
</figtable>
The Ramachandran plot (see: File:2gu2 model rama.pdf) for the model detects only one outlyer residue: Asn-236 . Therefore, the overal geometry of the model is correct.
Scores
<figtable id="modeller_2gu2_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
-36987.5 | 0.406 | 0.9815 | 0.8982 | 0.9669 | 0.248 |
</figtable>
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
The two different alignment methods provided by Modeller yielded completely identical results.
The generated model has a DOPE score of -34539.4, which is only slightly higher, than for the 2GU2 template.
Filename molpdf DOPE score GA341 score ---------------------------------------------------------------------- p45381.B99990001.pdb 1969.36560 -34539.44141 1.00000
The Aspartoacylase model and the native Aspartoacylase structure can be superimposed with an RMSD of 0.938. Again, the model and the native structure are very similar (compare <xr id="3nfz_modeller_overall"/>). Yet, compared to the model based on template 2GU2, there are more deviations, especially in the loop regions. When taking a closer look at the binding site, one can identify some small differences. Arg71 and Arg168 have different orientations as well as Arg178 (compare <xr id="3nfz_modeller_binding"/>).
<figtable id="3nfz_modeller">
<figure id="3nfz_modeller_overall"></figure> | <figure id="3nfz_modeller_binding"></figure> |
</figtable>
The Ramachandran Plot (see File:3nfz model rama.pdf) for the model identifies 5 outlier residues, but still more than 98.4% of the residues lie in allowed regions of the plot :
- 131 SER
- 148 ALA
- 161 SER
- 174 PRO
- 227 GLU
Scores
<figtable id="modeller_3nfz_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (16 residues) |
-34539.4 | 0.775 | 0.9641 | 0.7152 % | 0.8932 % | 0.370 |
</figtable>
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
<figure id="hssp">
</figure>
Protein 1YW4 is a Succinylglutamate desuccinylase, i.e., it belongs to the same Pfam family as Aspartoacylase. Still, the sequence identity is at only 15 at an alignment length of 250 residues (HHPred), and 12 % with length 331 (COMA), respectively. Considering the HSSP curve as first presented by Schneider and Sander in 1990 (<xr id="hssp"/>), we are definitely in an unsafe region for homology modelling.
1YW4 was also crystallized as a dimer, and again, we only used the monomer (chain A) as a template for modelling Aspartoacylase. There were some minor differences in the alignment methods, and we chose the align-2D alignment to include secondary structure information.
A superposition of the model and the reference can be found in <xr id="1yw4_modeller_betasheets"/>. It is easy to see that correct structure modelling failed, and even a seeming similarity - the buried beta-sheets - cannot be taken as correct since they are not the same corresponding sequence regions. The binding site is not even in the same region, but scattered all over the structure (<xr id="1yw4_modeller_binding"/>).
<figtable id="1yw4_modeller">
<figure id="1yw4_modeller_betasheets"></figure> | <figure id="1yw4_modeller_binding"></figure> |
</figtable>
Scores
<figtable id="modeller_1yw4_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (16 residues) |
-xx | xx | 0.1760 | 0.0364 % | 0.0704 % | x |
</figtable>
Template < 30% Seq Id: 1KWM (Human)
<figtable id="1kwm_modeller">
<figure id="1kwm_modeller_full"></figure> | <figure id="1kwm_modeller_binding"></figure> |
</figtable>
Scores
<figtable id="modeller_1kwm_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (16 residues) |
-xx | xx | 0.1885 | 0.0430 % | 0.0745 % | x |
</figtable>
SwissModell
Template > 80% Seq Id: 2GU2(Rattus Norvegicus)
Since the template is a homodimer, SwissModel also creates a dimer model from the Aspartoacylase sequence:
- 2gu2 is annotated as DIMER
- The quaternary structure of the target can be assumed to be identical
- To build the complex the following chains of the complex has been additionally identified: 2gu2B
SwissModel also included the Zinc ligand:
- All the residues interacting with the ligand are completely conserved between model and template.
- The RMSD between the interacting residues of model and template is lesser than two: 0.080
The total energy of the model is calculated to be -30164.225 KJ/mol.
The model created by SwissModel is almost identical to the native structure. The secondary structure elements can be superposed very well and only in the loop regions there are some deviations (compare <xr id="2gu2_sw_overall"/>). Once again, especially the channel gating loop, formed by residues 158-164 shows the largest deviations. The agreement between the two binding sites is also very good. As for the Modeller model, Y164 is located on the flexible channel entrance loop and has a completely different positon than compared to the reference crystal structure of Aspartoacylse. Arg71 has a slightly different orientation as well(compare <xr id="2gu2_sw_binding"/>).
<figtable id="2gu2_sw">
<figure id="2gu2_sw_overall"></figure> | <figure id="2gu2_sw_binding"></figure> |
</figtable>
Scores
<figtable id="sm_2gu2_scores">
SwissModel Score | Weighted Rmsd (301 Ca - atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
-30164.225 | 0.412 | 0.9788 | 0.8957 % | 0.9652 % | 0.255 |
</figtable>
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
In this case, the model was created as a monomer, although the template is annotated as a dimer.
- The target and template sequences are too diverse (seqid: 42.295) to infer a conservation of the oligomeric state
SwissModel does not include any ligands from the template for modelling.
- CL321: The ligand is farther than 3 Angstroem from the template, so it is assumed that they are not interacting.
- Given the properties calculated previously, the ligand A.CL321 will not be included in the final model.
The total energy of the model is calculated to be -10093.196 KJ/mol. Thus, the energy is three times higher than for the 2GU2 model.
<figtable id="3nfz_sw">
<figure id="3nfz_sw_overall"></figure> | <figure id="3nfz_sw_binding"></figure> |
</figtable>
Scores
<figtable id="sm_3nfz_scores">
SwissModel Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (21 residues) |
-10093.196 | 0.750 | 0.9665 | 0.9015 % | 0.7285 % | 0.414 |
</figtable>
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
With default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.
Template < 30% Seq Id: 1KWM (Human)
SwissModel expected this model to be fairly bad, having a QMEAN Z-Score of -6.62. However, looking at the results, we found that the model created by SwissModel has a high consistency with the N-Domain of Aspartoacylase. See <xr id="sm_1kwm_full"/>.
<figtable id="1kwm_sw">
<figure id="sm_1kwm_full"></figure> | <figure id="1kwm_binding"></figure> |
</figtable>
I-Tasser
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
I-Tasser modelled Aspartoacylase as a dimer, alhtough the given template 2GU2 was crystallized as a dimer.
The model in general looks very similar to the reference structure of Aspartoacylase, except for the already mentioned loop regions. But when taking a closer look at the binding site, one can find that the orientation of all of the side chains is slightly different to the the reference structure. This is in contrast to the models created by Modeller and SwissModel, where there are deviations for only some of the binding site residues. This is also reflected in the binding site RMSD, that is higher for this model than for the other two models.
<figtable id="2gu2_itasser">
<figure id="superposition_itasser_2gu2_aspa"></figure> | <figure id="itasser_2gu2_aspa2_binding"></figure> |
</figtable>
Scores
<figtable id="it_2gu2_scores">
C-Score | Weighted Rmsd (302 Ca - atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
1.81 | 0.552 | 0.9747 | 0.8278 % | 0.9470 % |
</figtable>
Modeller - Multimodel from templates with < 30% seq ID
<xr id="multimodel30_all"/>
<figtable id="multimodel">
<figure id="multimodel30_all"></figure> | <figure id="multimodel_binding"></figure> |
</figtable>
Evaluation of Methods
2GU2
All three methods generated very accurate models, having RMSD values between 0.3 and 0.5(??). TM_Score and the GDT Scores are as well very low, proving the quality of the models. Also a visual inspection of the models shows their accuracy, as can be seen in figure (??). Here, all models are superposed with the refernce structure 2O53. Only in some loop regions, and especially for the very flexible channel gating loop, there are some deviations from the crystal structure.
When taking a closer look at the binding site, one can see, that all models chose a different conformer for Y264. This correlates with the different positioning of the chanel gating loop, on which Y264 is located. For the other binding site residues, there are only minor deviations. Yet, it is clearly obviuos, that the binding site residues in the I-tasser model have the most deviations compared to the reference structure.