Difference between revisions of "Canavan Task 4 - Homology based structure predictions"
(→1KWM) |
(→Template < 30% Seq Id: 1KWM (Human)) |
||
Line 469: | Line 469: | ||
<tr> |
<tr> |
||
<td style="border-bottom:solid;border-right:solid;" align="left">DOPE Score</td> |
<td style="border-bottom:solid;border-right:solid;" align="left">DOPE Score</td> |
||
− | <td style="border-bottom:solid;border-right:solid;" align="left">Weighted Rmsd ( |
+ | <td style="border-bottom:solid;border-right:solid;" align="left">Weighted Rmsd (187 Ca-atoms)</td> |
<td style="border-bottom:solid;border-right:solid;" align="left">TM-Score</td> |
<td style="border-bottom:solid;border-right:solid;" align="left">TM-Score</td> |
||
<td style="border-bottom:solid;border-right:solid;" align="left">GDT_HA</td> |
<td style="border-bottom:solid;border-right:solid;" align="left">GDT_HA</td> |
||
Line 478: | Line 478: | ||
<tr> |
<tr> |
||
<td style="border-right:solid;" align="left">-xx</td> |
<td style="border-right:solid;" align="left">-xx</td> |
||
− | <td style="border-right:solid;" align="left"> |
+ | <td style="border-right:solid;" align="left">Weighted RMSd = 2.146</td> |
<td style="border-right:solid;" align="left">0.1885</td> |
<td style="border-right:solid;" align="left">0.1885</td> |
||
<td style="border-right:solid;" align="left">0.0430 %</td> |
<td style="border-right:solid;" align="left">0.0430 %</td> |
Revision as of 07:27, 4 June 2012
So you see, if you fall into a lion's pit, the reason the lion will tear you to pieces is not because it's hungry—be assured, zoo animals are amply fed—or because it's bloodthirsty, but because you've invaded its territory. As an aside, that is why a circus trainer must always enter the lion ring first, and in full sight of the lions. In doing so, he establishes that the ring is his territory, not theirs, a notion that he reinforces by shouting, by stomping about, by snapping his whip. The lions are impressed. Their disadvantage weighs heavily on them. Notice how they come in: mighty predators though they are, "kings of beasts", they crawl in with their tails low and they keep to the edges of the ring, which is always round so that they have nowhere to hide. They are in the presence of a strongly dominant male, a super-alpha male, and they must submit to his dominance rituals. So they open their jaws wide, they sit up, they jump through paper-covered hoops, they crawl through tubes, they walk backwards, they roll over. "He's a queer one," they think dimly. "Never seen a top lion like him. But he runs a good pride. The larder's always full and—let's be honest, mates—his antics keep us busy. Napping all the time does get a bit boring. At least we're not riding bicycles like the brown bears or catching flying plates like the chimps." From 'Life of Pi' by Yann Martel.
Contents
Protocol
Commands, Source Code and other methodocial issues are kept in the protocol.
Template Identification
<figtable id="templates">
HHPRED | COMA | ||||||||||
PDB ID | Organism | Protein Name | Seq ID | Alignment length | PDB ID | Organism | Protein Name | Seq ID | Alignment length | ||
Seq Id > 80 % | 2GU2 | Rattus Norvegicus | ASPA protein | 87% | 306 | ||||||
Seq Id 40 - 80% | 3NH4 | Mus musculus | ASPA protein | 43% | 304 | 3NFZ | Mus musculus | Aspartoacylase-2 | 42% | 300 | |
Seq Id < 30% | 1YW4 | Chromobacterium violaceum | Succinylglutamate desuccinylase | 15% | 250 | 1YW4 | Chromobacterium violaceum | Succinylglutamate desuccinylase | 12% | 331 | |
Seq Id < 30% | 3CDX | Rhodobacter sphaeroides 2 | Succinylglutamatedesuccinylase/ aspartoacylase | 15% | 251 | 3CDX | Rhodobacter sphaeroides 2 | Succinylglutamatedesuccinylase/ aspartoacylase | 12% | 330 | |
Seq Id < 30% | 2QJ8 | Mesorhizobium loti | hydrolase | 21% | 261 | 2QJ8 | mesorhizobium loti | Mlr6093 protein | 15% | 314 | |
Seq Id < 30% | 1KWM | Human | Procarboxypeptidase B | 11% | 192 | 3glj | Sus scrofa(Pig) | Carboxypeptidase B | 8% | 262 |
</figtable>
With HHPred we received 42 hits using standard parameters. Changing the E-Value threshold for MSA generation and the number of sequences to be shown per HMM, did not result in more hits. There is at least one hit for each sequence identity category.
COMA yielded 22 results, out of which only one structure has an sequence identity of more than 40%. Interestingly, COMA did not find the highest ranked hit from the HHPred output (2GU2 87%). In general, one can say that COMA generates longer alingments with respective lower sequence identities. Running COMA with less restricive E-Values results in more diverse hits, eg several Carboxypeptidases.
It is stated in <ref name="pnas_aspa_structure">Eduard Bitto, Craig A. Bingman, Gary E. Wesenberg, Jason G. McCoy, and George N. Phillips, Jr., Structure of aspartoacylase, the brain enzyme impaired in Canavan disease, Proc Natl Acad Sci U S A. 2007 January 9; 104(2): 456–461. </ref>, that the "N-terminal domain of aspartoacylase adopts a protein fold similar to that of zinc-dependent hydrolases related to carboxypeptidases A. The catalytic site of aspartoacylase shows close structural similarity to those of carboxypeptidases despite only 10–13% sequence identity between these proteins". Therefore it is reasonable to find several carboxypeptidases as hits with low sequence identity within the results of HHPRed as well as of COMA.
We decided to use those templates, that have been found by both methods (see <xr id="templates"/>).
Comparison of Aspartoacylase Structures
In order to be able to assess the quality of the homology models, we will shortly introduce the Aspartoacylase structure. In the PDB there are several structures of the human aspartoacylase.
- Apo-structure: 2O53: Resolution: 2,7 R-free: 0,269 chains: A,B
- Holo-structure: 2O4H: Resolution: 2,7 R-free: 0,271 chains: A,B intermediate substrate analog: N-phosphonomethyl-L-aspartate
- Apo-structure: 2I3C: Resolution: 2,8 R-free: 0,243 chains: A,B
- Ensemble Refinement 2Q51: Resolution: 2,8 R-free: 0,239 chains: A,B
Superpositioning of the four different crystal structures results in low RMSD values. When visually inspecting the superpositioning, one can also hardly identify any differences. In 2Q51 the beta sheet formed by residues 218-223 and 299-306 is not represented as a sheet in Pymol, which means, that there are some slight angle deviations from the orderly definition of a beta sheet.
<figtable id="aspa_structures_superposed" >
<figure id="aspa_superpos"></figure> | <figure id="aspa_superpos_binding"></figure> |
</figtable>
Only for residues 158-164, that form a loop which is involved in opening and closing the channel, there are major differences. 2O53 and 2O4H both represent the closed conformation, whereas 2I3C and 2Q51 represent the open conformation.
<figtable id="open_and_close">
<figure id="aspa_superpos_loop"></figure> | <figure id="aspa_superpos_closed"></figure> | <figure id="aspa_superpos_open"></figure> |
</figtable>
We decided to use 2O53 and 2O4H, since they are the two latest structures with the best resolution. Furthermore they have been solved by the same group, once with bound ligand and once in the apo-form.Yet, as a reference structure we decided to use 2O53. Whe comparing the residues in and around the binding sites of 2O53 and 2O4H, there are slight differences. Because most templates do not have a cocrystallized ligand, the models based on these templates will obviously be more similar to the Aspartoacylase structure without bound ligand.
Analysis of the Aspartoacylase structures
<figtable id="aspa_structure">
<figure id="aspa_domains"></figure> |
Concerning <xr id="aspa_domains"/>:
|
|
<figure id="aspa_zinc"></figure> | <figure id="aspa_zinc_ligplot"></figure> |
Concerning <xr id="aspa_zinc"/>:
|
<figure id="aspa_binding"></figure> | <figure id="aspa_ligplot"></figure> |
Concerning <xr id="aspa_binding"/>:
|
</figtable>
Modeller
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
2GU2 was crystallized as a dimer. We only used the monomer (chain A) as a template for modelling Aspartoacylase.
The two different alignment methods Modeller provides, yielded almost identical results (differences for only the first 6 residues - see protocoll for details). We chose the align-2D Alignment.
The model and the native structure can be superimposed with an RMSD of 0.371. Visual inspection of the superposition of the model with the Aspartoacylase structure shows, that there are only minor differences between the two structures (compare <xr id="2gu2_modeller_overall"/>). Some loop regions deviate from the original structure, for example the loop formed by residues 158-164. This is the loop involved in closing and binding the channel, that needs to be very flexible for this purpose. But especially in the most important part, the binding site, the agreement is very good (compare <xr id="2gu2_modeller_binding"/>). Only Y164, which is situated on the flexible loop, has a totally different position in the model.
Filename molpdf DOPE score GA341 score ---------------------------------------------------------------------- p45381.B99990001.pdb 1640.80847 -36987.52734 1.00000
<figtable id="2gu2_modeller">
<figure id="2gu2_modeller_overall"></figure> | <figure id="2gu2_modeller_binding"></figure> |
</figtable>
The Ramachandran plot (see: File:2gu2 model rama.pdf) for the model detects only one outlyer residue: Asn-236 . Therefore, the overal geometry of the model is correct.
Scores
<figtable id="modeller_2gu2_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
-36987.5 | 0.406 | 0.9815 | 0.8982 | 0.9669 | 0.248 |
</figtable>
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
The two different alignment methods provided by Modeller yielded completely identical results.
The generated model has a DOPE score of -34539.4, which is only slightly higher, than for the 2GU2 template.
Filename molpdf DOPE score GA341 score ---------------------------------------------------------------------- p45381.B99990001.pdb 1969.36560 -34539.44141 1.00000
Again, the model and the native structure are very similar (compare <xr id="3nfz_modeller_overall"/>). Yet, compared to the model based on template 2GU2, there are more deviations, especially in the loop regions. When taking a closer look at the binding site, one can identify some small differences. R71 and R168 have different orientations as well as Y164 (compare <xr id="3nfz_modeller_binding"/>).
<figtable id="3nfz_modeller">
<figure id="3nfz_modeller_overall"></figure> | <figure id="3nfz_modeller_binding"></figure> |
</figtable>
The Ramachandran Plot (see File:3nfz model rama.pdf) for the model identifies 5 outlier residues, but still more than 98.4% of the residues lie in allowed regions of the plot :
- 131 SER
- 148 ALA
- 161 SER
- 174 PRO
- 227 GLU
Scores
<figtable id="modeller_3nfz_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (16 residues) |
-34539.4 | 0.775 | 0.9641 | 0.7152 % | 0.8932 % | 0.370 |
</figtable>
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
<figure id="hssp">
</figure>
Protein 1YW4 is a Succinylglutamate desuccinylase, i.e., it belongs to the same Pfam family as Aspartoacylase. Still, the sequence identity is at only 15 at an alignment length of 250 residues (HHPred), and 12 % with length 331 (COMA), respectively. Considering the HSSP curve as first presented by Schneider and Sander in 1990 (<xr id="hssp"/>), we are definitely in an unsafe region for homology modelling.
1YW4 was also crystallized as a dimer, and again, we only used the monomer (chain A) as a template for modelling Aspartoacylase. There were some minor differences in the alignment methods, and we chose the align-2D alignment to include secondary structure information.
A superposition of the model and the reference can be found in <xr id="1yw4_modeller_betasheets"/>. It is easy to see that correct structure modelling failed, and even a seeming similarity - the buried beta-sheets - cannot be taken as correct since they are not the same corresponding sequence regions. The binding site is not even in the same region, but scattered all over the structure (<xr id="1yw4_modeller_binding"/>).
<figtable id="1yw4_modeller">
<figure id="1yw4_modeller_betasheets"></figure> | <figure id="1yw4_modeller_binding"></figure> |
</figtable>
Scores
<figtable id="modeller_1yw4_scores">
DOPE Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (16 residues) |
-xx | xx | 0.1760 | 0.0364 % | 0.0704 % | x |
</figtable>
Template < 30% Seq Id: 1KWM (Human)
1KWM is a procarboxypeptidase, and we wanted to find out about eventual similarities between the two types of proteins. It comes at a sequence identity of only 11% at an alignment length of 192 amino acids.
As one can expect again, with such an unfitting template, homology modelling fails. See <xr id="1kwm_modeller_full"/> and <xr id= "1kwm_modeller_binding"/> and the scores attached for details.
<figtable id="1kwm_modeller">
<figure id="1kwm_modeller_full"></figure> | <figure id="1kwm_modeller_binding"></figure> |
</figtable>
Scores
<figtable id="modeller_1kwm_scores">
DOPE Score | Weighted Rmsd (187 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (16 residues) |
-xx | Weighted RMSd = 2.146 | 0.1885 | 0.0430 % | 0.0745 % | x |
</figtable>
SwissModell
Template > 80% Seq Id: 2GU2(Rattus Norvegicus)
Since the template is a homodimer, SwissModel also creates a dimer model from the Aspartoacylase sequence:
- 2gu2 is annotated as DIMER
- The quaternary structure of the target can be assumed to be identical
- To build the complex the following chains of the complex has been additionally identified: 2gu2B
SwissModel also included the Zinc ligand:
- All the residues interacting with the ligand are completely conserved between model and template.
- The RMSD between the interacting residues of model and template is lesser than two: 0.080
The total energy of the model is calculated to be -30164.225 KJ/mol.
The model created by SwissModel is almost identical to the native structure. The secondary structure elements can be superposed very well and only in the loop regions there are some deviations (compare <xr id="2gu2_sw_overall"/>). Once again, especially the channel gating loop, formed by residues 158-164 shows the largest deviations. The agreement between the two binding sites is also very good. As for the Modeller model, Y164 is located on the flexible channel entrance loop and has a completely different positon than compared to the reference crystal structure of Aspartoacylse. Arg71 has a slightly different orientation as well(compare <xr id="2gu2_sw_binding"/>).
<figtable id="2gu2_sw">
<figure id="2gu2_sw_overall"></figure> | <figure id="2gu2_sw_binding"></figure> |
</figtable>
Scores
<figtable id="sm_2gu2_scores">
SwissModel Score | Weighted Rmsd (301 Ca - atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
-30164.225 | 0.412 | 0.9788 | 0.8957 % | 0.9652 % | 0.255 |
</figtable>
Template >40%, <80% Seq Id: 3NFZ (Mus musculus)
In this case, the model was created as a monomer, although the template is annotated as a dimer.
- The target and template sequences are too diverse (seqid: 42.295) to infer a conservation of the oligomeric state
SwissModel does not include any ligands from the template for modelling.
- CL321: The ligand is farther than 3 Angstroem from the template, so it is assumed that they are not interacting.
- Given the properties calculated previously, the ligand A.CL321 will not be included in the final model.
The total energy of the model is calculated to be -10093.196 KJ/mol. Thus, the energy is three times higher than for the 2GU2 model.
<figtable id="3nfz_sw">
<figure id="3nfz_sw_overall"></figure> | <figure id="3nfz_sw_binding"></figure> |
</figtable>
Scores
<figtable id="sm_3nfz_scores">
SwissModel Score | Weighted Rmsd (302 Ca-atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (21 residues) |
-10093.196 | 0.750 | 0.9665 | 0.9015 % | 0.7285 % | 0.414 |
</figtable>
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
With default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.
Template < 30% Seq Id: 1KWM (Human)
Again, with default parameters, SwissModel stated that the "Alignment quality between target and specified template is too low" and could therefore not provide a model.
iTasser
Template > 80% Seq Id: 2GU2 (Rattus Norvegicus)
I-Tasser modelled Aspartoacylase as a dimer, alhtough the given template 2GU2 was crystallized as a dimer.
The model in general looks very similar to the reference structure of Aspartoacylase, except for the already mentioned loop regions. But when taking a closer look at the binding site, one can find that the orientation of all of the side chains is slightly different to the the reference structure. This is in contrast to the models created by Modeller and SwissModel, where there are deviations for only some of the binding site residues. This is also reflected in the binding site RMSD, that is higher for this model than for the other two models.
<figtable id="2gu2_itasser">
<figure id="superposition_itasser_2gu2_aspa"></figure> | <figure id="itasser_2gu2_aspa2_binding"></figure> |
</figtable>
Scores
<figtable id="it_2gu2_scores">
C-Score | Weighted Rmsd (302 Ca - atoms) | TM-Score | GDT_HA | GDT_TS | binding site RMSD (18 residues) |
1.81 | 0.552 | 0.9747 | 0.8278 % | 0.9470 % | 0.297 |
</figtable>
Template < 30% Seq Id: 1YW4 (Chromobacterium violaceum)
Committed on Friday evening, still running (Monday, 8.30am).
Template < 30% Seq Id: 1KWM (Human)
Committed on Friday evening, still running (Monday, 8.30am).
Modeller - Multimodel from templates with < 30% seq ID
<xr id="multimodel30_all"/>
<figtable id="multimodel">
<figure id="multimodel30_all"></figure> | <figure id="multimodel_binding"></figure> |
</figtable>
Evaluation of Methods
2GU2
All three methods generated very accurate models, having RMSD values of 0.406(Modeller), 0.412(SwissModel) and 0.552(I-Tasser).
The same ranking is reflected by the other scores (see <xr id="eval_2gu2"/>). TM_Score and the GDT Scores are as well very low, proving the quality of the models.
Also a visual inspection of the models shows their accuracy, as can be seen in <xr id="eval_2gu2_overall"/>. Here, all models are superposed with the reference structure 2O53. Only in some loop regions, and especially for the very flexible channel gating loop, there are some deviations from the crystal structure (see <xr id="eval_2gu2_loop"/> ).
<figtable id="eval_2gu2">
<figure id="eval_2gu2_distri"></figure> | <figure id="eval_2gu2_overall"></figure> | <figure id="eval_2gu2_loop"></figure> |
</figtable>
When taking a closer look at the binding site, one can see, that all models chose a different conformer for Y164 (see <xr id="eval_2gu2_y164"/> ). This correlates with the different positioning of the channel gating loop, on which Y164 is located. For the other binding site residues, there are only minor deviations. Yet, it is clearly obvious, that the binding site residues in the I-tasser model have the most deviations compared to the reference structure (see <xr id="eval_2gu2_r63"/> - <xr id="eval_2gu2_e178"/>). This is also reflected in the binding site RMSD, that is the lowest for I-tasser (Modeller: 0.248, SwissModel: 0.255, I-tasser: 0.297).
<figtable id="eval_2gu2_bindingsite">
<figure id="eval_2gu2_r63"></figure> | <figure id="eval_2gu2_r71"></figure> | <figure id="eval_2gu2_r168"></figure> | <figure id="eval_2gu2_y164"></figure> | <figure id="eval_2gu2_e178"></figure> |
</figtable>
In Conclusion, all generated homology models based on 2GU2 are very accurate. This goes along with the expactations, since the template shares over 80% of the residues with the target sequence. The differences between the models are very little. Yet, I-tasser has the largest differences for the binding site residues.
3FNZ
For this template from mus musculus with 40% sequence identity, Modeller and SwissModel generated very accurate results. We did not receive the results from iTasser yet.
As can be seen in <xr id="eval_3nfz_overall"/>, visually there are only some deviations in the loop regions. Again, as for the 2GU2 template, residues 158-164 forming this flexible channel entrance loop, show the largest deviation compared to the reference structure of Aspartoacylase.
<figtable id="eval_3nfz">
<figure id="eval_3nfz_overall"></figure> | <figure id="eval_3nfz_binding"></figure> |
</figtable>
0.750 0.9665 0.9015 % 0.7285 % 0.414
0.775 0.9641 0.7152 % 0.8932 % 0.370
1KWM
For the moment, only the results of modeller are availabe. SwissModel does not create a model at all, given the low similarity of template and target, while we are still expecting the iTasser results. However, the modeller and SwissModel results already indicate what one could expect: if the chosen template is not of good quality, homology modelling is bound to fail. See <xr id = "1kwm_modeller_full"/> for details and scores.
1YW4
The same as for 1KWM holds true for 1YW4 - bad template, bad results. See <xr id="1yw4_modeller_betasheets"/> for details.