Canavan Disease: Task 05 - Homology Modelling

From Bioinformatikpedia
Revision as of 19:36, 29 August 2013 by Boehma (talk | contribs) (Modeller)

Homology Modelling is also a very important step: Since not always a structure to the protein of interest is known, models can help understanding the protein. Even SNPs in the sequence can make a difference in those models. Let's investigate!

LabJournal

Dataset

The models are calculated with three different modellers: Modeller, SwissModel and iTasser. To compare the modellers two sequences per sequence similarity set were chosen:

<figtable id="dataset">

Dataset composition
PDB-id Description Criterium
2O4H ASPA from Human with bound N-phosphonomethyl-L-aspartate reference structure
2O53 Crystal structure of apo-Aspartoacylase from human brain sequence identity 100%
2GU2 ASPA from Rat sequence identity 84%
2QJ8 ASPA family protein from mesorhizobium loti sequence identity 16%
1YW4 Succinylglutamate Desuccinylase from "Chromobacterium violaceum" sequence identity 14%
Overview of the dataset composition for Task 05, containing a brief description of the the chosen structures and the sequence identity to the reference ACY2 protein

</figtable>

Model creation

Each modelling algorithm was used to produce models for 2HO4 based on four different template proteins. Those models can be examined in the following section except the model that Swissmodel should have created based in 1YW4, as Swissmodel was not able to perform this task.

Modeller with Single Sequence Template

Modeller produced extremely accurate models for the target protein given templates with a high sequence similarity. Both 2O53 and 2GU2 are already highly similar in structure if visually compared to 2O4H. Performing a structural alignment of the template structures to 2O4H result in RMDS below 1Å. Therefore it is to be expected that the models generated should be very accurate, and this is exactly what can be observed (see <xr id="2O53_md">Figure</xr> and <xr id="2GU2_md">Figure</xr>).

</figure>

</figure>

<figure id="2O53_md">

Representation of the target (2O4H), the template (2O53) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="2GU2_md">

Representation of the target (2O4H), the template (2GU2) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Taking a look at the model generated for 2O4H with the aid of 2QJ8 and 1WY4 as template which have both a sequence similarity below 20%, the results are sill very good. There are visible differences between the target and the models like larger loop regions or secondary structure elements with conformations that are slightly miss predicted. However if the aligned target and models are compared to their original template there is a big difference detectable (compare <xr id="2QJ8_md">Figure</xr> and <xr id="1YW4_md">Figure</xr>).

</figure>

</figure>

<figure id="2QJ8_md">

Representation of the target (2O4H), the template (2QJ8) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="1YW4_md">

Representation of the target (2O4H), the template (1YW4) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Modeller with Multiple Sequence Template

text

Modeller with Manual Alignment Refinement

text

Swissmodel

Examining the models created by Swissmodel with high sequence similarity templates in Pymol together with the templates and target, reveals that Swissmodel creates very accurate models as well. One visible difference compared the models created by Modeller is that Swissmodel seems to created the model for the length of the target opposed to Modeller where for example the N and C-terminus of the polypeptide is well extended over the length of the actual target (see <xr id="2O53_sm">Figure</xr> and <xr id="2GU2_sm">Figure</xr>).

</figure>

</figure>

<figure id="2O53_sm">

Representation of the target (2O4H), the template (2O53) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="2GU2_sm">

Representation of the target (2O4H), the template (2GU2) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Regarding the models created from templates with a sequence similarity of less than 30% to 2O4H, the remark that Swissmodel was not able to form a model with 1YW4 as template has to be made. The modeling process with 2QJ8 as template has been successful however. Taking a closer look at the model created from 2QJ8 is get visible that Swissmodel at least for this specific example does not perform was well as Modeller using a template with low sequence similarity. The overall positioning of the atoms is correct, but the prediction of secondary structure elements is much worse. Some residues do not even have a predicted spacial position (see <xr id="2QJ8_sm">Figure</xr>).

<figure id="2QJ8_sm">

Representation of the target (2O4H), the template (2QJ8) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

</figure>

iTasser

iTasser continues to show the trend that is already observable if looking at the results of Modeller and Swissmodel. Taking a look at the models produced out of the high sequence similarity templates, it is visible that iTasser creates accurate models for the target sequence (see <xr id="2O53_it">Figure</xr> and <xr id="2GU2_it">Figure</xr>). However if the visualization in Pymol is compared to the result Modleler and Swissmodel created iTasser seems to be a little less precise. The spacial orientation of the secondary structure elements seem a bit off compared to the models created by the two other methods.

</figure>

</figure>

<figure id="2O53_it">

Representation of the target (2O4H), the template (2O53) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="2GU2_it">

Representation of the target (2O4H), the template (2GU2) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Looking at the calculated models based on 2QJ8 and 1YW4, iTasser definitely produces much worse results than Modeller and Swissmodel. Comparing target and template structures, concerning the positions of the atoms, the model seems to fall in between the two structures leaning more towards the template structure (compare to <xr id="2O53_it">Figure</xr> and <xr id="2GU2_it">Figure</xr>).

</figure>

</figure>

<figure id="2QJ8_it">

Representation of the target (2O4H), the template (2QJ8) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="1YW4_it">

Representation of the target (2O4H), the template (1YW4) and the generated model in Pymol. 2O4H is displayed in orange including the bound zinc atom and compound at the active site of ASPA. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Therefore the conclusion out of the examination of the structures via Pymol, should be that Modeller's overall performance is exceptional, only being beaten by Swissmodel for templates with a high sequence similarity. iTasser however is performing worse in every example additionally to the fact that the computing time is much longer if compared to the other methods.

Model evaluation

Comparing the models and methods that created the models, the GDT score and RMSD was calculated for each computed model. As displayed in <xr id="model">Table</xr> the GDT scores are a better measure than the RMSD to grade the models. What can be observed is that the models created from a template with high sequence similarity to the target have a high GDT score. With decreasing sequence similarity the GDT score rapidly decreases. This can not be observed using the RMSD scores. The RMSD scores are overall good with an exception to iTasser, although the models do observably differ in quality. The ranking in performance of the modeling algorithms that had been concluded from the visual examination by using Pymol can be confirmed taking the GDT scores into consideration. Modeller and Swissmodel generate very good models especially if the template has a high sequence similarity to the target. iTasser disappoints in terms of both model quality and amount of computing time needed to create the model. Additionally the model scores given by the individual modelling algorithms can be a good indicator on how accurate the model is. Modeller's combined Z-score and Swiss-Model's QMEAN Z-Score for example are generally correlating with the RMSD and the GDT score and therefore a good measurement of the actual model quality. iTasser's C-Score which should give a confidence measure of the model quality seems to neither correlate with the RMSD nor the GDT score.

<figtable id="model">

Comparison of modelling algorithms
Modeller SwissModel iTasser
2O53 2GU2 2QJ8 1YW4 2O53 2GU2 2QJ8 1YW4 2O53 2GU2 2QJ8 1YW4
Algorithm dep. score -11.92 -12.10 -4.83 -4.90 -0.44 -0.44 -5.96 not computed 1.64 1.76 1.62 -0.63
GDT Score 100.00 54.65 6.40 6.65 100.00 54.65 6.89 not computable 89.95 46.68 7.81 7.56
C_alpha RMSD 0.19Å 0.34Å 1.19Å 0.97Å 0.07Å 0.06Å 2.68Å not computable 1.31Å not computable 7.17Å 10.23Å
Comparison of the calculated models, using GDT score and RMSD as quality measure. Additionally the algorithm dependent confidence scores are given (Modeller: combined Z-Score, SwissModel: QMEAN Z-Score, iTasser: C-Score)

</figtable>

Tasks