Canavan Disease: Task 05 - Homology Modelling

From Bioinformatikpedia
Revision as of 11:24, 5 September 2013 by Mahlich (talk | contribs) (SwissModel)

Homology Modelling is a very important step: Since not always a structure to the protein of interest is known, models can help understanding the protein. Even SNPs in the sequence can make a difference in those models. Therefore such an investigation is indispensable.

Dataset

The models are calculated with three different homology modelling tools: Modeller (as explained in description of the Task), SwissModel and iTasser. To compare the modelling algorithms two sequences per sequence similarity set were chosen as template. For a description of the dataset see <xr id="dataset"></xr>.

<figtable id="dataset">

Dataset Composition
PDB-id Description Criterium
2O4H ASPA from Human with bound N-phosphonomethyl-L-aspartate reference structure
2O53 Crystal structure of apo-Aspartoacylase from human brain sequence identity 100%
2GU2 ASPA from Rat sequence identity 84%
2QJ8 ASPA family protein from mesorhizobium loti sequence identity 16%
1YW4 Succinylglutamate Desuccinylase from "Chromobacterium violaceum" sequence identity 14%
Overview of the dataset composition for Task 05, containing a brief description of the the chosen structures
and the sequence identity to the reference ASPA protein.

</figtable>

Model creation

Each modelling algorithm was used to produce models for 2O4H (as representative for aspartoacylase) based on four different template proteins. Those models can be examined in the following section. An exception is the model that SwissModel should have created based in 1YW4, as SwissModel was not able to perform this task.

Modeller with Single Sequence Template

Modeller produced extremely accurate models for the target protein given templates with a high sequence similarity. Both 2O53 and 2GU2 are already highly similar in structure if visually compared to 2O4H. Performing a structural alignment of the template structures to 2O4H result in RMSDs below 1Å. Therefore it is to be expected that the models generated should be very accurate, and this is exactly what can be observed (see <xr id="2O53_md"></xr> and <xr id="2GU2_md"></xr>).

</figure>

</figure>

<figure id="2O53_md">

Representation of target (2O4H), template (2O53) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="2GU2_md">

Representation of target (2O4H), template (2GU2) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Taking a look at the model generated for 2O4H with the aid of 2QJ8 and 1WY4 as template which have both a sequence similarity below 20%, the results are still very good. There are visible differences between the target and the models like larger loop regions or secondary structure elements with conformations that are slightly mispredicted. However if the aligned target and models are compared to their original template there is a big difference detectable (compare <xr id="2QJ8_md"></xr> and <xr id="1YW4_md"></xr>).

</figure>

</figure>

<figure id="2QJ8_md">

Representation of target (2O4H), template (2QJ8) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="1YW4_md">

Representation of target (2O4H), template (1YW4) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Modeller with Multiple Sequence Template

To calculate another type of model, the idea was to include more templates, to check whether it has an effect on the model quality. Since the number of templates are very restricted, a multiple sequence alignment of 2GU2, 1QJ8 and 1YW4 was created, which resulted in a consensus of 46.77% sequence identity to the reference sequence. Generally the expectation that the model does not improve, compared to the model created only from 2GU2 as template is confirmed. Comparing the model created using the multiple sequence input as template with the model created from 2GU2 and 2QJ8 respectively it can be clearly seen, that there is an improvement compared to 2QJ8 (see <xr id="MSA_md_2QJ8"></xr>) but a slight decrease in quality comparing to 2GU2 (see <xr id="MSA_md_2GU2"></xr>).

</figure>

</figure>

<figure id="MSA_md_2GU2">

Representation of target (2O4H), the generated model from the MSA template and the model generated from 2GU2 in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The produced model from the MSA is displayed in blue. The model produced from 2GU2 as template is displayed for comparison (turquoise).

<figure id="MSA_md_2QJ8">

Representation of target (2O4H), the generated model from the MSA template and the model generated from 2QJ8 in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site.The produced model from the MSA is displayed in blue. The model produced from 2QJ8 as template is displayed for comparison (turquoise).

Modeller with Manual Alignment Refinement

Although the prediction from Modeller works quite well with low sequence similarity, an approach using a refinement of the sequence was made. The idea was to check whether a short helix of about 6 amino acids could be predicted, since Modeller always missed it in the creation of the model using 1YW4 as template. The next step was to change the structure in the loop region (positions 320 - 325), where the helix should have been. The results showed that the refinement does not lead to any improvement of the model, especially not the creation of that specific helix.

SwissModel

Examining the models created by SwissModel with high sequence similarity templates in Pymol together with the templates and target, reveals that SwissModel creates very accurate models as well. One visible difference compared to the models created by Modeller is that SwissModel seems to created the model for the length of the target opposed to Modeller where for example the N and C-terminus of the polypeptide is well extended over the length of the actual target (see <xr id="2O53_sm"></xr> and <xr id="2GU2_sm"></xr>).

</figure>

</figure>

<figure id="2O53_sm">

Representation of target (2O4H), template (2O53) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="2GU2_sm">

Representation of target (2O4H), template (2GU2) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Regarding the models created from templates with a sequence similarity of less than 30% to 2O4H, the remark that SwissModel was not able to form a model with 1YW4 as template has to be made. The modelling process with 2QJ8 as template has been successful however. Taking a closer look at the model created from 2QJ8 is gets visible that SwissModel at least for this specific example does not perform as well as Modeller using a template with low sequence similarity. The overall positioning of the atoms is correct, but the prediction of secondary structure elements is much worse. Some residues do not even have a predicted spacial position (see <xr id="2QJ8_sm"></xr>).

<figure id="2QJ8_sm">

Representation of target (2O4H), template (2QJ8) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

</figure>

iTasser

iTasser continues to show the trend that is already observable if looking at the results of Modeller and SwissModel. Taking a look at the models produced out of the high sequence similarity templates, it is visible that iTasser creates accurate models for the target sequence (see <xr id="2O53_it"></xr> and <xr id="2GU2_it"></xr>). However if the visualization in Pymol is compared to the result Modeller and SwissModel created iTasser seems to be a little less precise. The spacial orientation of the secondary structure elements seem a bit off compared to the models created by the two other methods.

</figure>

</figure>

<figure id="2O53_it">

Representation of target (2O4H), template (2O53) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="2GU2_it">

Representation of target (2O4H), template (2GU2) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Looking at the calculated models based on 2QJ8 and 1YW4, iTasser definitely produces much worse results than Modeller and SwissModel. Comparing target and template structures, concerning the positions of the atoms, the model seems to fall in between the two structures leaning more towards the template structure (compare to <xr id="2QJ8_it"></xr> and <xr id="1YW4_it"></xr>).

</figure>

</figure>

<figure id="2QJ8_it">

Representation of target (2O4H), template (2QJ8) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

<figure id="1YW4_it">

Representation of target (2O4H), template (1YW4) and generated model in Pymol. 2O4H is displayed in orange including the bound zinc ion and compound at the active site. The template used to generate the model is displayed in green. The produced model is displayed in blue.

Therefore the conclusion based on examination of the structures via Pymol, should be that Modeller's overall performance is exceptional, only being beaten by SwissModel for templates with a high sequence similarity. iTasser however is performing worse in every example additionally to the fact that the computing time is much longer if compared to the other methods. This is due to the fact that the iTasser server was always full of jobs.

Model evaluation

Comparing the models and methods that created the models, the GDT score and RMSD was calculated for each computed model. As displayed in <xr id="model"></xr> the GDT scores are a better measure than the RMSD to grade the models. What can be observed is that the models created from a template with high sequence similarity to the target have a high GDT score. With decreasing sequence similarity the GDT score rapidly decreases. This can not be observed using the RMSD scores. The RMSD scores are overall good with an exception to iTasser, although the models do observably differ in quality. The ranking in performance of the modelling algorithms that had been concluded from the visual examination by using Pymol can be confirmed taking the GDT scores into consideration. Modeller and SwissModel generate very good models especially if the template has a high sequence similarity to the target. iTasser disappoints in terms of both model quality and amount of computing time needed to create the model. Additionally the model scores given by the individual modelling algorithms can be a good indicator on how accurate the model is. Modeller's combined Z-score and Swiss-Model's QMEAN Z-Score for example are generally correlating with the RMSD and the GDT score and therefore a good measurement of the actual model quality. iTasser's C-Score which should give a confidence measure of the model quality seems to neither correlate with the RMSD nor the GDT score.

<figtable id="model">

Comparison of Modelling Algorithms
Modeller SwissModel iTasser
2O53 2GU2 2QJ8 1YW4 MSA 2O53 2GU2 2QJ8 1YW4 2O53 2GU2 2QJ8 1YW4
Algorithm dep. score -11.92 -12.10 -4.83 -4.90 -8.77 -0.44 -0.44 -5.96 not computed 1.64 1.76 1.62 -0.63
GDT Score 100.00 54.65 6.40 6.65 11.407 100.00 54.65 6.89 not computable 89.95 46.68 7.81 7.56
C_alpha RMSD 0.19Å 0.34Å 1.19Å 0.97Å 3.06Å 0.07Å 0.06Å 2.68Å not computable 1.31Å not computable 7.17Å 10.23Å
Comparison of the calculated models, using GDT score and RMSD as quality measure. Additionally the algorithm dependent confidence scores are given
(Modeller: combined Z-Score, SwissModel: QMEAN Z-Score, iTasser: C-Score).

</figtable>

Tasks