Canavan Disease: Task 05 - Homology Modelling
Homology Modelling is also a very important step: Since not always a structure to the protein of interest is known, models can help understanding the protein. Even SNPs in the sequence can make a difference in those models. Such investigation is therefore indispensable.
Contents
Dataset
The models are calculated with three different homology modelling tools: Modeller (as explained in description of the task), SwissModel and iTasser. To compare the modelling algorithms two sequences per sequence similarity set were chosen as template:
<figtable id="dataset">
Dataset composition | |||
---|---|---|---|
PDB-id | Description | Criterium | |
2O4H | ASPA from Human with bound N-phosphonomethyl-L-aspartate | reference structure | |
2O53 | Crystal structure of apo-Aspartoacylase from human brain | sequence identity 100% | |
2GU2 | ASPA from Rat | sequence identity 84% | |
2QJ8 | ASPA family protein from mesorhizobium loti | sequence identity 16% | |
1YW4 | Succinylglutamate Desuccinylase from "Chromobacterium violaceum" | sequence identity 14% |
and the sequence identity to the reference ASPA protein.
</figtable>
Model creation
Each modelling algorithm was used to produce models for 2O4H (as representative for aspartoacylase) based on four different template proteins. Those models can be examined in the following section except the model that SwissModel should have created based in 1YW4, as SwissModel was not able to perform this task.
Modeller with Single Sequence Template
Modeller produced extremely accurate models for the target protein given templates with a high sequence similarity. Both 2O53 and 2GU2 are already highly similar in structure if visually compared to 2O4H. Performing a structural alignment of the template structures to 2O4H result in RMDS below 1Å. Therefore it is to be expected that the models generated should be very accurate, and this is exactly what can be observed (see <xr id="2O53_md"></xr> and <xr id="2GU2_md"></xr>).
</figure>
</figure>
<figure id="2O53_md"> |
<figure id="2GU2_md"> |
Taking a look at the model generated for 2O4H with the aid of 2QJ8 and 1WY4 as template which have both a sequence similarity below 20%, the results are still very good. There are visible differences between the target and the models like larger loop regions or secondary structure elements with conformations that are slightly mispredicted. However if the aligned target and models are compared to their original template there is a big difference detectable (compare <xr id="2QJ8_md">Figure</xr> and <xr id="1YW4_md">Figure</xr>).
</figure>
</figure>
<figure id="2QJ8_md"> |
<figure id="1YW4_md"> |
Modeller with Multiple Sequence Template
To calculate another type of model, the idea was to include more templates, to check whether it has an effect on the model quality. Since the number of templates are very restricted, a multiple sequence alignment of 2GU2, 1QJ8 and 1YW4 was created, which resulted in a consensus of 46.77% sequence identity to the reference sequence. Generally the expectation that the model does not improve, compared to the model created only from 2GU2 as template is confirmed. Comparing the model created using the multiple sequence input as template with the model created from 2GU2 and 2QJ8 respectively it can be clearly seen, that there is an improvement compared to 2QJ8 (see <xr id="MSA_md_2QJ8">Figure</xr>) but a slight decrease in quality comparing to 2GU2 (see <xr id="MSA_md_2GU2">Figure</xr>).
</figure>
</figure>
<figure id="MSA_md_2GU2"> |
<figure id="MSA_md_2QJ8"> |
Modeller with Manual Alignment Refinement
Although the prediction from Modeller works quite well with low sequence similarity, an approach using a refinement of the sequence was made. The idea was to check whether a short helix of about 6 amino acids could be predicted, since Modeller always missed it in the creation of the model using 1YW4 as template. The next step was to change the structure in the loop region (positions 320 - 325), where the helix should have been. The results showed that the refinement does not lead to any improvement of the model, especially not the creation of that specific helix.
SwissModel
Examining the models created by SwissModel with high sequence similarity templates in Pymol together with the templates and target, reveals that SwissModel creates very accurate models as well. One visible difference compared the models created by Modeller is that SwissModel seems to created the model for the length of the target opposed to Modeller where for example the N and C-terminus of the polypeptide is well extended over the length of the actual target (see <xr id="2O53_sm">Figure</xr> and <xr id="2GU2_sm">Figure</xr>).
</figure>
</figure>
<figure id="2O53_sm"> |
<figure id="2GU2_sm"> |
Regarding the models created from templates with a sequence similarity of less than 30% to 2O4H, the remark that SwissModel was not able to form a model with 1YW4 as template has to be made. The modelling process with 2QJ8 as template has been successful however. Taking a closer look at the model created from 2QJ8 is get visible that SwissModel at least for this specific example does not perform as well as Modeller using a template with low sequence similarity. The overall positioning of the atoms is correct, but the prediction of secondary structure elements is much worse. Some residues do not even have a predicted spacial position (see <xr id="2QJ8_sm">Figure</xr>).
<figure id="2QJ8_sm">
</figure>
iTasser
iTasser continues to show the trend that is already observable if looking at the results of Modeller and SwissModel. Taking a look at the models produced out of the high sequence similarity templates, it is visible that iTasser creates accurate models for the target sequence (see <xr id="2O53_it">Figure</xr> and <xr id="2GU2_it">Figure</xr>). However if the visualization in Pymol is compared to the result Modeller and SwissModel created iTasser seems to be a little less precise. The spacial orientation of the secondary structure elements seem a bit off compared to the models created by the two other methods.
</figure>
</figure>
<figure id="2O53_it"> |
<figure id="2GU2_it"> |
Looking at the calculated models based on 2QJ8 and 1YW4, iTasser definitely produces much worse results than Modeller and SwissModel. Comparing target and template structures, concerning the positions of the atoms, the model seems to fall in between the two structures leaning more towards the template structure (compare to <xr id="2QJ8_it">Figure</xr> and <xr id="1YW4_it">Figure</xr>).
</figure>
</figure>
<figure id="2QJ8_it"> |
<figure id="1YW4_it"> |
Therefore the conclusion based on examination of the structures via Pymol, should be that Modeller's overall performance is exceptional, only being beaten by SwissModel for templates with a high sequence similarity. iTasser however is performing worse in every example additionally to the fact that the computing time is much longer if compared to the other methods. This is due to the fact that the iTasser server was always full of jobs.
Model evaluation
Comparing the models and methods that created the models, the GDT score and RMSD was calculated for each computed model. As displayed in <xr id="model"></xr> the GDT scores are a better measure than the RMSD to grade the models. What can be observed is that the models created from a template with high sequence similarity to the target have a high GDT score. With decreasing sequence similarity the GDT score rapidly decreases. This can not be observed using the RMSD scores. The RMSD scores are overall good with an exception to iTasser, although the models do observably differ in quality. The ranking in performance of the modelling algorithms that had been concluded from the visual examination by using Pymol can be confirmed taking the GDT scores into consideration. Modeller and SwissModel generate very good models especially if the template has a high sequence similarity to the target. iTasser disappoints in terms of both model quality and amount of computing time needed to create the model. Additionally the model scores given by the individual modelling algorithms can be a good indicator on how accurate the model is. Modeller's combined Z-score and Swiss-Model's QMEAN Z-Score for example are generally correlating with the RMSD and the GDT score and therefore a good measurement of the actual model quality. iTasser's C-Score which should give a confidence measure of the model quality seems to neither correlate with the RMSD nor the GDT score.
<figtable id="model">
Comparison of Modelling Algorithms | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Modeller | SwissModel | iTasser | |||||||||||
2O53 | 2GU2 | 2QJ8 | 1YW4 | MSA | 2O53 | 2GU2 | 2QJ8 | 1YW4 | 2O53 | 2GU2 | 2QJ8 | 1YW4 | |
Algorithm dep. score | -11.92 | -12.10 | -4.83 | -4.90 | -8.77 | -0.44 | -0.44 | -5.96 | not computed | 1.64 | 1.76 | 1.62 | -0.63 |
GDT Score | 100.00 | 54.65 | 6.40 | 6.65 | 11.407 | 100.00 | 54.65 | 6.89 | not computable | 89.95 | 46.68 | 7.81 | 7.56 |
C_alpha RMSD | 0.19Å | 0.34Å | 1.19Å | 0.97Å | 3.06Å | 0.07Å | 0.06Å | 2.68Å | not computable | 1.31Å | not computable | 7.17Å | 10.23Å |
</figtable>
Tasks
- Link to Task 01: Canavan Disease
- Link to Task 02: Alignments
- Link to Task 03: Sequence-based Predictions
- Link to Task 04: Structural Alignments
- Link to Task 05: Homology Modelling
- Link to Task 06: Protein Structure Prediction from Evolutionary Sequence Variation
- Link to Task 07: Researching SNPs
- Link to Task 08: Sequence-based Mutation Analysis
- Link to Task 09: Structure-based Mutation Analysis
- Link to Task 10: Normal Mode Analysis