Difference between revisions of "Task 5: Homology Modeling"
(→Multiple templates) |
|||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
[[lab journal task 5]] |
[[lab journal task 5]] |
||
− | 1A6Z chain A was used as modeling target for all three methods. |
+ | From the two available PDB structures, 1A6Z chain A was used as modeling target for all three methods. |
==Modeller== |
==Modeller== |
||
Line 29: | Line 29: | ||
<figtable id="Modeller single"> |
<figtable id="Modeller single"> |
||
{|class="colBasic2" |
{|class="colBasic2" |
||
− | ! Template || Seq. identity || colspan="3" | std Alignment || colspan="3" | 2d alignment |
+ | ! Template || Seq. identity || colspan="3" | std Alignment || colspan="3" | 2d alignment |
|- |
|- |
||
− | ! || || DOPE score || |
+ | ! || || DOPE score ||RMSD || GDT score || DOPE score || RMSD || GDT score |
|- |
|- |
||
− | | 1QVO_A || 39% || -27772 ||3.647 || 0.6241 ||-20467 || 18.789 || 0.3410 |
+ | | 1QVO_A || 39% || -27772 ||3.647 || 0.6241 ||-20467 || 18.789 || 0.3410 |
|- |
|- |
||
− | | 1S7X_A || 29% || -19941 || 15.806 || 0.3355 ||-22681 || 18.101 || 0.3649 |
+ | | 1S7X_A || 29% || -19941 || 15.806 || 0.3355 ||-22681 || 18.101 || 0.3649 |
|- |
|- |
||
− | | 1CD1_A || 21% ||-19034 ||18.066 || 0.3640 || -18938 || 21.670 || 0.2822 |
+ | | 1CD1_A || 21% ||-19034 ||18.066 || 0.3640 || -18938 || 21.670 || 0.2822 |
− | |+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Table 1:''' Template structures and their sequence identity to the target, as computed by Blast. The DOPE score, RMSD and GDT score are given as a quality measure. The different models were created based on pairwise sequence alignments with dynamic programming (std Alignment) |
+ | |+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Table 1:''' Template structures and their sequence identity to the target, as computed by Blast. The DOPE score, RMSD and GDT score are given as a quality measure. The different models were created based on pairwise sequence alignments with dynamic programming (std Alignment) and pairwise sequence alignment with additional secondary structure information (2d Alignment). |
|} |
|} |
||
</figtable> |
</figtable> |
||
− | <xr id="Modeller single"/> lists the selected templates and the Modeller results for the different template structures and alignment methods. In addition to the standard pairwise sequence alignment based on dynamic programming, we also used Modeller's |
+ | <xr id="Modeller single"/> lists the selected templates and the Modeller results for the different template structures and alignment methods. In addition to the standard pairwise sequence alignment based on dynamic programming, we also used Modeller's align2d() method to improve the alignment by including secondary structure information of the template. The method tries to place gaps outside secondary structure segments. This is especially useful for alignments between more distantly related sequences, because those alignments usually contain more gaps than alignments between more closely related sequences. |
− | We also tried to improve the alignments manually. |
||
As Modeller quality score, we chose the [http://salilab.org/modeller/9.11/manual/node253.html DOPE score], which is a statistical potential that was optimized for the assessment of model quality. The DOPE score has an arbitrary scale, but scores for structures of the same protein are comparable and can be used to select the best model from a collection of structures. The lower the score, the better the model. |
As Modeller quality score, we chose the [http://salilab.org/modeller/9.11/manual/node253.html DOPE score], which is a statistical potential that was optimized for the assessment of model quality. The DOPE score has an arbitrary scale, but scores for structures of the same protein are comparable and can be used to select the best model from a collection of structures. The lower the score, the better the model. |
||
Line 49: | Line 48: | ||
For the GDT score, the average coverage of the target sequence under four defined distance cutoffs is computed. Normally, 1, 2, 4 and 8 Å are used as distance thresholds. The GDT score ranges between 0 and 1, with random superpositions of unrelated structures having a score of 0.1 to 0.2. |
For the GDT score, the average coverage of the target sequence under four defined distance cutoffs is computed. Normally, 1, 2, 4 and 8 Å are used as distance thresholds. The GDT score ranges between 0 and 1, with random superpositions of unrelated structures having a score of 0.1 to 0.2. |
||
− | < |
+ | <figure id="pymol modeller 1QVO"> |
{| align="center" |
{| align="center" |
||
| align="center" | [[File:1qvo_std.png|thumb|300px|'''a)''' classical pairwise sequence alignment]] |
| align="center" | [[File:1qvo_std.png|thumb|300px|'''a)''' classical pairwise sequence alignment]] |
||
Line 55: | Line 54: | ||
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 1:''' Superposition of the target 1A6Z_A (green), the template 1QVO_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller. |
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 1:''' Superposition of the target 1A6Z_A (green), the template 1QVO_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller. |
||
|} |
|} |
||
− | </ |
+ | </figure> |
− | <xr id="pymol modeller 1QVO"/> shows a visualisation of the two models (purple) created from the template 1QVO_A (red) with the closest homology to the target (green). The first model a) is obviously very good |
+ | <xr id="pymol modeller 1QVO"/> shows a visualisation of the two models (purple) created from the template 1QVO_A (red) with the closest homology to the target (green). The first model a) is obviously very good and much better than the second model b), because its secondary structure features match the target quite good and the position of the alpha helices of the second model differs a lot from the target. |
− | < |
+ | <figure id="pymol modeller 1S7X"> |
{| align="center" |
{| align="center" |
||
| align="center" | [[File:1s7x_std.png|thumb|300px|'''a)''' classical pairwise sequence alignment]] |
| align="center" | [[File:1s7x_std.png|thumb|300px|'''a)''' classical pairwise sequence alignment]] |
||
Line 65: | Line 64: | ||
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 2:''' Superposition of the target 1A6Z_A (green), the template 1S7X_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller. |
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 2:''' Superposition of the target 1A6Z_A (green), the template 1S7X_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller. |
||
|} |
|} |
||
− | </ |
+ | </figure> |
− | The models created from the template 1S7X_A are worse than those from the more closely related 1QVO_A. The 3D representation in <xr id="pymol modeller 1S7X"/> shows several regions in both models, where the secondary structure elements could not be superimposed correctly to the reference. The beta sheets in the second model b) match those of the |
+ | The models created from the template 1S7X_A are worse than those from the more closely related 1QVO_A. The 3D representation in <xr id="pymol modeller 1S7X"/> shows several regions in both models, where the secondary structure elements could not be superimposed correctly to the reference. The beta sheets in the second model b) match those of the immunoglobulin domain (lower part of the protein) in the target, but there are no beta sheets in that region in the first model a). Although the RMSD is better for model a), we would agree with the DOPE score that implies, that the model b) is better. |
− | < |
+ | <figure id="pymol modeller 1CD1"> |
{| align="center" |
{| align="center" |
||
| align="center" | [[File:1cd1_std.png|thumb|300px|'''a)''' classical pairwise sequence alignment ]] |
| align="center" | [[File:1cd1_std.png|thumb|300px|'''a)''' classical pairwise sequence alignment ]] |
||
Line 75: | Line 74: | ||
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 3:''' Superposition of the target 1A6Z_A (green), the template 1CD1_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller. |
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 3:''' Superposition of the target 1A6Z_A (green), the template 1CD1_A (red) and the model (purple). Two different alignment methods were used to create the input alignment for Modeller. |
||
|} |
|} |
||
− | </ |
+ | </figure> |
− | <xr id="pymol modeller 1CD1"/> shows 3D representations for the models created form the most distant related homolog 1CD1. Both model a) and b) are very different from the target and also the template. The 21% sequence identity between the two proteins is |
+ | <xr id="pymol modeller 1CD1"/> shows 3D representations for the models created form the most distant related homolog 1CD1. Both model a) and b) are very different from the target and also the template. The 21% sequence identity between the two proteins is obviously too low to create good alignments. Nevertheless, the secondary structure guided alignment method from Modeller was especially designed for this task: the alignment of sequences with low identity. But including the secondary structure information in the alignments did only slightly improve the alignment for the template 1S7X. |
− | < |
+ | <figure id="1CD1 alignments"> |
{| class="colBasic2" align="center" |
{| class="colBasic2" align="center" |
||
! std alignment || 2D alignment |
! std alignment || 2D alignment |
||
Line 86: | Line 85: | ||
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 4:''' Standard and 2d alignment of 1QVO_A and 1A6Z_A. |
|+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Figure 4:''' Standard and 2d alignment of 1QVO_A and 1A6Z_A. |
||
|} |
|} |
||
− | </ |
+ | </figure> |
The alignments of 1QVO_A and the target in <xr id="1CD1 alignments"/> show, that the standard alignment is already quite good, it only contains a few gaps. Including the 2D information could not improve the alignment and lead to a fragmented alignment with an increased number of gaps. Thus, the resulting model is of decreased quality. |
The alignments of 1QVO_A and the target in <xr id="1CD1 alignments"/> show, that the standard alignment is already quite good, it only contains a few gaps. Including the 2D information could not improve the alignment and lead to a fragmented alignment with an increased number of gaps. Thus, the resulting model is of decreased quality. |
||
Line 95: | Line 94: | ||
=== Multiple templates === |
=== Multiple templates === |
||
− | We also created |
+ | We also created models using more than one template in a single modeling step. Therefore, we created three sets of structures, one with close homologs, one with distant homologs and one mixed set. |
<figtable id="multiple sets"> |
<figtable id="multiple sets"> |
||
Line 111: | Line 110: | ||
|} |
|} |
||
</figtable> |
</figtable> |
||
− | |||
− | |||
<figtable id="multiple sets"> |
<figtable id="multiple sets"> |
||
Line 131: | Line 128: | ||
|- |
|- |
||
! Pymol visualisation |
! Pymol visualisation |
||
− | | [[File:close2.png|center|thumb|300px| Visualisation of the target (green) and the model created from 1QVO_A and 1ZAG_A (purple).]] || [[File:close.png|center|thumb|300px| Visualisation of the target (green) and model created from 1QVO_A, 1ZAG_A and 1RJZ_D |
+ | | [[File:close2.png|center|thumb|300px| Visualisation of the target (green) and the model created from 1QVO_A and 1ZAG_A (purple).]] || [[File:close.png|center|thumb|300px| Visualisation of the target (green) and model created from 1QVO_A, 1ZAG_A and 1RJZ_D (purple).]] |
|- |
|- |
||
− | | bgcolor="#adceff" | || colspan="2" |
+ | | bgcolor="#adceff" | || colspan="2" style="background-color:#adceff; text-align:center"|'''distant homology''' |
|- |
|- |
||
− | !Template |
+ | ! Template |
| 3HUJ_C, 1CD1_A || 3HUJ_C, 1CD1_A, 1VZY_A |
| 3HUJ_C, 1CD1_A || 3HUJ_C, 1CD1_A, 1VZY_A |
||
|- |
|- |
||
− | !DOPE score |
+ | ! DOPE score |
− | |-25967 ||-20588 |
+ | | -25967 || -20588 |
|- |
|- |
||
! RMSD |
! RMSD |
||
Line 150: | Line 147: | ||
| [[File:distant2.png|center | thumb|300px| Visualisation of the target (green) and model model created from 3HUJ_C and 1CD1_A (purple).]] || [[File:distant.png|center | thumb|300px| Visualisation of the target (green) and model created from 3HUJ_C, 1CD1_A and 1VZY_A (purple).]] |
| [[File:distant2.png|center | thumb|300px| Visualisation of the target (green) and model model created from 3HUJ_C and 1CD1_A (purple).]] || [[File:distant.png|center | thumb|300px| Visualisation of the target (green) and model created from 3HUJ_C, 1CD1_A and 1VZY_A (purple).]] |
||
|- |
|- |
||
− | |bgcolor="#adceff" | || |
+ | |bgcolor="#adceff" | || style="background-color:#adceff; text-align:center"| '''mixed homology''' |
|- |
|- |
||
!Template |
!Template |
||
Line 170: | Line 167: | ||
</figtable> |
</figtable> |
||
+ | Including more templates improves the quality of the models. However, we got the best results with two template sequences, because three templates led to a bit worse model than two templates. However, the three template models are still better than those created from a single template. |
||
− | |||
− | + | Surprisingly, the two templates with low sequence identity to the target led to a good model with an RMSD of 4.12 which is nearly as good as the model created with the mixed set. |
|
==Swiss-Model== |
==Swiss-Model== |
||
Line 179: | Line 176: | ||
Swiss-Model outputs a raw score and also a Z-score that represents an absolute measure of the model quality. It relates the model's raw score to the scores that high-resolution X-ray structures get and thus gives an estimate of how likely the model has a quality comparable to an experimental structure. A low quality model is indicated by a strong negative Z-score, which means that the raw score is several standard deviations lower as the scores of experimental structures with similar size (see [http://swissmodel.expasy.org/workspace/index.php?func=special_help&=#A Swiss-Model help]). |
Swiss-Model outputs a raw score and also a Z-score that represents an absolute measure of the model quality. It relates the model's raw score to the scores that high-resolution X-ray structures get and thus gives an estimate of how likely the model has a quality comparable to an experimental structure. A low quality model is indicated by a strong negative Z-score, which means that the raw score is several standard deviations lower as the scores of experimental structures with similar size (see [http://swissmodel.expasy.org/workspace/index.php?func=special_help&=#A Swiss-Model help]). |
||
− | Swiss-Model also provides plots that help to analyse the local energy of the model. For this, the atomic empirical mean force potential (Anolea) and the Gromos simulation package are used. Both are used calculate the energy of each amino acid in the sequence. The two plots show the protein sequence on the x-axis and the calculated energy of each residue on the y-axis. A low energy corresponds to a favorable energy environment for an amino acid and a positive energy represents an unfavorable energy environment. |
+ | Swiss-Model also provides plots that help to analyse the local energy of the model. For this, the atomic empirical mean force potential (Anolea) and the Gromos simulation package are used. Both are used to calculate the energy of each amino acid in the sequence. The two plots show the protein sequence on the x-axis and the calculated energy of each residue on the y-axis. A low energy corresponds to a favorable energy environment for an amino acid and a positive energy represents an unfavorable energy environment. |
<figtable id="swiss-model"> |
<figtable id="swiss-model"> |
||
Line 206: | Line 203: | ||
</figtable> |
</figtable> |
||
− | <xr id="swiss-model"/> contains the Swiss-Model results. The models from 1QVO_A and 1S7X_A are both very good having a RMSD of 2.847 and 2.757. The GDT score agrees with the RMSD that the model from 1S7X_A is slightly better than the one from 1QVO_A. However, the Z-score decreases with the sequence identity. Looking at the two models in 3D does not clearly reveal the best model. Some secondary structure segments are more correct in one model and some are better modeled in the |
+ | <xr id="swiss-model"/> contains the Swiss-Model results. The models from 1QVO_A and 1S7X_A are both very good, having a RMSD of 2.847 and 2.757. The GDT score agrees with the RMSD that the model from 1S7X_A is slightly better than the one from 1QVO_A. However, the Z-score decreases with the sequence identity. Looking at the two models in 3D does not clearly reveal the best model. Some secondary structure segments are more correct in one model and some are better modeled in the other one. We therefore would rank both models as equally good. |
− | The Anolea and Gromos plots for each model are also given in the table. For the first model, they show that the residues at the N and C termini are in a favorable energy environments, but there are some less favorable segments in the middle of the model. Since the first two models are better than the third, we expected to see this |
+ | The Anolea and Gromos plots for each model are also given in the table. For the first model, they show that the residues at the N and C termini are in a favorable energy environments, but there are some less favorable segments in the middle of the model. Since the first two models are better than the third, we expected to see this trend also represented in the energy landscape. But the energy plot of the third model is not very different from the other two. |
==I-TASSER== |
==I-TASSER== |
||
Line 236: | Line 233: | ||
! Pymol visualisation |
! Pymol visualisation |
||
| [[File:itasser_1qvo.png|center | thumb|300px| Visualisation of the target (green), the template 1QVO_A and the model (purple).]] || align="center" | [[File:itasser_1cd1.png|center | thumb|300px| Visualisation of the target (green), the template 1CD1_A and the model (purple).]] |
| [[File:itasser_1qvo.png|center | thumb|300px| Visualisation of the target (green), the template 1QVO_A and the model (purple).]] || align="center" | [[File:itasser_1cd1.png|center | thumb|300px| Visualisation of the target (green), the template 1CD1_A and the model (purple).]] |
||
+ | |+ style="caption-side: bottom; text-align: left" |<font size=1.5>'''Table 5:''' Overview of the I-Tasser results for the two different templates. |
||
− | |- |
||
|} |
|} |
||
</figtable> |
</figtable> |
||
The two models and different quality measures are listed in <xr id="i-tasser"/>. Both models are quite good, and especially the second model from 1CD1 is remarkable, since the sequence identity of the template is only 21%. The model has a RMSD of 3.478, which is very close to the RMSD of the first model (3.062). The GDT and C-score both also rank the first model as the better one. |
The two models and different quality measures are listed in <xr id="i-tasser"/>. Both models are quite good, and especially the second model from 1CD1 is remarkable, since the sequence identity of the template is only 21%. The model has a RMSD of 3.478, which is very close to the RMSD of the first model (3.062). The GDT and C-score both also rank the first model as the better one. |
||
− | In the |
+ | In the Pymol visualisation, it is clearly visible that the second model is worse especially in the immunoglobulin domain, where it does not contain only beta sheets, but an alpha helix instead. |
− | |||
− | == Discussion == |
||
+ | == Summary == |
||
− | The GDT score correates with the RMSD for most models, but there are some exceptions where the GDT score is much worse than the RMSD. |
||
+ | The GDT score is negatively correlated with the RMSD in nearly all cases. If the RMSD is gets lower, then the GDT score gets higher and vice versa. There are only a few exceptions among the models created by Modeller. |
||
− | The DOPE score from Modeller is also suited to evaluate the models. |
||
− | + | The DOPE score from Modeller is also suited to get a first impression of how good the models are. However, there are some models where it indicates a medium quality, but the RMSD and GDT score are actually bad. |
|
+ | Swiss-Models's Z-score and I-Tassers C-score are both stronger correlated with the RMSD and GDT score than the DOPE score. |
||
− | + | All methods were able to create good models, although Modeller created the best one from three close homology sequences with an RMSD of 2.431. |
|
+ | Nevertheless, the Modeller models have the highest RMSDs and lowest GDT scores and are thus the worst, only the models created with multiple templates are all good. |
||
+ | Swiss-Model created the second best model with an RMSD of 2.757. Itasser is also very good and especially in creating models from low sequence identity templates. |
Latest revision as of 01:57, 2 September 2013
From the two available PDB structures, 1A6Z chain A was used as modeling target for all three methods.
Modeller
We used Modeller to create models based on a single template and also multiple templates.
Single template
<css> table.colBasic2 { margin-left: auto; margin-right: auto; border: 2px solid black; border-collapse:collapse; width: 60%; } .colBasic2 th,td { padding: 3px; border: 2px solid black; } .colBasic2 td { text-align:left; } .colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;} </css>
<figtable id="Modeller single">
Template | Seq. identity | std Alignment | 2d alignment | ||||
---|---|---|---|---|---|---|---|
DOPE score | RMSD | GDT score | DOPE score | RMSD | GDT score | ||
1QVO_A | 39% | -27772 | 3.647 | 0.6241 | -20467 | 18.789 | 0.3410 |
1S7X_A | 29% | -19941 | 15.806 | 0.3355 | -22681 | 18.101 | 0.3649 |
1CD1_A | 21% | -19034 | 18.066 | 0.3640 | -18938 | 21.670 | 0.2822 |
</figtable>
<xr id="Modeller single"/> lists the selected templates and the Modeller results for the different template structures and alignment methods. In addition to the standard pairwise sequence alignment based on dynamic programming, we also used Modeller's align2d() method to improve the alignment by including secondary structure information of the template. The method tries to place gaps outside secondary structure segments. This is especially useful for alignments between more distantly related sequences, because those alignments usually contain more gaps than alignments between more closely related sequences.
As Modeller quality score, we chose the DOPE score, which is a statistical potential that was optimized for the assessment of model quality. The DOPE score has an arbitrary scale, but scores for structures of the same protein are comparable and can be used to select the best model from a collection of structures. The lower the score, the better the model. In addition to the DOPE score, we also computed the RMSD and GDT scores. The RMSD is a a good measure of the average distance between all pairs of corresponding atoms in two structures. Therefore, the lower the RMSD the better. For the GDT score, the average coverage of the target sequence under four defined distance cutoffs is computed. Normally, 1, 2, 4 and 8 Å are used as distance thresholds. The GDT score ranges between 0 and 1, with random superpositions of unrelated structures having a score of 0.1 to 0.2.
<figure id="pymol modeller 1QVO">
</figure>
<xr id="pymol modeller 1QVO"/> shows a visualisation of the two models (purple) created from the template 1QVO_A (red) with the closest homology to the target (green). The first model a) is obviously very good and much better than the second model b), because its secondary structure features match the target quite good and the position of the alpha helices of the second model differs a lot from the target.
<figure id="pymol modeller 1S7X">
</figure>
The models created from the template 1S7X_A are worse than those from the more closely related 1QVO_A. The 3D representation in <xr id="pymol modeller 1S7X"/> shows several regions in both models, where the secondary structure elements could not be superimposed correctly to the reference. The beta sheets in the second model b) match those of the immunoglobulin domain (lower part of the protein) in the target, but there are no beta sheets in that region in the first model a). Although the RMSD is better for model a), we would agree with the DOPE score that implies, that the model b) is better.
<figure id="pymol modeller 1CD1">
</figure>
<xr id="pymol modeller 1CD1"/> shows 3D representations for the models created form the most distant related homolog 1CD1. Both model a) and b) are very different from the target and also the template. The 21% sequence identity between the two proteins is obviously too low to create good alignments. Nevertheless, the secondary structure guided alignment method from Modeller was especially designed for this task: the alignment of sequences with low identity. But including the secondary structure information in the alignments did only slightly improve the alignment for the template 1S7X.
<figure id="1CD1 alignments">
std alignment | 2D alignment |
---|---|
</figure>
The alignments of 1QVO_A and the target in <xr id="1CD1 alignments"/> show, that the standard alignment is already quite good, it only contains a few gaps. Including the 2D information could not improve the alignment and lead to a fragmented alignment with an increased number of gaps. Thus, the resulting model is of decreased quality.
The DOPE score has a positive correlation with the the GDT score, as well as with the RMSD. The scores differ a bit in some cases, but all in all, they agree that the model created from 1QVO_A and the standard alignment is the best. This is not surprising, since the 3D visualisations show that 1QVO_A is already very similar to the target, whereas the other two templates differ much more.
Multiple templates
We also created models using more than one template in a single modeling step. Therefore, we created three sets of structures, one with close homologs, one with distant homologs and one mixed set.
<figtable id="multiple sets">
close homology | distant homology | mixed | |||
---|---|---|---|---|---|
Template | Seq. identity | Template | Seq. identity | Template | Seq. identity |
1QVO_A | 39% | 3HUJ_C | 23% | 1QVO_A | 39% |
1ZAG_A | 36% | 1CD1_A | 21% | 1CD1_A | 21% |
1RJZ_D | 34% | 1VZY_A | 14% |
</figtable>
<figtable id="multiple sets">
close homology | ||
---|---|---|
Template | 1QVO_A, 1ZAG_A | 1QVO_A, 1ZAG_A, 1RJZ_D |
DOPE score | -28073 | -27460 |
RMSD | 3.432 | 2.431 |
GDT score | 0.6553 | 0.7638 |
Pymol visualisation | ||
distant homology | ||
Template | 3HUJ_C, 1CD1_A | 3HUJ_C, 1CD1_A, 1VZY_A |
DOPE score | -25967 | -20588 |
RMSD | 4.130 | 7.741 |
GDT score | 0.5607 | 0.3814 |
Pymol visualisation | ||
mixed homology | ||
Template | 1QVO_A, 1CD1_A | |
DOPE score | -25894 | |
RMSD | 3.974 | |
GDT score | 0.5846 | |
Pymol visualisation |
</figtable>
Including more templates improves the quality of the models. However, we got the best results with two template sequences, because three templates led to a bit worse model than two templates. However, the three template models are still better than those created from a single template. Surprisingly, the two templates with low sequence identity to the target led to a good model with an RMSD of 4.12 which is nearly as good as the model created with the mixed set.
Swiss-Model
We used Swiss-Model to create models using 1QVO_A, 1S7X_A and 1CD1_A as template.
Swiss-Model outputs a raw score and also a Z-score that represents an absolute measure of the model quality. It relates the model's raw score to the scores that high-resolution X-ray structures get and thus gives an estimate of how likely the model has a quality comparable to an experimental structure. A low quality model is indicated by a strong negative Z-score, which means that the raw score is several standard deviations lower as the scores of experimental structures with similar size (see Swiss-Model help).
Swiss-Model also provides plots that help to analyse the local energy of the model. For this, the atomic empirical mean force potential (Anolea) and the Gromos simulation package are used. Both are used to calculate the energy of each amino acid in the sequence. The two plots show the protein sequence on the x-axis and the calculated energy of each residue on the y-axis. A low energy corresponds to a favorable energy environment for an amino acid and a positive energy represents an unfavorable energy environment.
<figtable id="swiss-model">
1QVO_A | 1S7X_A | 1CD1_A | |
---|---|---|---|
Seq. identity | 39% | 29% | 21% |
Z-score | -1.977 | -2.005 | -2.707 |
RMSD | 2.847 | 2.757 | 3.604 |
GDT score | 0.6774 | 0.7086 | 0.6121 |
Pymol visualisation | |||
Anolea and Gromos energy |
</figtable>
<xr id="swiss-model"/> contains the Swiss-Model results. The models from 1QVO_A and 1S7X_A are both very good, having a RMSD of 2.847 and 2.757. The GDT score agrees with the RMSD that the model from 1S7X_A is slightly better than the one from 1QVO_A. However, the Z-score decreases with the sequence identity. Looking at the two models in 3D does not clearly reveal the best model. Some secondary structure segments are more correct in one model and some are better modeled in the other one. We therefore would rank both models as equally good. The Anolea and Gromos plots for each model are also given in the table. For the first model, they show that the residues at the N and C termini are in a favorable energy environments, but there are some less favorable segments in the middle of the model. Since the first two models are better than the third, we expected to see this trend also represented in the energy landscape. But the energy plot of the third model is not very different from the other two.
I-TASSER
I-Tasser was used to create models from two different templates. Due to I-Tasser's very long runtime of over 60h for one protein and because we were only allowed to run one job at a time, we only created 2 models.
I-Tasser uses threading in the first step to search for several template structures with high secondary structure similarity to the target in addition to the user specified template(s). Fragments of those structures are then reassembled to create several models for the target. The models are clustered and the lowest energy models are reported. We only selected the first and best model, because we wanted to make it comparable to the results from Modeller and Swiss-Model, both methods only report a single model.
The results contain a confidence score (C-score) as quality measure for the created models. It ranges between -5 and 2, with a high score indicating a high confidence (see cscore.txt).
<figtable id="i-tasser">
1QVO_A | 1CD1_A | |
---|---|---|
Seq. identity | 39% | 21% |
C-score | 1.73 | 1.38 |
RMSD | 3.062 | 3.478 |
GDT score | 0.6719 | 0.6103 |
Pymol visualisation |
</figtable>
The two models and different quality measures are listed in <xr id="i-tasser"/>. Both models are quite good, and especially the second model from 1CD1 is remarkable, since the sequence identity of the template is only 21%. The model has a RMSD of 3.478, which is very close to the RMSD of the first model (3.062). The GDT and C-score both also rank the first model as the better one. In the Pymol visualisation, it is clearly visible that the second model is worse especially in the immunoglobulin domain, where it does not contain only beta sheets, but an alpha helix instead.
Summary
The GDT score is negatively correlated with the RMSD in nearly all cases. If the RMSD is gets lower, then the GDT score gets higher and vice versa. There are only a few exceptions among the models created by Modeller.
The DOPE score from Modeller is also suited to get a first impression of how good the models are. However, there are some models where it indicates a medium quality, but the RMSD and GDT score are actually bad. Swiss-Models's Z-score and I-Tassers C-score are both stronger correlated with the RMSD and GDT score than the DOPE score.
All methods were able to create good models, although Modeller created the best one from three close homology sequences with an RMSD of 2.431. Nevertheless, the Modeller models have the highest RMSDs and lowest GDT scores and are thus the worst, only the models created with multiple templates are all good. Swiss-Model created the second best model with an RMSD of 2.757. Itasser is also very good and especially in creating models from low sequence identity templates.