Difference between revisions of "Fabry:Homology based structure predictions"
Rackersederj (talk | contribs) (→Evaluation: DOPE scores) |
Rackersederj (talk | contribs) m (→Calculation of models) |
||
Line 289: | Line 289: | ||
== Modeller == |
== Modeller == |
||
=== Calculation of models === |
=== Calculation of models === |
||
− | With this command line tool, we created 10 models (see [[Fabry:Homology_based_structure_predictions/Journal#Modeller | Journal]]). The first three were produced with the standard settings and workflow of Modeller. The subsequent four models were computed from multiple target files in different combinations and in the last three models we rearranged the alignment files in order to test the quality of the alignment and the influence of the two types of alignment. |
+ | With this command line tool, we created 10 models (see [[Fabry:Homology_based_structure_predictions/Journal#Modeller | Journal]]). The first three were produced with the standard settings and workflow of ''Modeller''. The subsequent four models were computed from multiple target files in different combinations and in the last three models we rearranged the alignment files in order to test the quality of the alignment and the influence of the two types of alignment. |
==== Default settings ==== |
==== Default settings ==== |
||
Line 297: | Line 297: | ||
<caption>Model 1, visual comparison</caption> |
<caption>Model 1, visual comparison</caption> |
||
{| style="border-style: solid; border-width: 1px" |
{| style="border-style: solid; border-width: 1px" |
||
− | | [[File:FABRY_1R46_3HG3_model_only.png|right|280px|thumb| Model 1 (red), created with Modeller with the template 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
+ | | [[File:FABRY_1R46_3HG3_model_only.png|right|280px|thumb| Model 1 (red), created with ''Modeller'' with the template 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
| [[File:FABRY_1R46_3HG3.png|right|280px|thumb|Model 1 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow)]] |
| [[File:FABRY_1R46_3HG3.png|right|280px|thumb|Model 1 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow)]] |
||
|- |
|- |
||
Line 304: | Line 304: | ||
</div> |
</div> |
||
− | For the first model we used the template with the highest sequence identity. According to HHPred, the identity is 100%, Modeller only calculates an identity of 96% (see <xr id="tab:Modeller_scores_3hg3_2"/>). This discrepancy might be due to the way of the comparison - 1R46 is completely inclosed in 3HG3, but 3HG3 has a longer sequence (404 residues) and thus only 96% of it can be congruent to 1R46 (398 residues without signal peptide). In the left picture in <xr id="tab:pics_1R46_3HG3"/> the superimposition of the computed Model 1 and the actual target structure are shown. The right picture additionally displays the template structure. One can see, that the three structure almost perfectly superimpose, which is underlined by the scores derived from Modeller (see <xr id="tab:Modeller_scores_3hg3_2"/>). The GA341 score of 1.0 indicates a "native like" model (see basic [http://salilab.org/modeller/tutorial/basic.html tutorial]) and the Compactness <ref name="Compactness"> Foldit Wiki, Compactness (October 23, |
+ | For the first model we used the template with the highest sequence identity. According to HHPred, the identity is 100%, ''Modeller'' only calculates an identity of 96% (see <xr id="tab:Modeller_scores_3hg3_2"/>). This discrepancy might be due to the way of the comparison - 1R46 is completely inclosed in 3HG3, but 3HG3 has a longer sequence (404 residues) and thus only 96% of it can be congruent to 1R46 (398 residues without signal peptide). In the left picture in <xr id="tab:pics_1R46_3HG3"/> the superimposition of the computed Model 1 and the actual target structure are shown. The right picture additionally displays the template structure. One can see, that the three structure almost perfectly superimpose, which is underlined by the scores derived from ''Modeller'' (see <xr id="tab:Modeller_scores_3hg3_2"/>). The GA341 score of 1.0 indicates a "native like" model (see basic [http://salilab.org/modeller/tutorial/basic.html tutorial]) and the Compactness <ref name="Compactness"> Foldit Wiki, Compactness (October 23, |
2011), [http://foldit.wikia.com/wiki/Compactness http://foldit.wikia.com/wiki/Compactness]; May 26, 2012</ref> as well as the DOPE score (see basic [http://salilab.org/modeller/tutorial/basic.html tutorial]) are the second highest and lowest of all calculated models, respectively.<br> |
2011), [http://foldit.wikia.com/wiki/Compactness http://foldit.wikia.com/wiki/Compactness]; May 26, 2012</ref> as well as the DOPE score (see basic [http://salilab.org/modeller/tutorial/basic.html tutorial]) are the second highest and lowest of all calculated models, respectively.<br> |
||
− | The only parts that can not be modelled correctly are both ends of the sequence. Those parts are highlighted blue in the pictures. From our background knowledge we know that the first 31 residues form the signal peptide, that is cleaved off and thus can not be found in the tertiary structure of the target protein. This can not be modelled by the Modeller tool and thus it would be a good amendment to the modelling pipeline to add sequence based analyses like Signal peptide prediction, similiar to the predictions we made in [[Fabry:Sequence-based_analyses | Task 2]]. The lack of modellation of the last bit of the sequence can be pinned to the longer sequence of the 3HG3 structure, since the last 6 residues are craning and the template is 6 amino acids longer than the target.<br> |
+ | The only parts that can not be modelled correctly are both ends of the sequence. Those parts are highlighted blue in the pictures. From our background knowledge we know that the first 31 residues form the signal peptide, that is cleaved off and thus can not be found in the tertiary structure of the target protein. This can not be modelled by the ''Modeller'' tool and thus it would be a good amendment to the modelling pipeline to add sequence based analyses like Signal peptide prediction, similiar to the predictions we made in [[Fabry:Sequence-based_analyses | Task 2]]. The lack of modellation of the last bit of the sequence can be pinned to the longer sequence of the 3HG3 structure, since the last 6 residues are craning and the template is 6 amino acids longer than the target.<br> |
Inspecting the problematic residues (see <xr id="tab:Modeller_scores_3hg3_1"/>), with a distance of more than 8 angstrom, manually in pymol, we discovered that two of them lie in loop regions (91 and 101) which are hard to model. On the other hand two of the residues are located in a helix (160 and 318) and seem to fit perfectly to the target.<br> |
Inspecting the problematic residues (see <xr id="tab:Modeller_scores_3hg3_1"/>), with a distance of more than 8 angstrom, manually in pymol, we discovered that two of them lie in loop regions (91 and 101) which are hard to model. On the other hand two of the residues are located in a helix (160 and 318) and seem to fit perfectly to the target.<br> |
||
For further evaluation of the model, please see [[Fabry:Homology_based_structure_predictions#Evaluation | Modeller Evaluation]] |
For further evaluation of the model, please see [[Fabry:Homology_based_structure_predictions#Evaluation | Modeller Evaluation]] |
||
Line 366: | Line 366: | ||
<caption>Model 2, visual comparison</caption> |
<caption>Model 2, visual comparison</caption> |
||
{| style="border-style: solid; border-width: 1px" |
{| style="border-style: solid; border-width: 1px" |
||
− | | [[File:FABRY_1R46_1KTB_model_only.png|right|280px|thumb| Model 2 (red), created with Modeller with the template 1ktb, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
+ | | [[File:FABRY_1R46_1KTB_model_only.png|right|280px|thumb| Model 2 (red), created with ''Modeller'' with the template 1ktb, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
| [[File:FABRY_1R46_1KTB.png|right|280px|thumb|Model 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 1ktb (yellow)]] |
| [[File:FABRY_1R46_1KTB.png|right|280px|thumb|Model 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 1ktb (yellow)]] |
||
|- |
|- |
||
Line 429: | Line 429: | ||
<caption>Model 3, visual comparison</caption> |
<caption>Model 3, visual comparison</caption> |
||
{| style="border-style: solid; border-width: 1px" |
{| style="border-style: solid; border-width: 1px" |
||
− | | [[File:FABRY_1R46_3CC1_model_only.png|right|280px|thumb| Model 3 (red), created with Modeller with the template 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
+ | | [[File:FABRY_1R46_3CC1_model_only.png|right|280px|thumb| Model 3 (red), created with ''Modeller'' with the template 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
| [[File:FABRY_1R46_3CC1.png|right|280px|thumb|Model 3 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow)]] |
| [[File:FABRY_1R46_3CC1.png|right|280px|thumb|Model 3 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow)]] |
||
|- |
|- |
||
Line 436: | Line 436: | ||
</div> |
</div> |
||
For the last of the basic models we used the structure 3CC1 as a template, which is rather far related with a sequence identity of roughly 25%. Here the emerging problems are obvious. Of course, the signal peptide can not be modelled again, but the real problem is that Model 3 is tilted approximately 90° compared to the target structure 1R46.<br> |
For the last of the basic models we used the structure 3CC1 as a template, which is rather far related with a sequence identity of roughly 25%. Here the emerging problems are obvious. Of course, the signal peptide can not be modelled again, but the real problem is that Model 3 is tilted approximately 90° compared to the target structure 1R46.<br> |
||
− | Additionally analyzing the quality scores of the model, we see that the model must be rejected. For example the GA341 score of 0.33 indicates that the model should be disarded, since any good model will give a score near 1.0, and according to Salilab any model with a score less than 0.6 should not be considered as helpfull <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), [http://salilab.org/archives/modeller_usage/2006/msg00060.html http://salilab.org/archives/modeller_usage/2006/msg00060.html]; May 26, 2012</ref>. By all means, the z-scores are also not good, because these statistical potentials contribute to the GA341 (see [http://modbase.compbio.ucsf.edu/modeval/help.cgi?type=help&style=helplink#z-pair Modeller help]). |
+ | Additionally analyzing the quality scores of the model, we see that the model must be rejected. For example the GA341 score of 0.33 indicates that the model should be disarded, since any good model will give a score near 1.0, and according to Salilab any model with a score less than 0.6 should not be considered as helpfull <ref name="GA341"> Salilab - ''Modeller'' Usage, modeller ga341 score (February 21, 2006), [http://salilab.org/archives/modeller_usage/2006/msg00060.html http://salilab.org/archives/modeller_usage/2006/msg00060.html]; May 26, 2012</ref>. By all means, the z-scores are also not good, because these statistical potentials contribute to the GA341 (see [http://modbase.compbio.ucsf.edu/modeval/help.cgi?type=help&style=helplink#z-pair ''Modeller'' help]). |
<br> |
<br> |
||
For further evaluation of the model, please see [[Fabry:Homology_based_structure_predictions#Evaluation | Modeller Evaluation]] |
For further evaluation of the model, please see [[Fabry:Homology_based_structure_predictions#Evaluation | Modeller Evaluation]] |
||
Line 508: | Line 508: | ||
</figtable> |
</figtable> |
||
</div> |
</div> |
||
− | Model 1, which is based on the structures 3HG3 and 1KTB is the subjectively second best of the Modeller computed models. The superimposition of the model and the target structure (see <xr id="tab:pics_1R46_multi1"/>) shows a similiar results to [[Fabry:Homology_based_structure_predictions#Default_settings | Model 1]], but comparing the quality scores, we observe a slightly worse DOPE score a really bad Compactness value. According to Salilab the DOPE score is the score to distinguish between two good models <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), [http://salilab.org/archives/modeller_usage/2006/msg00060.html http://salilab.org/archives/modeller_usage/2006/msg00060.html]; May 26, 2012</ref>. This model has the second lowest score of all, as well as a almost 100% sequence identity to the target.<br> |
+ | Model 1, which is based on the structures 3HG3 and 1KTB is the subjectively second best of the ''Modeller'' computed models. The superimposition of the model and the target structure (see <xr id="tab:pics_1R46_multi1"/>) shows a similiar results to [[Fabry:Homology_based_structure_predictions#Default_settings | Model 1]], but comparing the quality scores, we observe a slightly worse DOPE score a really bad Compactness value. According to Salilab the DOPE score is the score to distinguish between two good models <ref name="GA341"> Salilab - ''Modeller'' Usage, modeller ga341 score (February 21, 2006), [http://salilab.org/archives/modeller_usage/2006/msg00060.html http://salilab.org/archives/modeller_usage/2006/msg00060.html]; May 26, 2012</ref>. This model has the second lowest score of all, as well as a almost 100% sequence identity to the target.<br> |
On the other hand, the Compactness seems to be really low, even smaller than the value of Model 3. Thus we think, that this quality score cannot really be considered reliable, in contrast to the DOPE score.<br> |
On the other hand, the Compactness seems to be really low, even smaller than the value of Model 3. Thus we think, that this quality score cannot really be considered reliable, in contrast to the DOPE score.<br> |
||
What also becomes obvious in this model, is that the more distant related 1KTB structure (orange) seems to fit better than the really close related 3HG3 (yellow), which can be seen in the visual comparison in the right picture of <xr id="tab:pics_1R46_multi1"/>. |
What also becomes obvious in this model, is that the more distant related 1KTB structure (orange) seems to fit better than the really close related 3HG3 (yellow), which can be seen in the visual comparison in the right picture of <xr id="tab:pics_1R46_multi1"/>. |
||
Line 572: | Line 572: | ||
<caption>Model MULTI 2, visual comparison</caption> |
<caption>Model MULTI 2, visual comparison</caption> |
||
{| style="border-style: solid; border-width: 1px" |
{| style="border-style: solid; border-width: 1px" |
||
− | | [[File:FABRY_1R46_multi2_model_only.png|right|280px|thumb| Model MULTI 2 (red), created with Modeller on basis of the templates 3HG3, 1KTB and 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
+ | | [[File:FABRY_1R46_multi2_model_only.png|right|280px|thumb| Model MULTI 2 (red), created with ''Modeller'' on basis of the templates 3HG3, 1KTB and 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
| [[File:FABRY_1R46_multi2.png|right|280px|thumb| Model MULTI 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow), 1KTB (orange) and 3CC1 (lightorange)]] |
| [[File:FABRY_1R46_multi2.png|right|280px|thumb| Model MULTI 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow), 1KTB (orange) and 3CC1 (lightorange)]] |
||
|- |
|- |
||
Line 646: | Line 646: | ||
<caption>Model MULTI 3, visual comparison</caption> |
<caption>Model MULTI 3, visual comparison</caption> |
||
{| style="border-style: solid; border-width: 1px" |
{| style="border-style: solid; border-width: 1px" |
||
− | | [[File:FABRY_1R46_multi3_model_only.png|right|280px|thumb| Model MULTI 3 (red), created with Modeller on basis of the templates 3CC1, 3ZSS and 3A24, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
+ | | [[File:FABRY_1R46_multi3_model_only.png|right|280px|thumb| Model MULTI 3 (red), created with ''Modeller'' on basis of the templates 3CC1, 3ZSS and 3A24, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
| [[File:FABRY_1R46_multi3.png|right|280px|thumb| Model MULTI 3 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow), 3ZSS (orange) and 3A24 (lightorange)]] |
| [[File:FABRY_1R46_multi3.png|right|280px|thumb| Model MULTI 3 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow), 3ZSS (orange) and 3A24 (lightorange)]] |
||
|- |
|- |
||
Line 720: | Line 720: | ||
<caption>Model MULTI 4, visual comparison</caption> |
<caption>Model MULTI 4, visual comparison</caption> |
||
{| style="border-style: solid; border-width: 1px" |
{| style="border-style: solid; border-width: 1px" |
||
− | | [[File:FABRY_1R46_multi4_model_only.png|right|280px|thumb| Model MULTI 4 (red), created with Modeller on basis of the templates 3CC1 and 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
+ | | [[File:FABRY_1R46_multi4_model_only.png|right|280px|thumb| Model MULTI 4 (red), created with ''Modeller'' on basis of the templates 3CC1 and 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)]] |
| [[File:FABRY_1R46_multi4.png|right|280px|thumb| Model MULTI 4 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow) and 3HG3 (orange)]] |
| [[File:FABRY_1R46_multi4.png|right|280px|thumb| Model MULTI 4 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow) and 3HG3 (orange)]] |
||
|- |
|- |
||
Line 914: | Line 914: | ||
{| style="border-style: solid; border-width: 1px" |
{| style="border-style: solid; border-width: 1px" |
||
| [[File:FABRY_1R46_3HG3_CHAS3_model_new.png|right|280px|thumb| Model CHAS 3 (red), with active site shifted right to next D (7 and 1 positions) in both alignment files and the substrate binding region (position 203-207, highlighted in blue and cyan) forced to be consecutive, superimposed on of α-Galactosidase A (green)]] |
| [[File:FABRY_1R46_3HG3_CHAS3_model_new.png|right|280px|thumb| Model CHAS 3 (red), with active site shifted right to next D (7 and 1 positions) in both alignment files and the substrate binding region (position 203-207, highlighted in blue and cyan) forced to be consecutive, superimposed on of α-Galactosidase A (green)]] |
||
− | | [[File:FABRY_1R46_3HG3_CHAS3_model_old.png|right|280px|thumb| For comparison with Model CHAS 3, Model 1 (orange) which was basis for the edited alignments, created with Modeller on basis of the templates 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green); binding region highlighted in blue and cyan ]] |
+ | | [[File:FABRY_1R46_3HG3_CHAS3_model_old.png|right|280px|thumb| For comparison with Model CHAS 3, Model 1 (orange) which was basis for the edited alignments, created with ''Modeller'' on basis of the templates 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green); binding region highlighted in blue and cyan ]] |
|- |
|- |
||
|} |
|} |
Revision as of 16:59, 29 May 2012
Fabry Disease » Homology based structure predictions
The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.
Contents
Dataset preparation and target comparison
Datasets
<figtable id="tab:datasetHHpred"> Dataset HHpred, E-value cutoff 1e-15
pdb ID | E-value | Identity in % |
---|---|---|
> 80% sequence identity | ||
3hg3 | 8.6e-90 | 100 |
40% - 80% sequence identity | ||
1ktb | 4.2e-85 | 53 |
< 30% sequence identity | ||
3cc1 | 5.5e-74 | 25 |
1zy9 | 3.1e-48 | 13 |
3a24 | 7.8e-40 | 17 |
2xn2 | 5.3e-37 | 15 |
2d73 | 5.7e-36 | 14 |
3mi6 | 1.4e-31 | 15 |
2yfo | 9.1e-30 | 13 |
2f2h | 2.7e-20 | 17 |
2g3m | 2.2e-20 | 16 |
3nsx | 6e-20 | 13 |
3lpp | 2.2e-18 | 15 |
3l4y | 1.9e-18 | 15 |
3top | 3.6e-18 | 12 |
2xvl | 3.2e-18 | 16 |
2x2h | 4.9e-16 | 13 |
</figtable>
<figtable id="tab:datasetHHpred"> Additional sequences HHpred, E-value cutoff 0.002
pdb ID | E-value | Identity in % |
---|---|---|
3zss | 0.00062 | 10 |
1j0h | 0.0011 | 15 |
1ea9 | 0.00098 | 12 |
</figtable>
<figtable id="tab:datasetCOMA"> Dataset COMA, E-value cutoff 0.002
pdb ID | E-value | Identity in % |
---|---|---|
> 80% sequence identity | ||
- | - | - |
40% - 80% sequence identity | ||
1ktb | 1.7e-61 | 52 |
< 30% sequence identity | ||
3lrk | 1.2e-66 | 23 |
3a21 | 2.7e-65 | 26 |
1szn | 3.7e-59 | 22 |
3cc1 | 5.2e-58 | 19 |
1zy9 | 1.7e-39 | 9 |
3mi6 | 4.3e-38 | 11 |
2yfn | 4.4e-35 | 10 |
2d73 | 1.9e-32 | 9 |
3a24 | 5.6e-30 | 10 |
1xsi | 1.9e-12 | 10 |
2g3m | 2.4e-11 | 10 |
3pha | 2.9e-10 | 6 |
3lpo | 4.7e-09 | 8 |
2x2h | 8.2e-09 | 8 |
3mo4 | 1.2e-08 | 7 |
2xvg | 2.4e-08 | 8 |
3ton | 4.3e-08 | 8 |
2xib | 1e-07 | 7 |
3eyp | 1.6e-06 | 8 |
3k1d | 3.5e-06 | 9 |
2zwy | 8.8e-06 | 9 |
3gza | 1.8e-05 | 8 |
3m07 | 2.3e-05 | 7 |
1eh9 | 0.00013 | 6 |
1gvi | 0.00035 | 8 |
1aqh | 0.00039 | 5 |
1mwo | 0.00058 | 7 |
3vmn | 0.0018 | 9 |
1bf2 | 0.0019 | 6 |
3aml | 0.0019 | 8 |
</figtable>
We performed a HHpred as well as a COMA search, to generate three distinct datasets. Since COMA did not find any homologue structures with a similarity above 41% (see <xr id="tab:datasetCOMA"/>), we used the dataset created with the HHpred search and the script described in the journal. Hereby we found one structure with a similarity above 80%, one with a similarity between 40 and 80% and 15 with sequence similarity below 30%, of which 14 had a similarity of under 20% (see <xr id="tab:datasetHHpred" />). All HHpred matches had an E-value below 1e-15, for the COMA homologues we tried a less strict threshold of 0.002.
In most of the cases we used the structures 3hg3, 1ktb and 3cc1 for modelling, because either they are the only representatives in their class, or in the case od 3cc1, the sequence identity did not seem too low. For the Model MULTI 3 we also used the structures 3a24 and 3zss. The latter of those has an E-value of 0.00062. We added this structure to examine how a template with an E-value that is worse than the value of all our other structures, but still would fullfill the restrictions of an usual BLAST search (threshold of 0.003), would perform.
In this case it is important to mention, that although the identity of 3hg3 is 100%, it is not the pdb structure annotated for the AGAL protein, but the structure of the substrate bound catalytic mechanism, hence the high similarity.
1ktb is the X-ray structure for the already mentioned α-N-acetylgalactosiminidase in chicken, which in future might be used for enzyme replacement therapy in the treatment of Fabry Disease.
The last one of the frequently used structures, 3cc1, is the x-ray structure of a putative α-N-acetylgalactosiminidase in in Bacillus Halodurans.
Target comparison
<figure id="fig:GAL:1R47">
</figure>
As an initial step of the evaluation, we compared the apo structure 1R46 and the complex structure (with bound α-galactose) 1R47. Since the alignment of both the chains A of 1R46 and 1R47 in Pymol (see <xr id="tab:compare"/>) revealed a RMSD value of 0.248 and the comparison of the position and direction of the residues involved in the binding of the sugar (see <xr id="fig:GAL:1R47"/>) do not differ significantly, we used only the 1R46 structure for vizualisation, but computed all values and statistics for both structures.
In the right figure in <xr id="tab:compare"/>, the residues Asp92A, Asp93A, LYS168A, ARG227A and ASP231A are depicted in sticks representation (thicker); they are responsible for the binding of the sugar in the complex structures, which is shown in magenta. Clearly, one can see not much difference in this region between 1R46 and 1R47.
Modeller
Calculation of models
With this command line tool, we created 10 models (see Journal). The first three were produced with the standard settings and workflow of Modeller. The subsequent four models were computed from multiple target files in different combinations and in the last three models we rearranged the alignment files in order to test the quality of the alignment and the influence of the two types of alignment.
Default settings
Model 1
For the first model we used the template with the highest sequence identity. According to HHPred, the identity is 100%, Modeller only calculates an identity of 96% (see <xr id="tab:Modeller_scores_3hg3_2"/>). This discrepancy might be due to the way of the comparison - 1R46 is completely inclosed in 3HG3, but 3HG3 has a longer sequence (404 residues) and thus only 96% of it can be congruent to 1R46 (398 residues without signal peptide). In the left picture in <xr id="tab:pics_1R46_3HG3"/> the superimposition of the computed Model 1 and the actual target structure are shown. The right picture additionally displays the template structure. One can see, that the three structure almost perfectly superimpose, which is underlined by the scores derived from Modeller (see <xr id="tab:Modeller_scores_3hg3_2"/>). The GA341 score of 1.0 indicates a "native like" model (see basic tutorial) and the Compactness <ref name="Compactness"> Foldit Wiki, Compactness (October 23,
2011), http://foldit.wikia.com/wiki/Compactness; May 26, 2012</ref> as well as the DOPE score (see basic tutorial) are the second highest and lowest of all calculated models, respectively.
The only parts that can not be modelled correctly are both ends of the sequence. Those parts are highlighted blue in the pictures. From our background knowledge we know that the first 31 residues form the signal peptide, that is cleaved off and thus can not be found in the tertiary structure of the target protein. This can not be modelled by the Modeller tool and thus it would be a good amendment to the modelling pipeline to add sequence based analyses like Signal peptide prediction, similiar to the predictions we made in Task 2. The lack of modellation of the last bit of the sequence can be pinned to the longer sequence of the 3HG3 structure, since the last 6 residues are craning and the template is 6 amino acids longer than the target.
Inspecting the problematic residues (see <xr id="tab:Modeller_scores_3hg3_1"/>), with a distance of more than 8 angstrom, manually in pymol, we discovered that two of them lie in loop regions (91 and 101) which are hard to model. On the other hand two of the residues are located in a helix (160 and 318) and seem to fit perfectly to the target.
For further evaluation of the model, please see Modeller Evaluation
<figtable id="tab:Modeller_scores_3hg3_1"> Modeller scores Model 3hg3, Distances
Model | Distances > 8.0 Å in 2d alignment |
Distances > 8.0 Å | ||||||
---|---|---|---|---|---|---|---|---|
3hg3 | Pos: 428 Dist 76.568 |
Pos: 1 Dist 28.357 |
Pos: 91 Dist 8.810 |
Pos: 101 Dist 17.314 |
Pos: 112 Dist 25.386 |
Pos: 160 Dist 32.647 |
Pos: 318 Dist 27.449 |
Pos: 333 Dist 42.457 |
</figtable> <figtable id="tab:Modeller_scores_3hg3_2"> Modeller scores Model 3hg3
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
95.570999 | 429 | 0.215183 | -213.518650 | -9.487873 | -5.603112 | -10.125743 | -6.381974 | -11.484159 | 1.000000 | -52607.89844 |
</figtable>
Model 2
For this model, we used the target 1KTB, which has a sequence identity of 53%. The superimposed structures of 1R46 and Model 2 are shown in <xr id="tab:pics_1R46_1KTB"/>, as well as the structural alignment of 1KTB with the model and the target. This model encounters only one of the problems of Model 1, namely the not modeled signal peptide for the same reasons mentioned above. The end of the sequence seems to be modeled just fine, although the template sequence is even longer (405 residues) than 3HG3. Despite the little worse Compactness and DOPE score (see <xr id="tab:Modeller_scores_1ktb_2"/>) , Model 2 seems to be really good, since there are no residues that have a distance greater than 8 Å (see <xr id="tab:Modeller_scores_1ktb_1"/>) and according to the GA341 score the model is also "native like" and a value greater than 0.7 generally indicates a reliable model, defined as ≥ 95% probability of correct fold. <ref name="Melo2002"> Melo F, Sánchez R, Sali A. (2002). Statistical potentials for fold
assessment. Protein Sci. 2002 Feb;11(2):430-48. PMCID: PMC2373452</ref> .
For further evaluation of the model, please see Modeller Evaluation
<figtable id="tab:Modeller_scores_1ktb_1"> Modeller scores Model 1ktb, Distances
Model | Distances > 8.0 Å in 2d alignment |
Distances > 8.0 Å |
---|---|---|
1ktb | Pos: 0 Dist 0 |
Pos: 0 Dist 0 |
</figtable>
<figtable id="tab:Modeller_scores_1ktb_2"> Modeller scores Model 1ktb
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
53.351002 | 429 | 0.176840 | -107.285679 | -7.043755 | -3.262988 | -8.593054 | -6.151719 | -10.076556 | 1.000000 | -49267.35156 |
</figtable>
Model 3
For the last of the basic models we used the structure 3CC1 as a template, which is rather far related with a sequence identity of roughly 25%. Here the emerging problems are obvious. Of course, the signal peptide can not be modelled again, but the real problem is that Model 3 is tilted approximately 90° compared to the target structure 1R46.
Additionally analyzing the quality scores of the model, we see that the model must be rejected. For example the GA341 score of 0.33 indicates that the model should be disarded, since any good model will give a score near 1.0, and according to Salilab any model with a score less than 0.6 should not be considered as helpfull <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), http://salilab.org/archives/modeller_usage/2006/msg00060.html; May 26, 2012</ref>. By all means, the z-scores are also not good, because these statistical potentials contribute to the GA341 (see Modeller help).
For further evaluation of the model, please see Modeller Evaluation
<figtable id="tab:Modeller_scores_3cc1_1"> Modeller scores Model 3cc1, Distances
Model | Distances > 8.0 Å in 2d alignment |
Distances > 8.0 Å | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
3cc1 | Pos: 433 Dist 63.967 |
Pos: 147 Dist 25.085 |
Pos: 290 Dist 19.238 |
Pos: 374 Dist 24.356 |
Pos: 395 Dist 15.007 |
Pos: 412 Dist 61.733 |
Pos: 452 Dist 23.680 |
Pos: 631 Dist 23.283 |
Pos: 659 Dist 8.421 |
Pos: 684 Dist 10.763 |
Pos: 703 Dist 10.204 |
Pos: 762 Dist 10.753 |
</figtable> <figtable id="tab:Modeller_scores_3cc1_2"> Modeller scores Model 3cc1
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
24.242001 | 429 | 0.139850 | 198.134571 | 24.857669 | 7.885459 | -3.800148 | -1.528096 | -3.572654 | 0.332343 | -38190.22656 |
</figtable>
Multiple templates
MULTI 1
Model 1, which is based on the structures 3HG3 and 1KTB is the subjectively second best of the Modeller computed models. The superimposition of the model and the target structure (see <xr id="tab:pics_1R46_multi1"/>) shows a similiar results to Model 1, but comparing the quality scores, we observe a slightly worse DOPE score a really bad Compactness value. According to Salilab the DOPE score is the score to distinguish between two good models <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), http://salilab.org/archives/modeller_usage/2006/msg00060.html; May 26, 2012</ref>. This model has the second lowest score of all, as well as a almost 100% sequence identity to the target.
On the other hand, the Compactness seems to be really low, even smaller than the value of Model 3. Thus we think, that this quality score cannot really be considered reliable, in contrast to the DOPE score.
What also becomes obvious in this model, is that the more distant related 1KTB structure (orange) seems to fit better than the really close related 3HG3 (yellow), which can be seen in the visual comparison in the right picture of <xr id="tab:pics_1R46_multi1"/>.
For further evaluation of the model, please see Modeller Evaluation
<figtable id="tab:Modeller_scores_MULTI1_1"> Modeller scores Model MULTI1, Distances
Model | Distances > 6.0 Å | |||||||
---|---|---|---|---|---|---|---|---|
MULTI1 | Pos 211 Dist 6.509 |
Pos 212 Dist 7.237 |
Pos 379 Dist 6.608 |
Pos 380 Dist 10.056 |
Pos 426 Dist 6.538 |
Pos 427 Dist 8.423 |
Pos 428 Dist 7.683 |
Pos 429 Dist 9.226 |
</figtable>
<figtable id="tab:Modeller_scores_MULTI1_2"> Modeller scores Model MULTI1
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
99.747002 | 429 | 0.137662 | -298.484703 | -14.285927 | -7.811892 | -10.018935 | -7.069398 | -12.033570 | 1.000000 | -51741.65625 |
</figtable>
MULTI 2
Model 2 bases on Model 1, but has the additional structure 3CC1, which is rather distantly related to 1R46. Hence we have one structure of each sequence identity group. Thus we can observe a decrease in the model quality. In <xr id="tab:pics_1R46_multi2"/> one can see, that the signal peptide is somewhat nested inside the molecule. The DOPE score increased, thus indicating a less reliable model, although the Compactness has the highest level of all models (see <xr id="tab:Modeller_scores_MULTI2_2" />), underlining our above made statement, that this score is not really trustworthy.
Another sign for the bad quality of the model are the 556 residues with a distance greater than 6 angstrom shared among the three template structures in the alignment (see <xr id="tab:Modeller_scores_MULTI2_1" />).
For further evaluation of the model, please see Modeller Evaluation
<figtable id="tab:Modeller_scores_MULTI2_1"> Modeller scores Model MULTI2, Distances
Model | Distances > 6.0 Å | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MULTI2 | Pos 216 Dist 6.509 |
Pos 217 Dist 7.237 |
Pos 386 Dist 6.608 |
Pos 387 Dist 10.056 |
Pos 433 Dist 6.538 |
Pos 434 Dist 8.423 |
Pos 435 Dist 7.683 |
Pos 436 Dist 9.226 |
Pos 4 Dist 14.846 |
Pos 5 Dist 15.230 |
... 556 in total |
</figtable>
<figtable id="tab:Modeller_scores_MULTI2_2"> Modeller scores Model MULTI2
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
71.139000 | 429 | 0.229647 | 145.809979 | 8.120285 | 3.810054 | -7.131847 | -4.691505 | -7.820324 | 1.000000 | -44783.55469 |
</figtable>
MULTI 3
This model combines three structures with a sequence identity of less than 30%, one of these has even a worse E-value than all the others (3ZSS - see section Datasets) to underline the need of a strict threshold in the dataset preparation.
In <xr id="tab:pics_1R46_multi3"/> demonstrates that there is an overhang of the Model 3 (red) on both sides of 1R46 and the signal peptide is again nested inside the structure. Looking at the right picture in the table, we find the explanation for this, because 3ZSS (orange) also takes up more space at the end than all other compared structures.
In this model the number of very distant residues is even more than in model MULTI 2, as we can see in <xr id="tab:Modeller_scores_MULTI3_1"/> there are 1488 bad aligned residues in the multiple sequence alignment of the four target structures. The quality scores, except for the Compactness, emphasize the bad quality of the model. Here, the GA341 score is especially low (0.004 - see <xr id="tab:Modeller_scores_MULTI3_2"/>).
For further evaluation of the model, please see Modeller Evaluation
<figtable id="tab:Modeller_scores_MULTI3_1"> Modeller scores Model MULTI3, Distances
Model | Distances > 6.0 Å | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MULTI3 | Pos 1 Dist 49.061 |
Pos 3 Dist 43.970 |
Pos 4 Dist 40.851 |
Pos 5 Dist 37.634 |
Pos 6 Dist 37.456 |
Pos 7 Dist 36.846 |
Pos 8 Dist 33.387 |
Pos 9 Dist 27.318 |
Pos 10 Dist 23.922 |
Pos 11 Dist 20.175 |
... 1488 in total |
</figtable>
<figtable id="tab:Modeller_scores_MULTI3_2"> Modeller scores Model MULTI3
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
10.956000 | 429 | 0.194156 | 867.151766 | 54.742537 | 22.941240 | -0.829229 | -0.350130 | -0.844565 | 0.004241 | -24762.13672 |
</figtable>
MULTI 4
The last member of the group of multiple template models is based on one structure of the best identity group (3HG3) and one of the worst group (3CC1). Here we can confirm our assumption, that 3CC1 increases the Compactness of the model by modeling the signal peptide inside the structure and hereby decreases the quality of the model (see <xr id="tab:pics_1R46_multi4"/>). This growth of Compactness comes on the expense of DOPE score (<xr id="tab:Modeller_scores_MULTI4_2"/>) and number of closely aligned residues (<xr id="tab:Modeller_scores_MULTI4_1"/>).
For further evaluation of the model, please see Modeller Evaluation
<figtable id="tab:Modeller_scores_MULTI4_1"> Modeller scores Model MULTI4; Distances
Model | Distances > 6.0 Å | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
MULTI4 | Pos 4 Dist 15.126 |
Pos 5 Dist 15.516 |
Pos 6 Dist 12.989 |
Pos 7 Dist 6.244 |
Pos 9 Dist 8.576 |
Pos 26 Dist 8.554 |
Pos 27 Dist 9.271 |
Pos 28 Dist 9.807 |
Pos 29 Dist 13.110 |
Pos 30 Dist 14.283 |
... 295 in total |
</figtable>
<figtable id="tab:Modeller_scores_MULTI4_2"> Modeller scores Model MULTI4
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
72.152000 | 429 | 0.227838 | 108.065012 | 3.797955 | 2.874986 | -7.874591 | -4.542697 | -8.455902 | 1.000000 | -44794.44531 |
</figtable>
Edited Alignment input
CHAS and CHAS 2
In these two models we tried to eluminate the influence of both alignment types in the model creation process. In order to do so, we changed the alignment of the active site in the 2d alignment such that each of the two Aspartic acid residues (position 170 and 231) is aligned with the next subsequent Asp in the template sequence. From this, we created model CHAS.
As a second step, we performed the same adjustment in the normal alignment and created CHAS 2.
Comparing the results with Model 1, we see that there is absolutely no difference between Model 1 and CHAS in any score or value, but a huge difference to CHAS 2. Leading to the conclusion, that the way we perform the model creation, the 2d alignment has absolutely no influence and making this step dispensable.
In the pictures in <xr id="tab:pics_1R46_3HG3_CHAS"/> a visual comparison of Model 1, CHAS and CHAS 2 was performed, with special attention to the modelling of the active site, highlighted in blue (target) and cyan (model).
<figtable id="tab:Modeller_scores_CHAS_1"> Modeller scores Model CHAS, Distances
Model | Distances > 8.0 Å | ||||||
---|---|---|---|---|---|---|---|
CHAS | Pos 1 Dist 28.357 |
Pos 91 Dist 8.810 |
Pos 101 Dist 17.314 |
Pos 112 Dist 25.386 |
Pos 160 Dist 32.647 |
Pos 318 Dist 27.449 |
Pos 333 Dist 42.457 |
</figtable> <figtable id="tab:Modeller_scores_CHAS_2"> Modeller scores Model CHAS
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
95.570999 | 429 | 0.215183 | -213.518650 | -9.487873 | -5.603112 | -10.125743 | -6.381974 | -11.484159 | 1.000000 | -52607.89844 |
</figtable>
<figtable id="tab:Modeller_scores_CHAS2_1"> Modeller scores Model CHAS2
Model | Distances > 8.0 Å | ||||||||
---|---|---|---|---|---|---|---|---|---|
CHAS2 | Pos 1 Dist 28.357 |
Pos 91 Dist 8.810 |
Pos 101 Dist 17.314 |
Pos 112 Dist 25.386 |
Pos 160 Dist 32.647 |
Pos 318 Dist 27.449 |
Pos 333 Dist 42.457 |
Pos 538 Dist 10.921 |
Pos 601 Dist 10.536 |
</figtable> <figtable id="tab:Modeller_scores_CHAS2_2"> Modeller scores Model CHAS2
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
40.326000 | 429 | 0.204903 | 122.562212 | 22.166245 | 5.785874 | -4.522039 | -1.645839 | -4.686339 | 0.995974 | -40807.12109 |
</figtable>
CHAS 3
In the last model, we tried to improve the quality of a subjectively bad aligned region, the substrate binding region (position 203-207, highlighted in <xr id="tab:pics_1R46_3HG3_CHAS3"/>) by forcing it to be consecutive in the alignment. Taking a look at the comparison of Model 1 and the 1R46 structure in the right picture in <xr id="tab:pics_1R46_3HG3_CHAS3"/>, we see that the regions despite the "poor" alignment is modelled nearly perfct, but adjusting the alignment produces a very bad modelled substrate binding site, as one can see in the left picture.
Surprisingly, the calculated quality scores are not deviant from those of CHAS 2 (the adjustment in the alignment of the active site was maintained), although the binding site looks different.
<figtable id="tab:Modeller_scores_CHAS3_1"> Modeller scores Model CHAS3, Distances
Model | Distances > 8.0 Å | ||||||||
---|---|---|---|---|---|---|---|---|---|
CHAS3 | Pos 1 Dist 28.357 |
Pos 91 Dist 8.810 |
Pos 101 Dist 17.314 |
Pos 112 Dist 25.386 |
Pos 160 Dist 32.647 |
Pos 318 Dist 27.449 |
Pos 333 Dist 42.457 |
Pos 538 Dist 10.921 |
Pos 601 Dist 10.536 |
</figtable> <figtable id="tab:Modeller_scores_CHAS3_2"> Modeller scores Model CHAS3
% sequID | Sequ length | Compact- ness |
Native energy (pair) |
Native energy (surface) |
Native energy (combined) |
Z score (pair) |
Z score (surface) |
Z score (combined)) |
GA341 score | DOPE score |
---|---|---|---|---|---|---|---|---|---|---|
40.326000 | 429 | 0.204903 | 122.562212 | 22.166245 | 5.785874 | -4.522039 | -1.645839 | -4.686339 | 0.995974 | -40807.12109 |
</figtable>
Evaluation
TM-score
<figtable id="tab:TMscore_1R46"> TM-score
Model | Number of residues in common |
RMSD of the common residues |
TM-score | GDT-TS-score | GDT-HA-score |
---|---|---|---|---|---|
Model 1 | 390 | 1.115 | 0.9841 | 0.9667 | 0.8558 |
Model 2 | 390 | 2.098 | 0.9596 | 0.9071 | 0.7635 |
Model 3 | 390 | 22.707 | 0.4087 | 0.2699 | 0.1814 |
MULTI 1 | 390 | 0.575 | 0.9938 | 0.9910 | 0.9128 |
MULTI 2 | 390 | 12.625 | 0.7364 | 0.6949 | 0.6404 |
MULTI 3 | 390 | 21.196 | 0.2048 | 0.0673 | 0.0314 |
MULTI 4 | 390 | 10.798 | 0.7405 | 0.6737 | 0.5833 |
CHAS | 390 | 1.115 | 0.9841 | 0.9667 | 0.8558 |
CHAS 2 | 390 | 15.292 | 0.4651 | 0.3622 | 0.3038 |
CHAS 3 | 390 | 15.292 | 0.4651 | 0.3622 | 0.3038 |
</figtable>
<figtable id="tab:TMscore_1R47"> TM-score
Model | Number of residues in common |
RMSD of the common residues |
TM-score | GDT-TS-score | GDT-HA-score |
---|---|---|---|---|---|
Model 1 | 390 | 1.119 | 0.9840 | 0.9654 | 0.8519 |
Model 2 | 390 | 2.093 | 0.9600 | 0.9083 | 0.7647 |
Model 3 | 390 | 22.713 | 0.4092 | 0.2731 | 0.1821 |
MULTI 1 | 390 | 0.575 | 0.9938 | 0.9897 | 0.9115 |
MULTI 2 | 390 | 12.609 | 0.7363 | 0.6942 | 0.6378 |
MULTI 3 | 390 | 21.191 | 0.2058 | 0.0679 | 0.0314 |
MULTI 4 | 390 | 10.793 | 0.7405 | 0.6744 | 0.5846 |
CHAS | 390 | 1.119 | 0.9840 | 0.9654 | 0.8519 |
CHAS 2 | 390 | 15.290 | 0.4652 | 0.3635 | 0.3019 |
CHAS 3 | 390 | 15.290 | 0.4652 | 0.3635 | 0.3019 |
</figtable>
The TM-score was computed with the command line tool TMscore. It provides the RMSD (Root-mean-square deviation ), TM-score (Template Modeling Score), GDT-TS (Global Distance Test - total score) and GDT-HA (Global Distance Test - high accuracy). Of these metrics, the RMSD is considered the least accurate measurement, because it is susceptible to bad modelling of only partial regions while the whole model would be quite good.
<ref name="GDT-TS"> Wikipedia, Global distance test (March 2,
2012), http://en.wikipedia.org/wiki/Global_distance_test; May 28, 2012</ref>
The TM-score is intended to be the most accurate measurement of all these values, whereas the GDT-TS and the more rigorous GDT-HA lie in the middle.
<ref name="TM_1">Zhang Y and Skolnick J (2004). Scoring function for automated assessment of protein structure template quality. Proteins 57 (4): 702–710. doi:10.1002/prot.20264.</ref>
<ref name="GDT-HA"> Read, Randy J.; Chavali, Gayatri (2007).Assessment of CASP7 predictions in the high accuracy template-based modeling category. Proteins 69 (S8): 27–37. doi:10.1002/prot.21662.</ref>
In our case, the metrics seem to correlate pretty well among each other, as well as to the quality scores of Modeller. A few differences however come to mind. First, Model 1 scores slightly worse than MULTI 1, which is also supported by the RMSD in the later evaluation. Subjectively, the model based on the two highest scoring structures looked better, since in our oppinion it combined the advantages of Model 1 and Model 2. The RMSD calculation itself nevertheless appears to be very inconsistent regarding its final value, considering the TMscore, SAP and Pymol calculated scores. Even the rank each of these three tools assigns to the different models is not homogeneous.
Taking a closer look at the TM-scores of the models, we see, that all of the templates are assumed to have about the same fold as the target (score greater than 0.5), except for those, where structures with a sequence identity below 30% are involved (Model 3, MULTI 3) and where we manually adjusted the alignment (CHAS 2 and 3). The model MULTI 3 is even considered to be based on almost randomly chosen unrelated proteins (TM-score ≈ 0.2)<ref name="TM_2">Zhang Y and Skolnick J (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33 (7): 2302–2309. doi:10.1093/nar/gki524. PMC 1084323.</ref>. MULTI 3 is considered nearly perfect, since its TM-score is only little less than 1.
The GDT scores and the TM-score are perfectly correlated except for the models MULTI 2 and 4, where the latter of those two has a slightly greater TM-score, but smaller GDT scores (see <xr id="tab:TMscore_1R46"/> and <xr id="tab:TMscore_1R47"/>, highlighted red). Both global distance tests do not appear to disagree on any of the models.
The values of the models compared to 1R46 and 1R47, respectively, differ a little, and usually only in the third decimal place.
Based on the observed values, we would in future rely on the TM-score, since in literature it is considered the most accurate of the metrics and it coincides with our intuition.
RMSD with SAP and Pymol
<figtable id="tab:SAP_1R46_mod"> RMSD of Modeller models compared to 1R46
Model | Number of residues in common |
Weighted RMSd | Un-weighted RMSd | RMSd Pymol | RMSd around cat. site |
---|---|---|---|---|---|
Model 1 | 390 | 0.532 | 1.115 | 0.616 | 0.592 |
Model 2 | 390 | 0.571 | 1.574 | 0.740 | 0.596 |
Model 3 | 376 | 1.833 | 20.273 | 21.890 | 2.881 |
MULTI 1 | 390 | 0.396 | 0.575 | 0.515 | 0.425 |
MULTI 2 | 390 | 0.479 | 2.689 | 0.768 | 0.583 |
MULTI 3 | 385 | 11.003 | 17.580 | 21.486 | 6.613 |
MULTI 4 | 380 | 0.904 | 3.833 | 1.023 | 0.603 |
CHAS | 390 | 0.532 | 1.115 | 0.616 | 0.592 |
CHAS 2 | 378 | 0.613 | 1.492 | 13.318 | 0.856 |
CHAS 3 | 378 | 0.613 | 1.492 | 13.318 | 0.856 |
</figtable>
<figtable id="tab:SAP_1R47_mod"> RMSD of Modeller models compared to 1R47
Model | Number of residues in common |
Weighted RMSd | Un-weighted RMSd | RMSd Pymol | RMSd around cat. site |
---|---|---|---|---|---|
Model 1 | 391 | nan | nan | 0.623 | 0.604 |
Model 2 | 391 | 0.717 | 1.569 | 0.731 | 0.511 |
Model 3 | 376 | 1.817 | 20.281 | 22.099 | 2.021 |
MULTI 1 | 390 | 0.396 | 0.575 | 0.513 | 0.379 |
MULTI 2 | 391 | 0.472 | 2.693 | 0.792 | 0.614 |
MULTI 3 | 383 | 9.297 | 17.430 | 21.484 | 7.646 |
MULTI 4 | 380 | 0.912 | 3.836 | 1.048 | 0.623 |
CHAS | 391 | nan | nan | 0.623 | 0.604 |
CHAS 2 | 378 | 0.618 | 1.498 | 13.338 | 0.541 |
CHAS 3 | 378 | 0.618 | 1.498 | 13.338 | 0.541 |
</figtable>
<figure id="fig:RMSD_model1">
</figure> <figure id="fig:RMSD_multi1">
</figure> <figure id="fig:RMSD_model3">
</figure>
Although the RMSD values calculated with SAP and the metrics computed with TMscore do not fit perfectly, both favour the model MULTI 1 and assign poor values to MULTI 3 (see <xr id="tab:SAP_1R46_mod"/> and <xr id="tab:SAP_1R47_mod"/>). What really is surprising in this case, is that the models CHAS 2 and 3 are assigned a really good RMSD by SAP, but not by TMscore.
The all atom RMSD in a radius of 6 Angstrom around the catalytic site is, as we expected, even lower than the weighted RMSD, which is to the credit of the quality of Modeller. The only unexpected incident is, that changing the alignment of the active site in CHAS 2 and 3 improves the RMSD around the catalytic center, but only when comparing it to the structure 1R47, which is the galactose bound one. This can only be explained by the fact, that one of the residues that is involved in the binding of the β-D-galactose is also part (D 231 - see <xr id="fig:GAL:1R47"/>) of the active center and the fold of the catalytic site is changed by the binding of the sugar.
All in all, we believe based on these evaluations, that the RMSD is not a reliable measure for model quality.
Please note, that unexpected high values in Pymol in certain models are due to many unrecognized residues like EDO and applying the script repairPDB to the according pdb file did not help. Also, we do not know, why SAP did not calculate numbers for Model 1 and CHAS (which are equal) when compared to 1R47 (<xr id="tab:SAP_1R47_mod"/>).
In <xr id="fig:RMSD_model1"/>, <xr id="fig:RMSD_multi1"/> and <xr id="fig:RMSD_model3"/>, which are the two best and one of the worst models, the RMSD is visually presented. It points out, where the problems of Model 3 are located, since there is a region with very poor RMSD values, which is indicated by the long green sticks (<xr id="fig:RMSD_model3"/>). Again, our assumption that MULTI 1 is the best model is supported, considering that one can see hardly any green lines between the model and the target structure (<xr id="fig:RMSD_multi1"/>).
DOPE score
<figure id="fig:DOPE_Model">
</figure> <figure id="fig:DOPE_MULTI">
</figure>
With the help of another Modeller script, the per residue DOPE score can be computed and afterwards plotted. In all of the following described pictures, the target structure 1R46 is shown in green, and the start of its record, after the 31 residue long signal peptide, is indicated by the dashed vertical line. In <xr id="fig:DOPE_Model"/> we can see, that the curves of Model 1 (red) and 2 (orange) both fit the green curve very well, with only small irregularities. Model 3 on the other hand, has more regions, where it digresses from the DOPE score of 1R46, than where it follows its curve.
<xr id="fig:DOPE_MULTI"/> shows the comparison of all models that are based on multiple template files. It is not surprising, that MULTI 3 (pink) performs worst, since it is based on three templates with less than 30% sequence identity. MULTI 2 and 4 (yellow and purple) are very poor modelled in the first 190 residues, but tend to become better in the later half of the protein, leading to the conclusion, that this is the easier part in modelling the molecule.
Comparing the three CHAS models (<xr id="fig:DOPE_CHAS"/>), we again observe, that CHAS 2 and 3 are equal and perform worse than CHAS, which is equal to Model 1. This worsening, however, can only be observed after the first modification of the alignment at position 170.
The last comparison focuses on the two best Modeller computed models (see <xr id="fig:DOPE_Best2"/>). Once more, we see our assumption endorsed, that these two model our protein fold very well and each of both has weaknesses and strength at some positions. What is important is, that the active site of the protein is modelled well, which is again indicated by the vertical dashed lines.
<figure id="fig:DOPE_CHAS">
</figure> <figure id="fig:DOPE_Best2">
</figure>
Swissmodel
Calculation of models
Evaluation
TM-score
<figtable id="tab:TMscore_1R46_sm"> TM-score Swissmodel 1R46
Model | Number of residues in common |
RMSD of the common residues |
TM-score | GDT-TS-score | GDT-HA-score |
---|---|---|---|---|---|
output_TMscore/out/1R46_Model_2.out | 390 | 0.512 | 0.9950 | 0.9917 | 0.9218 |
output_TMscore/out/1R46_Model_3.out | 390 | 1.551 | 0.9660 | 0.9032 | 0.7538 |
</figtable>
<figtable id="tab:TMscore_1R47_sm"> TM-score Swissmodel 1R47
Model | Number of residues in common |
RMSD of the common residues |
TM-score | GDT-TS-score | GDT-HA-score |
---|---|---|---|---|---|
output_TMscore/out/1R47_Model_2.out | 390 | 0.515 | 0.9950 | 0.9923 | 0.9231 |
output_TMscore/out/1R47_Model_3.out | 390 | 1.532 | 0.9667 | 0.9058 | 0.7545 |
</figtable>
RMSD with SAP
iTasser
3D-Jigsaw
References
<references/>