Fabry:Homology based structure predictions

From Bioinformatikpedia
Revision as of 23:18, 4 June 2012 by Staniewski (talk | contribs) (3D-Jigsaw)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Fabry Disease » Homology based structure predictions



The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.

Dataset preparation and target comparison

Datasets

<figtable id="tab:datasetHHpred"> Dataset HHpred,
E-value cutoff 1e-15

pdb ID E-value Identity in %
> 80% sequence identity
3hg3 8.6e-90 100
40% - 80% sequence identity
1ktb 4.2e-85 53
< 30% sequence identity
3cc1 5.5e-74 25
1zy9 3.1e-48 13
3a24 7.8e-40 17
2xn2 5.3e-37 15
2d73 5.7e-36 14
3mi6 1.4e-31 15
2yfo 9.1e-30 13
2f2h 2.7e-20 17
2g3m 2.2e-20 16
3nsx 6e-20 13
3lpp 2.2e-18 15
3l4y 1.9e-18 15
3top 3.6e-18 12
2xvl 3.2e-18 16
2x2h 4.9e-16 13

</figtable>

<figtable id="tab:datasetHHpred"> Additional sequences HHpred,
E-value cutoff 0.002

pdb ID E-value Identity in %
3zss 0.00062 10
1j0h 0.0011 15
1ea9 0.00098 12

</figtable>

<figtable id="tab:datasetCOMA"> Dataset COMA,
E-value cutoff 0.002

pdb ID E-value Identity in %
> 80% sequence identity
- - -
40% - 80% sequence identity
1ktb 1.7e-61 52
< 30% sequence identity
3lrk 1.2e-66 23
3a21 2.7e-65 26
1szn 3.7e-59 22
3cc1 5.2e-58 19
1zy9 1.7e-39 9
3mi6 4.3e-38 11
2yfn 4.4e-35 10
2d73 1.9e-32 9
3a24 5.6e-30 10
1xsi 1.9e-12 10
2g3m 2.4e-11 10
3pha 2.9e-10 6
3lpo 4.7e-09 8
2x2h 8.2e-09 8
3mo4 1.2e-08 7
2xvg 2.4e-08 8
3ton 4.3e-08 8
2xib 1e-07 7
3eyp 1.6e-06 8
3k1d 3.5e-06 9
2zwy 8.8e-06 9
3gza 1.8e-05 8
3m07 2.3e-05 7
1eh9 0.00013 6
1gvi 0.00035 8
1aqh 0.00039 5
1mwo 0.00058 7
3vmn 0.0018 9
1bf2 0.0019 6
3aml 0.0019 8

</figtable>

We performed a HHpred as well as a COMA search, to generate three distinct datasets. Since COMA did not find any homologue structures with a similarity above 41% (see <xr id="tab:datasetCOMA"/>), we used the dataset created with the HHpred search and the script described in the journal. Hereby we found one structure with a similarity above 80%, one with a similarity between 40 and 80% and 15 with sequence similarity below 30%, of which 14 had a similarity of under 20% (see <xr id="tab:datasetHHpred" />). All HHpred matches had an E-value below 1e-15, for the COMA homologous we tried a less strict threshold of 0.002.
In most cases we used the structures 3hg3, 1ktb and 3cc1 for modelling, because either they are the only representatives in their class, or in the case of 3cc1, the sequence identity did not seem too low. For the Model MULTI 3 we also used the structures 3a24 and 3zss. The latter of those has an E-value of 0.00062. We added this structure to examine how a template with an E-value that is worse than the value of all our other structures, but still would fulfil the restrictions of an usual BLAST search (threshold of 0.003), would perform.
In this case it is important to mention, that although the identity of 3hg3 is 100%, it is not the pdb structure annotated for the AGAL protein, but the structure of the substrate bound catalytic mechanism, hence the high similarity.
1ktb is the X-ray structure for the already mentioned α-N-acetylgalactosiminidase in chicken, which in future might be used for enzyme replacement therapy in the treatment of Fabry Disease.
The last one of the frequently used structures, 3cc1, is the x-ray structure of a putative α-N-acetylgalactosiminidase in in Bacillus Halodurans.


Target comparison

As an initial step of the evaluation, we compared the apo structure 1R46 and the complex structure (with bound α-galactose) 1R47. Since the alignment of both the chains A of 1R46 and 1R47 in Pymol (see <xr id="tab:compare"/>) revealed a RMSD value of 0.248 and the comparison of the position and direction of the residues involved in the binding of the sugar (see <xr id="fig:GAL:1R47"/>) do not differ significantly, we used only the 1R46 structure for vizualisation, but computed all values and statistics for both structures.
In the right figure in <xr id="tab:compare"/>, the residues Asp92A, Asp93A, LYS168A, ARG227A and ASP231A are depicted in sticks representation (thicker); they are responsible for the binding of the sugar in the complex structures, which is shown in magenta. Clearly, one can see not much difference in this region between 1R46 and 1R47.

<figtable id="tab:compare"> Comparison of apo and complex structure

Superimposed structures of 1R46 (blue) and 1R47 (green) in cartoon representation. Obviously, the structures do not differ much.
Comparison of the residues invoked in the binding of α-galactose in the apo structure (blue) and the complex structure (green)

</figtable>

<figure id="fig:GAL:1R47">

Residues involved in the binding of α-galactose in 1R47 [1]

</figure>


Modeller

With the command line tool, we created 10 models (see Journal). The first three were produced with the standard settings and workflow of Modeller. The subsequent four models were computed from multiple target files in different combinations and in the last three models we rearranged the alignment files in order to test the quality of the alignment and the influence of the two types of alignment.

Default settings

Model 1

<figtable id="tab:pics_1R46_3HG3"> Model 1, visual comparison

Model 1 (red), created with Modeller with the template 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)
Model 1 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow)

</figtable>

For the first model we used the template with the highest sequence identity. According to HHPred, the identity is 100%, Modeller only calculates an identity of 96% (see <xr id="tab:Modeller_scores_3hg3_2"/>). This discrepancy might be due to the way of the comparison - 1R46 is completely enclosed in 3HG3, but 3HG3 has a longer sequence (404 residues) and thus only 96% of it can be congruent to 1R46 (398 residues without signal peptide). In the left picture in <xr id="tab:pics_1R46_3HG3"/> the superimposition of the computed Model 1 and the actual target structure are shown. The right picture additionally displays the template structure. One can see, that the three structure almost perfectly superimpose, which is underlined by the scores derived from Modeller (see <xr id="tab:Modeller_scores_3hg3_2"/>). The GA341 score of 1.0 indicates a "native like" model (see basic tutorial) and the Compactness <ref name="Compactness"> Foldit Wiki, Compactness (October 23, 2011), http://foldit.wikia.com/wiki/Compactness; May 26, 2012</ref> as well as the DOPE score (see basic tutorial) are the second highest and lowest of all calculated models, respectively.
The only parts that can not be modelled correctly are both ends of the sequence. Those parts are highlighted blue in the pictures. From our background knowledge we know that the first 31 residues form the signal peptide, that is cleaved off and thus can not be found in the tertiary structure of the target protein. This can not be modelled by the Modeller tool and thus it would be a good amendment to the modelling pipeline to add sequence based analyses like Signal peptide prediction, similar to the predictions we made in Task 2. The lack of modelling of the last bit of the sequence can be pinned to the longer sequence of the 3HG3 structure, since the last 6 residues are craning and the template is 6 amino acids longer than the target.
Inspecting the problematic residues (see <xr id="tab:Modeller_scores_3hg3_1"/>), with a distance of more than 8 angstrom, manually in pymol, we discovered that two of them lie in loop regions (91 and 101) which are hard to model. On the other hand two of the residues are located in a helix (160 and 318) and seem to fit perfectly to the target.
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_3hg3_1"> Modeller scores Model 3hg3, Distances

Model Template Distances > 8.0 Å
in 2d alignment
Distances > 8.0 Å
Model 1 3hg3 Pos:
428
Dist
76.568
Pos:
1
Dist
28.357
Pos:
91
Dist
8.810
Pos:
101
Dist
17.314
Pos:
112
Dist
25.386
Pos:
160
Dist
32.647
Pos:
318
Dist
27.449
Pos:
333
Dist
42.457

</figtable> <figtable id="tab:Modeller_scores_3hg3_2"> Modeller scores Model 3hg3

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
95.570999 429 0.215183 -213.518650 -9.487873 -5.603112 -10.125743 -6.381974 -11.484159 1.000000 -52607.89844

</figtable>


Model 2

<figtable id="tab:pics_1R46_1KTB"> Model 2, visual comparison

Model 2 (red), created with Modeller with the template 1ktb, superimposed on the x-ray structure of α-Galactosidase A (green)
Model 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 1ktb (yellow)

</figtable>

For this model, we used the target 1KTB, which has a sequence identity of 53%. The superimposed structures of 1R46 and Model 2 are shown in <xr id="tab:pics_1R46_1KTB"/>, as well as the structural alignment of 1KTB with the model and the target. This model encounters only one of the problems of Model 1, namely the not modeled signal peptide for the same reasons mentioned above. The end of the sequence seems to be modeled just fine, although the template sequence is even longer (405 residues) than 3HG3. Despite the little worse Compactness and DOPE score (see <xr id="tab:Modeller_scores_1ktb_2"/>) , Model 2 seems to be really good, since there are no residues that have a distance greater than 8 Å (see <xr id="tab:Modeller_scores_1ktb_1"/>) and according to the GA341 score the model is also "native like" and a value greater than 0.7 generally indicates a reliable model, defined as ≥ 95% probability of correct fold. <ref name="Melo2002"> Melo F, Sánchez R, Sali A. (2002). Statistical potentials for fold assessment. Protein Sci. 2002 Feb;11(2):430-48. PMCID: PMC2373452</ref> .
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_1ktb_1"> Modeller scores Model 1ktb, Distances

Model Template Distances > 8.0 Å
in 2d alignment
Distances > 8.0 Å
Model 2 1ktb Pos:
0
Dist
0
Pos:
0
Dist
0

</figtable>

<figtable id="tab:Modeller_scores_1ktb_2"> Modeller scores Model 1ktb

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
53.351002 429 0.176840 -107.285679 -7.043755 -3.262988 -8.593054 -6.151719 -10.076556 1.000000 -49267.35156

</figtable>


Model 3

<figtable id="tab:pics_1R46_3CC1"> Model 3, visual comparison

Model 3 (red), created with Modeller with the template 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)
Model 3 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow)

</figtable>

For the last of the basic models we used the structure 3CC1 as a template, which is rather far related with a sequence identity of roughly 25%. Here the emerging problems are obvious. Of course, the signal peptide can not be modelled again, but the real problem is that Model 3 is tilted approximately 90° compared to the target structure 1R46.
Additionally analyzing the quality scores of the model, we see that the model must be rejected. For example the GA341 score of 0.33 indicates that the model should be discarded, since any good model will give a score near 1.0, and according to Salilab any model with a score less than 0.6 should not be considered as helpful <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), http://salilab.org/archives/modeller_usage/2006/msg00060.html; May 26, 2012</ref>. By all means, the z-scores are also not good, because these statistical potentials contribute to the GA341 (see Modeller help).
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_3cc1_1"> Modeller scores Model 3cc1, Distances

Model Template Distances > 8.0 Å
in 2d alignment
Distances > 8.0 Å
Model 3 3cc1 Pos:
433
Dist
63.967
Pos:
147
Dist
25.085
Pos:
290
Dist
19.238
Pos:
374
Dist
24.356
Pos:
395
Dist
15.007
Pos:
412
Dist
61.733
Pos:
452
Dist
23.680
Pos:
631
Dist
23.283
Pos:
659
Dist
8.421
Pos:
684
Dist
10.763
Pos:
703
Dist
10.204
Pos:
762
Dist
10.753

</figtable> <figtable id="tab:Modeller_scores_3cc1_2"> Modeller scores Model 3cc1

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
24.242001 429 0.139850 198.134571 24.857669 7.885459 -3.800148 -1.528096 -3.572654 0.332343 -38190.22656

</figtable>


Multiple templates

MULTI 1

<figtable id="tab:pics_1R46_multi1"> Model MULTI 1, visual comparison

Model MULTI 1 (red) (templates 3HG3 and 1KTB), superimposed on the x-ray structure of α-Galactosidase A (green)
Model MULTI 1 (red) superimposed on α-Galactosidase A (green) and the structure of 3HG3 (yellow) and 1KTB (orange)

</figtable>

MULTI 1, which is based on the structures 3HG3 and 1KTB is the subjectively second best of the Modeller computed models. The superimposition of the model and the target structure (see <xr id="tab:pics_1R46_multi1"/>) shows a similar results to Model 1, but comparing the quality scores, we observe a slightly worse DOPE score and a really bad Compactness value. According to Salilab the DOPE score is the score to distinguish between two good models <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), http://salilab.org/archives/modeller_usage/2006/msg00060.html; May 26, 2012</ref>. This model has the second lowest score of all, as well as a almost 100% sequence identity to the target.
On the other hand, the Compactness seems to be really low, even smaller than the value of Model 3. Thus we think, that this quality score cannot really be considered reliable, in contrast to the DOPE score.
What also becomes obvious in this model, is that the more distant related 1KTB structure (orange) seems to fit better than the really close related 3HG3 (yellow), which can be seen in the visual comparison in the right picture of <xr id="tab:pics_1R46_multi1"/>.
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_MULTI1_1"> Modeller scores Model MULTI1, Distances

Model Templates Distances > 6.0 Å
MULTI1 3HG3
1KTB
Pos
211
Dist
6.509
Pos
212
Dist
7.237
Pos
379
Dist
6.608
Pos
380
Dist
10.056
Pos
426
Dist
6.538
Pos
427
Dist
8.423
Pos
428
Dist
7.683
Pos
429
Dist
9.226

</figtable>

<figtable id="tab:Modeller_scores_MULTI1_2"> Modeller scores Model MULTI1

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
99.747002 429 0.137662 -298.484703 -14.285927 -7.811892 -10.018935 -7.069398 -12.033570 1.000000 -51741.65625

</figtable>


MULTI 2

<figtable id="tab:pics_1R46_multi2"> Model MULTI 2, visual comparison

Model MULTI 2 (red), created with Modeller on basis of the templates 3HG3, 1KTB and 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)
Model MULTI 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow), 1KTB (orange) and 3CC1 (lightorange)

</figtable>

MULTI 2 bases on MULTI 1, but has the additional structure 3CC1, which is rather distantly related to 1R46. Hence we have one structure of each sequence identity group. Thus we can observe a decrease in the model quality. In <xr id="tab:pics_1R46_multi2"/> one can see, that the signal peptide is somewhat nested inside the molecule. The DOPE score increased, thus indicating a less reliable model, although the Compactness has the highest level of all models (see <xr id="tab:Modeller_scores_MULTI2_2" />), underlining our above made statement, that this score is not really trustworthy.
Another sign for the bad quality of the model are the 556 residues with a distance greater than 6 angstrom shared among the three template structures in the alignment (see <xr id="tab:Modeller_scores_MULTI2_1" />).
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_MULTI2_1"> Modeller scores Model MULTI2, Distances

Model Templates Distances > 6.0 Å
MULTI2 3HG3
1KTB
3CC1
Pos
216
Dist
6.509
Pos
217
Dist
7.237
Pos
386
Dist
6.608
Pos
387
Dist
10.056
Pos
433
Dist
6.538
Pos
434
Dist
8.423
Pos
435
Dist
7.683
Pos
436
Dist
9.226
Pos
4
Dist
14.846
Pos
5
Dist
15.230
...
556 in total

</figtable>

<figtable id="tab:Modeller_scores_MULTI2_2"> Modeller scores Model MULTI2

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
71.139000 429 0.229647 145.809979 8.120285 3.810054 -7.131847 -4.691505 -7.820324 1.000000 -44783.55469

</figtable>



MULTI 3

<figtable id="tab:pics_1R46_multi3"> Model MULTI 3, visual comparison

Model MULTI 3 (red), created with Modeller on basis of the templates 3CC1, 3ZSS and 3A24, superimposed on the x-ray structure of α-Galactosidase A (green)
Model MULTI 3 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow), 3ZSS (orange) and 3A24 (lightorange)

</figtable>

This model combines three structures with a sequence identity of less than 30%, one of these has even a worse E-value than all the others (3ZSS - see section Datasets) to underline the need of a strict threshold in the dataset preparation.
In <xr id="tab:pics_1R46_multi3"/> demonstrates that there is an overhang of the Model 3 (red) on both sides of 1R46 and the signal peptide is again nested inside the structure. Looking at the right picture in the table, we find the explanation for this, because 3ZSS (orange) also takes up more space at the end than all other compared structures.
In this model the number of very distant residues is even more than in model MULTI 2, as we can see in <xr id="tab:Modeller_scores_MULTI3_1"/> there are 1488 bad aligned residues in the multiple sequence alignment of the four target structures. The quality scores, except for the Compactness, emphasize the bad quality of the model. Here, the GA341 score is especially low (0.004 - see <xr id="tab:Modeller_scores_MULTI3_2"/>).
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_MULTI3_1"> Modeller scores Model MULTI3, Distances

Model Templates Distances > 6.0 Å
MULTI3 3CC1
3A24
3ZSS
Pos
1
Dist
49.061
Pos
3
Dist
43.970
Pos
4
Dist
40.851
Pos
5
Dist
37.634
Pos
6
Dist
37.456
Pos
7
Dist
36.846
Pos
8
Dist
33.387
Pos
9
Dist
27.318
Pos
10
Dist
23.922
Pos
11
Dist
20.175
...
1488 in total

</figtable>

<figtable id="tab:Modeller_scores_MULTI3_2"> Modeller scores Model MULTI3

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
10.956000 429 0.194156 867.151766 54.742537 22.941240 -0.829229 -0.350130 -0.844565 0.004241 -24762.13672

</figtable>


MULTI 4

<figtable id="tab:pics_1R46_multi4"> Model MULTI 4, visual comparison

Model MULTI 4 (red), created with Modeller on basis of the templates 3CC1 and 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)
Model MULTI 4 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow) and 3HG3 (orange)

</figtable>

The last member of the group of multiple template models is based on one structure of the best identity group (3HG3) and one of the worst group (3CC1). Here we can confirm our assumption, that 3CC1 increases the Compactness of the model by modeling the signal peptide inside the structure and hereby decreases the quality of the model (see <xr id="tab:pics_1R46_multi4"/>). This growth of Compactness comes on the expense of DOPE score (<xr id="tab:Modeller_scores_MULTI4_2"/>) and number of closely aligned residues (<xr id="tab:Modeller_scores_MULTI4_1"/>).
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_MULTI4_1"> Modeller scores Model MULTI4; Distances

Model Templates Distances > 6.0 Å
MULTI4 3HG3
3CC1
Pos
4
Dist
15.126
Pos
5
Dist
15.516
Pos
6
Dist
12.989
Pos
7
Dist
6.244
Pos
9
Dist
8.576
Pos
26
Dist
8.554
Pos
27
Dist
9.271
Pos
28
Dist
9.807
Pos
29
Dist
13.110
Pos
30
Dist
14.283
...
295 in total

</figtable>

<figtable id="tab:Modeller_scores_MULTI4_2"> Modeller scores Model MULTI4

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
72.152000 429 0.227838 108.065012 3.797955 2.874986 -7.874591 -4.542697 -8.455902 1.000000 -44794.44531

</figtable>


Edited Alignment input

CHAS and CHAS 2

<figtable id="tab:pics_1R46_3HG3_CHAS"> Models CHAS and CHAS 2, visual comparison

Model CHAS (red), with active site shifted right to next D (7 and 1 positions) in 2d alignment file, superimposed on the x-ray structure of α-Galactosidase A (green); active site highlighted in blue (target) and cyan (model)
For comparison with Model CHAS and CHAS 2, Model 1 (orange) which was basis for the edited alignments, superimposed on α-Galactosidase A (green); active site highlighted in blue (target) and cyan (model)
Model CHAS 2 (red), with active site shifted right to next D (7 and 1 positions) in both alignment files, superimposed on the x-ray structure of α-Galactosidase A (green); active site highlighted in blue (target) and cyan (model)

</figtable>

In these two models we tried to illuminate the influence of both alignment types in the model creation process. In order to do so, we changed the alignment of the active site in the 2d alignment such that each of the two aspartic acid residues (position 170 and 231) is aligned with the next subsequent Asp in the template sequence. From this, we created model CHAS.
As a second step, we performed the same adjustment in the normal alignment and created CHAS 2.
Comparing the results with Model 1, we see that there is absolutely no difference between Model 1 and CHAS in any score or value, but a huge difference to CHAS 2. Leading to the conclusion, that the way we perform the model creation, the 2d alignment has absolutely no influence and making this step dispensable.
In the pictures in <xr id="tab:pics_1R46_3HG3_CHAS"/> a visual comparison of Model 1, CHAS and CHAS 2 was performed, with special attention to the modelling of the active site, highlighted in blue (target) and cyan (model).

<figtable id="tab:Modeller_scores_CHAS_1"> Modeller scores Model CHAS, Distances

Model Distances > 8.0 Å
CHAS Pos
1
Dist
28.357
Pos
91
Dist
8.810
Pos
101
Dist
17.314
Pos
112
Dist
25.386
Pos
160
Dist
32.647
Pos
318
Dist
27.449
Pos
333
Dist
42.457

</figtable> <figtable id="tab:Modeller_scores_CHAS_2"> Modeller scores Model CHAS

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
95.570999 429 0.215183 -213.518650 -9.487873 -5.603112 -10.125743 -6.381974 -11.484159 1.000000 -52607.89844

</figtable>

<figtable id="tab:Modeller_scores_CHAS2_1"> Modeller scores Model CHAS2

Model Distances > 8.0 Å
CHAS2 Pos
1
Dist
28.357
Pos
91
Dist
8.810
Pos
101
Dist
17.314
Pos
112
Dist
25.386
Pos
160
Dist
32.647
Pos
318
Dist
27.449
Pos
333
Dist
42.457
Pos
538
Dist
10.921
Pos
601
Dist
10.536

</figtable> <figtable id="tab:Modeller_scores_CHAS2_2"> Modeller scores Model CHAS2

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined)
GA341 score DOPE score
40.326000 429 0.204903 122.562212 22.166245 5.785874 -4.522039 -1.645839 -4.686339 0.995974 -40807.12109

</figtable>


CHAS 3

<figtable id="tab:pics_1R46_3HG3_CHAS3"> Model CHAS 3, visual comparison

Model CHAS 3 (red), with active site shifted right to next D (7 and 1 positions) in both alignment files and the substrate binding region (position 203-207, highlighted in blue and cyan) forced to be consecutive, superimposed on of α-Galactosidase A (green)
For comparison with Model CHAS 3, Model 1 (orange) which was basis for the edited alignments, created with Modeller on basis of the templates 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green); binding region highlighted in blue and cyan

</figtable>

In the last model, we tried to improve the quality of a subjectively bad aligned region, the substrate binding region (position 203-207, highlighted in <xr id="tab:pics_1R46_3HG3_CHAS3"/>) by forcing it to be consecutive in the alignment. Taking a look at the comparison of Model 1 and the 1R46 structure in the right picture in <xr id="tab:pics_1R46_3HG3_CHAS3"/>, we see that the regions despite the "poor" alignment is modelled nearly perfect, but adjusting the alignment produces a very bad modelled substrate binding site, as one can see in the left picture.
Surprisingly, the calculated quality scores are not deviant from those of CHAS 2 (the adjustment in the alignment of the active site was maintained), although the binding site looks different.

<figtable id="tab:Modeller_scores_CHAS3_1"> Modeller scores Model CHAS3, Distances

Model Distances > 8.0 Å
CHAS3 Pos
1
Dist
28.357
Pos
91
Dist
8.810
Pos
101
Dist
17.314
Pos
112
Dist
25.386
Pos
160
Dist
32.647
Pos
318
Dist
27.449
Pos
333
Dist
42.457
Pos
538
Dist
10.921
Pos
601
Dist
10.536

</figtable> <figtable id="tab:Modeller_scores_CHAS3_2"> Modeller scores Model CHAS3

% sequID Sequ length Compact-
ness
Native energy
(pair)
Native energy
(surface)
Native energy
(combined)
Z score
(pair)
Z score
(surface)
Z score
(combined))
GA341 score DOPE score
40.326000 429 0.204903 122.562212 22.166245 5.785874 -4.522039 -1.645839 -4.686339 0.995974 -40807.12109

</figtable>



Evaluation

TM-score

<figtable id="tab:TMscore_1R46"> TM-score

Model Number of residues
in common
RMSD of the
common residues
TM-score GDT-TS-score GDT-HA-score
Model 1 390 1.115 0.9841 0.9667 0.8558
Model 2 390 2.098 0.9596 0.9071 0.7635
Model 3 390 22.707 0.4087 0.2699 0.1814
MULTI 1 390 0.575 0.9938 0.9910 0.9128
MULTI 2 390 12.625 0.7364 0.6949 0.6404
MULTI 3 390 21.196 0.2048 0.0673 0.0314
MULTI 4 390 10.798 0.7405 0.6737 0.5833
CHAS 390 1.115 0.9841 0.9667 0.8558
CHAS 2 390 15.292 0.4651 0.3622 0.3038
CHAS 3 390 15.292 0.4651 0.3622 0.3038

</figtable>

<figtable id="tab:TMscore_1R47"> TM-score

Model Number of residues
in common
RMSD of the
common residues
TM-score GDT-TS-score GDT-HA-score
Model 1 390 1.119 0.9840 0.9654 0.8519
Model 2 390 2.093 0.9600 0.9083 0.7647
Model 3 390 22.713 0.4092 0.2731 0.1821
MULTI 1 390 0.575 0.9938 0.9897 0.9115
MULTI 2 390 12.609 0.7363 0.6942 0.6378
MULTI 3 390 21.191 0.2058 0.0679 0.0314
MULTI 4 390 10.793 0.7405 0.6744 0.5846
CHAS 390 1.119 0.9840 0.9654 0.8519
CHAS 2 390 15.290 0.4652 0.3635 0.3019
CHAS 3 390 15.290 0.4652 0.3635 0.3019

</figtable>

The TM-score was computed with the command line tool TMscore. It provides the RMSD (Root-mean-square deviation ), TM-score (Template Modeling Score), GDT-TS (Global Distance Test - total score) and GDT-HA (Global Distance Test - high accuracy). Of these metrics, the RMSD is considered the least accurate measurement, because it is susceptible to bad modelling of only partial regions while the whole model would be quite good. <ref name="GDT-TS"> Wikipedia, Global distance test (March 2, 2012), http://en.wikipedia.org/wiki/Global_distance_test; May 28, 2012</ref> The TM-score is intended to be the most accurate measurement of all these values, whereas the GDT-TS and the more rigorous GDT-HA lie in the middle. <ref name="TM_1">Zhang Y and Skolnick J (2004). Scoring function for automated assessment of protein structure template quality. Proteins 57 (4): 702–710. doi:10.1002/prot.20264.</ref> <ref name="GDT-HA"> Read, Randy J.; Chavali, Gayatri (2007).Assessment of CASP7 predictions in the high accuracy template-based modeling category. Proteins 69 (S8): 27–37. doi:10.1002/prot.21662.</ref>
In our case, the metrics seem to correlate pretty well among each other, as well as to the quality scores of Modeller. A few differences however come to mind. First, Model 1 scores slightly worse than MULTI 1, which is also supported by the RMSD in the later evaluation. Subjectively, the model based on the two highest scoring structures looked better, since in our opinion it combined the advantages of Model 1 and Model 2. The RMSD calculation itself nevertheless appears to be very inconsistent regarding its final value, considering the TMscore, SAP and Pymol calculated scores. Even the rank each of these three tools assigns to the different models is not homogeneous.
Taking a closer look at the TM-scores of the models, we see, that all of the templates are assumed to have about the same fold as the target (score greater than 0.5), except for those, where structures with a sequence identity below 30% are involved (Model 3, MULTI 3) and where we manually adjusted the alignment (CHAS 2 and 3). The model MULTI 3 is even considered to be based on almost randomly chosen unrelated proteins (TM-score ≈ 0.2)<ref name="TM_2">Zhang Y and Skolnick J (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33 (7): 2302–2309. doi:10.1093/nar/gki524. PMC 1084323.</ref>. MULTI 1 is considered nearly perfect, since its TM-score is only little less than 1.
The GDT scores and the TM-score are perfectly correlated except for the models MULTI 2 and 4, where the latter of those two has a slightly greater TM-score, but smaller GDT scores (see <xr id="tab:TMscore_1R46"/> and <xr id="tab:TMscore_1R47"/>, highlighted red). Both global distance tests do not appear to disagree on any of the models.
The values of the models compared to 1R46 and 1R47, respectively, differ a little, and usually only in the third decimal place.
Based on the observed values, we would in future rely on the TM-score, since in literature it is considered the most accurate of the metrics and it coincides with our intuition.

RMSD with SAP and Pymol

<figure id="fig:RMSD_model1">

Depiction of the RMSD (green) of Model 1 (magenta) from Modeller and 1R46 (cyan)

</figure> <figure id="fig:RMSD_multi1">

Depiction of the RMSD (green) of MULTI 1 (magenta) from Modeller and 1R46 (cyan)

</figure>

Although the RMSD values calculated with SAP and the metrics computed with TMscore do not fit perfectly, both favour the model MULTI 1 and assign poor values to MULTI 3 (see <xr id="tab:SAP_1R46_mod"/> and <xr id="tab:SAP_1R47_mod"/>). What really is surprising in this case, is that the models CHAS 2 and 3 are assigned a really good RMSD by SAP, but not by TMscore.

The all atom RMSD in a radius of 6 Angstrom around the catalytic site is, as we expected, even lower than the weighted RMSD, which is to the credit of the quality of Modeller. The only unexpected incident is, that changing the alignment of the active site in CHAS 2 and 3 improves the RMSD around the catalytic center, but only when comparing it to the structure 1R47, which is the galactose bound one. This can only be explained by the fact, that one of the residues that is involved in the binding of the β-D-galactose is also part (D 231 - see <xr id="fig:GAL:1R47"/>) of the active center and the fold of the catalytic site is changed by the binding of the sugar.

All in all, we believe based on these evaluations, that the RMSD is not a reliable measure for model quality.

Please note, that unexpected high values in Pymol in certain models are due to many unrecognized residues like EDO and applying the script repairPDB to the according pdb file did not help. Also, we do not know, why SAP did not calculate numbers for Model 1 and CHAS (which are equal) when compared to 1R47 (<xr id="tab:SAP_1R47_mod"/>).

In <xr id="fig:RMSD_model1"/>, <xr id="fig:RMSD_multi1"/> and <xr id="fig:RMSD_model3"/>, which are the two best and one of the worst models, the RMSD is visually presented. It points out, where the problems of Model 3 are located, since there is a region with very poor RMSD values, which is indicated by the long green sticks (<xr id="fig:RMSD_model3"/>). Again, our assumption that MULTI 1 is the best model is supported, considering that one can see hardly any green lines between the model and the target structure (<xr id="fig:RMSD_multi1"/>).

<figtable id="tab:SAP_1R46_mod"> RMSD of Modeller models compared to 1R46

Model Number of residues
in common
Weighted RMSd Un-weighted RMSd RMSd Pymol RMSd around
cat. site
Model 1 390 0.532 1.115 0.616 0.592
Model 2 390 0.571 1.574 0.740 0.596
Model 3 376 1.833 20.273 21.890 2.881
MULTI 1 390 0.396 0.575 0.515 0.425
MULTI 2 390 0.479 2.689 0.768 0.583
MULTI 3 385 11.003 17.580 21.486 6.613
MULTI 4 380 0.904 3.833 1.023 0.603
CHAS 390 0.532 1.115 0.616 0.592
CHAS 2 378 0.613 1.492 13.318 0.856
CHAS 3 378 0.613 1.492 13.318 0.856

</figtable>

<figtable id="tab:SAP_1R47_mod"> RMSD of Modeller models compared to 1R47

Model Number of residues
in common
Weighted RMSd Un-weighted RMSd RMSd Pymol RMSd around
cat. site
Model 1 391 nan nan 0.623 0.604
Model 2 391 0.717 1.569 0.731 0.511
Model 3 376 1.817 20.281 22.099 2.021
MULTI 1 390 0.396 0.575 0.513 0.379
MULTI 2 391 0.472 2.693 0.792 0.614
MULTI 3 383 9.297 17.430 21.484 7.646
MULTI 4 380 0.912 3.836 1.048 0.623
CHAS 391 nan nan 0.623 0.604
CHAS 2 378 0.618 1.498 13.338 0.541
CHAS 3 378 0.618 1.498 13.338 0.541

</figtable>

<figure id="fig:RMSD_model3">

Depiction of the RMSD (green) of Model 2 (magenta) from Modeller and 1R46 (cyan)

</figure>


DOPE score

<figure id="fig:DOPE_Model">

Per residue DOPE score comparison of 1R46 (green) with Model 1, 2 and 3 (red, orange and pink)

</figure> <figure id="fig:DOPE_MULTI">

Per residue DOPE score comparison of 1R46 (green) with MULTI 1-4 (red, orange, pink and purple)

</figure>

With the help of another Modeller script, the per residue DOPE score can be computed and afterwards plotted. In all of the following described pictures, the target structure 1R46 is shown in green, and the start of its record, after the 31 residue long signal peptide, is indicated by the dashed vertical line. In <xr id="fig:DOPE_Model"/> we can see, that the curves of Model 1 (red) and 2 (orange) both fit the green curve very well, with only small irregularities. Model 3 on the other hand, has more regions, where it digresses from the DOPE score of 1R46, than where it follows its curve.
<xr id="fig:DOPE_MULTI"/> shows the comparison of all models that are based on multiple template files. It is not surprising, that MULTI 3 (pink) performs worst, since it is based on three templates with less than 30% sequence identity. MULTI 2 and 4 (yellow and purple) are very poor modelled in the first 190 residues, but tend to become better in the later half of the protein, leading to the conclusion, that this is the easier part in modelling the molecule.
Comparing the three CHAS models (<xr id="fig:DOPE_CHAS"/>), we again observe, that CHAS 2 and 3 are equal and perform worse than CHAS, which is equal to Model 1. This worsening, however, can only be observed after the first modification of the alignment at position 170.
The last comparison focuses on the two best Modeller computed models (see <xr id="fig:DOPE_Best2"/>). Once more, we see our assumption endorsed, that these two model our protein fold very well and each of both has weaknesses and strength at some positions. What is important is, that the active site of the protein is modelled well, which is again indicated by the vertical dashed lines.

<figure id="fig:DOPE_CHAS">

Per residue DOPE score comparison of 1R46 (green) with CHAS 1, 2 and 3 (red, orange and pink)

</figure> <figure id="fig:DOPE_Best2">

Per residue DOPE score comparison of 1R46 (green) with the two subjectively best models Model 1 and MULTI 1 (red and orange)

</figure>


Swissmodel

Calculation of models

With SwissModel <ref name="SM1"> Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref> <ref name="SM2"> Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Research 31: 3381-3385.</ref> <ref name="SM3"> Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modelling. Electrophoresis 18: 2714-2723.</ref> we calculated only two basic models, since the sequence identity of all targets except for 3HG3 and 1KTB is too low.

Model SM1

<figtable id="tab:SM_scores_Model1"> SwissModel quality scores Model SM1

Model Template Modelled residue range % sequID E-value QMEAN
Z-Score
QMEAN4 score Cβ interaction all-atom interaction solvation interaction torsion interaction Final Total Energy
SM1 3HG3 32 - 426 99.747 0 -0.826 0.718 0.33 -0.05 -0.59 -0.57 -49620.699 KJ/mol

</figtable>

<figtable id="tab:pics_SM_1R46_3HG3"> SM1, visual comparison

SM1 (red), created with SwissModel with the template 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)
SM1 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow)

</figtable>

<figure id="fig:SM1_ModelQuality">

Local Model Quality Estimation of model SM1: Depiction of ANOLEA, QMEAN and Gromos score.

</figure>

For this model we used the template 3HG3 like we did for Model 1. SwissModel only models the residues 32 to 426, thus it might incorporate the knowledge about the signal peptide from residue 1 to 31. Therefore, there is no overhang of unmodeled sequence, as we observed it in all Modeller built models (see visualisation in <xr id="tab:pics_SM_1R46_3HG3"/>). One problem SwissModel does also encounter, is the end of the sequence, where the template's sequence is longer than the target's. Thus again, there is a craning small helix of about 9 amino acids.
The server itself provides some very helpful visualisations and evaluations of the model and its score. In <xr id="fig:SM1_ModelQuality"/> the comparison of the cores ANOLEA, QMEAN and Gromos is depicted. The local QMEAN score <ref name="QMEAN"> Benkert P, Biasini M, Schwede T. (2011). Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics, 27(3):343-50. </ref> reflects the expected structural inaccuracy, Gromos <ref name="GROMOS">van Gunsteren, W. F., S. R. Billeter, et al. (1996). Biomolecular Simulations: The GROMOS96 Manual and User Guide. Zürich, VdF Hochschulverlag ETHZ. </ref> shows the energy for each amino acid of the protein chain and ANOLEA reflects the AMFP(atomic mean force potential)-derived energy profiles, which are able to correlate high scores with point errors and misalignments in the models.<ref name="ANOLEA"> Melo, F. and E. Feytmans (1998). Assessing protein structures with a non-local atomic interaction energy. J Mol Biol 277(5): 1141-1152.; May 30, 2012</ref>
Considering, that ANOLEA and Gromos should be negative, and the QMEAN score should tend towards 0, we can observe a quite good correlation between these three measures. Comparing the ANOLEA and the Gromos score, which both base on energy profiles, we can see, that the Gromos score calculation solely takes one amino acid into account, because very often, especially in poor modelled regions, the scores of adjacent residues are not even similar, while the ANOLEA score is a sliding window approach, and the scores actually look like a rather smooth curve. All in all, we can see very few regions with bad scores, which indicates the quality of our model.
In <xr id="tab:pics_SM1_error"/> several error scores are depicted, as well as comparisons to reference sets. Only one of the four scores contributing to QMEAN is better than the average Z-score of 0, but since the three left ones are not far below zero, the resulting QMEAN Z-score of -0.826 and QMEAN4 score of 0.718, which estimates the quality (see <xr id="tab:SM_scores_Model1"/>) indicate a reasonable model. The local error plot gives a very diverse picture, where some of the residues seem to be modelled quite good, on average only 0.62 angstrom wrong, but with high deviances at the important residues, like the active site (position 170 and 231) or the substrate binding site (position 203-206). But all in all, the deviation does not seem to be very high, which is indicated by the 4th picture, in which the model is colored by residue error and almost the whole chain is dark blue, with only few light blue and cyan parts. Both comparisons to the reference sets underline the conclusion, that the model is rather close to the native protein.

<figtable id="tab:pics_SM1_error"> SM1, error and comparison to reference set

Z-Score of the individual components of QMEAN. The average Z-Score of high-quality structures is 0.
Local Model reliability of SM1 with estimated per-residue inaccuracies along the sequence of SM1
SM1 (chain A) colored by estimated per-residue error from blue (more reliable regions) to red (potentially unreliable regions)
The QMEAN score of the Model SM1 is compared to scores of high-resolution reference structures
Comparing the model's QMEAN score to the density plot of the scores of the high-resolution reference set

</figtable>


Model SM2

<figtable id="tab:SM_scores_Model2"> SwissModel quality scores Model 1KTB

Model Template Modelled residue range % sequID E-value QMEAN
Z-Score
QMEAN4 score Cβ interaction all-atom interaction solvation interaction torsion interaction Final Total Energy
SM2 1KTB 32 - 421 52.163 0 -2.566 0.61 -0.81 -1.14 -0.47 -2.39 -12303.048 KJ/mol

</figtable>

<figtable id="tab:pics_sm_1R46_1KTB"> Model 2, visual comparison

SM2 (red), created with SwissModel with the template 1KTB, superimposed on the x-ray structure of α-Galactosidase A (green)
SM2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 1KTB (yellow)

</figtable>

<figure id="fig:SM2_ModelQuality">

Local Model Quality Estimation of model SM2: Depiction of ANOLEA, QMEAN and Gromos score.

</figure>

The second model we created with SwissModel is based on the template structure 1KTB, and therefore should be similar to Model 2. Again, the modelling starts with position 32, although 1KTB (Uniprot ID Q90744) does not have an annotated signal peptide. This strengthens our assumption, that signal peptide information is incorporated.

Although the SwissModel quality scores listed in <xr id="tab:SM_scores_Model2"/> are worse than those of the model SM1, the produced model on the first glance does not appear to be that bad. Looking at the similarity of the target (green), the model SM2 (red) and the template 1KTB (yellow) in <xr id="tab:pics_sm_1R46_1KTB"/>, we see, that only few regions are modeled poor. Due to the large amount of helices in the 1KTB structure, this structure is overpredicted in the model.
Comparing the local model quality estimation of this model (see <xr id="fig:SM2_ModelQuality"/>) to the first SwissModels' (see <xr id="fig:SM1_ModelQuality"/>), we can observe a lot more regions with poor estimated quality, e.g. position 140 to 150. Here, the three measures do not seem to be perfectly in agreement. It seems, that the QMEAN's estimated bad regions are shifted to the right, compared to the ANOLEA score. Another example at position 370, both ANOLEA and Gromos assign a poor score, while the QMEAN tends towards 0.
The error plots and comparisons to reference sets in <xr id="tab:pics_SM2_error"/> also show, that the model is not as good as SM1. The QMEAN is about 1.8 units smaller, the mean predicted local error is approximately 1.3 angstrom larger and the per-residue error depiction of the whole chain shows a couple of parts, that are even red, meaning that the modelling of these regions is potentially unreliable. Considering the reference sets, SM2 seems to be not very good. However, we cannot support the conclusion drawn by the quality scores of SwissModel, making a visual comparison and consulting the evaluation scores

<figtable id="tab:pics_SM2_error"> SM2, error and comparison to reference set

Z-Score of the individual components of QMEAN. The average Z-Score of high-quality structures is 0.
Local Model reliability of SM2 with estimated per-residue inaccuracies along the sequence of SM2
SM2 colored by estimated per-residue error ranging from blue (more reliable regions) to red (potentially unreliable regions)
The QMEAN score of the Model SM2 is compared to scores of high-resolution reference structures
Comparing the model's QMEAN score to the density plot of the scores of the high-resolution reference set

</figtable>


Evaluation

TM-score

<figtable id="tab:TMscore_1R46_sm"> TM-score Swissmodel 1R46

Model Number of residues
in common
RMSD of the
common residues
TM-score GDT-TS-score GDT-HA-score
SM1 390 0.512 0.9950 0.9917 0.9218
SM2 390 1.551 0.9660 0.9032 0.7538

</figtable> <figtable id="tab:TMscore_1R47_sm"> TM-score Swissmodel 1R47

Model Number of residues
in common
RMSD of the
common residues
TM-score GDT-TS-score GDT-HA-score
SM1 390 0.515 0.9950 0.9923 0.9231
SM2 390 1.532 0.9667 0.9058 0.7545

</figtable>

In this section, the RMSD, TM-score, GDT-TS and GDT-HA were computed with the help of TMscore. All scores seem to be correlated without an exception, in this case, but since we only have two models and hence four evaluations to compare, this conclusion may not be very meaningful. The model SM1 is according to all scores the better one.
Despite the really good TM-score and the still good GDT-TS, the GDT-HA is very low compared to the other scores and the matchable Modeller's Model1. We cannot really explain this, because there are only two regions, that appear to be not quite perfectly modelled (see <xr id="fig:RMSD_sm2"/>).
Again, we cannot observe significant differences between the comparison to 1R46 and 1R47.

RMSD with SAP and Pymol

<figure id="fig:RMSD_sm1">

Depiction of the RMSD (green) of SM1 (magenta) from SwissModel and 1R46 (cyan)

</figure>

All calculated RMSD values are very good compared to both, the Apo and the Complex Structure (see <xr id="tab:SAP_1R46_sm"/> and <xr id="tab:SAP_1R47_sm"/>).

<figtable id="tab:SAP_1R46_sm"> RMSD of SwissModel models compared to 1R46

Model Number of residues
in common
Weighted RMSd Un-weighted RMSd RMSd Pymol RMSd around
cat. site
SM1 390 0.367 0.512 0.498 0.475
SM2 391 0.723 1.559 0.748 0.519

</figtable> <figtable id="tab:SAP_1R47_sm"> RMSD of SwissModel models compared to 1R47

Model Number of residues
in common
Weighted RMSd Un-weighted RMSd RMSd Pymol RMSd around
cat. site
SM1 391 0.385 0.599 0.492 0.447
SM2 391 0.710 1.539 0.757 0.493

</figtable>

<figure id="fig:RMSD_sm2">

Depiction of the RMSD (green) of SM2 (magenta) from SwissModel and 1R46 (cyan)

</figure>


I-Tasser

<figtable id="tab:itasser_runs"> Overview of the I-Tasser runs

Model names Homologous templates
excluded (sequence id)
Top template used
Model1..5 --- 3hg3A
80%_cutoff_model{1..5} > 80% 1ktcA
30%_cutoff_model{1..5} > 30% 3lrkA
20%_cutoff_model{1..5} > 20% 3cc1A

</figtable>

We also initiated model calculation by the I-Tasser-Server using different template exclusion settings. In total, there was one run that had access to the whole template database (Model1..5) and three runs where homologous templates with a sequence identity greater than 80%, 30% and 20% respectively were excluded (see <xr id="tab:itasser_runs"/>).


Evaluation

The RMSD, TM-score and GDT as calculated by the TMscore and SAP tools are shown in <xr id="tab:itasser_tm"/> and <xr id="tab:itasser_sap"/>. All these scores seems to be correlated and as expected, the models get worse the more homologous templates are excluded.

I-Tasser uses its own quality measure, the C-score, which "is typically in the range of [-5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa."<ref name="itasser_cscore">http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S103581/cscore.txt, last accessed 2012-06-03</ref> According to the I-Tasser authors, the C-score is highly correlated with the TM-Score (correlation coefficient of 0.91)<ref name="itasser">Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010), http://www.ncbi.nlm.nih.gov/pubmed/20360767</ref> - a circumstance we could not observe in our results.

Interestingly, the models that were calculated without using templates with a sequence identity of more than 80% are quite good compared to the calculation without any template constraints, even tough the best sequence identity of the used templates is only 48% vs. 92% .

<figtable id="tab:itasser_tm"> TM-score

Model I-Tasser
C-score
Number of
residues in common
RMSD of
the common residues
TM-score GDT-TS-score GDT-HA-score
Model1 0.05 390 0.864 0.9869 0.9718 0.8506
Model2 -0.76 390 1.439 0.9698 0.9090 0.7385
Model3 -1.85 390 0.400 0.9969 0.9968 0.9571
Model4 -0.36 390 0.640 0.9923 0.9840 0.8724
Model5 -5 390 1.432 0.9684 0.9077 0.7346
80%_cutoff_model1 -0.41 390 1.618 0.9652 0.8994 0.7212
80%_cutoff_model2 -0.73 390 1.554 0.9676 0.9103 0.7321
80%_cutoff_model3 -0.97 390 1.573 0.9666 0.9064 0.7506
80%_cutoff_model4 -0.86 390 1.539 0.9679 0.9109 0.7391
30%_cutoff_model1 -0.53 390 7.238 0.7843 0.5910 0.3962
30%_cutoff_model2 -1.84 390 9.750 0.7839 0.6519 0.4859
30%_cutoff_model3 -2.16 390 6.946 0.7807 0.5705 0.3731
30%_cutoff_model4 -2.43 390 6.756 0.8183 0.6372 0.4462
30%_cutoff_model5 -3.70 390 10.533 0.7261 0.6077 0.4628
20%_cutoff_model1 -0.19 390 7.068 0.7817 0.5769 0.3788
20%_cutoff_model2 -2.27 390 7.136 0.7736 0.5660 0.3699
20%_cutoff_model3 -1.29 390 7.498 0.7661 0.5699 0.3763
20%_cutoff_model4 -1.63 390 6.884 0.8033 0.6045 0.4071
20%_cutoff_model5 -1.19 390 7.441 0.7701 0.5609 0.3654

</figtable>

<figtable id="tab:itasser_sap"> SAP calculated RMSD

Model Number of residues
in common
Weighted RMSd Un-weighted RMSd RMSd Pymol RMSd around
cat. site
Model1 389 0.505 0.686 0.815 0.893
Model2 389 0.672 1.132 1.100 1.020
Model3 390 0.313 0.400 0.599 0.716
Model4 390 0.476 0.639 0.794 0.783
Model5 389 0.679 1.112 1.123 0.929
80%_cutoff_model1 389 0.694 1.034 1.159 1.076
80%_cutoff_model2 389 0.684 1.009 1.131 0.979
80%_cutoff_model3 389 0.632 1.053 1.050 0.955
80%_cutoff_model4 390 0.664 1.015 1.101 0.928
30%_cutoff_model1 390 1.260 3.328 2.775 1.872
30%_cutoff_model2 390 0.969 3.575 2.086 1.093
30%_cutoff_model3 387 1.473 2.893 3.058 2.207
30%_cutoff_model4 389 1.160 2.858 2.393 1.975
30%_cutoff_model5 388 1.140 6.486 2.461 1.322
20%_cutoff_model1 387 1.273 3.065 3.063 2.436
20%_cutoff_model2 387 1.436 3.113 3.230 2.052
20%_cutoff_model3 388 1.263 3.387 3.233 1.803
20%_cutoff_model4 390 1.212 3.001 2.774 1.982
20%_cutoff_model5 387 1.303 3.150 3.159 2.094

</figtable>

3D-Jigsaw

For 3D-Jigsaw, we created three sets containing five models each based on templates from the given the homology groups. <xr id="tab:3djigsaw_input_models"/> lists these sets in detail. For the low sequence identity template set, there are actually two which differ only in one sequence (a and b). That is because the only Modeller model based on a template fulfilling the sequence identity constraint is rather bad.

<figtable id="tab:3djigsaw_input_models"> Input Model sets for 3D-Jigsaw

Model Number of
residues in common
RMSD of
the common residues
TM-score GDT-TS-score GDT-HA-score
> 80% sequence identity
Modeller/Model 1 390 1.115 0.9841 0.9667 0.8558
Modeller/MULTI 1 390 0.575 0.9938 0.9910 0.9128
Swissmodel/SM1 390 0.512 0.9950 0.9917 0.9218
I-Tasser/Model3 390 0.400 0.9969 0.9968 0.9571
I-Tasser/Model4 390 0.640 0.9923 0.9840 0.8724
40% - 80% sequence identity
Modeller/Model 2 390 2.098 0.9596 0.9071 0.7635
Swissmodel/SM2 390 1.551 0.9660 0.9032 0.7538
I-Tasser/80%_cutoff_model4 390 1.539 0.9679 0.9109 0.7391
I-Tasser/80%_cutoff_model2 390 1.554 0.9676 0.9103 0.7321
I-Tasser/80%_cutoff_model3 390 1.573 0.9666 0.9064 0.7506
< 30% sequence identity
a) Modeller/Model 3 390 22.707 0.4087 0.2699 0.1814
b) I-Tasser/30%_cutoff_model3 390 6.946 0.7807 0.5705 0.3731
I-Tasser/30%_cutoff_model4 390 6.756 0.8183 0.6372 0.4462
I-Tasser/30%_cutoff_model1 390 7.238 0.7843 0.5910 0.3962
I-Tasser/20%_cutoff_model4 390 6.884 0.8033 0.6045 0.4071
I-Tasser/20%_cutoff_model1 390 7.068 0.7817 0.5769 0.3788

</figtable>

Evaluation

The quality measures of the resulting models are shown in <xr id="tab:3djigsaw_tm"/> and <xr id="tab:3djigsaw_sap"/>. We could not observe any improvements to the input models. However, it should be noted that the resulting models of the low sequence identity sets are very similar, even tough one of the input models of one set was rather bad (see section above).

<figtable id="tab:3djigsaw_tm"> TM-score

Model Number of
residues in common
RMSD of
the common residues
TM-score GDT-TS-score GDT-HA-score
> 80% sequence identity
Model 1 390 0.717 0.9907 0.9769 0.8724
Model 2 390 0.717 0.9907 0.9776 0.8724
Model 3 390 0.710 0.9906 0.9756 0.8590
Model 4 390 0.982 0.9818 0.9179 0.7481
Model 5 390 0.723 0.9904 0.9750 0.8673
40% - 80% sequence identity
Model 1 390 1.572 0.9650 0.9006 0.7442
Model 2 390 2.505 0.9362 0.8506 0.6936
Model 3 390 2.196 0.9516 0.8724 0.7199
Model 4 390 2.151 0.9563 0.8885 0.7231
Model 5 390 1.572 0.9650 0.9019 0.7455
< 30% sequence identity
a) Modeller
Model 1 390 6.925 0.7988 0.5974 0.3949
Model 2 390 6.924 0.7989 0.5974 0.3962
Model 3 390 6.925 0.7988 0.5974 0.3942
Model 4 390 6.989 0.7965 0.5904 0.3865
Model 5 390 6.925 0.7987 0.5981 0.3962
b) I-Tasser
Model 1 390 6.937 0.7982 0.5968 0.3929
Model 2 390 6.925 0.7988 0.5974 0.3955
Model 3 390 6.933 0.7982 0.5962 0.3942
Model 4 390 6.936 0.7985 0.5968 0.3942
Model 5 390 6.937 0.7985 0.5974 0.3936

</figtable>

<figtable id="tab:3djigsaw_sap"> SAP calculated RMSD

Model Number of residues
in common
Weighted RMSd Un-weighted RMSd RMSd Pymol RMSd around
cat. site
> 80% sequence identity
Model 1 390 0.461 0.677 0.620 0.518
Model 2 390 0.463 0.677 0.621 0.539
Model 3 390 0.512 0.699 0.674 0.465
Model 4 390 0.887 0.982 1.050 0.880
Model 5 390 0.482 0.684 0.630 0.525
40% - 80% sequence identity
Model 1 389 0.629 1.164 0.830 0.797
Model 2 388 0.865 1.756 1.113 0.769
Model 3 390 0.701 1.331 0.914 0.701
Model 4 389 0.696 1.101 0.949 0.701
Model 5 389 0.630 1.164 0.845 0.663
< 30% sequence identity
a) Modeller
Model 1 387 1.315 3.056 2.894 2.096
Model 2 387 1.310 3.053 2.883 2.095
Model 3 387 1.314 3.058 2.852 1.923
Model 4 387 1.346 3.203 2.875 2.099
Model 5 387 1.316 3.057 2.877 2.101
b) I-Tasser
Model 1 387 1.318 3.009 2.876 1.577
Model 2 387 1.315 3.055 2.868 2.095
Model 3 387 1.314 3.067 2.804 1.947
Model 4 387 1.303 2.990 2.845 1.944
Model 5 387 1.299 2.983 2.832 1.545

</figtable>

Comparison

<figtable id="tab:RMSD"> RMSD of Model1(Modeller), SM1(SwissModel), Model3(I-Tasser) and Model2(Modeller), SM2(SwissModel) and 80%_cutoff_model4(I-Tasser)

Model1 - SM1 Model1 - Model3 SM1 -Model3 Model2 - SM2 Model2 - 80%_cutoff_model4 SM2 - 80%_cutoff_model4
0.393 0.937 0.667 0.223 0.812 0.595

</figtable>

<figure id="fig:comparison_1">

Alignment of Modeler's Model1(red), SwissModel's SM1 (orange) and I-Tasser's Model3 (magenta)

</figure>

We considered Modeler's Model1 (red), SwissModel's SM1 (orange) and I-Tasser's Model3 (magenta) comparable, since they all three are the best model of each method, with a high sequence identity and without any further changes. They are all three aligned in <xr id="fig:comparison_1"/>. We can see that SwissModel is the only method that does (correctly) not model the signal peptide, while Modeller simply models it as loop (left) and I-Tasser assigns a helix to the starting residues (right side of the picture).
The rather poor RMSD values (<xr id="tab:RMSD"/>) of the first three models among each other seem to stem mainly from the signal peptide issues and only little from the little diverse modelling of some secondary structures.
The comparison of Modeler's Model2 (yellow), SwissModel's SM2 (blue) and 80%_cutoff_model4 (cyan) reveals a better compliance among all three methods, considering the throughout better RMSD values (see <xr id="tab:RMSD"/>). The visualisation in <xr id="fig:comparison_2"/> cannot show a real reason for this, other than a slightly better accordance in modelling helices and sheets.

<figure id="fig:comparison_2">

Alignment of Modeler's Model2(yellow), SwissModel's SM2 (blue) and I-Tasser's 80%_cutoff_model4 (cyan)

</figure>


References

<references/>