Fabry:Homology based structure predictions

Fabry Disease » Homology based structure predictions

The following analyses were performed on the basis of the α-Galactosidase A sequence. Please consult the journal for the commands used to generate the results.

Dataset preparation and target comparison

Datasets

<figtable id="tab:datasetHHpred"> Dataset HHpred,
E-value cutoff 1e-15

pdb ID	E-value	Identity in %
> 80% sequence identity
3hg3	8.6e-90	100
40% - 80% sequence identity
1ktb	4.2e-85	53
< 30% sequence identity
3cc1	5.5e-74	25
1zy9	3.1e-48	13
3a24	7.8e-40	17
2xn2	5.3e-37	15
2d73	5.7e-36	14
3mi6	1.4e-31	15
2yfo	9.1e-30	13
2f2h	2.7e-20	17
2g3m	2.2e-20	16
3nsx	6e-20	13
3lpp	2.2e-18	15
3l4y	1.9e-18	15
3top	3.6e-18	12
2xvl	3.2e-18	16
2x2h	4.9e-16	13

</figtable>

<figtable id="tab:datasetHHpred"> Additional sequences HHpred,
E-value cutoff 0.002

pdb ID	E-value	Identity in %
3zss	0.00062	10
1j0h	0.0011	15
1ea9	0.00098	12

</figtable>

<figtable id="tab:datasetCOMA"> Dataset COMA,
E-value cutoff 0.002

pdb ID	E-value	Identity in %
> 80% sequence identity
-	-	-
40% - 80% sequence identity
1ktb	1.7e-61	52
< 30% sequence identity
3lrk	1.2e-66	23
3a21	2.7e-65	26
1szn	3.7e-59	22
3cc1	5.2e-58	19
1zy9	1.7e-39	9
3mi6	4.3e-38	11
2yfn	4.4e-35	10
2d73	1.9e-32	9
3a24	5.6e-30	10
1xsi	1.9e-12	10
2g3m	2.4e-11	10
3pha	2.9e-10	6
3lpo	4.7e-09	8
2x2h	8.2e-09	8
3mo4	1.2e-08	7
2xvg	2.4e-08	8
3ton	4.3e-08	8
2xib	1e-07	7
3eyp	1.6e-06	8
3k1d	3.5e-06	9
2zwy	8.8e-06	9
3gza	1.8e-05	8
3m07	2.3e-05	7
1eh9	0.00013	6
1gvi	0.00035	8
1aqh	0.00039	5
1mwo	0.00058	7
3vmn	0.0018	9
1bf2	0.0019	6
3aml	0.0019	8

</figtable>

We performed a HHpred as well as a COMA search, to generate three distinct datasets. Since COMA did not find any homologue structures with a similarity above 41% (see <xr id="tab:datasetCOMA"/>), we used the dataset created with the HHpred search and the script described in the journal. Hereby we found one structure with a similarity above 80%, one with a similarity between 40 and 80% and 15 with sequence similarity below 30%, of which 14 had a similarity of under 20% (see <xr id="tab:datasetHHpred" />). All HHpred matches had an E-value below 1e-15, for the COMA homologous we tried a less strict threshold of 0.002.
In most cases we used the structures 3hg3, 1ktb and 3cc1 for modelling, because either they are the only representatives in their class, or in the case of 3cc1, the sequence identity did not seem too low. For the Model MULTI 3 we also used the structures 3a24 and 3zss. The latter of those has an E-value of 0.00062. We added this structure to examine how a template with an E-value that is worse than the value of all our other structures, but still would fulfil the restrictions of an usual BLAST search (threshold of 0.003), would perform.
In this case it is important to mention, that although the identity of 3hg3 is 100%, it is not the pdb structure annotated for the AGAL protein, but the structure of the substrate bound catalytic mechanism, hence the high similarity.
1ktb is the X-ray structure for the already mentioned α-N-acetylgalactosiminidase in chicken, which in future might be used for enzyme replacement therapy in the treatment of Fabry Disease.
The last one of the frequently used structures, 3cc1, is the x-ray structure of a putative α-N-acetylgalactosiminidase in in Bacillus Halodurans.

Target comparison

As an initial step of the evaluation, we compared the apo structure 1R46 and the complex structure (with bound α-galactose) 1R47. Since the alignment of both the chains A of 1R46 and 1R47 in Pymol (see <xr id="tab:compare"/>) revealed a RMSD value of 0.248 and the comparison of the position and direction of the residues involved in the binding of the sugar (see <xr id="fig:GAL:1R47"/>) do not differ significantly, we used only the 1R46 structure for vizualisation, but computed all values and statistics for both structures.
In the right figure in <xr id="tab:compare"/>, the residues Asp92A, Asp93A, LYS168A, ARG227A and ASP231A are depicted in sticks representation (thicker); they are responsible for the binding of the sugar in the complex structures, which is shown in magenta. Clearly, one can see not much difference in this region between 1R46 and 1R47.

<figtable id="tab:compare"> Comparison of apo and complex structure

Superimposed structures of 1R46 (blue) and 1R47 (green) in cartoon representation. Obviously, the structures do not differ much.

Comparison of the residues invoked in the binding of α-galactose in the apo structure (blue) and the complex structure (green)

</figtable>

Residues involved in the binding of α-galactose in 1R47 [1]

</figure>

Modeller

With the command line tool, we created 10 models (see Journal). The first three were produced with the standard settings and workflow of Modeller. The subsequent four models were computed from multiple target files in different combinations and in the last three models we rearranged the alignment files in order to test the quality of the alignment and the influence of the two types of alignment.

Default settings

Model 1

<figtable id="tab:pics_1R46_3HG3"> Model 1, visual comparison

Model 1 (red), created with Modeller with the template 3HG3, superimposed on the x-ray structure of α-Galactosidase A (green)

Model 1 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow)

</figtable>

For the first model we used the template with the highest sequence identity. According to HHPred, the identity is 100%, Modeller only calculates an identity of 96% (see <xr id="tab:Modeller_scores_3hg3_2"/>). This discrepancy might be due to the way of the comparison - 1R46 is completely enclosed in 3HG3, but 3HG3 has a longer sequence (404 residues) and thus only 96% of it can be congruent to 1R46 (398 residues without signal peptide). In the left picture in <xr id="tab:pics_1R46_3HG3"/> the superimposition of the computed Model 1 and the actual target structure are shown. The right picture additionally displays the template structure. One can see, that the three structure almost perfectly superimpose, which is underlined by the scores derived from Modeller (see <xr id="tab:Modeller_scores_3hg3_2"/>). The GA341 score of 1.0 indicates a "native like" model (see basic tutorial) and the Compactness <ref name="Compactness"> Foldit Wiki, Compactness (October 23, 2011), http://foldit.wikia.com/wiki/Compactness; May 26, 2012</ref> as well as the DOPE score (see basic tutorial) are the second highest and lowest of all calculated models, respectively.
The only parts that can not be modelled correctly are both ends of the sequence. Those parts are highlighted blue in the pictures. From our background knowledge we know that the first 31 residues form the signal peptide, that is cleaved off and thus can not be found in the tertiary structure of the target protein. This can not be modelled by the Modeller tool and thus it would be a good amendment to the modelling pipeline to add sequence based analyses like Signal peptide prediction, similar to the predictions we made in Task 2. The lack of modelling of the last bit of the sequence can be pinned to the longer sequence of the 3HG3 structure, since the last 6 residues are craning and the template is 6 amino acids longer than the target.
Inspecting the problematic residues (see <xr id="tab:Modeller_scores_3hg3_1"/>), with a distance of more than 8 angstrom, manually in pymol, we discovered that two of them lie in loop regions (91 and 101) which are hard to model. On the other hand two of the residues are located in a helix (160 and 318) and seem to fit perfectly to the target.
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_3hg3_1"> Modeller scores Model 3hg3, Distances

Model	Template	Distances > 8.0 Å in 2d alignment	Distances > 8.0 Å
Model 1	3hg3	Pos: 428 Dist 76.568	Pos: 1 Dist 28.357	Pos: 91 Dist 8.810	Pos: 101 Dist 17.314	Pos: 112 Dist 25.386	Pos: 160 Dist 32.647	Pos: 318 Dist 27.449	Pos: 333 Dist 42.457

</figtable> <figtable id="tab:Modeller_scores_3hg3_2"> Modeller scores Model 3hg3

% sequID	Sequ length	Compact- ness	Native energy (pair)	Native energy (surface)	Native energy (combined)	Z score (pair)	Z score (surface)	Z score (combined)	GA341 score	DOPE score
95.570999	429	0.215183	-213.518650	-9.487873	-5.603112	-10.125743	-6.381974	-11.484159	1.000000	-52607.89844

</figtable>

Model 2

<figtable id="tab:pics_1R46_1KTB"> Model 2, visual comparison

Model 2 (red), created with Modeller with the template 1ktb, superimposed on the x-ray structure of α-Galactosidase A (green)

Model 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 1ktb (yellow)

</figtable>

For this model, we used the target 1KTB, which has a sequence identity of 53%. The superimposed structures of 1R46 and Model 2 are shown in <xr id="tab:pics_1R46_1KTB"/>, as well as the structural alignment of 1KTB with the model and the target. This model encounters only one of the problems of Model 1, namely the not modeled signal peptide for the same reasons mentioned above. The end of the sequence seems to be modeled just fine, although the template sequence is even longer (405 residues) than 3HG3. Despite the little worse Compactness and DOPE score (see <xr id="tab:Modeller_scores_1ktb_2"/>) , Model 2 seems to be really good, since there are no residues that have a distance greater than 8 Å (see <xr id="tab:Modeller_scores_1ktb_1"/>) and according to the GA341 score the model is also "native like" and a value greater than 0.7 generally indicates a reliable model, defined as ≥ 95% probability of correct fold. <ref name="Melo2002"> Melo F, Sánchez R, Sali A. (2002). Statistical potentials for fold assessment. Protein Sci. 2002 Feb;11(2):430-48. PMCID: PMC2373452</ref> .
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_1ktb_1"> Modeller scores Model 1ktb, Distances

Model	Template	Distances > 8.0 Å in 2d alignment	Distances > 8.0 Å
Model 2	1ktb	Pos: 0 Dist 0	Pos: 0 Dist 0

</figtable>

<figtable id="tab:Modeller_scores_1ktb_2"> Modeller scores Model 1ktb

% sequID	Sequ length	Compact- ness	Native energy (pair)	Native energy (surface)	Native energy (combined)	Z score (pair)	Z score (surface)	Z score (combined)	GA341 score	DOPE score
53.351002	429	0.176840	-107.285679	-7.043755	-3.262988	-8.593054	-6.151719	-10.076556	1.000000	-49267.35156

</figtable>

Model 3

<figtable id="tab:pics_1R46_3CC1"> Model 3, visual comparison

Model 3 (red), created with Modeller with the template 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)

Model 3 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3CC1 (yellow)

</figtable>

For the last of the basic models we used the structure 3CC1 as a template, which is rather far related with a sequence identity of roughly 25%. Here the emerging problems are obvious. Of course, the signal peptide can not be modelled again, but the real problem is that Model 3 is tilted approximately 90° compared to the target structure 1R46.
Additionally analyzing the quality scores of the model, we see that the model must be rejected. For example the GA341 score of 0.33 indicates that the model should be discarded, since any good model will give a score near 1.0, and according to Salilab any model with a score less than 0.6 should not be considered as helpful <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), http://salilab.org/archives/modeller_usage/2006/msg00060.html; May 26, 2012</ref>. By all means, the z-scores are also not good, because these statistical potentials contribute to the GA341 (see Modeller help).
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_3cc1_1"> Modeller scores Model 3cc1, Distances

Model	Template	Distances > 8.0 Å in 2d alignment	Distances > 8.0 Å
Model 3	3cc1	Pos: 433 Dist 63.967	Pos: 147 Dist 25.085	Pos: 290 Dist 19.238	Pos: 374 Dist 24.356	Pos: 395 Dist 15.007	Pos: 412 Dist 61.733	Pos: 452 Dist 23.680	Pos: 631 Dist 23.283	Pos: 659 Dist 8.421	Pos: 684 Dist 10.763	Pos: 703 Dist 10.204	Pos: 762 Dist 10.753

</figtable> <figtable id="tab:Modeller_scores_3cc1_2"> Modeller scores Model 3cc1

% sequID	Sequ length	Compact- ness	Native energy (pair)	Native energy (surface)	Native energy (combined)	Z score (pair)	Z score (surface)	Z score (combined)	GA341 score	DOPE score
24.242001	429	0.139850	198.134571	24.857669	7.885459	-3.800148	-1.528096	-3.572654	0.332343	-38190.22656

</figtable>

Multiple templates

MULTI 1

<figtable id="tab:pics_1R46_multi1"> Model MULTI 1, visual comparison

Model MULTI 1 (red) (templates 3HG3 and 1KTB), superimposed on the x-ray structure of α-Galactosidase A (green)

Model MULTI 1 (red) superimposed on α-Galactosidase A (green) and the structure of 3HG3 (yellow) and 1KTB (orange)

</figtable>

MULTI 1, which is based on the structures 3HG3 and 1KTB is the subjectively second best of the Modeller computed models. The superimposition of the model and the target structure (see <xr id="tab:pics_1R46_multi1"/>) shows a similar results to Model 1, but comparing the quality scores, we observe a slightly worse DOPE score and a really bad Compactness value. According to Salilab the DOPE score is the score to distinguish between two good models <ref name="GA341"> Salilab - Modeller Usage, modeller ga341 score (February 21, 2006), http://salilab.org/archives/modeller_usage/2006/msg00060.html; May 26, 2012</ref>. This model has the second lowest score of all, as well as a almost 100% sequence identity to the target.
On the other hand, the Compactness seems to be really low, even smaller than the value of Model 3. Thus we think, that this quality score cannot really be considered reliable, in contrast to the DOPE score.
What also becomes obvious in this model, is that the more distant related 1KTB structure (orange) seems to fit better than the really close related 3HG3 (yellow), which can be seen in the visual comparison in the right picture of <xr id="tab:pics_1R46_multi1"/>.
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_MULTI1_1"> Modeller scores Model MULTI1, Distances

Model	Templates	Distances > 6.0 Å
MULTI1	3HG3 1KTB	Pos 211 Dist 6.509	Pos 212 Dist 7.237	Pos 379 Dist 6.608	Pos 380 Dist 10.056	Pos 426 Dist 6.538	Pos 427 Dist 8.423	Pos 428 Dist 7.683	Pos 429 Dist 9.226

</figtable>

<figtable id="tab:Modeller_scores_MULTI1_2"> Modeller scores Model MULTI1

% sequID	Sequ length	Compact- ness	Native energy (pair)	Native energy (surface)	Native energy (combined)	Z score (pair)	Z score (surface)	Z score (combined)	GA341 score	DOPE score
99.747002	429	0.137662	-298.484703	-14.285927	-7.811892	-10.018935	-7.069398	-12.033570	1.000000	-51741.65625

</figtable>

MULTI 2

<figtable id="tab:pics_1R46_multi2"> Model MULTI 2, visual comparison

Model MULTI 2 (red), created with Modeller on basis of the templates 3HG3, 1KTB and 3CC1, superimposed on the x-ray structure of α-Galactosidase A (green)

Model MULTI 2 (red) superimposed on the x-ray structure of α-Galactosidase A (green) and the structure of 3HG3 (yellow), 1KTB (orange) and 3CC1 (lightorange)

</figtable>

MULTI 2 bases on MULTI 1, but has the additional structure 3CC1, which is rather distantly related to 1R46. Hence we have one structure of each sequence identity group. Thus we can observe a decrease in the model quality. In <xr id="tab:pics_1R46_multi2"/> one can see, that the signal peptide is somewhat nested inside the molecule. The DOPE score increased, thus indicating a less reliable model, although the Compactness has the highest level of all models (see <xr id="tab:Modeller_scores_MULTI2_2" />), underlining our above made statement, that this score is not really trustworthy.
Another sign for the bad quality of the model are the 556 residues with a distance greater than 6 angstrom shared among the three template structures in the alignment (see <xr id="tab:Modeller_scores_MULTI2_1" />).
For further evaluation of the model, please see Modeller Evaluation

<figtable id="tab:Modeller_scores_MULTI2_1"> Modeller scores Model MULTI2, Distances

Model	Templates	Distances > 6.0 Å
MULTI2	3HG3 1KTB 3CC1	Pos 216 Dist 6.509	Pos 217 Dist 7.237	Pos 386 Dist 6.608	Pos 387 Dist 10.056	Pos 433 Dist 6.538	Pos 434 Dist 8.423	Pos 435 Dist 7.683	Pos 436 Dist 9.226	Pos 4 Dist 14.846	Pos 5 Dist 15.230	... 556 in total

</figtable>

<figtable id="tab:Modeller_scores_MULTI2_2"> Modeller scores Model MULTI2

% sequID	Sequ length	Compact- ness	Native energy (pair)	Native energy (surface)	Native energy (combined)	Z score (pair)	Z score (surface)	Z score (combined)	GA341 score	DOPE score
71.139000	429	0.229647	145.809979	8.120285	3.810054	-7.131847	-4.691505	-7.820324	1.000000	style="

Fabry:Homology based structure predictions

Contents

Dataset preparation and target comparison

Datasets

Target comparison

Modeller

Default settings

Model 1

Model 2

Model 3

Multiple templates

MULTI 1

MULTI 2

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools