Homology modelling Gaucher Disease

The object of this task was to apply homology modeling for predicting the tertiary structure of glycosylceramidase given its sequence. For this, we first selected different templates which were than used to derive the structure of glycosylceramidase using three different homology modeling tools, namely Modeller, SWISS-MODEL, and the I-TASSER server. The resulting models were evaluated using both quality assessment scores and the native crystal structure 1ogs for comparison. Technical details are reported in our protocol.

Template selection

We used HHsearch for searching the PDB for homologous templates. <xr id="tab:templates"/> lists some of the top-ranking templates. 2nt0_A is identical to the target 1ogs_A and was therefore excluded. Although all listed hits are homologous to the target (HHsearch probability > 97%), their sequence identity was below 30%. We therefore selected 2wnw_A (blue) as a close homolog, 2y24_A (green) as an intermediate homolog, and 3nco_A (yellow) as a more distant homolog. Note that the latter two templates to not cover the complete target which makes the homology modeling process harder. <figtable id="tab:templates">

Hit Nr	Template	Identity	Query HMM	Prob
> 80% sequence identity
1	2nt0_A	100.0	1-497	100.0
40% - 80% sequence identity

< 30% sequence identity
2	2wnw_A	28.0	36-496	100.0
3	3clw_A	14.0	64-495	100.0
4	2y24_A	18.0	66-495	100.0
5	3kl0_A	19.0	65-495	100.0
6	3zr5_A	17.0	65-494	100.0
7	3ik2_A	14.0	65-495	99.2
22	3nco_A	11.0	113-384	97.7
28	1egz_A	12.0	113-387	97.4

Homologs found by HHsearch. Bold: selected templated used for the following modeling. </figtable>

Modeller

Modeller is a popular tool for building models by the satisfaction of spatial restrains which are derived, for instance, from one or several target-template alignments. The alignments can come from any alignment or homology search tool, or they can be built by modeller itself. We created models for our target protein by (1) using a single template and employing Modeller to compute the alignment, (2) using a single template and the alignment from HHsearch, and (3) using multiple templates. The Model quality was assessed via the DOPE and DOPE z-score reported by Modeller as well as the QMEAN6 score from the SWISS-MODEL workspace. We compared the resulting models to the crystal structure 1ogs_A via the weighted all-atom RMSD score computed by SAP, as well as the TM-score, GDT_TS, and GTD_HA score which we computed by the program TMscore.

Single-template modeling using Modeller alignments

<xr id="tab:modeller-single-models"/> shows the resulting models. 2wnw_A produced the best looking model: all major secondary structure elements coincided with the native structure 1ogs_A (red). Only the target range 1-31 which was not covered by template (cf. <xr id="tab:templates"/>) resulted in some deviating loop regions (at the top right corner). Although 2y24_A is less conserved than 2wnw_A, the model came close to the native structure. 3nco_A shares the same TIM beta/alpha-barrel domain (the tube at the center) than 1ogs_A but is missing the glycosyl hydrolase domain (the sheets at the right side) such that the model was less well structured in this region.

2wnw_A

2y24_A

3nco_A

Dope score per residue.

Models built by Modeller using single templates and alignments computed by Modeller itself. Red: 1ogs_A. </figtable>

<xr id="tab:modeller-single-eval"/> shows the evaluation results. Both the quality assessment scores and the comparison to the native structure 1ogs_A were in line with the observations described above. The model derived from 2wnw_A had a lower energy (was more stable) than the models from 2y24_A and 3nco_A, and it better matched with 1ogs_A. Except for the DOPE z-score which should have been lower for 2y24_A, the assessment scores correlated well with the structure comparison scores. The dope score per residue (cf. <xr id="tab:modeller-single-models"/>) of all three models were correlated and lowest for 2wnw_A.

Template	DOPE	DOPE z-score	QMEAN6	RMSD	TM-score	GDT_TS	GTD_HA

2wnw_A	-55925	-0.471	0.689	1.006	0.824	0.661	0.479
2y24_A	-47194	0.777	0.376	1.222	0.550	0.294	0.223
3nco_A	-44033	1.224	0.139	2.158	0.252	0.093	0.043

Evaluation of models built by Modeller using single templates and alignments computed by Modeller itself. </figtable>

Single-template modeling using HHsearch alignments

Modeller computes alignments by aligning the target sequence to the known structure of the template. Hence, no predicted features of the target sequence are used. Instead, HHsearch computes an HMM-to-HMM alignment where the target HMM comprises more features than the sequence alone. It also contains the secondary structure predicted by PSIPRED and information about the conservation of all residues derived from a sequence profile. HHsearch alignments are therefore thought to be more accurate than those produced by Modeller which should lead to better models. Thus we tried to improve the models by using HHsearch alignments.

2wnw_A

2y24_A

3nco_A

Comparison of Modeller and HHsearch alignments. </figtable>

<xr id="tab:modeller-hhsearch-alis"/> depicts HHsearch alignments compared to Modeller alignments. The alignments produced by HHsearch were more compact, i.e. they exhibited more gaps at the beginning and at the end of sequences, whereas gaps were more distributed in case of alignments computed by modeller. In comparing the resulting models of <xr id="tab:modeller-hhsearch-models"/> with those of <xr id="tab:modeller-single-models"/>, we found that the core regions better matches the native structure in case of using HHsearch alignments. However, Modeller could not find the correct topology for the ends of the target sequence which were not covered by the template. These regions just became an unfolded threads stretching out over the space. Surprisingly, the dope score per residue was relatively low for unfolded regions.

2wnw_A

2y24_A

3nco_A

Dope score per residue.

Models built by Modeller using HHsearch alignments covering the complete target. Red: 1ogs_A. </figtable>

The DOPE score of the models of <xr id="tab:modeller-hhsearch-models"/> was slightly higher compared to the models of <xr id="tab:modeller-single-models"/>, but they were more stable according to the QMEAN6 score. The RMSD score increased due to the unfolded ends of the target sequence. However, the TM-score and the GDT scores improved since the actual core to the target became closer to the native structure.

Template	DOPE	DOPE z-score	QMEAN6	RMSD	TM-score	GDT_TS	GTD_HA

2wnw_A	-54695	-0.295	0.726	1.079	0.869	0.732	0.538
2y24_A	-47256	0.765	0.566	1.640	0.722	0.553	0.386
3nco_A	-32577	2.857	0.272	9.757	0.398	0.235	0.131

Evaluation of models built by Modeller using HHsearch alignments covering the complete target. </figtable>

Single-template modeling using local HHsearch alignments

Since the ends of the sequences could not be modelled correctly by Modeller and might impact the model evaluation, we repeated the modeling using only the regions covered the template and not the ends mentioned above. This resulted in more dense models without the long unfolded threads (cf. <xr id="tab:modeller-hhsearch-local-models"/>).

2wnw_A

2y24_A

3nco_A

Dope score per residue.

Models built by Modeller using local HHsearch alignments covering the complete target. Red: 1ogs_A. </figtable>

Although the DOPE score became worse since the target was truncated, the QMEAN6 score could be further increased. The RMSD could be reduced, in particular in case of 3nco_A, whereas the TM-score and GDT score remained the same. This demonstrated the sensitivity of the RMSD for small deviation whereas the TM-score and GDT score better evaluate the actual fit of the core region.

Template	DOPE	DOPE z-score	QMEAN6	RMSD	TM-score	GDT_TS	GTD_HA

2wnw_A	-53593	-0.744	0.749	1.003	0.866	0.731	0.532
2y24_A	-46290	-0.030	0.564	1.162	0.724	0.563	0.381
3nco_A	-25047	0.886	0.451	1.827	0.402	0.245	0.138

Evaluation of models built by Modeller using HHsearch alignments covering the complete target. </figtable>

Multiple-template modeling using Modeller alignments

For testing whether multiple templates could improve the model quality, we prepared three groups of templates: (1) 2wnw_A, 2y24_A, and 3kl0_A as close homologs, (2) 2wnw_A and 3nco_A as a close and a more distant homolog, and (3) 1ogs_A, 3ik2_A, and 3nco_A-1egz as remote homologs. A multiple alignment was created for each group via Modeller to which the target was added afterwards. The resulting alignment was used as input for Modeller.

2wnw_A-2y24_A-3kl0_A

2wnw_A-3nco_A

3ik2_A-3nco_A-1egz_A

Dope score per residue.

Models built by Modeller using multiple templates. Red: 1ogs_A. </figtable>

<xr id="tab:modeller-mult-models"/> shows the resulting models. None of the template combinations led to better model than using 2wnw_A alone (cf. <xr id="tab:modeller-single-models"/>). We assume that 2wnw_A is more similar to 1ogs_A than 2y24_A and 3kl0_A are. The latter two templates rather drew the model away from the true structure 1ogs_A and did not contain additional information which helped building a better model. Surprisingly, using 2wnw_A in combination with 3nco_A resulted in a better model than using two templates with a higher sequence identity. Taking advantage of more than one template improves the model quality in particular if templates cover different region, e.g. different domains, of the target sequence. However, this had not been the case for our target since 2wnw_A already covered the largest fraction of it (cf. <xr id="tab:templates"/>).

Template	DOPE	DOPE z-score	QMEAN6	RMSD	TM-score	GDT_TS	GTD_HA

2wnw_A-2y24_A-3kl0	-39084	1.930	0.314	1.987	0.465	0.277	0.179
2wnw_A-3nco_A	-46962	0.807	0.556	1.256	0.778	0.593	0.409
3ik2_A-3nco_A-1egz_A	-25391	0.3881	0.101	6.607	0.241	0.062	0.029