Difference between revisions of "Glucocerebrosidase homology modelling"
(→Evaluation Measurements) |
|||
Line 65: | Line 65: | ||
:The Cα Root Mean Square deviation describes the distance between the backbone atoms of two superimposed structures and is therefore a good measure to assess how close the predicted and the reference structure are. DaliLite <ref>http://www.ebi.ac.uk/Tools/dalilite/index.html</ref>, a tool, which performes a rigid body superposition and calculates the Cα RMSD for two given PDB-files, is used in this analysis. |
:The Cα Root Mean Square deviation describes the distance between the backbone atoms of two superimposed structures and is therefore a good measure to assess how close the predicted and the reference structure are. DaliLite <ref>http://www.ebi.ac.uk/Tools/dalilite/index.html</ref>, a tool, which performes a rigid body superposition and calculates the Cα RMSD for two given PDB-files, is used in this analysis. |
||
* '''TM score:''' |
* '''TM score:''' |
||
− | :The Template Modeling Score is a measure of similarity between two different protein structures that is more accurate and sensitive than the RMSD. The differences between two structures is indicated by a score between zero and one, where the latter describes a perfect match. in this analysis, the TM score is calculated with TM-score<ref>http://zhanglab.ccmb.med.umich.edu/TM-score/</ref>, an online version |
+ | :The Template Modeling Score is a measure of similarity between two different protein structures that is more accurate and sensitive than the RMSD. The differences between two structures is indicated by a score between zero and one, where the latter describes a perfect match. in this analysis, the TM score is calculated with TM-score<ref>http://zhanglab.ccmb.med.umich.edu/TM-score/</ref>, an online version from the Zhang Lab of the University of Michigan. |
== MODELLER == |
== MODELLER == |
Revision as of 20:41, 11 June 2011
Contents
General
In this section, different homology modelling approaches are presented to predict the structure of glucocerebrosidase.
Homology Modelling
Homology modelling refers to predicting the structure of a protein based on the known three-dimensional structures of homologous proteins (referred to as 'templates') and is based on two observations: the structure of a protein is uniquely determined by its amino acid sequence and it is better conserved and evolves more slowly during evolution than the corresponding sequence.
It can be devided into seven steps: <ref>Bourne P., Weissig H. (2003) Structural Bioinformatics. Wiley-Liss, Inc., Hoboken, New Jersey.</ref>
- Template recognition and initial alignment
- Alignment correction
- Backbone generation
- Loop modelling
- Side-chain modelling
- Model optimization
- Model validation
Template Structures
The different homology modelling approaches will be carried out with the protein sequence of 1OGS (PDB) instead of the sequence of P04062 (Uniprot), as the latter contains the signal peptide which is not present in the mature protein and therefor not needed to be modeled. To retrieve homologous structures, HHSearch<ref>http://toolkit.tuebingen.mpg.de/hhpred</ref> was used to search against the database pdb70 as of 26 May 2011. The 10 best results of this search are listed in the table below. Interestingly, only homologous structures of bacteria have been found.
The structures used as template in the different modelling approaches are marked with an X. As no sequences with an identity in the range of 99 to 40 percent have been found, only sequences with an identity below 40% could be used.
> 60% sequence identity | ||||
PDB-ID | name | organism | identity | template |
2nt0 | Glucosylceramidase | Homo Sapiens | 99% | |
> 40% sequence identity | ||||
PDB-ID | name | organism | identity | template |
> 0% sequence identity | ||||
PDB-ID | name | organism | identity | template |
2wnw | SrfJ | Salmonella enterica subsp. enterica | 28% | x |
3clw | conserved exported protein | Bacteroides fragilis | 13% | |
3kl0 | Glucuronoxylan Xylanohydrolase | Bacillus subtilis | 19% | x |
1nof | xylanase | Erwinia chrysanthemi | 18% | |
1qw9 | Arabinosidase | Geobacillus stearothermophilus | 14% | |
3ii1 | Cellulase | Uncultured bacterium | 13% | |
1uhv | Beta-Xylosidase | Thermoanaerobacterium saccharolyticum | 13% | |
2e4t | Endoglucanase | Clostridium thermocellum | 11% | x |
2c7f | alpha-L-Arabinofuranosidase | Clostridium thermocellum | 16% |
2WNW belongs to the same family as 1OGS: O-Glycosyl hydrolase family 30 and has a glucosylceramidase activity as well. Therefore 2WNW should be a good template despite the low sequence identity. 3KL0 is a member of O-Glycosyl hydrolase family 30 as well, but does not carry out glucosylceramidase activity. 2E4T belongs to the glycoside hydrolase family.
Evaluation Measurements
To evalute the predicted models, two different measurements will be used additional to the numeric evaluations given by the modelling programs themselves:
- Cα RMSD:
- The Cα Root Mean Square deviation describes the distance between the backbone atoms of two superimposed structures and is therefore a good measure to assess how close the predicted and the reference structure are. DaliLite <ref>http://www.ebi.ac.uk/Tools/dalilite/index.html</ref>, a tool, which performes a rigid body superposition and calculates the Cα RMSD for two given PDB-files, is used in this analysis.
- TM score:
- The Template Modeling Score is a measure of similarity between two different protein structures that is more accurate and sensitive than the RMSD. The differences between two structures is indicated by a score between zero and one, where the latter describes a perfect match. in this analysis, the TM score is calculated with TM-score<ref>http://zhanglab.ccmb.med.umich.edu/TM-score/</ref>, an online version from the Zhang Lab of the University of Michigan.
MODELLER
MODELLER is a method for comparative protein structure modelling, provided by satisfaction of spatial restraints. In the simplest case, the most probable structure for a given sequence can be found based on its alignment with related structures. Additional to model building, MODELLER can perform several other tasks including fold assignment, pairwise/ multiple alignments of protein sequences, calculation of phylogenetic trees, and de novo modeling of loops in protein structures. The method was published by Sali and Blundell in 1993. <ref>A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.</ref>
Usage
- Website with tutorials and download information: http://salilab.org/modeller/
- Description of the steps applied in this analysis: Detailed Workflow
Results
The results presented and discussed in this section, were retrieved according to this workflow. At first, MODELLER was used to build models based on one template structure. To do so, pairwise alignments of 1OGS with the different template structures (2WNW, 3KL0 and 2E4T) were created and used as input for MODELLER. Figure 1 shows the resulting models aligned to the structure of 1OGS visualized with the tool Pymol. One can see, that the results differ greatly. The models based on the templates 3KL0 and 2E4T vary greatly from the reference structure, whereas the model based on 2WNW seems to be quite good in large parts of the protein. These visual interpretations will be further examined and validated by different measurements, as discribed in the section below.
As MODELLER can build models based in multiple sequence alignments as well, it was ivestigated whether this might improve the structure prediction. The templates 2WNW and 2E4T were chosen for this analysis, as the models based on pairwise alignments with them seemed to be better (concerning secondary structure elements), than the model obtained with the template of 3KL0. The result of the modeling procedure with MODELLER and this multiple sequence alignment is shown below, in Figure 2. The structure of the model consists mostly of loops and only rarely a defined secondary structure is present. Overall the structure consists of 5 small helices and 2 small sheets. In this case, the multiple sequence alignment did not help at all to predict the structure of glucocerebrosidase: the results obtained by pairwise sequence alignments are significantly better. But this may not be true in general.
Analysis
The results of MODELLER, described in the section above, are validated by calculating the corresponding TM-Score and Cα RMSD. The resulting values are listed in the table below. As already suggested after the visual interpretation, the model build on template 2WNW is by far the best one: The TM-Score is with 0.8 close to 1 and the RMSD is below 2Å. The RMSD of 3KL0 is better than the one of 2E4T, whereas the latter has a higher TM-Score. This indicates, that the secondary elements are better predicted in the model based on 2E4T, whereas the backbone of the model based on 3KL0 is more similar to the one of 1OGS. This goes along with the fact observed in the section before: The structure of the 2E4T model seems to fit better to 1OGS than the one 3KL0 comparing the secondary structure elements. The model obtained via the multiple sequence alignment has a RMSD of 5, which is quite bad. Interestingly, the TM-Score is better than the one of 3KL0, although the structure of the 3KL0 model seems to much better.
Comparison to 1OGS | ||||
Template 2WNW | Template 3KL0 | Template 2E4T | Template 2E4T & 2WNW | |
TM-Score | 0.8094 | 0.1886 | 0.2171 | 0.2322 |
Cα RMSD | 1.9 | 2.6 | 3.6 | 5.0 |
I-TASSER
I-TASSER (iterative threading assembly refinement) is a unified platform for automated protein structure and function prediction. The model is generated by first creating a three-dimensional atomic model based on multiple threading alignments (possible template proteins found with LOMETS in PDB library) and iterative structural assembly simulations in the second step. Additional to creating a structural model, I-TASSER assigns EC-numbers, GO-terms and binding sites by structurally matching the model to known proteins. The method was established by Zhang et al. in 2007. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
Usage
- Webserver: http://zhanglab.ccmb.med.umich.edu/I-TASSER/
- Description of the steps applied in this analysis: Detailed Workflow
Results
Analysis
SWISS-MODEL
The SWISS-MODEL workspace is a web-based protein structure homology modelling service, which was published by Arnold et al. in 2005. It consists of three different modelling modes: automated mode, alignment mode and project mode. The template structure database that is used by SWISS-MODEL, called SMTL, is derived from the Protein Data Bank entries, which got split into individual chains. <ref> Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref>
Usage
- Webserver: http://swissmodel.expasy.org/
- Description of the steps applied in this analysis: Detailed Workflow
Results
The alignment modelling process did work for the templates 2E4T and 3KL0. Using the standard output alignment of ClustalW2, the workunit of Swiss-Model got aborted when trying to create the model with the template 2WNW: 'too many unfruitful attempts to rebuild a loop were tried.' This indicates, that the alignment is not good and that it has to be adjusted. With the intention to modify the alignment, it was loaded into JalView. An analysis of the alignment showed, that the active sites and the residues forming hydrogen bonds with the active sites (listed in the table below) were already correctly aligned and therefore should be optimal. <ref>Kim et al., Crystal Structure of the Salmonella enterica Serovar Typhimurium Virulence Factor SrfJ, a Glycoside Hydrolase Family Enzyme. Journal of Bacteriology, 2009, p. 6550-6554, Vol. 191, No. 21 </ref> SWISS-MODEL seemed to have a problem with one gap of length one. After removing this gap, SWISS-MODEL was able to perform the modelling process, although the alignment is worse than before (Arg87 of 2WNW not aligned to Arg120).
2WNW | 1OGS | |
Residues forming the Active Site | ||
Glu196 Glu294 |
Glu235 Glu340 | |
Residues forming hydrogen bonds with active site | ||
Arg87 Asp239 His268 |
Arg120 Asp282 His311 |
The results of the modelling process of SWISS-MODEL obtained with the Alignment Mode are shown in Figure X. As one can see in the image, the model based on the template 2WNW seems to fit best, although there were the modeling problems with the initial alignment. The other modeled structures adopt similar folds as the reference structure of 1OGS.
The Automated Mode of SWISS-MODEL was checked as well, although indicated that this mode only works for structures with a high sequence identity. The modelling process for the template 3KL0 got aborted.
Figure Z shows the structures of the Alignment and the Automated Mode in comparison.
A structural model for glucocerebrosidase was obtained as well by choosing the complete automated mode (without specifing a template structure). SWISS-MODEL used the structure 2NT0 in this case as template. As this structure is a self hit, the resulting structure is almost identical with the reference one.
Analysis
Comparison to 1OGS Chain A | |||||||
Alignment Mode | Automated Mode with specified temp. |
Automated Mode without specified temp. | |||||
2WNW | 3KL0 | 2E4T | 2WNW | 3KL0 | 2E4T | 2NT0 | |
TM-Score | 2.3 | 2.5 | 3.6 | 2.1 | - | 3.5 | 0.5 |
Cα RMSD | 0.7766 | 0.4839 | 0.1860 | 0.8590 | - | 0.4296 | 0.9961 |
References
<references />