Difference between revisions of "Glucocerebrosidase homology modelling"

Revision as of 15:13, 11 June 2011

General

In this section, different homology modelling approaches are presented to predict the structure of glucocerebrosidase.

Homology Modelling

Homology modelling refers to predicting the structure of a protein based on the known three-dimensional structures of homologous proteins (referred to as 'templates') and is based on two observations: the structure of a protein is uniquely determined by its amino acid sequence and it is better conserved and evolves more slowly during evolution than the corresponding sequence.
It can be devided into seven steps: <ref>Bourne P., Weissig H. (2003) Structural Bioinformatics. Wiley-Liss, Inc., Hoboken, New Jersey.</ref>

Template recognition and initial alignment
Alignment correction
Backbone generation
Loop modelling
Side-chain modelling
Model optimization
Model validation

Template Structures

The different homology modelling approaches will be carried out with the protein sequence of 1OGS (PDB) instead of the sequence of P04062 (Uniprot), as the latter contains the signal peptide which is not present in the mature protein and therefor not needed to be modeled. To retrieve homologous structures, HHSearch<ref>http://toolkit.tuebingen.mpg.de/hhpred</ref> was used to search against the database pdb70 as of 26 May 2011. The 10 best results of this search are listed in the table below. Interestingly, only homologous structures of bacteria have been found. The structures used as template in the different modelling approaches are marked with an X. As no sequences with an identity in the range of 99 to 40 percent have been found, only sequences with an identity below 40% could be used.

> 60% sequence identity
PDB-ID	name	organism	identity	template
2nt0	Glucosylceramidase	Homo Sapiens	99%
> 40% sequence identity
PDB-ID	name	organism	identity	template
> 0% sequence identity
PDB-ID	name	organism	identity	template
2wnw	SrfJ	Salmonella enterica subsp. enterica	28%	x
3clw	conserved exported protein	Bacteroides fragilis	13%
3kl0	Glucuronoxylan Xylanohydrolase	Bacillus subtilis	19%	x
1nof	xylanase	Erwinia chrysanthemi	18%
1qw9	Arabinosidase	Geobacillus stearothermophilus	14%
3ii1	Cellulase	Uncultured bacterium	13%
1uhv	Beta-Xylosidase	Thermoanaerobacterium saccharolyticum	13%
2e4t	Endoglucanase	Clostridium thermocellum	11%	x
2c7f	alpha-L-Arabinofuranosidase	Clostridium thermocellum	16%

Evaluation Measurements

To evalute the predicted models, two different measurements will be used additional to the numeric evaluations given by the modelling programs themselves:

Cα RMSD:

The Cα Root Mean Square deviation describes the distance between the backbone atoms of two superimposed structures and is therefore a good measure to assess how close the predicted and the reference structure are. DaliLite <ref>http://www.ebi.ac.uk/Tools/dalilite/index.html</ref>, a tool, which performes a rigid body superposition and calculates the Cα RMSD for two given PDB-files, is used in this analysis.

TM score:

The Template Modeling Score is a measure of similarity between two different protein structures that is more accurate and sensitive than the RMSD. The differences between two structures is indicated by a score between zero and one, where the latter describes a perfect match. in this analysis, the TM score is calculated with TM-score<ref>http://zhanglab.ccmb.med.umich.edu/TM-score/</ref>, an online version of the Zhang Lab of the University of Michigan.

MODELLER

MODELLER is a method for comparative protein structure modelling, provided by satisfaction of spatial restraints. In the simplest case, the most probable structure for a given sequence can be found based on its alignment with related structures. Additional to model building, MODELLER can perform several other tasks including fold assignment, pairwise/ multiple alignments of protein sequences, calculation of phylogenetic trees, and de novo modeling of loops in protein structures. The method was published by Sali and Blundell in 1993. <ref>A. Sali & T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993.</ref>

Usage

Website with tutorials and download information: http://salilab.org/modeller/
Description of the steps applied in this analysis: Detailed Workflow

Results

The results presented and discussed in this section, where retrieved according to this workflow. At first, MODELLER was used to build models based on one template structure. To do so, pairwise alignments of 1OGS with the different template structures (2WNW, 3KL0 and 2E4T) were created and used as input for MODELLER. Figure 1 shows the resulting models aligned to the structure of 1OGS visualized with the tool Pymol. One can see, that the results differ greatly. The models based on the templates 3KL0 and 2E4T vary greatly from the reference structure, whereas the model based on 2WNW seems to be quite good in large parts of the protein. These visual interpretations will be further examined and validated by different measurements, as discribed in the section below.

Figure 1: Results of MODELLER for pairwise sequence alignments.

As MODELLER can build models based in multiple sequence alignments as well, it was ivestigated whether this might improve the structure prediction. The templates 2WNW and 2E4T were chosen for this analysis, as the models based on pairwise alignments with them seemed to be better, than the model obtained with the template of 3KL0. The result of the modeling procedure with MODELLER and this multiple sequence alignment is shown below, in Figure 2. The structure of the model consists mostly of loops and only rarely a defined secondary structure is present. Overall the structure consists of 5 small helices and 2 small sheets. In this case, the multiple sequence alignment did not help at all to predict the structure of glucocerebrosidase: the results obtained by pairwise sequence alignments are significantly better. But this may not be true in general.

Figure 2: Results of MODELLER for multiple sequence alignment with 2WNW and 2E4T.

Analysis

The results of MODELLER, described in the section above, are validated by calculating different measures.

iTasser

Results

Analysis

SWISS-MODEL

SWISS-MODEL workspace was published by Arnold et al. in 2005. <ref> Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref>

Results

Using the standard output alignment of ClustalW2, the workunit of Swiss-Model got aborted: too many unfruitful attempts to rebuild a loop were tried. This indicates, that the alignment is not good and that it has to be adjusted.

Analysis

References

@@ Line 1: / Line 1: @@
 == General ==
+In this section, different homology modelling approaches are presented to predict the structure of glucocerebrosidase.
+=== Homology Modelling ===
+Homology modelling refers to predicting the structure of a protein based on the known three-dimensional structures of homologous proteins (referred to as 'templates') and is based on two observations: the structure of a protein is uniquely determined by its amino acid sequence and it is better conserved and evolves more slowly during evolution than the corresponding sequence.<br/>
+It can be devided into seven steps: <ref>Bourne P., Weissig H. (2003) Structural Bioinformatics. Wiley-Liss, Inc., Hoboken, New Jersey.</ref>
+# Template recognition and initial alignment
+# Alignment correction
+# Backbone generation
+# Loop modelling
+# Side-chain modelling
+# Model optimization
+# Model validation
 === Template Structures ===
@@ Line 48: / Line 63: @@
 :The C&alpha; Root Mean Square deviation describes the distance between the backbone atoms of two superimposed structures and is therefore a good measure to assess how close the predicted and the reference structure are. DaliLite <ref>http://www.ebi.ac.uk/Tools/dalilite/index.html</ref>, a tool, which performes a rigid body superposition and calculates the C&alpha; RMSD for two given PDB-files, is used in this analysis.
 * '''TM score:'''
 :The Template Modeling Score is a measure of similarity between two different protein structures that is more accurate and sensitive than the RMSD. The differences between two structures is indicated by a score between zero and one, where the latter describes a perfect match. in this analysis, the TM score is calculated with TM-score<ref>http://zhanglab.ccmb.med.umich.edu/TM-score/</ref>, an online version of the Zhang Lab of the University of Michigan.
 == MODELLER ==

Difference between revisions of "Glucocerebrosidase homology modelling"

Revision as of 15:13, 11 June 2011

Contents

General

Homology Modelling

Template Structures

Evaluation Measurements

MODELLER

Results

Analysis

iTasser

Results

Analysis

SWISS-MODEL

Results

Analysis

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools