Difference between revisions of "Workflow homology modelling glucocerebrosidase"

Latest revision as of 04:03, 25 August 2011

Detailed workflow of the different homology modelling approaches for glucocerebrosidase. Return to overview.

MODELLER

Pairwise Sequence Alignments

1. Preparation of the Alignment File

Save target protein sequence in PIR-format: TARGET.pir
Save PDB-file of template sequence: TEMPLATE.pdb
If PDB-file consists of several chains: split pdb file with the help of splitpdb (note that minor changes are needed, so that ATOM coordinates get listed in the resulting PDB-file instead of HETATOMS).
Run the following Python script with command 'mod9.9 align.py' to create a target-template alignment in PIR-format:

log.verbose()
env = environ()
aln = alignment(env)
mdl= model(env, file='TEMPLATE')
aln.append_model(mdl, align_codes='TEMPLATE')
aln.append(file='TARGET.pir', align_codes=('TARGET'))
aln.align(gap_penalties_1d=(-600,-400))
aln.write(file='TARGET_TEMPLATE.ali', alignment_format='PIR')
aln.write(file='TARGET_TEMPLATE.pap', alignment_format='PAP')

2. Modelling of the Target Structure

Run the following Python script with command 'mod9.9 model.py' to model the structure of the target sequence:
Note that all files (alignment- and structure file) must be in the same folder

from modeller.automodel import *
log.verbose()
env = environ()
env.io.atom_files_directory = 
a = automodel (env, alnfile = 'TARGET_TEMPLATE.ali', knowns = 'TEMPLATE', sequence = 'TARGET')
a.starting_model = 1
a.ending_model = 1
a.make()

Multiple Sequence Alignments

1. Preparation of the Alignment File

Save target protein sequence in PIR-format: TARGET.pir
Save PDB-files of template sequences: TEMPLATE_x.pdb
Run the following Python script with command 'mod9.9 align_templates.py' to create an alignment of the templates in PIR-format:

log.verbose()
env = environ()
aln = alignment(env)
for (code, chain) in ((TEMPLATE_1, CHAIN), (TEMPLATE2, CHAIN), ...):
  mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
  aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.write(file='msa.ali', alignment_format='PIR')
aln.write(file='msa.pap', alignment_format='PAP')

Run the following Python script with command 'mod9.9 align_target.py' to add the TARGET sequence to the multiple sequence alignment:

from modeller import *
log.verbose()
env = environ()
aln = alignment(env)
aln.append(file='msa.ali', align_codes='all')
aln_block = len(aln)
aln.append(file='TARGET.pir', align_codes='TARGET')
aln.salign()
aln.write(file='msa.ali', alignment_format='PIR')
aln.write(file='msa.pap', alignment_format='PAP')

2. Modelling of the Target Strucutre

Modify the following line of the python script given for the pairwise sequence alignments:

Original: a = automodel (env, alnfile = 'TARGET_TEMPLATE.ali', knowns = 'TEMPLATE', sequence = 'TARGET')

Modification: a = automodel(env, alnfile='msa.ali', knowns=('TEMPLATE_1', 'TEMPLATE_2', ...), sequence='TARGET')

I-TASSER

Webserver: http://zhanglab.ccmb.med.umich.edu/I-TASSER/
Input: protein sequence in FASTA-format.

I-TASSER provides the possibility to exclude homologous structures with a certain sequence identity cut-off, which was used in this analysis as well.

SWISS-MODEL

Automated Mode

Webserver: http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1
Input: Sequence in fasta format and template PDB-ID (optional)

The automated mode should only be used, if target and template share more than 50% of sequence identity.

Alignment Mode

Webserver: http://swissmodel.expasy.org/workspace/index.php?func=modelling_align1
Input: Target-Template Alignment in different formats (FASTA, CLUSTALW, ...)

To create the Alignments needed as input, the tool ClustalW2 was used with standard settings. Additionally the Alignment created with MODELLER was used for 2WNW

Evaluation

DOPE-Score

The DOPE score can be calculated for models obtained with MODELLER with the following script:

from modeller import *
from modeller.scripts import complete_pdb
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
env.libs.parameters.read(file='$(LIB)/par.lib')
mdl = complete_pdb(env, 'MODEL.pdb')
atmsel = selection(mdl.chains[0])
score = atmsel.assess_dope()

All-Atom RMSD in Area of 6Å around Active Site

Define active site: residues E235 and E340 form active site of glucocerebrosidadse<ref>http://www.nature.com/embor/journal/v4/n7/full/embor873.html</ref>
Load reference structure (1OGS) into Pymol and select active site resiudes
Expand selection by 6Å (action -> modify -> expand -> by 6A, residues) and save selection (AS_REFERENCE)
Load model structure into Pymol and select same residues as in AS_REFERENCE and save selection (AS_MODEL)
Align selection AS_MODEL to AS_REFERENCE (action -> align -> to selection -> AS_REFERENCE)
All-Atom RMSD = RMS cycle 1 (in the latter cycles atoms get excluded)

References

@@ Line 3: / Line 3: @@
 == MODELLER ==
-=== Preparation of the Alignment File ===
+=== Pairwise Sequence Alignments ===
-# Save target protein sequence in PIR-format: target.pir
+'''1. Preparation of the Alignment File'''
-# Save PDB-file of template sequence: template:pdb
+* Save target protein sequence in PIR-format: TARGET.pir
-#: If PDB-file consists of several chains: split pdb file with the help of [http://structure.usc.edu/splitpdb/ splitpdb] (note that minor changes are needed, so that ATOM coordinates get listed in the resulting PDB-file instead of HETATOMS).
+* Save PDB-file of template sequence: TEMPLATE.pdb
-# Run the following Python script with command '<code>mod9.9 align.py</code>'  to create a target-template alignment in PIR-format:
+*: If PDB-file consists of several chains: split pdb file with the help of [http://structure.usc.edu/splitpdb/ splitpdb] (note that minor changes are needed, so that ATOM coordinates get listed in the resulting PDB-file instead of HETATOMS).
+* Run the following Python script with command '<code>mod9.9 align.py</code>'  to create a target-template alignment in PIR-format:
 <code>
- log.verbose()<br/>
+ log.verbose()
- env = environ()<br/>
+ env = environ()
- aln = alignment(env)<br/>
+ aln = alignment(env)
- mdl= model(env, file='template')<br/>
+ mdl= model(env, file='TEMPLATE')
- aln.append_model(mdl, align_codes='template')<br/>
+ aln.append_model(mdl, align_codes='TEMPLATE')
- aln.append(file='target.pir', align_codes=('target'))<br/>
+ aln.append(file='TARGET.pir', align_codes=('TARGET'))
- aln.align(gap_penalties_1d=(-600,-400))<br/>
+ aln.align(gap_penalties_1d=(-600,-400))
- aln.write(file='target_template.ali', alignment_format='PIR')<br/>
+ aln.write(file='TARGET_TEMPLATE.ali', alignment_format='PIR')
- aln.write(file='target_template.pap', alignment_format='PAP')</code>
+ aln.write(file='TARGET_TEMPLATE.pap', alignment_format='PAP')</code>
-=== Modelling of the Target Structure ===
+'''2. Modelling of the Target Structure'''
-# Run the following Python script with command '<code>mod9.9 model.py</code>' to model the structure of the target sequence:
+* Run the following Python script with command '<code>mod9.9 model.py</code>' to model the structure of the target sequence:
-#: Note that all files (alignment- and structure file) must be in the same folder
+*: Note that all files (alignment- and structure file) must be in the same folder
 <code>
- from modeller.automodel import * <br/>
+ from modeller.automodel import *
- log.verbose()<br/>
+ log.verbose()
- env = environ()<br/>
+ env = environ()
- env.io.atom_files_directory = ''<br/>
+ env.io.atom_files_directory = ''
- a = automodel (env, alnfile = 'target_template.ali', knowns = 'template', sequence = 'target')<br/>
+ a = automodel (env, alnfile = 'TARGET_TEMPLATE.ali', knowns = 'TEMPLATE', sequence = 'TARGET')
- a.starting_model = 1<br/>
+ a.starting_model = 1
- a.ending_model = 1<br/>
+ a.ending_model = 1
- a.make()<br/>
+ a.make()
 </code>
+=== Multiple Sequence Alignments ===
+'''1. Preparation of the Alignment File '''
+* Save target protein sequence in PIR-format: TARGET.pir
+* Save PDB-files of template sequences: TEMPLATE_x.pdb
+* Run the following Python script with command '<code>mod9.9 align_templates.py</code>'  to create an alignment of the templates in PIR-format:
+<code>
+ log.verbose()
+ env = environ()
+ aln = alignment(env)
+ for (code, chain) in ((TEMPLATE_1, CHAIN), (TEMPLATE2, CHAIN), ...):
+   mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
+   aln.append_model(mdl, atom_files=code, align_codes=code+chain)
+ aln.salign()
+ aln.write(file='msa.ali', alignment_format='PIR')
+ aln.write(file='msa.pap', alignment_format='PAP')
+</code>
+* Run the following Python script with command '<code>mod9.9 align_target.py</code>'  to add the TARGET sequence to the multiple sequence alignment:
+<code>
+ from modeller import *
+ log.verbose()
+ env = environ()
+ aln = alignment(env)
+ aln.append(file='msa.ali', align_codes='all')
+ aln_block = len(aln)
+ aln.append(file='TARGET.pir', align_codes='TARGET')
+ aln.salign()
+ aln.write(file='msa.ali', alignment_format='PIR')
+ aln.write(file='msa.pap', alignment_format='PAP')
+</code>
+'''2. Modelling of the Target Strucutre'''
+* Modify the following line of the python script given for the pairwise sequence alignments:
+: Original: <code>a = automodel (env, alnfile = 'TARGET_TEMPLATE.ali', knowns = 'TEMPLATE', sequence = 'TARGET')</code>
+: Modification: <code>a = automodel(env, alnfile='msa.ali', knowns=('TEMPLATE_1', 'TEMPLATE_2', ...), sequence='TARGET')</code>
 == I-TASSER ==
@@ Line 38: / Line 77: @@
 * Input: protein sequence in FASTA-format.
+I-TASSER provides the possibility to exclude homologous structures with a certain sequence identity cut-off, which was used in this analysis as well.
-== SWISS-MODELLER ==
+== SWISS-MODEL ==
 ''' Automated Mode '''
 * Webserver: http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1
-* Input: Sequence in fasta format
+* Input: Sequence in fasta format and template PDB-ID (optional)
 The automated mode should only be used, if target and template share more than 50% of sequence identity.
@@ Line 50: / Line 91: @@
 * Input: Target-Template Alignment in different formats (FASTA, CLUSTALW, ...)
-To create the Alignments needed as input, the tool [http://www.ebi.ac.uk/Tools/msa/clustalw2/ ClustalW2] was used in this case.
+To create the Alignments needed as input, the tool [http://www.ebi.ac.uk/Tools/msa/clustalw2/ ClustalW2] was used with standard settings. Additionally the Alignment created with MODELLER was used for 2WNW
+== Evaluation ==
+=== DOPE-Score ===
+The DOPE score can be calculated for models obtained with MODELLER with the following script:
+<code>
+ from modeller import *
+ from modeller.scripts import complete_pdb
+ env = environ()
+ env.libs.topology.read(file='$(LIB)/top_heav.lib')
+ env.libs.parameters.read(file='$(LIB)/par.lib')
+ mdl = complete_pdb(env, 'MODEL.pdb')
+ atmsel = selection(mdl.chains[0])
+ score = atmsel.assess_dope()
+</code>
+=== All-Atom RMSD in Area of 6Å around Active Site ===
+# Define active site: residues E235 and E340 form active site of glucocerebrosidadse<ref>http://www.nature.com/embor/journal/v4/n7/full/embor873.html</ref>
+# Load reference structure (1OGS) into Pymol and select active site resiudes
+# Expand selection by 6Å (action -> modify -> expand -> by 6A, residues) and save selection (AS_REFERENCE)
+# Load model structure into Pymol and select same residues as in AS_REFERENCE and save selection (AS_MODEL)
+# Align selection AS_MODEL to AS_REFERENCE (action -> align -> to selection -> AS_REFERENCE)
+# All-Atom RMSD = RMS cycle 1 (in the latter cycles atoms get excluded)
+== References ==
+<references/>
+[[Category:Gaucher_Disease]]

Difference between revisions of "Workflow homology modelling glucocerebrosidase"

Latest revision as of 04:03, 25 August 2011

Contents

MODELLER

Pairwise Sequence Alignments

Multiple Sequence Alignments

I-TASSER

SWISS-MODEL

Evaluation

DOPE-Score

All-Atom RMSD in Area of 6Å around Active Site

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools