Homology Modeling of ARS A
Contents
HHpred
We used the webserver and
Modeller
We wrote a modeling tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis.
Proteins used as templates
We identified the following proteins (see Alignment TASK) as potential targets for homology modeling:used the following
SeqIdentifier | Seq Identity (from TASK 2) | source | Protein function | True homolog (HSSP) | Seq Identity (pairw. ali.) | Active site | Substrate binding site | Metal binding site |
---|---|---|---|---|---|---|---|---|
1P49 | 39.0% | Homo Sapiens | Steryl-Sulfatase | yes | 31.9% | 136 | 333, 459 | 35, 36, 75, 342, 343, |
1FSU | 28.0% | Homo Sapiens | Arylsulfatase B | yes | 26.5% | 147 | 145, 242, 318 | 53, 54, 91, 300, 301 |
2VQR | 20.0% | Rhizobium leguminosarum | Monoester Hydrolase | no | 20.3% | not avail. | not avail. | 12, 57, 324, 325 |
3ED4 | 32.0% | Escherichia coli | Arylsulfatase | yes | 27.7% | not avail. | not avail. | not avail. |
ARSA | - | Homo Sapiens | - | - | - | 125 | 123, 150, 229, 302 | 29, 30, 69, 281, 282 |
Our potential templates, identified by the database searches contain all homologs with known structure, regarding to HSSP.
Single template modelling
In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. alignment.malign()
uses classical dynamic programming to align two sequences. alignment.alig2dn()
also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequnece alignments of the above templates with the target. The script used for this purpose is shown below:
from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='template_name', model_segment=('FIRST:@', 'END:'))
aln.append_model(mdl, align_codes='template_name', atom_files='template_name')
aln.append(file='1AUK.pir', align_codes='target_name')
aln.align2d()
aln.check()
aln.write(file='target-template-2d.ali', alignment_format='PIR')
aln.malign()
aln.check()
aln.write(file='target-template.ali', alignment_format='PIR')
For these alignments we constructed eight models, using the following script:
from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
a = automodel(env,
alnfile = '1AUK-1FSU-2d.ali',
knowns = '1FSU',
sequence = '1AUK',
assess_methods=(assess.DOPE, assess.GA341))
a.starting_model= 1
a.ending_model = 1
a.make()
We modified the paths and filenames in the scripts such that it matched our proteins of interest.
Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. The results are depicted in the table below.
Further on, we visualised the models using pymol. We load both structures into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:
align 1AUK, MODEL
hide all
show cartoon
# select color of modelled structure via graphical interface
ray
cmd.png("MODEL.png")
Alignment method | 1P49 | 2VQR | 1FSU | 3ED4 |
Classical Dynamic Programming |
||||
Dynamic Programming with structural information from the template |
3ED4
Alignment method | 3ED4A | 3ED4B | 3ED4C | 3ED4D |
Dynamic Programming with structural information from the template |
Modification of Alignments
Using 1P49 as template structure for the modeling process yielded the best results, thus we decided to manually modify this alignment to see, if we can improve the model. We made sure, that there are no gaps in secondary structure elements and modified the alignment such that active site, substrate binding sites and metal binding sites were aligned. Altogether, we performed the following changes:
- The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment.
- The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
- The gap between residues 190-191 was moved out of beta strand between residues 191-192.
- All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
- The gap between 290-291 was moved to the right end of the helix.
We used JalView to do this. Figures of the alignment before and after the changes are depicted on the right.
Surprisingly, the TM-score was decreased to 0.5561.
TM-scores and RMSD of the single template models
We downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using
gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f
TMscores were calculated as follows:
./TMscore MODEL.pdb REAL_STRUCTURE.pdb
PDB Identifier | TM-score | RMSD | |
---|---|---|---|
Dynamic Programing with structural information | |||
1P49 | 0.7960 | - | |
2VQR | 0.4825 | - | |
1FSU | 0.7146 | - | |
3ED4 | 0.3881 | - | |
3ED4A | 0.7268 | - | |
3ED4B | 0.7251 | - | |
3ED4C | 0.6518 | - | |
3ED4D | 0.7303 | - | |
Dynamic Programing without structural information | |||
1P49 | 0.7731 | - | |
2VQR | 0.3183 | - | |
1FSU | 0.7223 | - | |
3ED4 | 0.3122 | - |
Multiple Template Modeling
We calculated three models:
- Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
- Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
- Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
Model 1 | Model 2 | Model 3 |
PDB Identifier | TM-score | RMSD |
---|---|---|
model1 | 0.5409 | - |
model2 | 0.6701 | - |
model3 | 0.6819 | - |
Initial multiple structural alignment:
from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:./'
aln = alignment(env)
for (code, chain) in (('PROTEIN', 'CHAIN'), ('ANOTHER_PROTEIN', 'ANOTHER_CHAIN'), ...):
mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.write(file='mymas.pap', alignment_format='PAP')
aln.write(file='mymsa.ali', alignment_format='PIR')
Add target sequence to MSA:
from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='mymsa.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='1AUK.pir', align_codes='1AUK')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign()
aln.write(file='mymsa-1AUK.ali', alignment_format='PIR')
aln.write(file='mymsa-1AUK.pap', alignment_format='PAP')
Calculate the model:
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='msa2-1AUK.ali',
knowns=('PROTEIN', 'ANOTHER_PROTEIN', ...), sequence='1AUK')
a.starting_model = 1
a.ending_model = 1
a.make()
Modification of Alignments
iTasser
iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was the ranked best in CASP7 and CASP8 and ranked second in CASP9. iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
Modelling without template
Modelling with single template
Discussion
SWISS-MODEL
SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated mode.
Modelling without template
In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.
Results
Modelling with single template
It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.
1P49
1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates.
2VQR
2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates.
References
<references />