Homology Modeling of ARS A
Contents
HHpred
We used the webserver and
Modeller
We wrote a modeling tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis.
Proteins used as templates
We identified the following proteins (see Alignment TASK) as potential targets for homology modeling:used the following
SeqIdentifier | Seq Identity (from TASK 2) | source | Protein function | True homolog (HSSP) | Seq Identity (pairw. ali.) | Active site | Substrate binding site | Metal binding site |
---|---|---|---|---|---|---|---|---|
1P49 | 39.0% | Homo Sapiens | Steryl-Sulfatase | yes | 31.9% | 136 | 333, 459 | 35, 36, 75, 342, 343, |
1FSU | 28.0% | Homo Sapiens | Arylsulfatase B | yes | 26.5% | 147 | 145, 242, 318 | 53, 54, 91, 300, 301 |
2VQR | 20.0% | Rhizobium leguminosarum | Monoester Hydrolase | no | 20.3% | not avail. | not avail. | 12, 57, 324, 325 |
3ED4 | 32.0% | Escherichia coli | Arylsulfatase | yes | 27.7% | not avail. | not avail. | not avail. |
ARSA | - | Homo Sapiens | - | - | - | 125 | 123, 150, 229, 302 | 29, 30, 69, 281, 282 |
Our potential templates, identified by the database searches contain all homologs with known structure, regarding to HSSP.
Single template modelling
In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. alignment.malign()
uses classical dynamic programming to align two sequences. alignment.alig2dn()
also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequnece alignments of the above templates with the target. The script used for this purpose is shown below:
from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='template_name', model_segment=('FIRST:@', 'END:'))
aln.append_model(mdl, align_codes='template_name', atom_files='template_name')
aln.append(file='1AUK.pir', align_codes='target_name')
aln.align2d()
aln.check()
aln.write(file='target-template-2d.ali', alignment_format='PIR')
aln.malign()
aln.check()
aln.write(file='target-template.ali', alignment_format='PIR')
For these alignments we constructed eight models, using the following script:
from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
a = automodel(env,
alnfile = '1AUK-1FSU-2d.ali',
knowns = '1FSU',
sequence = '1AUK',
assess_methods=(assess.DOPE, assess.GA341))
a.starting_model= 1
a.ending_model = 1
a.make()
We modified the paths and filenames in the scripts such that it matched our proteins of interest.
Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. The results are depicted in the table below.
Further on, we visualised the models using pymol. We load both structures into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:
align 1AUK, MODEL
hide all
show cartoon
# select color of modelled structure via graphical interface
ray
cmd.png("MODEL.png")
Alignment method | 1P49 | 2VQR | 1FSU | 3ED4 |
Classical Dynamic Programming |
||||
Dynamic Programming with structural information from the template |
3ED4
Alignment method | 3ED4A | 3ED4B | 3ED4C | 3ED4D |
Dynamic Programming with structural information from the template |
Modification of Alignments
Using 1P49 as template structure for the modeling process yielded the best results, thus we decided to manually modify this alignment to see, if we can improve the model. We made sure, that there are no gaps in secondary structure elements and modified the alignment such that active site, substrate binding sites and metal binding sites were aligned. Altogether, we performed the following changes:
- The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment.
- The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
- The gap between residues 190-191 was moved out of beta strand between residues 191-192.
- All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
- The gap between 290-291 was moved to the right end of the helix.
We used JalView to do this. Figures of the alignment before and after the changes are depicted on the right.
Surprisingly, the TM-score was decreased to 0.5561.
TM-scores and RMSD of the single template models
We downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using
gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f
TMscores were calculated as follows:
./TMscore MODEL.pdb REAL_STRUCTURE.pdb
PDB Identifier | TM-score | RMSD | |
---|---|---|---|
Dynamic Programing with structural information | |||
1P49 | 0.7960 | - | |
2VQR | 0.4825 | - | |
1FSU | 0.7146 | - | |
3ED4 | 0.3881 | - | |
3ED4A | 0.7268 | - | |
3ED4B | 0.7251 | - | |
3ED4C | 0.6518 | - | |
3ED4D | 0.7303 | - | |
Dynamic Programing without structural information | |||
1P49 | 0.7731 | - | |
2VQR | 0.3183 | - | |
1FSU | 0.7223 | - | |
3ED4 | 0.3122 | - |
Multiple Template Modeling
We calculated three models:
- Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
- Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
- Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
Model 1 | Model 2 | Model 3 |
PDB Identifier | TM-score | RMSD |
---|---|---|
model1 | 0.5409 | - |
model2 | 0.6701 | - |
model3 | 0.6819 | - |
Initial multiple structural alignment:
from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:./'
aln = alignment(env)
for (code, chain) in (('PROTEIN', 'CHAIN'), ('ANOTHER_PROTEIN', 'ANOTHER_CHAIN'), ...):
mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.write(file='mymas.pap', alignment_format='PAP')
aln.write(file='mymsa.ali', alignment_format='PIR')
Add target sequence to MSA:
from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='mymsa.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='1AUK.pir', align_codes='1AUK')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign()
aln.write(file='mymsa-1AUK.ali', alignment_format='PIR')
aln.write(file='mymsa-1AUK.pap', alignment_format='PAP')
Calculate the model:
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='msa2-1AUK.ali',
knowns=('PROTEIN', 'ANOTHER_PROTEIN', ...), sequence='1AUK')
a.starting_model = 1
a.ending_model = 1
a.make()
Modification of Alignments
iTasser
iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was the ranked best in CASP7 and CASP8 and ranked second in CASP9. iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
Modelling without template
Modelling with single template
Discussion
SWISS-MODEL
SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated mode.
Modelling without template
In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.
Template
SWISS-MODEL identified 1N21A as best template. The name of the PDB-Entry of 1N21A is "(+)-Bornyl Diphosphate Synthase: Cocrystal with Mg and 3-aza-2,3-dihydrogeranyl diphosphate". The Alignment quality was very high as one can see:
TARGET 19 RPPNIVLI FADDLGYGDL GCYGHPSSTT PNLDQLAAGG LRFTDFYVPV 1n2lA 19 rppnivli faddlgygdl gcyghpsstt pnldqlaagg lrftdfyvpv TARGET sssss ss hhhhhhhh ssss sss 1n2lA sssss ss hhhhhhhh ssssssss TARGET 67 SLCTPSRAAL LTGRLPVRMG MYPGVLVPSS RGGLPLEEVT VAEVLAARGY 1n2lA 67 sl-tpsraal ltgrlpvrmg mypgvlvpss rgglpleevt vaevlaargy TARGET hhhhh hh hhh ss s hhhhhhhh 1n2lA hhhhhh hh hh ss s hhhhhhhh TARGET 117 LTGMAGKWHL GVGPEGAFLP PHQGFHRFLG IPYSHDQGPC QNLTCFPPAT 1n2lA 117 ltgmagkwhl gvgpegaflp phqgfhrflg ipyshdqgpc qnltcfppat TARGET ssssssss sss sss hhh ssss s sss s 1n2lA ssssssss sss sss hhh ssss s sss s TARGET 167 PCDGGCDQGL VPIPLLANLS VEAQPPWLPG LEARYMAFAH DLMADAQRQD 1n2lA 167 pcdggcdqgl vpipllanls veaqppwlpg learymafah dlmadaqrqd TARGET ss ssss s ss hhhhhhhhh hhhhhhhh 1n2lA ss ssss s ss hhhhhhhhh hhhhhhhh TARGET 217 RPFFLYYASH HTHYPQFSGQ SFAERSGRGP FGDSLMELDA AVGTLMTAIG 1n2lA 217 rpfflyyash hthypqfsgq sfaersgrgp fgdslmelda avgtlmtaig TARGET ssssssss h hhhhhhhhhh hhhhhhhhhh 1n2lA ssssssss h hhhhhhhhhh hhhhhhhhhh TARGET 267 DLGLLEETLV IFTADNGPET MRMSRGGCSG LLRCGKGTTY EGGVREPALA 1n2lA 267 dlglleetlv iftadngpet mrmsrggcsg llrcgkgtty eggvrepala TARGET hh ssss sss sss sss 1n2lA h ssss sss hhhsss sss TARGET 317 FWPGHIAPGV THELASSLDL LPTLAALAGA PLPNVTLDGF DLSPLLLGTG 1n2lA 317 fwpghiapgv thelassldl lptlaalaga plpnvtldgf dlsplllgtg TARGET s ss s hhh hhhhhhhh hhhhh 1n2lA s ss s ssshhh hhhhhhhh hhhhh TARGET 367 KSPRQSLFFY PSYPDEVRGV FAVRTGKYKA HFFTQGSAHS DTTADPACHA 1n2lA 367 ksprqslffy psypdevrgv favrtgkyka hfftqgsahs dttadpacha TARGET sssss ssssssssss ssss 1n2lA sssss ssssssssss ssss TARGET 417 SSSLTAHEPP LLYDLSKDPG ENYNLLGGVA GATPEVLQAL KQLQLLKAQL 1n2lA 417 sssltahepp llydlskdpg enynllg--- gatpevlqal kqlqllkaql TARGET sss sssss hhhhhhh hhhhhhhhhh 1n2lA sss sssss hhhhhhh hhhhhhhhhh TARGET 467 DAAVTFGPSQ VARGEDPALQ ICCHPGCTPR PACCHCPD 1n2lA 467 daavtfgpsq vargedpalq icchpgctpr pacchcpd- TARGET hhh hh sss 1n2lA hhh hh sss
Results
[[image:]]
Modelling with single template
It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.
1P49
1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates.
2VQR
2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates.
References
<references />