Homology Modeling of ARS A
Contents
HHpred
We used the webserver and
Modeller
We wrote a modeling tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis.
Proteins used as templates
We identified the following proteins (see Alignment TASK) as potential targets for homology modeling:used the following
SeqIdentifier | Seq Identity (from TASK 2) | source | Protein function | True homolog (HSSP) | Seq Identity (pairw. ali.) | Active site | Substrate binding site | Metal binding site |
---|---|---|---|---|---|---|---|---|
1P49 | 39.0% | Homo Sapiens | Steryl-Sulfatase | yes | 31.9% | 136 | 333, 459 | 35, 36, 75, 342, 343, |
1FSU | 28.0% | Homo Sapiens | Arylsulfatase B | yes | 26.5% | 147 | 145, 242, 318 | 53, 54, 91, 300, 301 |
2VQR | 20.0% | Rhizobium leguminosarum | Monoester Hydrolase | no | 20.3% | not avail. | not avail. | 12, 57, 324, 325 |
3ED4 | 32.0% | Escherichia coli | Arylsulfatase | yes | 27.7% | not avail. | not avail. | not avail. |
ARSA | - | Homo Sapiens | - | - | - | 125 | 123, 150, 229, 302 | 29, 30, 69, 281, 282 |
Our potential templates, identified by the database searches contain all homologs with known structure, regarding to HSSP.
Single template modelling
In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. alignment.malign()
uses classical dynamic programming to align two sequences. alignment.alig2dn()
also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequnece alignments of the above templates with the target. The script used for this purpose is shown below:
from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='template_name', model_segment=('FIRST:@', 'END:'))
aln.append_model(mdl, align_codes='template_name', atom_files='template_name')
aln.append(file='1AUK.pir', align_codes='target_name')
aln.align2d()
aln.check()
aln.write(file='target-template-2d.ali', alignment_format='PIR')
aln.malign()
aln.check()
aln.write(file='target-template.ali', alignment_format='PIR')
For these alignments we constructed eight models, using the following script:
from modeller import *
from modeller.automodel import *
log.verbose()
env = environ()
a = automodel(env,
alnfile = '1AUK-1FSU-2d.ali',
knowns = '1FSU',
sequence = '1AUK',
assess_methods=(assess.DOPE, assess.GA341))
a.starting_model= 1
a.ending_model = 1
a.make()
We modified the paths and filenames in the scripts such that it matched our proteins of interest.
Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. The results are depicted in the table below.
Further on, we visualised the models using pymol. We load both structures into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:
align 1AUK, MODEL
hide all
show cartoon
# select color of modelled structure via graphical interface
ray
cmd.png("MODEL.png")
Alignment method | 1P49 | 2VQR | 1FSU | 3ED4 |
Classical Dynamic Programming |
||||
Dynamic Programming with structural information from the template |
3ED4
Alignment method | 3ED4A | 3ED4B | 3ED4C | 3ED4D |
Dynamic Programming with structural information from the template |
Modification of Alignments
Using 1P49 as template structure for the modeling process yielded the best results, thus we decided to manually modify this alignment to see, if we can improve the model. We made sure, that there are no gaps in secondary structure elements and modified the alignment such that active site, substrate binding sites and metal binding sites were aligned. Altogether, we performed the following changes:
- The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment.
- The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
- The gap between residues 190-191 was moved out of beta strand between residues 191-192.
- All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
- The gap between 290-291 was moved to the right end of the helix.
We used JalView to do this. Figures of the alignment before and after the changes are depicted on the right.
Surprisingly, the TM-score was decreased to 0.5561.
TM-scores and RMSD of the single template models
We downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using
gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f
TMscores were calculated as follows:
./TMscore MODEL.pdb REAL_STRUCTURE.pdb
PDB Identifier | TM-score | RMSD | |
---|---|---|---|
Dynamic Programing with structural information | |||
1P49 | 0.7960 | - | |
2VQR | 0.4825 | - | |
1FSU | 0.7146 | - | |
3ED4 | 0.3881 | - | |
3ED4A | 0.7268 | - | |
3ED4B | 0.7251 | - | |
3ED4C | 0.6518 | - | |
3ED4D | 0.7303 | - | |
Dynamic Programing without structural information | |||
1P49 | 0.7731 | - | |
2VQR | 0.3183 | - | |
1FSU | 0.7223 | - | |
3ED4 | 0.3122 | - |
Multiple Template Modeling
We calculated three models:
- Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
- Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
- Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
Model 1 | Model 2 | Model 3 |
PDB Identifier | TM-score | RMSD |
---|---|---|
model1 | 0.5409 | - |
model2 | 0.6701 | - |
model3 | 0.6819 | - |
Initial multiple structural alignment:
from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:./'
aln = alignment(env)
for (code, chain) in (('PROTEIN', 'CHAIN'), ('ANOTHER_PROTEIN', 'ANOTHER_CHAIN'), ...):
mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.write(file='mymas.pap', alignment_format='PAP')
aln.write(file='mymsa.ali', alignment_format='PIR')
Add target sequence to MSA:
from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='mymsa.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='1AUK.pir', align_codes='1AUK')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign()
aln.write(file='mymsa-1AUK.ali', alignment_format='PIR')
aln.write(file='mymsa-1AUK.pap', alignment_format='PAP')
Calculate the model:
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='msa2-1AUK.ali',
knowns=('PROTEIN', 'ANOTHER_PROTEIN', ...), sequence='1AUK')
a.starting_model = 1
a.ending_model = 1
a.make()
Modification of Alignments
iTasser
iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was the ranked best in CASP7 and CASP8 and ranked second in CASP9. iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
Modelling without template
Modelling with single template
Discussion
SWISS-MODEL
SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated mode.
Modelling without template
In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.
Template
SWISS-MODEL identified 1N21A as best template. The name of the PDB-Entry of 1N21A is "(+)-Bornyl Diphosphate Synthase: Cocrystal with Mg and 3-aza-2,3-dihydrogeranyl diphosphate". The Alignment quality was very high as one can see:
TARGET 19 RPPNIVLI FADDLGYGDL GCYGHPSSTT PNLDQLAAGG LRFTDFYVPV 1n2lA 19 rppnivli faddlgygdl gcyghpsstt pnldqlaagg lrftdfyvpv TARGET sssss ss hhhhhhhh ssss sss 1n2lA sssss ss hhhhhhhh ssssssss TARGET 67 SLCTPSRAAL LTGRLPVRMG MYPGVLVPSS RGGLPLEEVT VAEVLAARGY 1n2lA 67 sl-tpsraal ltgrlpvrmg mypgvlvpss rgglpleevt vaevlaargy TARGET hhhhh hh hhh ss s hhhhhhhh 1n2lA hhhhhh hh hh ss s hhhhhhhh TARGET 117 LTGMAGKWHL GVGPEGAFLP PHQGFHRFLG IPYSHDQGPC QNLTCFPPAT 1n2lA 117 ltgmagkwhl gvgpegaflp phqgfhrflg ipyshdqgpc qnltcfppat TARGET ssssssss sss sss hhh ssss s sss s 1n2lA ssssssss sss sss hhh ssss s sss s TARGET 167 PCDGGCDQGL VPIPLLANLS VEAQPPWLPG LEARYMAFAH DLMADAQRQD 1n2lA 167 pcdggcdqgl vpipllanls veaqppwlpg learymafah dlmadaqrqd TARGET ss ssss s ss hhhhhhhhh hhhhhhhh 1n2lA ss ssss s ss hhhhhhhhh hhhhhhhh TARGET 217 RPFFLYYASH HTHYPQFSGQ SFAERSGRGP FGDSLMELDA AVGTLMTAIG 1n2lA 217 rpfflyyash hthypqfsgq sfaersgrgp fgdslmelda avgtlmtaig TARGET ssssssss h hhhhhhhhhh hhhhhhhhhh 1n2lA ssssssss h hhhhhhhhhh hhhhhhhhhh TARGET 267 DLGLLEETLV IFTADNGPET MRMSRGGCSG LLRCGKGTTY EGGVREPALA 1n2lA 267 dlglleetlv iftadngpet mrmsrggcsg llrcgkgtty eggvrepala TARGET hh ssss sss sss sss 1n2lA h ssss sss hhhsss sss TARGET 317 FWPGHIAPGV THELASSLDL LPTLAALAGA PLPNVTLDGF DLSPLLLGTG 1n2lA 317 fwpghiapgv thelassldl lptlaalaga plpnvtldgf dlsplllgtg TARGET s ss s hhh hhhhhhhh hhhhh 1n2lA s ss s ssshhh hhhhhhhh hhhhh TARGET 367 KSPRQSLFFY PSYPDEVRGV FAVRTGKYKA HFFTQGSAHS DTTADPACHA 1n2lA 367 ksprqslffy psypdevrgv favrtgkyka hfftqgsahs dttadpacha TARGET sssss ssssssssss ssss 1n2lA sssss ssssssssss ssss TARGET 417 SSSLTAHEPP LLYDLSKDPG ENYNLLGGVA GATPEVLQAL KQLQLLKAQL 1n2lA 417 sssltahepp llydlskdpg enynllg--- gatpevlqal kqlqllkaql TARGET sss sssss hhhhhhh hhhhhhhhhh 1n2lA sss sssss hhhhhhh hhhhhhhhhh TARGET 467 DAAVTFGPSQ VARGEDPALQ ICCHPGCTPR PACCHCPD 1n2lA 467 daavtfgpsq vargedpalq icchpgctpr pacchcpd- TARGET hhh hh sss 1n2lA hhh hh sss
Results
As one can see in the images above, the model quality is quite good, with uncertainties especially in the loop-regions. What surprised us, was the fact that 1N21 was not found in our search.
Modelling with single template
It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.
1P49
1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates. With this low sequence identity the alignment quality was rather poor:
TARGET 1 RPPNIV LIFADDLGYG DLGCYGHPSS TTPNLDQLAA GGLRFTDFYV userX 23 aa--srpnii lvmaddlgig dpgcygnkti rtpnidrlas ggvkltqhla TARGET sss ssss hhhh ssssssss userX sss ssss hhhh sss ssss TARGET 47 PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SS-----RGG LPLEEVTVAE userX 71 a-spltpsra afmtgrypvr sgmaswsrtg vflftassgg lptdeitfak TARGET hhh hhh hh h hhh userX hhhh hhhh hh h hhh TARGET 92 VLAARGYLTG MAGKWHLGVG PEG----AFL PPHQGFHRFL GIPYSHDQGP userX 121 llkdqgysta ligkwhlgms chsktdfchh plhhgfnyfy gisltnlrdc TARGET hhhh sss sssss sss ss userX hhhh sss sssss sss ss TARGET 138 CQNLT-CFPP ATPCDG---- ---------- ---------- ---------G userX 171 kpgegsvftt gfkrlvflpl qivgvtlltl aalnclgllh vplgvffsll TARGET hh hhhhh userX hhh hhhh hhh hhhhhhhhhh hhhhhh hhhhhhh TARGET 154 CD--QGLVPI PLLANLSVEA QP-------- ----PWLPGL EARYMAFAHD userX 221 flaaliltlf lgflhyfrpl ncfmmrnyei iqqpmsydnl tqrltveaaq TARGET hhhh hhhhhhh hhhhhhhhh userX hhhhhhhhhh hhhhhhhhhh ssss ss sss h hhhhhhhhhh TARGET 190 LMADAQRQDR PFFLYYASHH THYPQFSGQS FAERSGRGPF GDSLMELDAA userX 271 fiq--rntet pfllvlsylh vhtalfsskd fagksqhgvy gdaveemdws TARGET hh ssssssss hh hhhhhhhhhh userX hhh ssssssss hh hhhhhhhhhh TARGET 240 VGTLMTAIGD LGLLEETLVI FTADNGPETM RM-----SRG GCSGLLRCGK userX 319 vgqilnllde lrlandtliy ftsdqgahve evsskgeihg gsngiykggk TARGET hhhhhhhhhh h sssss ssss userX hhhhhhhhhh h sssss ssss sss sss TARGET 285 GTTYEGGVRE PALAFWPGHI -APGVTHELA SSLDLLPTLA ALAGAPLPN- userX 369 annweggirv pgilrwprvi qagqkidept snmdifptva klagaplped TARGET hhsss ssss sss s sshhhhhhhh hhh userX sss ssss s sshhhhhhhh hhh TARGET 333 VTLDGFDLSP LLLGTGKSPR QSLFFYPS-- YPDEVRGVFA VRTGKYKAHF userX 419 riidgrdlmp llegksqrsd heflfhycna ylnavrwhpq nstsiwkaff TARGET hh hhh hhhhhh h ssssss userX hh hhh sssssss ssssssss ssssss TARGET 381 FTQGSAHSDT TADPACHASS SLTAHEPPLL YDLSKDPGEN YNLLGGVAGA userX 469 ftpnfnpvcf athvcfcfgs yvthhdppll fdiskdprer nplt----pa TARGET ss sss ss sss sss sss userX ss sss ss sss TARGET 431 TPEVLQAL-K QLQLLKAQLD AAVTFGPSQV A---RGEDPA LQICCHPGCT userX 519 seprfyeilk vmqeaadrht qtlpevpdqf swnnflwkpw lqlccp---s TARGET hh hh hhhhhhhhh userX hhh hhhhhhhhhh TARGET 477 PRPACC --- userX 566 tglscqcdre TARGET userX sss
Results
As one can see in the images above, the model quality is not really good, due to the fact that the template seems to be too far related
2VQR
2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates. With this even lower sequence identity the alignment quality was really poor:
TARGET 2 PPNIVLIFAD DLGYGDLG-- --CYGHPS-S TTPNLDQLAA GGLRFTDFYV userX 3 kknvllivvd qwradfvphv lradgkidfl ktpnldrlcr egvtfrnhvt TARGET sssssss hhhhhhhh h ssssssss userX sssssss hhh hhhh hhhhhhhh h ssssssss TARGET 47 PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SSRGGLPLEE VTVAEVLAAR userX 53 tcvpxgpara slltglylmn hravqntv-- ----pldqrh lnlgkalrgv TARGET hhhhhh hhh hhh h hhhhhhhh userX hhhh hhh hhh h hhhhhh TARGET 97 GYLTGMAGKW HLGVGPEGAF LPPHQGFHRF LGIPYSHDQG PCQ-----NL userX 97 gydpaligyt ttvpdprtt- spndprfrvl gdlmdgfhpv gafepnmegy TARGET ssss hh sss h userX ssss hh sss hhh TARGET 142 TCFPPATPCD GGCD-----Q GLVPIPLLAN LSVEAQPPWL PGLEARYMAF userX 146 fgwvaqngfd lpehrpdiwl pegedavaga tdrpsripke fsdstffter TARGET hhhhhh hhhhhhhh userX hhhhhh hhhhhhhh TARGET 187 AHDLMADAQR QDRPFFLYYA SHHTHYPQFS GQSFAERSGR ---------- userX 196 altylkg--r dgkpfflhlg yyrphppfva sapyhamyrp edmpapiraa TARGET hhhhhh ssssss s userX hhhhhhh h ssssss s TARGET 227 ---------- ---------- ---------- ---------- ----GPFGDS userX 244 npdieaaqhp lmkfyvdsir rgsffqgaeg sgatldeael rqmratycgl TARGET hhhh userX hhhhhh hhhhhhhhss s s ss hhhh hhhhhhhhhh TARGET 233 LMELDAAVGT LMTAIGDLGL LEETLVIFTA DNGPETMRMS RGGCSGLLRC userX 294 itevddclgr vfsyldetgq wddtliifts dhgeqlgdhh ll-------- TARGET hhhhhhhhhh hhhhhh h ssssss s userX hhhhhhhhhh hhhhhh h ssssss s TARGET 283 GKGTTYEGGV REPALAFWPG HI--APGVTH ELASSLDLLP TLAALAGAPL userX 336 gkigyndpsf riplvikdag enaragaies gftesidvmp tildwlggki TARGET hhs ss ssss sss ssshhhhh hhhhhh userX hhs ss ssss sss ssshhhhh hhhhhh TARGET 331 PNVTLDGFDL SPLLLGTGKS PRQ-SLFFYP SYP------- -------DEV userX 386 ph-acdglsl lpflsegrpq dwrtelhyey dfrdvyysep qsflglgmnd TARGET hhh sssss userX hhh sssss ss hhhh TARGET 366 RGVFAVRTGK YKAHFFTQGS AHSDTTADPA CHASSSLTAH EPPLLYDLSK userX 435 cslcviqder ykyvhfaa-- ---------- ---------- lpplffdlrh TARGET sssssss s sssssss s ss ss s sssss userX ssssssss s sssssss sssss TARGET 416 DPGENYNLLG GVAGATPEVL QAL-KQLQLL KAQLDAA -- ---------- userX 463 dpneftnlad d--payaalv rdyaqkalsw rlkhadrtlt hyrsgpegls TARGET hhhh hh hhhh hhh userX hhh hhhhhhhhhh hhh sssss sss TARGET ---- userX 511 ersh TARGET userX ss
Due to the low alignment quality, the model quality was so low, that the only result was a plot of the local quality.
References
<references />