Homology Modeling of ARS A
Contents
HHpred
We used the webserver and
Modeller
Modeller is a program for comparative modeling of the 3D structure of a protein with unkonown structure. It provides different methods for calculation of the initial target-template. Given the alignments, modeller generates the backbone and optimizes a probablility function reflecting spatial restraints. The input alignments can be either pairwise sequence alignment - for single template modeling - or multiple sequence alignments - for multiple template modeling. <ref>AA. Sali, T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993</ref>
In this section, we apply modeller to model the 3D structure of ARSA and compare the results to the known structure from PDB. We wrote a tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis. It provides generic scripts and example code and executes all methods using default parameters.
Proteins used as templates
From the previous alignment TASK (see Alignment TASK), we four proteins which might serve as suitable templates for the modeling. The preoteins are depicted in the below table. The information about active and binding sites were obtained from Uniprot and will serve as additional information for the manual modification of the alignments in order to try to improbe the accuracy of the models. Interestingly, our potential templates - identified by the database searches - contain all homologs with known structure, regarding to HSSP.
SeqIdentifier | Seq Identity (from TASK 2) | source | Protein function | True homolog (HSSP) | Seq Identity (pairw. ali.) | Active site | Substrate binding site | Metal binding site |
---|---|---|---|---|---|---|---|---|
1P49 | 39.0% | Homo Sapiens | Steryl-Sulfatase | yes | 31.9% | 136 | 333, 459 | 35, 36, 75, 342, 343, |
1FSU | 28.0% | Homo Sapiens | Arylsulfatase B | yes | 26.5% | 147 | 145, 242, 318 | 53, 54, 91, 300, 301 |
2VQR | 20.0% | Rhizobium leguminosarum | Monoester Hydrolase | no | 20.3% | not avail. | not avail. | 12, 57, 324, 325 |
3ED4 | 32.0% | Escherichia coli | Arylsulfatase | yes | 27.7% | not avail. | not avail. | not avail. |
ARSA | - | Homo Sapiens | - | - | - | 125 | 123, 150, 229, 302 | 29, 30, 69, 281, 282 |
Single template modelling
In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. alignment.malign()
uses classical dynamic programming to align two sequences. alignment.alig2dn()
also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequnece alignments of the above templates with the target. Then we modeled the structure with default parameters using the automodel()
class. The scripts used for this purpose can be seen in our protocol: Using Modeller for TASK 4 .
Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. The results are depicted in the table below.
Further on, we visualised the models using pymol. We load both structures into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:
align 1AUK, MODEL
hide all
show cartoon
# select color of modelled structure via graphical interface
ray
cmd.png("MODEL.png")
Alignment method
1P49
2VQR
1FSU
3ED4
Classical
Dynamic Programming
Dynamic Programming
with structural information
from the template
3ED4
Alignment method
3ED4A
3ED4B
3ED4C
3ED4D
Dynamic Programming
with structural information
from the template
Modification of Alignments
Using 1P49 as template structure for the modeling process yielded the best results, thus we decided to manually modify this alignment to see, if we can improve the model. We made sure, that there are no gaps in secondary structure elements and modified the alignment such that active site, substrate binding sites and metal binding sites were aligned. For modification of the initial alignments, we used JAlView. Altogether, we performed the following changes:
- The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment. The region around the active site is well conserved between both enzymes. However, this conservation seems to be shifted, thus the amino acids at the active sites differ and an alignment of both sites decreases the alignment of conserved residues within this region.
After this change we caluclated one model. The TM-score drops to 0.6940. This might be due to the fact, that the amino acids are conserved in the region around the active sites, but the alignment of the active sites thmeselves decrease the alignment wuality (as described above). Normally, one does not have information about the secondary structure of the target sequence, but in our case, this information was available and thus we modified the alignments such that gaps within the secondary structure were avoided.
- The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
- The gap between residues 190-191 was moved out of beta strand between residues 191-192.
- All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
- The gap between 290-291 was moved to the right end of the helix.
Surprisingly, the TM-score was decreased even more to 0.5561.
TM-scores and RMSD of the single template models
We downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using
gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f
TMscores were calculated as follows:
./TMscore MODEL.pdb REAL_STRUCTURE.pdb
PDB Identifier
TM-score
RMSD
Dynamic Programing with structural information
1P49
0.7960
-
2VQR
0.4825
-
1FSU
0.7146
-
3ED4
0.3881
-
3ED4A
0.7268
-
3ED4B
0.7251
-
3ED4C
0.6518
-
3ED4D
0.7303
-
Dynamic Programing without structural information
1P49
0.7731
-
2VQR
0.3183
-
1FSU
0.7223
-
3ED4
0.3122
-
Multiple Template Modeling
We calculated three models:
- Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
- Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
- Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
Model 1
Model 2
Model 3
PDB Identifier
TM-score
RMSD
model1
0.5409
-
model2
0.6701
-
model3
0.6819
-
Initial multiple structural alignment:
from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:./'
aln = alignment(env)
for (code, chain) in (('PROTEIN', 'CHAIN'), ('ANOTHER_PROTEIN', 'ANOTHER_CHAIN'), ...):
mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.write(file='mymas.pap', alignment_format='PAP')
aln.write(file='mymsa.ali', alignment_format='PIR')
Add target sequence to MSA:
from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='mymsa.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='1AUK.pir', align_codes='1AUK')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign()
aln.write(file='mymsa-1AUK.ali', alignment_format='PIR')
aln.write(file='mymsa-1AUK.pap', alignment_format='PAP')
Calculate the model:
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='msa2-1AUK.ali',
knowns=('PROTEIN', 'ANOTHER_PROTEIN', ...), sequence='1AUK')
a.starting_model = 1
a.ending_model = 1
a.make()
Modification of Alignments
We modified the alignments such that all active site were aligned. The TM-score drops to 0.5685.
iTasser
iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was the ranked best in CASP7 and CASP8 and ranked second in CASP9.
iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
The confidence of a model is measured with the C-score which is based on the significance of the template alignments and the convergence parameters of the structure
assembly simulations. The typical range for the C-score is [-5,2], where a higher C-score means higher confidence in the model. <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S72828/cscore.txt</ref>
Modelling without template
As one can see, model 1 is the model with the highest confidence. Model 1 has a TM-score of 0.84 ± 0.08 and a RMSD of 5.3 ± 3.4Å.
Modelling with single template
Discussion
SWISS-MODEL
SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated mode.
Modelling without template
In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.
Template
SWISS-MODEL identified 1N21A as best template.
The name of the PDB-Entry of 1N2LA is "Crystal structure of a covalent intermediate of endogenous human arylsulfatase A". So the result should be very good, as we are using a human Arylsulfatase A as a template.
As expected, the Alignment quality was very high:
TARGET 19 RPPNIVLI FADDLGYGDL GCYGHPSSTT PNLDQLAAGG LRFTDFYVPV
1n2lA 19 rppnivli faddlgygdl gcyghpsstt pnldqlaagg lrftdfyvpv
TARGET sssss ss hhhhhhhh ssss sss
1n2lA sssss ss hhhhhhhh ssssssss
TARGET 67 SLCTPSRAAL LTGRLPVRMG MYPGVLVPSS RGGLPLEEVT VAEVLAARGY
1n2lA 67 sl-tpsraal ltgrlpvrmg mypgvlvpss rgglpleevt vaevlaargy
TARGET hhhhh hh hhh ss s hhhhhhhh
1n2lA hhhhhh hh hh ss s hhhhhhhh
TARGET 117 LTGMAGKWHL GVGPEGAFLP PHQGFHRFLG IPYSHDQGPC QNLTCFPPAT
1n2lA 117 ltgmagkwhl gvgpegaflp phqgfhrflg ipyshdqgpc qnltcfppat
TARGET ssssssss sss sss hhh ssss s sss s
1n2lA ssssssss sss sss hhh ssss s sss s
TARGET 167 PCDGGCDQGL VPIPLLANLS VEAQPPWLPG LEARYMAFAH DLMADAQRQD
1n2lA 167 pcdggcdqgl vpipllanls veaqppwlpg learymafah dlmadaqrqd
TARGET ss ssss s ss hhhhhhhhh hhhhhhhh
1n2lA ss ssss s ss hhhhhhhhh hhhhhhhh
TARGET 217 RPFFLYYASH HTHYPQFSGQ SFAERSGRGP FGDSLMELDA AVGTLMTAIG
1n2lA 217 rpfflyyash hthypqfsgq sfaersgrgp fgdslmelda avgtlmtaig
TARGET ssssssss h hhhhhhhhhh hhhhhhhhhh
1n2lA ssssssss h hhhhhhhhhh hhhhhhhhhh
TARGET 267 DLGLLEETLV IFTADNGPET MRMSRGGCSG LLRCGKGTTY EGGVREPALA
1n2lA 267 dlglleetlv iftadngpet mrmsrggcsg llrcgkgtty eggvrepala
TARGET hh ssss sss sss sss
1n2lA h ssss sss hhhsss sss
TARGET 317 FWPGHIAPGV THELASSLDL LPTLAALAGA PLPNVTLDGF DLSPLLLGTG
1n2lA 317 fwpghiapgv thelassldl lptlaalaga plpnvtldgf dlsplllgtg
TARGET s ss s hhh hhhhhhhh hhhhh
1n2lA s ss s ssshhh hhhhhhhh hhhhh
TARGET 367 KSPRQSLFFY PSYPDEVRGV FAVRTGKYKA HFFTQGSAHS DTTADPACHA
1n2lA 367 ksprqslffy psypdevrgv favrtgkyka hfftqgsahs dttadpacha
TARGET sssss ssssssssss ssss
1n2lA sssss ssssssssss ssss
TARGET 417 SSSLTAHEPP LLYDLSKDPG ENYNLLGGVA GATPEVLQAL KQLQLLKAQL
1n2lA 417 sssltahepp llydlskdpg enynllg--- gatpevlqal kqlqllkaql
TARGET sss sssss hhhhhhh hhhhhhhhhh
1n2lA sss sssss hhhhhhh hhhhhhhhhh
TARGET 467 DAAVTFGPSQ VARGEDPALQ ICCHPGCTPR PACCHCPD
1n2lA 467 daavtfgpsq vargedpalq icchpgctpr pacchcpd-
TARGET hhh hh sss
1n2lA hhh hh sss
Results
As one can see in the images above, the model quality is quite good, with uncertainties especially in the loop-regions. The result is not really surprising, as 1N2L is the structure of a human Arylsulfatase A.
Modelling with single template
It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.
1P49
1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates. With this low sequence identity the alignment quality was rather poor:
TARGET 1 RPPNIV LIFADDLGYG DLGCYGHPSS TTPNLDQLAA GGLRFTDFYV
userX 23 aa--srpnii lvmaddlgig dpgcygnkti rtpnidrlas ggvkltqhla
TARGET sss ssss hhhh ssssssss
userX sss ssss hhhh sss ssss
TARGET 47 PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SS-----RGG LPLEEVTVAE
userX 71 a-spltpsra afmtgrypvr sgmaswsrtg vflftassgg lptdeitfak
TARGET hhh hhh hh h hhh
userX hhhh hhhh hh h hhh
TARGET 92 VLAARGYLTG MAGKWHLGVG PEG----AFL PPHQGFHRFL GIPYSHDQGP
userX 121 llkdqgysta ligkwhlgms chsktdfchh plhhgfnyfy gisltnlrdc
TARGET hhhh sss sssss sss ss
userX hhhh sss sssss sss ss
TARGET 138 CQNLT-CFPP ATPCDG---- ---------- ---------- ---------G
userX 171 kpgegsvftt gfkrlvflpl qivgvtlltl aalnclgllh vplgvffsll
TARGET hh hhhhh
userX hhh hhhh hhh hhhhhhhhhh hhhhhh hhhhhhh
TARGET 154 CD--QGLVPI PLLANLSVEA QP-------- ----PWLPGL EARYMAFAHD
userX 221 flaaliltlf lgflhyfrpl ncfmmrnyei iqqpmsydnl tqrltveaaq
TARGET hhhh hhhhhhh hhhhhhhhh
userX hhhhhhhhhh hhhhhhhhhh ssss ss sss h hhhhhhhhhh
TARGET 190 LMADAQRQDR PFFLYYASHH THYPQFSGQS FAERSGRGPF GDSLMELDAA
userX 271 fiq--rntet pfllvlsylh vhtalfsskd fagksqhgvy gdaveemdws
TARGET hh ssssssss hh hhhhhhhhhh
userX hhh ssssssss hh hhhhhhhhhh
TARGET 240 VGTLMTAIGD LGLLEETLVI FTADNGPETM RM-----SRG GCSGLLRCGK
userX 319 vgqilnllde lrlandtliy ftsdqgahve evsskgeihg gsngiykggk
TARGET hhhhhhhhhh h sssss ssss
userX hhhhhhhhhh h sssss ssss sss sss
TARGET 285 GTTYEGGVRE PALAFWPGHI -APGVTHELA SSLDLLPTLA ALAGAPLPN-
userX 369 annweggirv pgilrwprvi qagqkidept snmdifptva klagaplped
TARGET hhsss ssss sss s sshhhhhhhh hhh
userX sss ssss s sshhhhhhhh hhh
TARGET 333 VTLDGFDLSP LLLGTGKSPR QSLFFYPS-- YPDEVRGVFA VRTGKYKAHF
userX 419 riidgrdlmp llegksqrsd heflfhycna ylnavrwhpq nstsiwkaff
TARGET hh hhh hhhhhh h ssssss
userX hh hhh sssssss ssssssss ssssss
TARGET 381 FTQGSAHSDT TADPACHASS SLTAHEPPLL YDLSKDPGEN YNLLGGVAGA
userX 469 ftpnfnpvcf athvcfcfgs yvthhdppll fdiskdprer nplt----pa
TARGET ss sss ss sss sss sss
userX ss sss ss sss
TARGET 431 TPEVLQAL-K QLQLLKAQLD AAVTFGPSQV A---RGEDPA LQICCHPGCT
userX 519 seprfyeilk vmqeaadrht qtlpevpdqf swnnflwkpw lqlccp---s
TARGET hh hh hhhhhhhhh
userX hhh hhhhhhhhhh
TARGET 477 PRPACC ---
userX 566 tglscqcdre
TARGET
userX sss
Results
As one can see in the images above, the model quality is not really good, due to the fact that the template seems to be too far related
2VQR
2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates. With this even lower sequence identity the alignment quality was really poor:
TARGET 2 PPNIVLIFAD DLGYGDLG-- --CYGHPS-S TTPNLDQLAA GGLRFTDFYV
userX 3 kknvllivvd qwradfvphv lradgkidfl ktpnldrlcr egvtfrnhvt
TARGET sssssss hhhhhhhh h ssssssss
userX sssssss hhh hhhh hhhhhhhh h ssssssss
TARGET 47 PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SSRGGLPLEE VTVAEVLAAR
userX 53 tcvpxgpara slltglylmn hravqntv-- ----pldqrh lnlgkalrgv
TARGET hhhhhh hhh hhh h hhhhhhhh
userX hhhh hhh hhh h hhhhhh
TARGET 97 GYLTGMAGKW HLGVGPEGAF LPPHQGFHRF LGIPYSHDQG PCQ-----NL
userX 97 gydpaligyt ttvpdprtt- spndprfrvl gdlmdgfhpv gafepnmegy
TARGET ssss hh sss h
userX ssss hh sss hhh
TARGET 142 TCFPPATPCD GGCD-----Q GLVPIPLLAN LSVEAQPPWL PGLEARYMAF
userX 146 fgwvaqngfd lpehrpdiwl pegedavaga tdrpsripke fsdstffter
TARGET hhhhhh hhhhhhhh
userX hhhhhh hhhhhhhh
TARGET 187 AHDLMADAQR QDRPFFLYYA SHHTHYPQFS GQSFAERSGR ----------
userX 196 altylkg--r dgkpfflhlg yyrphppfva sapyhamyrp edmpapiraa
TARGET hhhhhh ssssss s
userX hhhhhhh h ssssss s
TARGET 227 ---------- ---------- ---------- ---------- ----GPFGDS
userX 244 npdieaaqhp lmkfyvdsir rgsffqgaeg sgatldeael rqmratycgl
TARGET hhhh
userX hhhhhh hhhhhhhhss s s ss hhhh hhhhhhhhhh
TARGET 233 LMELDAAVGT LMTAIGDLGL LEETLVIFTA DNGPETMRMS RGGCSGLLRC
userX 294 itevddclgr vfsyldetgq wddtliifts dhgeqlgdhh ll--------
TARGET hhhhhhhhhh hhhhhh h ssssss s
userX hhhhhhhhhh hhhhhh h ssssss s
TARGET 283 GKGTTYEGGV REPALAFWPG HI--APGVTH ELASSLDLLP TLAALAGAPL
userX 336 gkigyndpsf riplvikdag enaragaies gftesidvmp tildwlggki
TARGET hhs ss ssss sss ssshhhhh hhhhhh
userX hhs ss ssss sss ssshhhhh hhhhhh
TARGET 331 PNVTLDGFDL SPLLLGTGKS PRQ-SLFFYP SYP------- -------DEV
userX 386 ph-acdglsl lpflsegrpq dwrtelhyey dfrdvyysep qsflglgmnd
TARGET hhh sssss
userX hhh sssss ss hhhh
TARGET 366 RGVFAVRTGK YKAHFFTQGS AHSDTTADPA CHASSSLTAH EPPLLYDLSK
userX 435 cslcviqder ykyvhfaa-- ---------- ---------- lpplffdlrh
TARGET sssssss s sssssss s ss ss s sssss
userX ssssssss s sssssss sssss
TARGET 416 DPGENYNLLG GVAGATPEVL QAL-KQLQLL KAQLDAA -- ----------
userX 463 dpneftnlad d--payaalv rdyaqkalsw rlkhadrtlt hyrsgpegls
TARGET hhhh hh hhhh hhh
userX hhh hhhhhhhhhh hhh sssss sss
TARGET ----
userX 511 ersh
TARGET
userX ss
Due to the low alignment quality, the model quality was so low, that the only result was a plot of the local quality.
References
<references />