Homology Modeling of ARS A

HHpred

We used the webserver and

Modeller

We wrote a modeling tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis.

Proteins used as templates

We identified the following proteins (see Alignment TASK) as potential targets for homology modeling:used the following

SeqIdentifier	Seq Identity (from TASK 2)	source	Protein function	True homolog (HSSP)	Seq Identity (pairw. ali.)	Active site	Substrate binding site	Metal binding site
1P49	39.0%	Homo Sapiens	Steryl-Sulfatase	yes	31.9%	136	333, 459	35, 36, 75, 342, 343,
1FSU	28.0%	Homo Sapiens	Arylsulfatase B	yes	26.5%	147	145, 242, 318	53, 54, 91, 300, 301
2VQR	20.0%	Rhizobium leguminosarum	Monoester Hydrolase	no	20.3%	not avail.	not avail.	12, 57, 324, 325
3ED4	32.0%	Escherichia coli	Arylsulfatase	yes	27.7%	not avail.	not avail.	not avail.
ARSA	-	Homo Sapiens	-	-	-	125	123, 150, 229, 302	29, 30, 69, 281, 282

Our potential templates, identified by the database searches contain all homologs with known structure, regarding to HSSP.

Single template modelling

In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. alignment.malign() uses classical dynamic programming to align two sequences. alignment.alig2dn() also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequnece alignments of the above templates with the target. The script used for this purpose is shown below:


from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='template_name', model_segment=('FIRST:@', 'END:'))
aln.append_model(mdl, align_codes='template_name', atom_files='template_name')
aln.append(file='1AUK.pir', align_codes='target_name')
aln.align2d()
aln.check()
aln.write(file='target-template-2d.ali', alignment_format='PIR') 
aln.malign()
aln.check()
aln.write(file='target-template.ali', alignment_format='PIR')

For these alignments we constructed eight models, using the following script:


from modeller import *
from modeller.automodel import *    
log.verbose()   
env = environ() 
a = automodel(env,
             alnfile  = '1AUK-1FSU-2d.ali',   
             knowns   = '1FSU',              
             sequence = '1AUK',
             assess_methods=(assess.DOPE, assess.GA341))
a.starting_model= 1                
a.ending_model  = 1                
a.make()

We modified the paths and filenames in the scripts such that it matched our proteins of interest.

Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. The results are depicted in the table below.

Further on, we visualised the models using pymol. We load both structures into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:


align 1AUK, MODEL
hide all
show cartoon
# select color of modelled structure via graphical interface
ray
cmd.png("MODEL.png")

Alignment method	1P49	2VQR	1FSU	3ED4
Classical Dynamic Programming	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1P49, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 2VQR, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1FSU, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4, visualized in Pymol
Dynamic Programming with structural information from the template	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1P49, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 2VQR, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1FSU, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4, visualized in Pymol

3ED4

real structure of 3ED4 visualized in Pymol

Alignment method	3ED4A	3ED4B	3ED4C	3ED4D
Dynamic Programming with structural information from the template	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4A, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4B, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4C, visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4D, visualized in Pymol

Modification of Alignments

Using 1P49 as template structure for the modeling process yielded the best results, thus we decided to manually modify this alignment to see, if we can improve the model. We made sure, that there are no gaps in secondary structure elements and modified the alignment such that active site, substrate binding sites and metal binding sites were aligned. Altogether, we performed the following changes:

The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment.
The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
The gap between residues 190-191 was moved out of beta strand between residues 191-192.
All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
The gap between 290-291 was moved to the right end of the helix.

We used JalView to do this. Figures of the alignment before and after the changes are depicted on the right.
Surprisingly, the TM-score was decreased to 0.5561.

real structure of 1AUK and structure of 1AUK modelled by modeller with the modified templat alignment to 1P49, visualized in Pymol

Structures of modified and 2d alignment with template 1P49, visualized in Pymol

Initial alignment.

Modified alignment.

TM-scores and RMSD of the single template models

We downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using


gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f

TMscores were calculated as follows:


./TMscore MODEL.pdb REAL_STRUCTURE.pdb

PDB Identifier	TM-score	RMSD
Dynamic Programing with structural information
1P49	0.7960	-
2VQR	0.4825	-
1FSU	0.7146	-
3ED4	0.3881	-
3ED4A	0.7268	-
3ED4B	0.7251	-
3ED4C	0.6518	-
3ED4D	0.7303	-
Dynamic Programing without structural information
1P49	0.7731	-
2VQR	0.3183	-
1FSU	0.7223	-
3ED4	0.3122	-

Multiple Template Modeling

We calculated three models:

Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.

Model 1	Model 2	Model 3
real structure of 1AUK and structure of 1AUK modelled (Model 1), visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled (Model 2), visualized in Pymol	real structure of 1AUK and structure of 1AUK modelled (Model 3), visualized in Pymol

PDB Identifier	TM-score	RMSD
model1	0.5409	-
model2	0.6701	-
model3	0.6819	-

Initial multiple structural alignment:


from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:./'
aln = alignment(env)
for (code, chain) in (('PROTEIN', 'CHAIN'), ('ANOTHER_PROTEIN', 'ANOTHER_CHAIN'), ...):
   mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
   aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.write(file='mymas.pap', alignment_format='PAP')
aln.write(file='mymsa.ali', alignment_format='PIR')

Add target sequence to MSA:


from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='mymsa.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='1AUK.pir', align_codes='1AUK')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign()
aln.write(file='mymsa-1AUK.ali', alignment_format='PIR')
aln.write(file='mymsa-1AUK.pap', alignment_format='PAP')

Calculate the model:


from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='msa2-1AUK.ali',
              knowns=('PROTEIN', 'ANOTHER_PROTEIN', ...), sequence='1AUK')
a.starting_model = 1
a.ending_model = 1
a.make()

Modification of Alignments

iTasser

iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was the ranked best in CASP7 and CASP8 and ranked second in CASP9. iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>

workflow of iTasser <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html</ref>

Modelling without template

Modelling with single template

Discussion

SWISS-MODEL

SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated mode.

Modelling without template

In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.

Template

SWISS-MODEL identified 1N21A as best template. The name of the PDB-Entry of 1N21A is "(+)-Bornyl Diphosphate Synthase: Cocrystal with Mg and 3-aza-2,3-dihydrogeranyl diphosphate". The Alignment quality was very high as one can see:

TARGET    19      RPPNIVLI FADDLGYGDL GCYGHPSSTT PNLDQLAAGG LRFTDFYVPV
1n2lA     19      rppnivli faddlgygdl gcyghpsstt pnldqlaagg lrftdfyvpv
                                                                      
TARGET               sssss ss                    hhhhhhhh   ssss sss  
1n2lA                sssss ss                    hhhhhhhh   ssssssss  


TARGET    67    SLCTPSRAAL LTGRLPVRMG MYPGVLVPSS RGGLPLEEVT VAEVLAARGY
1n2lA     67    sl-tpsraal ltgrlpvrmg mypgvlvpss rgglpleevt vaevlaargy
                                                                      
TARGET               hhhhh hh    hhh          ss s          hhhhhhhh  
1n2lA               hhhhhh hh    hh           ss s          hhhhhhhh  


TARGET    117   LTGMAGKWHL GVGPEGAFLP PHQGFHRFLG IPYSHDQGPC QNLTCFPPAT
1n2lA     117   ltgmagkwhl gvgpegaflp phqgfhrflg ipyshdqgpc qnltcfppat
                                                                      
TARGET          ssssssss    sss sss   hhh   ssss s            sss    s
1n2lA           ssssssss    sss sss   hhh   ssss s            sss    s


TARGET    167   PCDGGCDQGL VPIPLLANLS VEAQPPWLPG LEARYMAFAH DLMADAQRQD
1n2lA     167   pcdggcdqgl vpipllanls veaqppwlpg learymafah dlmadaqrqd
                                                                      
TARGET          ss             ssss s ss          hhhhhhhhh hhhhhhhh  
1n2lA           ss             ssss s ss          hhhhhhhhh hhhhhhhh  


TARGET    217   RPFFLYYASH HTHYPQFSGQ SFAERSGRGP FGDSLMELDA AVGTLMTAIG
1n2lA     217   rpfflyyash hthypqfsgq sfaersgrgp fgdslmelda avgtlmtaig
                                                                      
TARGET           ssssssss                      h hhhhhhhhhh hhhhhhhhhh
1n2lA            ssssssss                      h hhhhhhhhhh hhhhhhhhhh


TARGET    267   DLGLLEETLV IFTADNGPET MRMSRGGCSG LLRCGKGTTY EGGVREPALA
1n2lA     267   dlglleetlv iftadngpet mrmsrggcsg llrcgkgtty eggvrepala
                                                                      
TARGET          hh    ssss sss                                 sss sss
1n2lA           h     ssss sss                              hhhsss sss


TARGET    317   FWPGHIAPGV THELASSLDL LPTLAALAGA PLPNVTLDGF DLSPLLLGTG
1n2lA     317   fwpghiapgv thelassldl lptlaalaga plpnvtldgf dlsplllgtg
                                                                      
TARGET          s       ss s      hhh hhhhhhhh                hhhhh   
1n2lA           s       ss s   ssshhh hhhhhhhh                hhhhh   


TARGET    367   KSPRQSLFFY PSYPDEVRGV FAVRTGKYKA HFFTQGSAHS DTTADPACHA
1n2lA     367   ksprqslffy psypdevrgv favrtgkyka hfftqgsahs dttadpacha
                                                                      
TARGET              sssss             ssssssssss ssss                 
1n2lA               sssss             ssssssssss ssss                 


TARGET    417   SSSLTAHEPP LLYDLSKDPG ENYNLLGGVA GATPEVLQAL KQLQLLKAQL
1n2lA     417   sssltahepp llydlskdpg enynllg--- gatpevlqal kqlqllkaql
                                                                      
TARGET              sss    sssss                    hhhhhhh hhhhhhhhhh
1n2lA               sss    sssss                    hhhhhhh hhhhhhhhhh


TARGET    467   DAAVTFGPSQ VARGEDPALQ ICCHPGCTPR PACCHCPD             
1n2lA     467   daavtfgpsq vargedpalq icchpgctpr pacchcpd-            
                                                                      
TARGET          hhh        hh sss                                     
1n2lA           hhh        hh sss

Results

[[image:]]

comparison of QMEAN-score against nonredundant PDB

Modelling with single template

It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.

1P49

1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates.

2VQR

2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates.

References

Homology Modeling of ARS A

Contents

HHpred

Modeller

Proteins used as templates

Single template modelling

3ED4

Modification of Alignments

TM-scores and RMSD of the single template models

Multiple Template Modeling

Modification of Alignments

iTasser

Modelling without template

Modelling with single template

Discussion

SWISS-MODEL

Modelling without template

Template

Results

Modelling with single template

1P49

2VQR

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools