Difference between revisions of "Homology Modeling of ARS A"

From Bioinformatikpedia
(SWISS-MODEL)
(Results)
Line 376: Line 376:
 
|[[Image:ARSA_SWISSMODEL_without_template_QMEAN_density_plot.png|thumb|Estimated density of model quality]]
 
|[[Image:ARSA_SWISSMODEL_without_template_QMEAN_density_plot.png|thumb|Estimated density of model quality]]
 
|[[Image:ARSA_SWISSMODEL_without_template_QMEAN_slider.png|thumb| Z-Score by category]]
 
|[[Image:ARSA_SWISSMODEL_without_template_QMEAN_slider.png|thumb| Z-Score by category]]
  +
|[[Image:ARSA_SWISSMODEL_local_quality_without_template.png|thumb | estimation of local model quality ]]
  +
|[[Image:ARSA_SWISSMODEL_without_template.jpg‎|thumb | model colored by residue error ]]
 
|}
 
|}
   

Revision as of 09:31, 13 June 2011

HHpred

We used the webserver and

Modeller

We wrote a modeling tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis.

Proteins used as templates

We identified the following proteins (see Alignment TASK) as potential targets for homology modeling:used the following

SeqIdentifier Seq Identity (from TASK 2) source Protein function True homolog (HSSP) Seq Identity (pairw. ali.) Active site Substrate binding site Metal binding site
1P49 39.0% Homo Sapiens Steryl-Sulfatase yes 31.9% 136 333, 459 35, 36, 75, 342, 343,
1FSU 28.0% Homo Sapiens Arylsulfatase B yes 26.5% 147 145, 242, 318 53, 54, 91, 300, 301
2VQR 20.0% Rhizobium leguminosarum Monoester Hydrolase no 20.3% not avail. not avail. 12, 57, 324, 325
3ED4 32.0% Escherichia coli Arylsulfatase yes 27.7% not avail. not avail. not avail.
ARSA - Homo Sapiens - - - 125 123, 150, 229, 302 29, 30, 69, 281, 282


Our potential templates, identified by the database searches contain all homologs with known structure, regarding to HSSP.

Single template modelling

In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. alignment.malign() uses classical dynamic programming to align two sequences. alignment.alig2dn() also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequnece alignments of the above templates with the target. The script used for this purpose is shown below:


from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='template_name', model_segment=('FIRST:@', 'END:'))
aln.append_model(mdl, align_codes='template_name', atom_files='template_name')
aln.append(file='1AUK.pir', align_codes='target_name')
aln.align2d()
aln.check()
aln.write(file='target-template-2d.ali', alignment_format='PIR') 
aln.malign()
aln.check()
aln.write(file='target-template.ali', alignment_format='PIR') 


For these alignments we constructed eight models, using the following script:


from modeller import *
from modeller.automodel import *    
log.verbose()   
env = environ() 
a = automodel(env,
             alnfile  = '1AUK-1FSU-2d.ali',   
             knowns   = '1FSU',              
             sequence = '1AUK',
             assess_methods=(assess.DOPE, assess.GA341))
a.starting_model= 1                
a.ending_model  = 1                
a.make()                          

We modified the paths and filenames in the scripts such that it matched our proteins of interest.

Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. The results are depicted in the table below.


Further on, we visualised the models using pymol. We load both structures into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:


align 1AUK, MODEL
hide all
show cartoon
# select color of modelled structure via graphical interface
ray
cmd.png("MODEL.png")

Alignment method 1P49 2VQR 1FSU 3ED4
Classical
Dynamic Programming
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1P49, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 2VQR, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1FSU, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4, visualized in Pymol
Dynamic Programming
with structural information
from the template
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1P49, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 2VQR, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1FSU, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4, visualized in Pymol

3ED4

real structure of 3ED4 visualized in Pymol
Alignment method 3ED4A 3ED4B 3ED4C 3ED4D
Dynamic Programming
with structural information
from the template
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4A, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4B, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4C, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4D, visualized in Pymol

Modification of Alignments

Using 1P49 as template structure for the modeling process yielded the best results, thus we decided to manually modify this alignment to see, if we can improve the model. We made sure, that there are no gaps in secondary structure elements and modified the alignment such that active site, substrate binding sites and metal binding sites were aligned. Altogether, we performed the following changes:

  • The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment.
  • The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
  • The gap between residues 190-191 was moved out of beta strand between residues 191-192.
  • All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
  • The gap between 290-291 was moved to the right end of the helix.

We used JalView to do this. Figures of the alignment before and after the changes are depicted on the right.
Surprisingly, the TM-score was decreased to 0.5561.

real structure of 1AUK and structure of 1AUK modelled by modeller with the modified templat alignment to 1P49, visualized in Pymol
Structures of modified and 2d alignment with template 1P49, visualized in Pymol
Initial alignment.
Modified alignment.

TM-scores and RMSD of the single template models

We downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using


gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f

TMscores were calculated as follows:


./TMscore MODEL.pdb REAL_STRUCTURE.pdb


PDB Identifier TM-score RMSD
Dynamic Programing with structural information
1P49 0.7960 -
2VQR 0.4825 -
1FSU 0.7146 -
3ED4 0.3881 -
3ED4A 0.7268 -
3ED4B 0.7251 -
3ED4C 0.6518 -
3ED4D 0.7303 -
Dynamic Programing without structural information
1P49 0.7731 -
2VQR 0.3183 -
1FSU 0.7223 -
3ED4 0.3122 -

Multiple Template Modeling

We calculated three models:

  • Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
  • Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
  • Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
Model 1 Model 2 Model 3
real structure of 1AUK and structure of 1AUK modelled (Model 1), visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled (Model 2), visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled (Model 3), visualized in Pymol
PDB Identifier TM-score RMSD
model1 0.5409 -
model2 0.6701 -
model3 0.6819 -

Initial multiple structural alignment:


from modeller import *
log.verbose()
env = environ()
env.io.atom_files_directory = './:./'
aln = alignment(env)
for (code, chain) in (('PROTEIN', 'CHAIN'), ('ANOTHER_PROTEIN', 'ANOTHER_CHAIN'), ...):
   mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
   aln.append_model(mdl, atom_files=code, align_codes=code+chain)
aln.salign()
aln.write(file='mymas.pap', alignment_format='PAP')
aln.write(file='mymsa.ali', alignment_format='PIR')

Add target sequence to MSA:


from modeller import *
log.verbose()
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib')
# Read aligned structure(s):
aln = alignment(env)
aln.append(file='mymsa.ali', align_codes='all')
aln_block = len(aln)
# Read aligned sequence(s):
aln.append(file='1AUK.pir', align_codes='1AUK')
# Structure sensitive variable gap penalty sequence-sequence alignment:
aln.salign()
aln.write(file='mymsa-1AUK.ali', alignment_format='PIR')
aln.write(file='mymsa-1AUK.pap', alignment_format='PAP')


Calculate the model:


from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='msa2-1AUK.ali',
              knowns=('PROTEIN', 'ANOTHER_PROTEIN', ...), sequence='1AUK')
a.starting_model = 1
a.ending_model = 1
a.make()


Modification of Alignments

iTasser

iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was the ranked best in CASP7 and CASP8 and ranked second in CASP9. iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>

Modelling without template

Modelling with single template

Discussion

SWISS-MODEL

SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated mode.

Modelling without template

In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.

Template

SWISS-MODEL identified 1N21A as best template. The name of the PDB-Entry of 1N21A is "(+)-Bornyl Diphosphate Synthase: Cocrystal with Mg and 3-aza-2,3-dihydrogeranyl diphosphate". The Alignment quality was very high as one can see:

TARGET    19      RPPNIVLI FADDLGYGDL GCYGHPSSTT PNLDQLAAGG LRFTDFYVPV
1n2lA     19      rppnivli faddlgygdl gcyghpsstt pnldqlaagg lrftdfyvpv
                                                                      
TARGET               sssss ss                    hhhhhhhh   ssss sss  
1n2lA                sssss ss                    hhhhhhhh   ssssssss  


TARGET    67    SLCTPSRAAL LTGRLPVRMG MYPGVLVPSS RGGLPLEEVT VAEVLAARGY
1n2lA     67    sl-tpsraal ltgrlpvrmg mypgvlvpss rgglpleevt vaevlaargy
                                                                      
TARGET               hhhhh hh    hhh          ss s          hhhhhhhh  
1n2lA               hhhhhh hh    hh           ss s          hhhhhhhh  


TARGET    117   LTGMAGKWHL GVGPEGAFLP PHQGFHRFLG IPYSHDQGPC QNLTCFPPAT
1n2lA     117   ltgmagkwhl gvgpegaflp phqgfhrflg ipyshdqgpc qnltcfppat
                                                                      
TARGET          ssssssss    sss sss   hhh   ssss s            sss    s
1n2lA           ssssssss    sss sss   hhh   ssss s            sss    s


TARGET    167   PCDGGCDQGL VPIPLLANLS VEAQPPWLPG LEARYMAFAH DLMADAQRQD
1n2lA     167   pcdggcdqgl vpipllanls veaqppwlpg learymafah dlmadaqrqd
                                                                      
TARGET          ss             ssss s ss          hhhhhhhhh hhhhhhhh  
1n2lA           ss             ssss s ss          hhhhhhhhh hhhhhhhh  


TARGET    217   RPFFLYYASH HTHYPQFSGQ SFAERSGRGP FGDSLMELDA AVGTLMTAIG
1n2lA     217   rpfflyyash hthypqfsgq sfaersgrgp fgdslmelda avgtlmtaig
                                                                      
TARGET           ssssssss                      h hhhhhhhhhh hhhhhhhhhh
1n2lA            ssssssss                      h hhhhhhhhhh hhhhhhhhhh


TARGET    267   DLGLLEETLV IFTADNGPET MRMSRGGCSG LLRCGKGTTY EGGVREPALA
1n2lA     267   dlglleetlv iftadngpet mrmsrggcsg llrcgkgtty eggvrepala
                                                                      
TARGET          hh    ssss sss                                 sss sss
1n2lA           h     ssss sss                              hhhsss sss


TARGET    317   FWPGHIAPGV THELASSLDL LPTLAALAGA PLPNVTLDGF DLSPLLLGTG
1n2lA     317   fwpghiapgv thelassldl lptlaalaga plpnvtldgf dlsplllgtg
                                                                      
TARGET          s       ss s      hhh hhhhhhhh                hhhhh   
1n2lA           s       ss s   ssshhh hhhhhhhh                hhhhh   


TARGET    367   KSPRQSLFFY PSYPDEVRGV FAVRTGKYKA HFFTQGSAHS DTTADPACHA
1n2lA     367   ksprqslffy psypdevrgv favrtgkyka hfftqgsahs dttadpacha
                                                                      
TARGET              sssss             ssssssssss ssss                 
1n2lA               sssss             ssssssssss ssss                 


TARGET    417   SSSLTAHEPP LLYDLSKDPG ENYNLLGGVA GATPEVLQAL KQLQLLKAQL
1n2lA     417   sssltahepp llydlskdpg enynllg--- gatpevlqal kqlqllkaql
                                                                      
TARGET              sss    sssss                    hhhhhhh hhhhhhhhhh
1n2lA               sss    sssss                    hhhhhhh hhhhhhhhhh


TARGET    467   DAAVTFGPSQ VARGEDPALQ ICCHPGCTPR PACCHCPD             
1n2lA     467   daavtfgpsq vargedpalq icchpgctpr pacchcpd-            
                                                                      
TARGET          hhh        hh sss                                     
1n2lA           hhh        hh sss   

Results

Estimated model quality in comparison to nonredundant PDB
Estimated density of model quality
Z-Score by category
estimation of local model quality
model colored by residue error

Modelling with single template

It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.

1P49

1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates.

2VQR

2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates.

References

<references />