Homology Modeling of ARS A
- 1 Proteins used as templates
- 2 Modeller
- 3 iTasser
- 4 SWISS-MODEL
- 5 3D-Jigsaw
- 6 Comparison of the methods
- 7 References
Proteins used as templates
From the previous alignment TASK (see Alignment TASK), we took four proteins which might serve as suitable templates for the modeling. The proteins are depicted in the below table. The information about active and binding sites were obtained from Uniprot and will serve as additional information for the manual modification of the alignments in order to try to improve the accuracy of the models. Interestingly, our potential templates - identified by the database searches - contain all homologs with known structure, regarding to HSSP.
|SeqIdentifier||Seq Identity (from TASK 2)||source||Protein function||True homolog (HSSP)||Seq Identity (pairw. ali.)||Active site||Substrate binding site||Metal binding site|
|1P49||39.0%||Homo Sapiens||Steryl-Sulfatase||yes||31.9%||136||333, 459||35, 36, 75, 342, 343,|
|1FSU||28.0%||Homo Sapiens||Arylsulfatase B||yes||26.5%||147||145, 242, 318||53, 54, 91, 300, 301|
|2VQR||20.0%||Rhizobium leguminosarum||Monoester Hydrolase||no||20.3%||not avail.||not avail.||12, 57, 324, 325|
|3ED4||32.0%||Escherichia coli||Arylsulfatase||yes||27.7%||not avail.||not avail.||not avail.|
|ARSA (1AUK)||-||Homo Sapiens||-||-||-||125||123, 150, 229, 302||29, 30, 69, 281, 282|
Furthermore, we used the HHsearch webserver to see if we could extend this list towards even more remotely related sequences. But the results did not yield significant hits that were more distantly related than 2VQR, so we decided to stick with the above proteins for modelling with Modeller.
Modeller is a program for comparative modeling of the 3D structure of a protein with unknown structure. It provides different methods for the calculation of the initial target-template alignment. Given the alignments, Modeller generates the backbone and optimizes a probablility function reflecting spatial restraints. The input alignments can be either pairwise sequence alignment - for single template modeling - or multiple sequence alignments - for multiple template modeling. <ref>AA. Sali, T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993</ref>
In this section, we use Modeller to model the 3D structure of ARSA and compare the results to the known structure from PDB. We wrote a tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis. It provides generic scripts and example code and executes all methods using default parameters.
Single template modelling
In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments.
alignment.malign() uses classical dynamic programming to align two sequences.
alignment.alig2dn() also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequence alignments of the above templates with the target. Then we modelled the structure with default parameters using the
automodel() class. The scripts used for this purpose can be seen in our protocol: Using Modeller for TASK 4 .
Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. In order to calculate the TM-scores, we downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using
gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f
The TM-scores were calculated as follows:
./TMscore MODEL.pdb REAL_STRUCTURE.pdb
In order to calculate the RMSD scores, we used DaliLite
The results are depicted below:
|Dynamic Programing with structural information|
|1FSU||0.7146||1.5 - 2.3|
|3ED4||0.3881||1.9 - 2.8|
|Dynamic Programing without structural information|
The RMSD score measures the root mean square deviation of the two structures. This is a straightforward measure to assess the similarity of our models. The TM-score also measures the deviation of the two structures from each other, but includes weight, i.e. the distances between close residues get a higher weight than the distances between distant residues. <ref>Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710 </ref>
The TM-score is more sensitive in detecting the same fold. Assume two proteins for which 80 % of the residues lie in a very similar fold, but the remaining 20 % of the residues fold completely different. One would consider these two proteins still as similar, but the RMSD might become very large regarding to the high distances in these 20 %.
Further on, we visualised the models using pymol. We loaded all pairs of model and real structure into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:
align 1AUK, MODEL hide all show cartoon # select color of modelled structure via graphical interface ray cmd.png("MODEL.png")
| Dynamic Programming
with structural information
from the template
Despite its evolutionary relationship, 3ED4 is a very poor template structure for modeling. Thus, we considered the structure of 3ED4 in more detail to figure out the reason for this behaviour.
First of all, we looked at the PDB entry of the protein and found out that 3ED4 consists of 4 different chains. Next, we plotted 3ED4 coloring each of the four chains (Figure is depicted below).
This visual inspection let us speculate that each of the individual chains structurally resemble our target protein ARSA. Thus we decided to use each individual chain for modeling. We again computed TM- and RMSD scores and used pymol to visualise the model together with the real structure of ARSA.
| Dynamic Programming
with structural information
from the template
Modification of Alignments
Using 1P49 as template structure for the modeling process yielded the best results thus we decided to manually modify this alignment to see if we can improve the model. We created two different modified alignments. For the first alignment we only made sure that active site, substrate binding sites and metal binding sites were aligned. For the second modified alignment we additionally removed gaps in secondary structure elements. For modification of the initial alignments, we used JalView. Images of the initial and the two modified alignments are depicted on the right border. Altogether, we performed the following changes:
- The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment. The region around the active site is well conserved between both enzymes. However, this conservation seems to be shifted, thus the amino acids at the active sites differ and an alignment of both sites decreases the alignment of conserved residues within this region. An image of the structural model together with the real structure is depcited below.
After this change we caluclated one model. The TM-score of this model drops to 0.6940. This might be due to the fact that the amino acids are conserved in the region around the active sites, but the alignment of the active sites thmeselves decreases the alignment quality because the conservation is somehow shifted by one amino acid (i.e. the active site in 1AUK/ARSA is identical to the amino acid before the active site in 1P49).
Normally, one does not have information about the secondary structure of the target sequence, but in our case, this information was available and thus we modified the alignments such that gaps within the secondary structure were avoided. There were no gaps in secondary structure elements of 1P49. This is due to the fact that we modified the output of
align2d() , which already uses secondary structure information to place gaps outside of these regions.
- The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
- The gap between residues 190-191 was moved out of beta strand between residues 191-192.
- All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
- The gap between 290-291 was moved to the right end of the helix.
Surprisingly, the TM-score was decreased even more to 0.5561. An image of the structural model together with the real structure is depcited below.
Multiple Template Modeling
For the multiple template model, we first needed to create a multiple sequence alignment of the templates and the target. We used the
salign() function, which uses (like
align2d() ) structural information from the template to place gaps in coil regions. Then we calculated the model using this MSA. The detaild workflow is agian described in ( Using Modeller for TASK 4 ). We calculated three models:
- Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
- Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
- Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
Below are depicted the TM- and RMSD scores, as well as images of the structural model and the real structure of ARS A.
|Model 1||Model 2||Model 3|
|model2||0.6701||2.4 - 3.1|
Modification of the MSA
Analogously to the previous section, we modified the alignment of Model 2 such that all active and binding sites were aligned. Images of the alignments and the structural model vs. the real structure are depicted on the right, below respectively.
The TM-score of the new model drops to 0.5685.
Modeller yields good results. The initial alignments are very important for the quality of the resulting model. When using the 2d-alignment method for single template modeling, we could improve the prediction accuracy, regarding to both measures (RMSD, TMscore).
Surprisingly, the multiple template models all perform worse than the three best single template models. Also the manual modification of the alignments results in a decreased modelling accuracy. This could be due to the fact that the automatic alignments are optimized over the whole alignment while the manual modified alignments are optimized in one region which seems to lead to a decrease of quality in other areas of the alignment.
iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was ranked best in CASP7 and CASP8 and ranked second in CASP9. iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
The confidence of a model is measured with the C-score which is based on the significance of the template alignments and the convergence parameters of the structure assembly simulations. The typical range for the C-score is [-5,2], where a higher C-score means higher confidence in the model. <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S72828/cscore.txt</ref>
Modelling without template
As one can see, model 1 is the model with the highest confidence. Model 1 has a estimated TM-score of 0.84 ± 0.08 and an estimated RMSD of 5.3 ± 3.4Å.
To compare the models, TM-score and RMSD to 1AUK were calculated:
|prediction by iTasser without template|
As one can see, three of the models are very good: model1, model2 and model5. They all have a rather high TM-score while having a very low RMSD. So going with the confidences assigned by iTasser seems to be a good way as these three models also were the models with the highest C-scores.
Modelling with single template
To specify a single template, one can specify a pdb-id or upload a pdb-file. Due to the long runtime of iTasser we only built a model based on 1P49. We chose 1P49 as it was the best template in the analyses before.
As one can see, model 1 is the model with the highest confidence. Model 1 has a estimated TM-score of 0.93 ± 0.06 and an estimated RMSD of 4.0 ± 2.7Å.
To compare the models, TM-score and RMSD to 1AUK were calculated:
|prediction by iTasser with 1P49 as template|
Here the C-score doesn't reflect the best model. The best model is model 4. But even model4 has a TM-score that is not really what we desire and a RMSD that is ok.
In this case, modelling without template seems to be the way to go. But that could be due to the fact that the option to exclude hits above a certain sequence similarity doesn't seem to work. So the great result could be due to a self-hit (or a very closely related sequence).
SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated and the alignment mode.
Modelling without template
In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.
SWISS-MODEL identified 1N2LA as best template. The name of the PDB-Entry of 1N2LA is "Crystal structure of a covalent intermediate of endogenous human arylsulfatase A". So the result should be very good but not really significant, as we are using a human Arylsulfatase A as a template. As expected, the Alignment quality was very high:
TARGET 19 RPPNIVLI FADDLGYGDL GCYGHPSSTT PNLDQLAAGG LRFTDFYVPV 1n2lA 19 rppnivli faddlgygdl gcyghpsstt pnldqlaagg lrftdfyvpv TARGET sssss ss hhhhhhhh ssss sss 1n2lA sssss ss hhhhhhhh ssssssss TARGET 67 SLCTPSRAAL LTGRLPVRMG MYPGVLVPSS RGGLPLEEVT VAEVLAARGY 1n2lA 67 sl-tpsraal ltgrlpvrmg mypgvlvpss rgglpleevt vaevlaargy TARGET hhhhh hh hhh ss s hhhhhhhh 1n2lA hhhhhh hh hh ss s hhhhhhhh TARGET 117 LTGMAGKWHL GVGPEGAFLP PHQGFHRFLG IPYSHDQGPC QNLTCFPPAT 1n2lA 117 ltgmagkwhl gvgpegaflp phqgfhrflg ipyshdqgpc qnltcfppat TARGET ssssssss sss sss hhh ssss s sss s 1n2lA ssssssss sss sss hhh ssss s sss s TARGET 167 PCDGGCDQGL VPIPLLANLS VEAQPPWLPG LEARYMAFAH DLMADAQRQD 1n2lA 167 pcdggcdqgl vpipllanls veaqppwlpg learymafah dlmadaqrqd TARGET ss ssss s ss hhhhhhhhh hhhhhhhh 1n2lA ss ssss s ss hhhhhhhhh hhhhhhhh TARGET 217 RPFFLYYASH HTHYPQFSGQ SFAERSGRGP FGDSLMELDA AVGTLMTAIG 1n2lA 217 rpfflyyash hthypqfsgq sfaersgrgp fgdslmelda avgtlmtaig TARGET ssssssss h hhhhhhhhhh hhhhhhhhhh 1n2lA ssssssss h hhhhhhhhhh hhhhhhhhhh TARGET 267 DLGLLEETLV IFTADNGPET MRMSRGGCSG LLRCGKGTTY EGGVREPALA 1n2lA 267 dlglleetlv iftadngpet mrmsrggcsg llrcgkgtty eggvrepala TARGET hh ssss sss sss sss 1n2lA h ssss sss hhhsss sss TARGET 317 FWPGHIAPGV THELASSLDL LPTLAALAGA PLPNVTLDGF DLSPLLLGTG 1n2lA 317 fwpghiapgv thelassldl lptlaalaga plpnvtldgf dlsplllgtg TARGET s ss s hhh hhhhhhhh hhhhh 1n2lA s ss s ssshhh hhhhhhhh hhhhh TARGET 367 KSPRQSLFFY PSYPDEVRGV FAVRTGKYKA HFFTQGSAHS DTTADPACHA 1n2lA 367 ksprqslffy psypdevrgv favrtgkyka hfftqgsahs dttadpacha TARGET sssss ssssssssss ssss 1n2lA sssss ssssssssss ssss TARGET 417 SSSLTAHEPP LLYDLSKDPG ENYNLLGGVA GATPEVLQAL KQLQLLKAQL 1n2lA 417 sssltahepp llydlskdpg enynllg--- gatpevlqal kqlqllkaql TARGET sss sssss hhhhhhh hhhhhhhhhh 1n2lA sss sssss hhhhhhh hhhhhhhhhh TARGET 467 DAAVTFGPSQ VARGEDPALQ ICCHPGCTPR PACCHCPD 1n2lA 467 daavtfgpsq vargedpalq icchpgctpr pacchcpd- TARGET hhh hh sss 1n2lA hhh hh sss
As one can see in the images above, the model quality is quite good, with uncertainties especially in the loop-regions. The result is not really surprising, as 1N2L is the structure of a human Arylsulfatase A.
Modelling with single template without alignment
It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.
1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates. With this low sequence identity the alignment quality was rather poor:
TARGET 1 RPPNIV LIFADDLGYG DLGCYGHPSS TTPNLDQLAA GGLRFTDFYV userX 23 aa--srpnii lvmaddlgig dpgcygnkti rtpnidrlas ggvkltqhla TARGET sss ssss hhhh ssssssss userX sss ssss hhhh sss ssss TARGET 47 PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SS-----RGG LPLEEVTVAE userX 71 a-spltpsra afmtgrypvr sgmaswsrtg vflftassgg lptdeitfak TARGET hhh hhh hh h hhh userX hhhh hhhh hh h hhh TARGET 92 VLAARGYLTG MAGKWHLGVG PEG----AFL PPHQGFHRFL GIPYSHDQGP userX 121 llkdqgysta ligkwhlgms chsktdfchh plhhgfnyfy gisltnlrdc TARGET hhhh sss sssss sss ss userX hhhh sss sssss sss ss TARGET 138 CQNLT-CFPP ATPCDG---- ---------- ---------- ---------G userX 171 kpgegsvftt gfkrlvflpl qivgvtlltl aalnclgllh vplgvffsll TARGET hh hhhhh userX hhh hhhh hhh hhhhhhhhhh hhhhhh hhhhhhh TARGET 154 CD--QGLVPI PLLANLSVEA QP-------- ----PWLPGL EARYMAFAHD userX 221 flaaliltlf lgflhyfrpl ncfmmrnyei iqqpmsydnl tqrltveaaq TARGET hhhh hhhhhhh hhhhhhhhh userX hhhhhhhhhh hhhhhhhhhh ssss ss sss h hhhhhhhhhh TARGET 190 LMADAQRQDR PFFLYYASHH THYPQFSGQS FAERSGRGPF GDSLMELDAA userX 271 fiq--rntet pfllvlsylh vhtalfsskd fagksqhgvy gdaveemdws TARGET hh ssssssss hh hhhhhhhhhh userX hhh ssssssss hh hhhhhhhhhh TARGET 240 VGTLMTAIGD LGLLEETLVI FTADNGPETM RM-----SRG GCSGLLRCGK userX 319 vgqilnllde lrlandtliy ftsdqgahve evsskgeihg gsngiykggk TARGET hhhhhhhhhh h sssss ssss userX hhhhhhhhhh h sssss ssss sss sss TARGET 285 GTTYEGGVRE PALAFWPGHI -APGVTHELA SSLDLLPTLA ALAGAPLPN- userX 369 annweggirv pgilrwprvi qagqkidept snmdifptva klagaplped TARGET hhsss ssss sss s sshhhhhhhh hhh userX sss ssss s sshhhhhhhh hhh TARGET 333 VTLDGFDLSP LLLGTGKSPR QSLFFYPS-- YPDEVRGVFA VRTGKYKAHF userX 419 riidgrdlmp llegksqrsd heflfhycna ylnavrwhpq nstsiwkaff TARGET hh hhh hhhhhh h ssssss userX hh hhh sssssss ssssssss ssssss TARGET 381 FTQGSAHSDT TADPACHASS SLTAHEPPLL YDLSKDPGEN YNLLGGVAGA userX 469 ftpnfnpvcf athvcfcfgs yvthhdppll fdiskdprer nplt----pa TARGET ss sss ss sss sss sss userX ss sss ss sss TARGET 431 TPEVLQAL-K QLQLLKAQLD AAVTFGPSQV A---RGEDPA LQICCHPGCT userX 519 seprfyeilk vmqeaadrht qtlpevpdqf swnnflwkpw lqlccp---s TARGET hh hh hhhhhhhhh userX hhh hhhhhhhhhh TARGET 477 PRPACC --- userX 566 tglscqcdre TARGET userX sss
As one can see in the images above, the model quality is not really good, due to the fact that the template seems to be too far related
2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates. With this even lower sequence identity the alignment quality was really poor:
TARGET 2 PPNIVLIFAD DLGYGDLG-- --CYGHPS-S TTPNLDQLAA GGLRFTDFYV userX 3 kknvllivvd qwradfvphv lradgkidfl ktpnldrlcr egvtfrnhvt TARGET sssssss hhhhhhhh h ssssssss userX sssssss hhh hhhh hhhhhhhh h ssssssss TARGET 47 PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SSRGGLPLEE VTVAEVLAAR userX 53 tcvpxgpara slltglylmn hravqntv-- ----pldqrh lnlgkalrgv TARGET hhhhhh hhh hhh h hhhhhhhh userX hhhh hhh hhh h hhhhhh TARGET 97 GYLTGMAGKW HLGVGPEGAF LPPHQGFHRF LGIPYSHDQG PCQ-----NL userX 97 gydpaligyt ttvpdprtt- spndprfrvl gdlmdgfhpv gafepnmegy TARGET ssss hh sss h userX ssss hh sss hhh TARGET 142 TCFPPATPCD GGCD-----Q GLVPIPLLAN LSVEAQPPWL PGLEARYMAF userX 146 fgwvaqngfd lpehrpdiwl pegedavaga tdrpsripke fsdstffter TARGET hhhhhh hhhhhhhh userX hhhhhh hhhhhhhh TARGET 187 AHDLMADAQR QDRPFFLYYA SHHTHYPQFS GQSFAERSGR ---------- userX 196 altylkg--r dgkpfflhlg yyrphppfva sapyhamyrp edmpapiraa TARGET hhhhhh ssssss s userX hhhhhhh h ssssss s TARGET 227 ---------- ---------- ---------- ---------- ----GPFGDS userX 244 npdieaaqhp lmkfyvdsir rgsffqgaeg sgatldeael rqmratycgl TARGET hhhh userX hhhhhh hhhhhhhhss s s ss hhhh hhhhhhhhhh TARGET 233 LMELDAAVGT LMTAIGDLGL LEETLVIFTA DNGPETMRMS RGGCSGLLRC userX 294 itevddclgr vfsyldetgq wddtliifts dhgeqlgdhh ll-------- TARGET hhhhhhhhhh hhhhhh h ssssss s userX hhhhhhhhhh hhhhhh h ssssss s TARGET 283 GKGTTYEGGV REPALAFWPG HI--APGVTH ELASSLDLLP TLAALAGAPL userX 336 gkigyndpsf riplvikdag enaragaies gftesidvmp tildwlggki TARGET hhs ss ssss sss ssshhhhh hhhhhh userX hhs ss ssss sss ssshhhhh hhhhhh TARGET 331 PNVTLDGFDL SPLLLGTGKS PRQ-SLFFYP SYP------- -------DEV userX 386 ph-acdglsl lpflsegrpq dwrtelhyey dfrdvyysep qsflglgmnd TARGET hhh sssss userX hhh sssss ss hhhh TARGET 366 RGVFAVRTGK YKAHFFTQGS AHSDTTADPA CHASSSLTAH EPPLLYDLSK userX 435 cslcviqder ykyvhfaa-- ---------- ---------- lpplffdlrh TARGET sssssss s sssssss s ss ss s sssss userX ssssssss s sssssss sssss TARGET 416 DPGENYNLLG GVAGATPEVL QAL-KQLQLL KAQLDAA -- ---------- userX 463 dpneftnlad d--payaalv rdyaqkalsw rlkhadrtlt hyrsgpegls TARGET hhhh hh hhhh hhh userX hhh hhhhhhhhhh hhh sssss sss TARGET ---- userX 511 ersh TARGET userX ss
Due to the low alignment quality, the model quality was so low, that the only result was a plot of the local quality.
Modelling with single template with alignment
SWISS-MODEL was not able to build a model based on our alignment.
TARGET 4 PRSLLLA LAAGLAVARP PNIVLIFADD LGYGDLGCYG HPSSTTPNLD 2vqrA 3 kknvlli vvdqwradfv phvlr--adg -kidfl---- ----ktpnld TARGET sssss ss sss hhhh sss sss hhhh 2vqrA sssss ss hhhhh hh hhhh TARGET 51 QLAAGGLRFT DFYVPVSLCT PSRAALLTGR LPVRMGMYPG VLVPSSRGGL 2vqrA 39 rlcregvtfr nhvttcvpcg paraslltgl ylmnhravqn t-vpldqrhl TARGET hhhhh ssss ssss hh hhhhhhh hhhh 2vqrA hhhhh ssss ssss hh hhhhhhh hhhh TARGET 101 PLEEV--TVA EVLAARGYLT GM-------- -------AGK WHLGVGPEGA 2vqrA 88 nlgkalrgvg ydpaligytt tvpdprttsp ndprfrvlgd lmdgfhpvga TARGET ssss sss 2vqrA hhhhhh ssss hh sss TARGET 134 FLPPHQGFHR FL------GI PYSHDQGPC- ---------- ---QNLTCFP 2vqrA 138 fepnmegyfg wvaqngfdlp ehrpdiwlpe gedavagatd rpsripkefs TARGET hhhhh hh hhh 2vqrA hhhhh hhhh TARGET 164 PATPCDGGCD ---QGLVPIP LLANLSV-EA QPPWLPGLEA R-------YM 2vqrA 188 dstffteral tylkgrdgkp fflhlgyyrp hppfvasapy hamyrpedmp TARGET hhhhhhhhh ssssss 2vqrA hhhhhhhhhh hhhhhh sssssss TARGET 203 AFAHDLMADA QRQDRPFFLY YASHHTHY-- ---PQFSGQS FAE---RSGR 2vqrA 238 apiraanpdi eaaqhplmkf yvdsirrgsf fqgaegsgat ldeaelrqmr TARGET hhh hhh hhhh hhhh hh 2vqrA hhh hhh hhhh hhhhsss sss hhhhhhhh TARGET 245 GPFGDSLMEL DAAVGTLMTA IGDLGLLEET LVIFTADNGP ETMRMSRGGC 2vqrA 288 atycglitev ddclgrvfsy ldetgqwddt liiftsdhg- eql-----gd TARGET hhhhhhhhhh hhhhhhhhhh hh h ssssssssss 2vqrA hhhhhhhhhh hhhhhhhhhh hh h sssssss TARGET 295 SGLLRCGKGT TYEGGVREPA LAFWPGHI-A PGVTH-ELAS SLDLLPTLAA 2vqrA 332 hhll--gkig yndpsfripl vikdagenar agaiesgfte sidvmptild TARGET hhsss s sss sss ss shhhhhhhhh 2vqrA hhsss s sss sss ss shhhhhhhhh TARGET 343 LAGAPLPNVT LDGFDLSPLL LGTGKSPRQS LFFYPSYPDE VRGVFAVRTG 2vqrA 380 wlggkiph-a cdglsllpfl s-egr-p-qd wrtelhyeyd frdvy---ys TARGET hh hhhh ssssss s 2vqrA hh hhh ssssss s TARGET 393 KYKAHFFTQG S-AHSDTTAD PACHASSSLT AHEPPLLYDL SKDPGENYNL 2vqrA 423 epqs-flglg mndcslcviq derykyvhfa al-pplffdl rhdpneftnl TARGET sssssss s ssssssss sssss 2vqrA hh hh sssssss s ssssssss sssss TARGET 442 LGGVAGATPE VLQALKQLQL LKAQLDAAVT FGPSQVARGE DPALQICCHP 2vqrA 471 addpayaalv rdyaqkalsw rlkhadrtlt ----hyrsg- -pe------- TARGET hhhh hhhhhhhhhh hhh sssss sss ssss 2vqrA hhh hhhhhhhhhh hhh sssss TARGET 492 GCTPRPACCH CPDPH 2vqrA 508 glser----- ---sh- TARGET ssss 2vqrA sssss
As one can see in the images above, the model was even worse than the one built with 2VQR without alignment.
|prediction by SWISS-MODEL|
|2VQR||no common residue||no calculation possible|
|1P49 ali||no model||no model|
As expected, the model that was built with a human Arylsulfatase A is by far the best model. But if we imagine that no resolved structure were available the models are all not really good. With 1P49 the RMSD is close to the acceptable range while the TM-score is so low that it does not seem to make sense to use this predicted structure for anything while structure predicted with 2VQR with given alignment has the highest TM-score in our non-self-hits but a RMSD that is above the acceptable range. So it seems as if there are only templates that are too close or too far away to reasonably model the structure with it.
Without knowing the real TM-score and RMSD, we decided to build a combined structure out of the following predictions:
- modeller prediction without structure information based on 1P49
- modeller prediction with structure information based on 1P49
- iTasser prediction with single template based on 1P49
- SWISS-MODEL prediction with single template based on 1P49
In the following the models are evaluated using RMSD and TM score and visually superimposed with the real structure using pymol:
|Model 1||Model 2||Model 3||Model 4||Model 5|
|prediction by 3D-jigsaw|
|Model 5||0.9121||1.1 – 1.3|
Comparison of the methods
The RMSDs of all methods range from 2 to 3.5 Angstrom. This suggests, that all methods produced accteptable models, but none of them were outstanding (except of the two models for iTasser and SWISS-MODEL, which are better due to the self-hit during the template recognition).
The TM-scores for all models vary more than the RMSDs. An RMSD of < 0.5 suggests that the model is not good and a TM score of >= 0.5 suggests that the model exhibits a very similar fold as the original structure. The TM scores for Modeller range from 0.4 - 0.8. Using multiple templates or manual modifications of the alignments decreased the accuracy. Surprisingly all iTasser (excluding self-hit) models yield a very low TM-score (around 0.4), despite the RMSDs are very similar to the predictions of modeller. Also the SWISS-MODEL TM-scores are low compared to that of iTasser (all below 0.5).
Regarding to the RMSD scores, no method performs substantially better than another one. Regarding to the TM scores Modeller performs best and regarding to both measures also modeller outperforms iTasser and SWISS-MODEL. For all methods, 1P49 was the best template, thus we included the models generated when using 1P49 to 3D-Jigsaw. 3D-Jigsaw generated 5 models; all with very high TM scores (around 0.9) and low RMSDs (1-2 Angstrom). Thus 3D-Jigsaw was able to substantially improve the models generated by our previously applied methods. The best model is model 5 with an RMSD of 1.1-1.3 and a TM score of 0.91.