Difference between revisions of "Homology Modeling of ARS A"

From Bioinformatikpedia
(Modification of Alignments)
m (References)
 
(90 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
== Proteins used as templates ==
== HHpred ==
 
  +
From the previous alignment TASK (see [[metachromatic_leukodystrophy_reference_aminoacids|Alignment TASK]]), we took four proteins which might serve as suitable templates for the modeling. The proteins are depicted in the below table. The information about active and binding sites were obtained from Uniprot and will serve as additional information for the manual modification of the alignments in order to try to improve the accuracy of the models. Interestingly, our potential templates - identified by the database searches - contain all homologs with known structure, regarding to HSSP. <br>
 
We used the [http://toolkit.lmb.uni-muenchen.de/hhpred webserver] and
 
 
== Modeller ==
 
We wrote a modeling tutorial ([[Using Modeller for TASK 4 | Using Modeller for TASK 4 ]]) comprising all necessary steps in the following analysis.
 
 
=== Proteins used as templates ===
 
We identified the following proteins (see [[metachromatic_leukodystrophy_reference_aminoacids|Alignment TASK]]) as potential targets for homology modeling:used the following
 
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 28: Line 21:
 
|3ED4|| 32.0% || Escherichia coli || Arylsulfatase || yes || 27.7% || not avail. || not avail. || not avail.
 
|3ED4|| 32.0% || Escherichia coli || Arylsulfatase || yes || 27.7% || not avail. || not avail. || not avail.
 
|-
 
|-
|ARSA|| - || Homo Sapiens || -|| - || - || 125 || 123, 150, 229, 302 || 29, 30, 69, 281, 282
+
|ARSA (1AUK) || - || Homo Sapiens || -|| - || - || 125 || 123, 150, 229, 302 || 29, 30, 69, 281, 282
 
|-
 
|-
 
|}
 
|}
   
  +
<br>
 
  +
Furthermore, we used the [http://toolkit.lmb.uni-muenchen.de/hhpred HHsearch webserver] to see if we could extend this list towards even more remotely related sequences. But the results did not yield significant hits that were more distantly related than 2VQR, so we decided to stick with the above proteins for modelling with Modeller.
Our potential templates, identified by the database searches contain all homologs with known structure, regarding to HSSP.
 
  +
  +
== Modeller ==
  +
Modeller is a program for comparative modeling of the 3D structure of a protein with unknown structure. It provides different methods for the calculation of the initial target-template alignment. Given the alignments, Modeller generates the backbone and optimizes a probablility function reflecting spatial restraints. The input alignments can be either pairwise sequence alignment - for single template modeling - or multiple sequence alignments - for multiple template modeling. <ref>AA. Sali, T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993</ref> <br>
  +
In this section, we use Modeller to model the 3D structure of ARSA and compare the results to the known structure from PDB. We wrote a tutorial ([[Using Modeller for TASK 4 | Using Modeller for TASK 4 ]]) comprising all necessary steps in the following analysis. It provides generic scripts and example code and executes all methods using default parameters.
  +
   
 
=== Single template modelling ===
 
=== Single template modelling ===
In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. <code> alignment.malign() </code> uses classical dynamic programming to align two sequences. <code> alignment.alig2dn() </code> also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequnece alignments of the above templates with the target. The script used for this purpose is shown below:
+
In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. <code> alignment.malign() </code> uses classical dynamic programming to align two sequences. <code> alignment.alig2dn() </code> also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequence alignments of the above templates with the target. Then we modelled the structure with default parameters using the <code>automodel()</code> class. The scripts used for this purpose can be seen in our protocol: [[Using Modeller for TASK 4 | Using Modeller for TASK 4 ]]. <br>
  +
  +
Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. In order to calculate the TM-scores, we downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using
   
 
<code>
 
<code>
  +
gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f
from modeller import *
 
env = environ()
 
aln = alignment(env)
 
mdl = model(env, file='template_name', model_segment=('FIRST:@', 'END:'))
 
aln.append_model(mdl, align_codes='template_name', atom_files='template_name')
 
aln.append(file='1AUK.pir', align_codes='target_name')
 
aln.align2d()
 
aln.check()
 
aln.write(file='target-template-2d.ali', alignment_format='PIR')
 
aln.malign()
 
aln.check()
 
aln.write(file='target-template.ali', alignment_format='PIR')
 
 
</code>
 
</code>
   
  +
The TM-scores were calculated as follows:
 
  +
For these alignments we constructed eight models, using the following script:
 
 
<code>
 
<code>
  +
./TMscore MODEL.pdb REAL_STRUCTURE.pdb
from modeller import *
 
from modeller.automodel import *
 
log.verbose()
 
env = environ()
 
a = automodel(env,
 
alnfile = '1AUK-1FSU-2d.ali',
 
knowns = '1FSU',
 
sequence = '1AUK',
 
assess_methods=(assess.DOPE, assess.GA341))
 
a.starting_model= 1
 
a.ending_model = 1
 
a.make()
 
 
</code>
 
</code>
   
  +
In order to calculate the RMSD scores, we used [http://www.ebi.ac.uk/Tools/dalilite/ DaliLite] <br>
We modified the paths and filenames in the scripts such that it matched our proteins of interest.
 
   
  +
The results are depicted below: <br>
Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. The results are depicted in the table below.
 
   
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
! PDB Identifier
  +
! TM-score
  +
! RMSD
  +
|-
  +
!colspan="4"| Dynamic Programing with structural information
  +
|-
  +
| 1P49 || 0.7960 || 2.0
  +
|-
  +
| 2VQR || 0.4825 || 3.5
  +
|-
  +
| 1FSU || 0.7146 || 1.5 - 2.3
  +
|-
  +
| 3ED4 || 0.3881 || 1.9 - 2.8
  +
|-
  +
!colspan="4"| Dynamic Programing without structural information
  +
|-
  +
| 1P49 || 0.7731 || 2.2
  +
|-
  +
| 2VQR || 0.3183 || 3.5
  +
|-
  +
| 1FSU || 0.7223 || 2.7
  +
|-
  +
| 3ED4 || 0.3122 || 4.1
  +
|-
  +
|}
   
  +
The RMSD score measures the '''r'''oot '''m'''ean '''s'''quare '''d'''eviation of the two structures. This is a straightforward measure to assess the similarity of our models. The TM-score also measures the deviation of the two structures from each other, but includes weight, i.e. the distances between close residues get a higher weight than the distances between distant residues. <ref>Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710 </ref> <br>
Further on, we visualised the models using pymol. We load both structures into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:
 
  +
The TM-score is more sensitive in detecting the same fold. Assume two proteins for which 80 % of the residues lie in a very similar fold, but the remaining 20 % of the residues fold completely different. One would consider these two proteins still as similar, but the RMSD might become very large regarding to the high distances in these 20 %. <br>
  +
Further on, we visualised the models using pymol. We loaded all pairs of model and real structure into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:
   
 
<code>
 
<code>
Line 94: Line 100:
 
|-
 
|-
 
| '' Classical <br> Dynamic Programming ''
 
| '' Classical <br> Dynamic Programming ''
| [[File:1AUK.1P49.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1P49, visualized in Pymol]]
+
| [[File:1AUK.1P49.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 1P49, visualized in Pymol]]
| [[File:1AUK.2VQR.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 2VQR, visualized in Pymol]]
+
| [[File:1AUK.2VQR.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 2VQR, visualized in Pymol]]
| [[File:1AUK.1FSU.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1FSU, visualized in Pymol]]
+
| [[File:1AUK.1FSU.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 1FSU, visualized in Pymol]]
| [[File:1AUK.3ED4.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4, visualized in Pymol]]
+
| [[File:1AUK.3ED4.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 3ED4, visualized in Pymol]]
 
|-
 
|-
 
| '' Dynamic Programming <br> with structural information <br> from the template''
 
| '' Dynamic Programming <br> with structural information <br> from the template''
| [[File:1AUK.1P492d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1P49, visualized in Pymol]]
+
| [[File:1AUK.1P492d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 1P49, visualized in Pymol]]
| [[File:1AUK.2VQR2d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 2VQR, visualized in Pymol]]
+
| [[File:1AUK.2VQR2d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 2VQR, visualized in Pymol]]
| [[File:1AUK.1FSU2d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 1FSU, visualized in Pymol]]
+
| [[File:1AUK.1FSU2d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 1FSU, visualized in Pymol]]
| [[File:1AUK.3ED42d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4, visualized in Pymol]]
+
| [[File:1AUK.3ED42d.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with template 3ED4, visualized in Pymol]]
 
|-
 
|-
 
|}
 
|}
   
 
==== 3ED4 ====
 
==== 3ED4 ====
  +
Despite its evolutionary relationship, 3ED4 is a very poor template structure for modeling. Thus, we considered the structure of 3ED4 in more detail to figure out the reason for this behaviour. <br>
  +
First of all, we looked at the PDB entry of the protein and found out that 3ED4 consists of 4 different chains. Next, we plotted 3ED4 coloring each of the four chains (Figure is depicted below).
   
 
[[File:3E4D.png | 200px | center | thumb | real structure of 3ED4 visualized in Pymol]]
 
[[File:3E4D.png | 200px | center | thumb | real structure of 3ED4 visualized in Pymol]]
  +
  +
This visual inspection let us speculate that each of the individual chains structurally resemble our target protein ARSA. Thus we decided to use each individual chain for modeling. We again computed TM- and RMSD scores and used pymol to visualise the model together with the real structure of ARSA.
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
| chain || TM-score || RMSD
  +
|-
  +
| 3ED4A || 0.7268 || 2.9
  +
|-
  +
| 3ED4B || 0.7251 || 2.8
  +
|-
  +
| 3ED4C || 0.6518 || 2.8
  +
|-
  +
| 3ED4D || 0.7303 || 2.8
  +
|-
  +
|}
  +
  +
<br>
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 127: Line 152:
   
 
==== Modification of Alignments ====
 
==== Modification of Alignments ====
Using 1P49 as template structure for the modeling process yielded the best results, thus we decided to manually modify this alignment to see, if we can improve the model. We made sure, that there are no gaps in secondary structure elements and modified the alignment such that active site, substrate binding sites and metal binding sites were aligned. For modification of the initial alignments, we used JAlView. Altogether, we performed the following changes:
+
Using 1P49 as template structure for the modeling process yielded the best results thus we decided to manually modify this alignment to see if we can improve the model. We created two different modified alignments. For the first alignment we only made sure that active site, substrate binding sites and metal binding sites were aligned. For the second modified alignment we additionally removed gaps in secondary structure elements. For modification of the initial alignments, we used JalView. Images of the initial and the two modified alignments are depicted on the right border. Altogether, we performed the following changes:
   
  +
[[File:1p49.ali.png | 200px | right | thumb | Initial alignment of 1AUK and 1P49. ]]
* The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment. The region around the active site is well conserved between both enzymes. However, this conservation seems to be shifted, thus the amino acids at the active sites differ and an alignment of both sites decreases the alignment of conserved residues within this region.
 
   
  +
* The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment. The region around the active site is well conserved between both enzymes. However, this conservation seems to be shifted, thus the amino acids at the active sites differ and an alignment of both sites decreases the alignment of conserved residues within this region. An image of the structural model together with the real structure is depcited below.
After this change we caluclated one model. The TM-score drops to 0.6940. This might be due to the fact, that the amino acids are conserved in the region around the active sites, but the alignment of the active sites thmeselves decrease the alignment wuality (as described above). Normally, one does not have information about the secondary structure of the target sequence, but in our case, this information was available and thus we modified the alignments such that gaps within the secondary structure were avoided.
 
  +
  +
[[File:1AUK.1P49.mod.act.cite.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with the modified template alignment (above change) to 1P49, visualized in Pymol]]
  +
  +
[[File:Only_act_site.png | 200px | right | thumb | Modified alignment 1 (active and binding sites are aligned) of 1AUK and 1P49. ]]
  +
  +
After this change we caluclated one model. The TM-score of this model drops to 0.6940. This might be due to the fact that the amino acids are conserved in the region around the active sites, but the alignment of the active sites thmeselves decreases the alignment quality because the conservation is somehow shifted by one amino acid (i.e. the active site in 1AUK/ARSA is identical to the amino acid before the active site in 1P49). <br>
  +
Normally, one does not have information about the secondary structure of the target sequence, but in our case, this information was available and thus we modified the alignments such that gaps within the secondary structure were avoided. There were no gaps in secondary structure elements of 1P49. This is due to the fact that we modified the output of <code> align2d() </code>, which already uses secondary structure information to place gaps outside of these regions.
   
 
* The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
 
* The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
Line 138: Line 170:
 
* The gap between 290-291 was moved to the right end of the helix.
 
* The gap between 290-291 was moved to the right end of the helix.
   
  +
[[File:1P49.mod.al.png | 200px | right | thumb | Modified alignment 2 (active and binding sites are aligned and gaps are removed from secondary structure) of 1AUK and 1P49. ]]
Surprisingly, the TM-score was decreased even more to 0.5561.
 
   
[[File:1P49.mod.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with the modified templat alignment to 1P49, visualized in Pymol]]
 
[[File:1p49.ali.png | 200px | center | thumb | Initial alignment. ]]
 
[[File:1P49.mod.al.png | 200px | center | thumb | Modified alignment. ]]
 
   
  +
Surprisingly, the TM-score was decreased even more to 0.5561. An image of the structural model together with the real structure is depcited below.
==== TM-scores and RMSD of the single template models ====
 
We downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using
 
   
<code>
 
gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f
 
</code>
 
   
  +
[[File:1P49.mod.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by modeller with the modified template alignment to 1P49, visualized in Pymol]]
TMscores were calculated as follows:
 
 
<code>
 
./TMscore MODEL.pdb REAL_STRUCTURE.pdb
 
</code>
 
 
 
{| border="1" style="text-align:center; border-spacing:0;"
 
! PDB Identifier
 
! TM-score
 
! RMSD
 
|-
 
!colspan="4"| Dynamic Programing with structural information
 
|-
 
| 1P49 || 0.7960 || -
 
|-
 
| 2VQR || 0.4825 || -
 
|-
 
| 1FSU || 0.7146 || -
 
|-
 
| 3ED4 || 0.3881 || -
 
|-
 
| 3ED4A || 0.7268 || -
 
|-
 
| 3ED4B || 0.7251 || -
 
|-
 
| 3ED4C || 0.6518 || -
 
|-
 
| 3ED4D || 0.7303 || -
 
|-
 
!colspan="4"| Dynamic Programing without structural information
 
|-
 
| 1P49 || 0.7731 || -
 
|-
 
| 2VQR || 0.3183 || -
 
|-
 
| 1FSU || 0.7223 || -
 
|-
 
| 3ED4 || 0.3122 || -
 
|-
 
|}
 
   
 
=== Multiple Template Modeling ===
 
=== Multiple Template Modeling ===
   
  +
For the multiple template model, we first needed to create a multiple sequence alignment of the templates and the target. We used the <code> salign() </code> function, which uses (like <code> align2d() </code>) structural information from the template to place gaps in coil regions. Then we calculated the model using this MSA. The detaild workflow is agian described in ([[Using Modeller for TASK 4 | Using Modeller for TASK 4 ]]). We calculated three models:
We calculated three models:
 
   
 
* ''Model 1'' was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
 
* ''Model 1'' was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
 
* ''Model 2'' was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
 
* ''Model 2'' was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
 
* ''Model 3'' was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
 
* ''Model 3'' was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.
  +
  +
Below are depicted the TM- and RMSD scores, as well as images of the structural model and the real structure of ARS A.
  +
   
 
{| border="1" style="text-align:center; border-spacing:0;"
 
{| border="1" style="text-align:center; border-spacing:0;"
Line 217: Line 205:
 
! RMSD
 
! RMSD
 
|-
 
|-
| ''model1'' || 0.5409 || -
+
| ''model1'' || 0.5409 || 2.4
 
|-
 
|-
| ''model2'' || 0.6701 || -
+
| ''model2'' || 0.6701 || 2.4 - 3.1
 
|-
 
|-
| ''model3'' || 0.6819 || -
+
| ''model3'' || 0.6819 || 3.2
 
|-
 
|-
 
|}
 
|}
   
Initial multiple structural alignment:
 
   
  +
==== Modification of the MSA ====
<code>
 
  +
Analogously to the previous section, we modified the alignment of ''Model 2'' such that all active and binding sites were aligned. Images of the alignments and the structural model vs. the real structure are depicted on the right, below respectively.
from modeller import *
 
log.verbose()
 
env = environ()
 
env.io.atom_files_directory = './:./'
 
aln = alignment(env)
 
for (code, chain) in (('PROTEIN', 'CHAIN'), ('ANOTHER_PROTEIN', 'ANOTHER_CHAIN'), ...):
 
mdl = model(env, file=code, model_segment=('FIRST:'+chain, 'LAST:'+chain))
 
aln.append_model(mdl, atom_files=code, align_codes=code+chain)
 
aln.salign()
 
aln.write(file='mymas.pap', alignment_format='PAP')
 
aln.write(file='mymsa.ali', alignment_format='PIR')
 
</code>
 
   
  +
[[File:Initial.msa3.png | 200px | right | thumb | Initial MSA of ''Model 2''. ]]
Add target sequence to MSA:
 
  +
[[File:Modified.mas3.png | 200px | right | thumb | Modified MSA (active and binding sites are aligned) of ''Model 2''. ]]
  +
[[File:1AUK.msa3.mod.png | 200px | center | thumb | Modified ''Model 2''. ]]
   
  +
The TM-score of the new model drops to 0.5685.
<code>
 
from modeller import *
 
log.verbose()
 
env = environ()
 
env.libs.topology.read(file='$(LIB)/top_heav.lib')
 
# Read aligned structure(s):
 
aln = alignment(env)
 
aln.append(file='mymsa.ali', align_codes='all')
 
aln_block = len(aln)
 
# Read aligned sequence(s):
 
aln.append(file='1AUK.pir', align_codes='1AUK')
 
# Structure sensitive variable gap penalty sequence-sequence alignment:
 
aln.salign()
 
aln.write(file='mymsa-1AUK.ali', alignment_format='PIR')
 
aln.write(file='mymsa-1AUK.pap', alignment_format='PAP')
 
</code>
 
   
  +
=== Discussion ===
   
  +
Modeller yields good results. The initial alignments are very important for the quality of the resulting model. When using the 2d-alignment method for single template modeling, we could improve the prediction accuracy, regarding to both measures (RMSD, TMscore). <br>
Calculate the model:
 
  +
Surprisingly, the multiple template models all perform worse than the three best single template models. Also the manual modification of the alignments results in a decreased modelling accuracy.
 
  +
This could be due to the fact that the automatic alignments are optimized over the whole alignment while the manual modified alignments are optimized in one region which seems to lead to a decrease of quality in other areas of the alignment.
<code>
 
from modeller import *
 
from modeller.automodel import *
 
env = environ()
 
a = automodel(env, alnfile='msa2-1AUK.ali',
 
knowns=('PROTEIN', 'ANOTHER_PROTEIN', ...), sequence='1AUK')
 
a.starting_model = 1
 
a.ending_model = 1
 
a.make()
 
</code>
 
 
 
==== Modification of Alignments ====
 
   
 
== iTasser ==
 
== iTasser ==
   
iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was the ranked best in CASP7 and CASP8 and ranked second in CASP9.
+
iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was ranked best in CASP7 and CASP8 and ranked second in CASP9.
 
iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
 
iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>
 
[[Image:I-TASSER_workflow.jpg|thumb|workflow of iTasser <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html</ref>]]
 
[[Image:I-TASSER_workflow.jpg|thumb|workflow of iTasser <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/about.html</ref>]]
   
The confidence of a model is measured with the C-score which is based on the significance of the template alignments and the convergence parameters of the structure
+
The confidence of a model is measured with the C-score which is based on the significance of the template alignments and the convergence parameters of the structure assembly simulations. The typical range for the C-score is [-5,2], where a higher C-score means higher confidence in the model. <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S72828/cscore.txt</ref>
assembly simulations. The typical range for the C-score is [-5,2], where a higher C-score means higher confidence in the model. <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S72828/cscore.txt</ref>
 
   
 
=== Modelling without template ===
 
=== Modelling without template ===
Line 296: Line 247:
 
|}
 
|}
   
As one can see, model 1 is the model with the highest confidence. Model 1 has a TM-score of 0.84 ± 0.08 and a RMSD of 5.3 ± 3.4Å.
+
As one can see, model 1 is the model with the highest confidence. Model 1 has a estimated TM-score of 0.84 ± 0.08 and an estimated RMSD of 5.3 ± 3.4Å.
  +
  +
To compare the models, TM-score and RMSD to 1AUK were calculated:
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!colspan="4"| prediction by iTasser without template
  +
|-
  +
! model
  +
! TM-score
  +
! RMSD
  +
|-
  +
| model1 || 0.9971 || 0.5
  +
|-
  +
| model2 || 0.9972 || 0.5
  +
|-
  +
| model3 || 0.9891 || 0.8
  +
|-
  +
| model4 || 0.9220 || 1.8
  +
|-
  +
| model5 || 0.9972 || 0.5
  +
|-
  +
|}
  +
  +
As one can see, three of the models are very good: model1, model2 and model5. They all have a rather high TM-score while having a very low RMSD. So going with the confidences assigned by iTasser seems to be a good way as these three models also were the models with the highest C-scores.
   
 
=== Modelling with single template ===
 
=== Modelling with single template ===
  +
  +
To specify a single template, one can specify a pdb-id or upload a pdb-file.
  +
Due to the long runtime of iTasser we only built a model based on 1P49. We chose 1P49 as it was the best template in the analyses before.
  +
  +
{| class="centered"
  +
|[[Image:ARSA_iTasser_1P49_model1.gif|thumb| model 1 for ARSA by iTasser, C-score: 1.563]]
  +
|[[Image:ARSA_iTasser_1P49_model2.gif|thumb| model 2 for ARSA by iTasser, C-score: -1.121]]
  +
|[[Image:ARSA_iTasser_1P49_model3.gif|thumb| model 3 for ARSA by iTasser, C-score: -1.507]]
  +
|[[Image:ARSA_iTasser_1P49_model4.gif|thumb| model 4 for ARSA by iTasser, C-score: -1.478]]
  +
|[[Image:ARSA_iTasser_1P49_model5.gif‎|thumb| model 5 for ARSA by iTasser, C-score: -0.795]]
  +
|}
  +
  +
As one can see, model 1 is the model with the highest confidence. Model 1 has a estimated TM-score of 0.93 ± 0.06 and an estimated RMSD of 4.0 ± 2.7Å.
  +
  +
  +
To compare the models, TM-score and RMSD to 1AUK were calculated:
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!colspan="4"| prediction by iTasser with 1P49 as template
  +
|-
  +
! model
  +
! TM-score
  +
! RMSD
  +
|-
  +
| model1 || 0.2059 || 2.1
  +
|-
  +
| model2 || 0.2057 || 2.1
  +
|-
  +
| model3 || 0.2063 || 2.0
  +
|-
  +
| model4 || 0.2065 || 1.9
  +
|-
  +
| model5 || 0.2052 || 2.2
  +
|-
  +
|}
  +
  +
Here the C-score doesn't reflect the best model. The best model is model 4. But even model4 has a TM-score that is not really what we desire and a RMSD that is ok.
   
 
=== Discussion ===
 
=== Discussion ===
  +
In this case, modelling without template seems to be the way to go. But that could be due to the fact that the option to exclude hits above a certain sequence similarity doesn't seem to work. So the great result could be due to a self-hit (or a very closely related sequence).
   
 
== SWISS-MODEL ==
 
== SWISS-MODEL ==
SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated mode.
+
SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated and the alignment mode.
   
 
=== Modelling without template ===
 
=== Modelling without template ===
Line 309: Line 321:
   
 
==== Template ====
 
==== Template ====
SWISS-MODEL identified 1N21A as best template.
+
SWISS-MODEL identified 1N2LA as best template.
The name of the PDB-Entry of 1N2LA is "Crystal structure of a covalent intermediate of endogenous human arylsulfatase A". So the result should be very good, as we are using a human Arylsulfatase A as a template.
+
The name of the PDB-Entry of 1N2LA is "Crystal structure of a covalent intermediate of endogenous human arylsulfatase A". So the result should be very good but not really significant, as we are using a human Arylsulfatase A as a template.
 
As expected, the Alignment quality was very high:
 
As expected, the Alignment quality was very high:
   
Line 396: Line 408:
 
As one can see in the images above, the model quality is quite good, with uncertainties especially in the loop-regions. The result is not really surprising, as 1N2L is the structure of a human Arylsulfatase A.
 
As one can see in the images above, the model quality is quite good, with uncertainties especially in the loop-regions. The result is not really surprising, as 1N2L is the structure of a human Arylsulfatase A.
   
=== Modelling with single template ===
+
=== Modelling with single template without alignment ===
 
It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.
 
It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.
   
Line 588: Line 600:
 
userX ss
 
userX ss
 
</nowiki>
 
</nowiki>
  +
  +
  +
===== Results =====
   
 
Due to the low alignment quality, the model quality was so low, that the only result was a plot of the local quality.
 
Due to the low alignment quality, the model quality was so low, that the only result was a plot of the local quality.
Line 594: Line 609:
 
|[[Image:ARSA_SWISSMODEL_local_quality_2VQR.png|thumb | estimation of local model quality ]]
 
|[[Image:ARSA_SWISSMODEL_local_quality_2VQR.png|thumb | estimation of local model quality ]]
 
|}
 
|}
  +
  +
  +
=== Modelling with single template with alignment ===
  +
  +
==== 1P49 ====
  +
SWISS-MODEL was not able to build a model based on our alignment.
  +
  +
==== 2VQR ====
  +
  +
<nowiki>
  +
TARGET 4 PRSLLLA LAAGLAVARP PNIVLIFADD LGYGDLGCYG HPSSTTPNLD
  +
2vqrA 3 kknvlli vvdqwradfv phvlr--adg -kidfl---- ----ktpnld
  +
  +
TARGET sssss ss sss hhhh sss sss hhhh
  +
2vqrA sssss ss hhhhh hh hhhh
  +
  +
  +
TARGET 51 QLAAGGLRFT DFYVPVSLCT PSRAALLTGR LPVRMGMYPG VLVPSSRGGL
  +
2vqrA 39 rlcregvtfr nhvttcvpcg paraslltgl ylmnhravqn t-vpldqrhl
  +
  +
TARGET hhhhh ssss ssss hh hhhhhhh hhhh
  +
2vqrA hhhhh ssss ssss hh hhhhhhh hhhh
  +
  +
  +
TARGET 101 PLEEV--TVA EVLAARGYLT GM-------- -------AGK WHLGVGPEGA
  +
2vqrA 88 nlgkalrgvg ydpaligytt tvpdprttsp ndprfrvlgd lmdgfhpvga
  +
  +
TARGET ssss sss
  +
2vqrA hhhhhh ssss hh sss
  +
  +
  +
TARGET 134 FLPPHQGFHR FL------GI PYSHDQGPC- ---------- ---QNLTCFP
  +
2vqrA 138 fepnmegyfg wvaqngfdlp ehrpdiwlpe gedavagatd rpsripkefs
  +
  +
TARGET hhhhh hh hhh
  +
2vqrA hhhhh hhhh
  +
  +
  +
TARGET 164 PATPCDGGCD ---QGLVPIP LLANLSV-EA QPPWLPGLEA R-------YM
  +
2vqrA 188 dstffteral tylkgrdgkp fflhlgyyrp hppfvasapy hamyrpedmp
  +
  +
TARGET hhhhhhhhh ssssss
  +
2vqrA hhhhhhhhhh hhhhhh sssssss
  +
  +
  +
TARGET 203 AFAHDLMADA QRQDRPFFLY YASHHTHY-- ---PQFSGQS FAE---RSGR
  +
2vqrA 238 apiraanpdi eaaqhplmkf yvdsirrgsf fqgaegsgat ldeaelrqmr
  +
  +
TARGET hhh hhh hhhh hhhh hh
  +
2vqrA hhh hhh hhhh hhhhsss sss hhhhhhhh
  +
  +
  +
TARGET 245 GPFGDSLMEL DAAVGTLMTA IGDLGLLEET LVIFTADNGP ETMRMSRGGC
  +
2vqrA 288 atycglitev ddclgrvfsy ldetgqwddt liiftsdhg- eql-----gd
  +
  +
TARGET hhhhhhhhhh hhhhhhhhhh hh h ssssssssss
  +
2vqrA hhhhhhhhhh hhhhhhhhhh hh h sssssss
  +
  +
  +
TARGET 295 SGLLRCGKGT TYEGGVREPA LAFWPGHI-A PGVTH-ELAS SLDLLPTLAA
  +
2vqrA 332 hhll--gkig yndpsfripl vikdagenar agaiesgfte sidvmptild
  +
  +
TARGET hhsss s sss sss ss shhhhhhhhh
  +
2vqrA hhsss s sss sss ss shhhhhhhhh
  +
  +
  +
TARGET 343 LAGAPLPNVT LDGFDLSPLL LGTGKSPRQS LFFYPSYPDE VRGVFAVRTG
  +
2vqrA 380 wlggkiph-a cdglsllpfl s-egr-p-qd wrtelhyeyd frdvy---ys
  +
  +
TARGET hh hhhh ssssss s
  +
2vqrA hh hhh ssssss s
  +
  +
  +
TARGET 393 KYKAHFFTQG S-AHSDTTAD PACHASSSLT AHEPPLLYDL SKDPGENYNL
  +
2vqrA 423 epqs-flglg mndcslcviq derykyvhfa al-pplffdl rhdpneftnl
  +
  +
TARGET sssssss s ssssssss sssss
  +
2vqrA hh hh sssssss s ssssssss sssss
  +
  +
  +
TARGET 442 LGGVAGATPE VLQALKQLQL LKAQLDAAVT FGPSQVARGE DPALQICCHP
  +
2vqrA 471 addpayaalv rdyaqkalsw rlkhadrtlt ----hyrsg- -pe-------
  +
  +
TARGET hhhh hhhhhhhhhh hhh sssss sss ssss
  +
2vqrA hhh hhhhhhhhhh hhh sssss
  +
  +
  +
TARGET 492 GCTPRPACCH CPDPH
  +
2vqrA 508 glser----- ---sh-
  +
  +
TARGET ssss
  +
2vqrA sssss
  +
</nowiki>
  +
  +
===== Results =====
  +
  +
{| class="centered"
  +
|[[Image:ARSA_SWISSMODEL_2VQRalignment_QMEAN_plot.png|thumb| Estimated model quality in comparison to nonredundant PDB]]
  +
|[[Image:ARSA_SWISSMODEL_2VQRalignment_QMEAN_density_plot.png|thumb|Estimated density of model quality]]
  +
|[[Image:ARSA_SWISSMODEL_2VQRalignment_QMEAN_slider.png|thumb| Z-Score by category]]
  +
|[[Image:ARSA_SWISSMODEL_local_quality_2VQRalignment.png|thumb | estimation of local model quality ]]
  +
|[[Image:ARSA_SWISSMODEL_2VQRalignment.jpg‎|thumb | model colored by residue error ]]
  +
|}
  +
  +
As one can see in the images above, the model was even worse than the one built with 2VQR without alignment.
  +
  +
=== Discussion ===
  +
  +
To compare the models, TM-score and RMSD were calculated. For the calculation of the TM-score, [http://zhanglab.ccmb.med.umich.edu/TM-score/ TM-Score] was used, for the calculation of the RMSD, [http://www.ebi.ac.uk/Tools/dalilite/index.html DaliLite] was used.
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!colspan="4"| prediction by SWISS-MODEL
  +
|-
  +
! template
  +
! TM-score
  +
! RMSD
  +
|-
  +
| none || 0.9977 || 0.4
  +
|-
  +
| 1P49 || 0.2037 || 2.2
  +
|-
  +
| 2VQR || no common residue || no calculation possible
  +
|-
  +
| 1P49 ali || no model || no model
  +
|-
  +
| 2VQR ali || 0.4732 || 3.4
  +
|-
  +
|}
  +
  +
As expected, the model that was built with a human Arylsulfatase A is by far the best model. But if we imagine that no resolved structure were available the models are all not really good. With 1P49 the RMSD is close to the acceptable range while the TM-score is so low that it does not seem to make sense to use this predicted structure for anything while structure predicted with 2VQR with given alignment has the highest TM-score in our non-self-hits but a RMSD that is above the acceptable range. So it seems as if there are only templates that are too close or too far away to reasonably model the structure with it.
  +
  +
== 3D-Jigsaw ==
  +
  +
Without knowing the real TM-score and RMSD, we decided to build a combined structure out of the following predictions:
  +
  +
* modeller prediction without structure information based on 1P49
  +
* modeller prediction with structure information based on 1P49
  +
* iTasser prediction with single template based on 1P49
  +
* SWISS-MODEL prediction with single template based on 1P49
  +
  +
In the following the models are evaluated using RMSD and TM score and visually superimposed with the real structure using pymol:
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
| ''' Model 1 '''
  +
| ''' Model 2 '''
  +
| ''' Model 3 '''
  +
| ''' Model 4 '''
  +
| ''' Model 5 '''
  +
|-
  +
| [[File:jigsaw_1auk_model1.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol]]
  +
| [[File:jigsaw_1auk_model2.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol]]
  +
| [[File:jigsaw_1auk_model3.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol]]
  +
| [[File:jigsaw_1auk_model4.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol]]
  +
| [[File:jigsaw_1auk_model5.png | 200px | center | thumb | real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol]]
  +
|-
  +
|}
  +
  +
To compare the models, TM-score and RMSD were calculated. For the calculation of the TM-score, [http://zhanglab.ccmb.med.umich.edu/TM-score/ TM-Score] was used, for the calculation of the RMSD, [http://www.ebi.ac.uk/Tools/dalilite/index.html DaliLite] was used.
  +
  +
{| border="1" style="text-align:center; border-spacing:0;"
  +
!colspan="3"| prediction by 3D-jigsaw
  +
|-
  +
! model
  +
! TM-score
  +
! RMSD
  +
|-
  +
| Model 1 || 0.9120 || 2.3
  +
|-
  +
| Model 2 || 0.9106 || 2.3
  +
|-
  +
| Model 3 || 0.9121 || 2.3
  +
|-
  +
| Model 4 || 0.9121 || 2.1
  +
|-
  +
| Model 5 || 0.9121 || 1.1 – 1.3
  +
|-
  +
|}
  +
  +
== Comparison of the methods ==
  +
  +
The RMSDs of all methods range from 2 to 3.5 Angstrom. This suggests, that all methods produced accteptable models, but none of them were outstanding (except of the two models for iTasser and SWISS-MODEL, which are better due to the self-hit during the template recognition). <br>
  +
The TM-scores for all models vary more than the RMSDs. An RMSD of < 0.5 suggests that the model is not good and a TM score of >= 0.5 suggests that the model exhibits a very similar fold as the original structure. The TM scores for Modeller range from 0.4 - 0.8. Using multiple templates or manual modifications of the alignments decreased the accuracy. Surprisingly all iTasser (excluding self-hit) models yield a very low TM-score (around 0.4), despite the RMSDs are very similar to the predictions of modeller. Also the SWISS-MODEL TM-scores are low compared to that of iTasser (all below 0.5). <br>
  +
Regarding to the RMSD scores, no method performs substantially better than another one. Regarding to the TM scores Modeller performs best and regarding to both measures also modeller outperforms iTasser and SWISS-MODEL. For all methods, 1P49 was the best template, thus we included the models generated when using 1P49 to 3D-Jigsaw. 3D-Jigsaw generated 5 models; all with very high TM scores (around 0.9) and low RMSDs (1-2 Angstrom). Thus 3D-Jigsaw was able to substantially improve the models generated by our previously applied methods. The best model is model 5 with an RMSD of 1.1-1.3 and a TM score of 0.91.
   
 
== References ==
 
== References ==
 
<references />
 
<references />
  +
  +
[[Category : Metachromatic_Leukodystrophy 2011]]

Latest revision as of 13:58, 29 March 2012

Proteins used as templates

From the previous alignment TASK (see Alignment TASK), we took four proteins which might serve as suitable templates for the modeling. The proteins are depicted in the below table. The information about active and binding sites were obtained from Uniprot and will serve as additional information for the manual modification of the alignments in order to try to improve the accuracy of the models. Interestingly, our potential templates - identified by the database searches - contain all homologs with known structure, regarding to HSSP.

SeqIdentifier Seq Identity (from TASK 2) source Protein function True homolog (HSSP) Seq Identity (pairw. ali.) Active site Substrate binding site Metal binding site
1P49 39.0% Homo Sapiens Steryl-Sulfatase yes 31.9% 136 333, 459 35, 36, 75, 342, 343,
1FSU 28.0% Homo Sapiens Arylsulfatase B yes 26.5% 147 145, 242, 318 53, 54, 91, 300, 301
2VQR 20.0% Rhizobium leguminosarum Monoester Hydrolase no 20.3% not avail. not avail. 12, 57, 324, 325
3ED4 32.0% Escherichia coli Arylsulfatase yes 27.7% not avail. not avail. not avail.
ARSA (1AUK) - Homo Sapiens - - - 125 123, 150, 229, 302 29, 30, 69, 281, 282


Furthermore, we used the HHsearch webserver to see if we could extend this list towards even more remotely related sequences. But the results did not yield significant hits that were more distantly related than 2VQR, so we decided to stick with the above proteins for modelling with Modeller.

Modeller

Modeller is a program for comparative modeling of the 3D structure of a protein with unknown structure. It provides different methods for the calculation of the initial target-template alignment. Given the alignments, Modeller generates the backbone and optimizes a probablility function reflecting spatial restraints. The input alignments can be either pairwise sequence alignment - for single template modeling - or multiple sequence alignments - for multiple template modeling. <ref>AA. Sali, T.L. Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815, 1993</ref>
In this section, we use Modeller to model the 3D structure of ARSA and compare the results to the known structure from PDB. We wrote a tutorial ( Using Modeller for TASK 4 ) comprising all necessary steps in the following analysis. It provides generic scripts and example code and executes all methods using default parameters.


Single template modelling

In order to predict the structure using a single template structure, modeller needs pairwise sequence alignments in PIR format. Modeller provides two different methods to calculate pairwise sequence alignments. alignment.malign() uses classical dynamic programming to align two sequences. alignment.alig2dn() also uses a dynamic programming approach, but includes structural information to optimize the alignment (e.g. tries to place gaps outside of secondary structure elements). We applied both alignment methods and created eight pairwise sequence alignments of the above templates with the target. Then we modelled the structure with default parameters using the automodel() class. The scripts used for this purpose can be seen in our protocol: Using Modeller for TASK 4 .

Next, we calculated RMSD and TM scores of the models to get a first impression on how much the models deviate from the original structure. In order to calculate the TM-scores, we downloaded the TMscore FORTRAN source code from http://zhanglab.ccmb.med.umich.edu/TM-score/ and compiled it using


gfortran -static -O3 -ffast-math -lm -o TMscore TMscore.f

The TM-scores were calculated as follows:


./TMscore MODEL.pdb REAL_STRUCTURE.pdb

In order to calculate the RMSD scores, we used DaliLite

The results are depicted below:

PDB Identifier TM-score RMSD
Dynamic Programing with structural information
1P49 0.7960 2.0
2VQR 0.4825 3.5
1FSU 0.7146 1.5 - 2.3
3ED4 0.3881 1.9 - 2.8
Dynamic Programing without structural information
1P49 0.7731 2.2
2VQR 0.3183 3.5
1FSU 0.7223 2.7
3ED4 0.3122 4.1

The RMSD score measures the root mean square deviation of the two structures. This is a straightforward measure to assess the similarity of our models. The TM-score also measures the deviation of the two structures from each other, but includes weight, i.e. the distances between close residues get a higher weight than the distances between distant residues. <ref>Y. Zhang, J. Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710 </ref>
The TM-score is more sensitive in detecting the same fold. Assume two proteins for which 80 % of the residues lie in a very similar fold, but the remaining 20 % of the residues fold completely different. One would consider these two proteins still as similar, but the RMSD might become very large regarding to the high distances in these 20 %.
Further on, we visualised the models using pymol. We loaded all pairs of model and real structure into the program and performed a structural alignment to superimpose and compare them visually. The pymol commands and the images are shown below:


align 1AUK, MODEL
hide all
show cartoon
# select color of modelled structure via graphical interface
ray
cmd.png("MODEL.png")

Alignment method 1P49 2VQR 1FSU 3ED4
Classical
Dynamic Programming
real structure of 1AUK and structure of 1AUK modelled by modeller with template 1P49, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with template 2VQR, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with template 1FSU, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with template 3ED4, visualized in Pymol
Dynamic Programming
with structural information
from the template
real structure of 1AUK and structure of 1AUK modelled by modeller with template 1P49, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with template 2VQR, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with template 1FSU, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with template 3ED4, visualized in Pymol

3ED4

Despite its evolutionary relationship, 3ED4 is a very poor template structure for modeling. Thus, we considered the structure of 3ED4 in more detail to figure out the reason for this behaviour.
First of all, we looked at the PDB entry of the protein and found out that 3ED4 consists of 4 different chains. Next, we plotted 3ED4 coloring each of the four chains (Figure is depicted below).

real structure of 3ED4 visualized in Pymol

This visual inspection let us speculate that each of the individual chains structurally resemble our target protein ARSA. Thus we decided to use each individual chain for modeling. We again computed TM- and RMSD scores and used pymol to visualise the model together with the real structure of ARSA.

chain TM-score RMSD
3ED4A 0.7268 2.9
3ED4B 0.7251 2.8
3ED4C 0.6518 2.8
3ED4D 0.7303 2.8


Alignment method 3ED4A 3ED4B 3ED4C 3ED4D
Dynamic Programming
with structural information
from the template
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4A, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4B, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4C, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by modeller with tamplate 3ED4D, visualized in Pymol

Modification of Alignments

Using 1P49 as template structure for the modeling process yielded the best results thus we decided to manually modify this alignment to see if we can improve the model. We created two different modified alignments. For the first alignment we only made sure that active site, substrate binding sites and metal binding sites were aligned. For the second modified alignment we additionally removed gaps in secondary structure elements. For modification of the initial alignments, we used JalView. Images of the initial and the two modified alignments are depicted on the right border. Altogether, we performed the following changes:

Initial alignment of 1AUK and 1P49.
  • The gap between residue 74 and 75 in 1P49 was removed to align metal-binding site 75 with metal-binding site 69. This also induced the alignment of both active sites, which were not aligned in the initial alignment. The region around the active site is well conserved between both enzymes. However, this conservation seems to be shifted, thus the amino acids at the active sites differ and an alignment of both sites decreases the alignment of conserved residues within this region. An image of the structural model together with the real structure is depcited below.
real structure of 1AUK and structure of 1AUK modelled by modeller with the modified template alignment (above change) to 1P49, visualized in Pymol
Modified alignment 1 (active and binding sites are aligned) of 1AUK and 1P49.

After this change we caluclated one model. The TM-score of this model drops to 0.6940. This might be due to the fact that the amino acids are conserved in the region around the active sites, but the alignment of the active sites thmeselves decreases the alignment quality because the conservation is somehow shifted by one amino acid (i.e. the active site in 1AUK/ARSA is identical to the amino acid before the active site in 1P49).
Normally, one does not have information about the secondary structure of the target sequence, but in our case, this information was available and thus we modified the alignments such that gaps within the secondary structure were avoided. There were no gaps in secondary structure elements of 1P49. This is due to the fact that we modified the output of align2d() , which already uses secondary structure information to place gaps outside of these regions.

  • The gap between residue 154 and 155 was moved out of the beta strand between residues 152 and 153.
  • The gap between residues 190-191 was moved out of beta strand between residues 191-192.
  • All gaps within the helix from residue 197-214 were moved out of the helix (at the right border).
  • The gap between 290-291 was moved to the right end of the helix.
Modified alignment 2 (active and binding sites are aligned and gaps are removed from secondary structure) of 1AUK and 1P49.


Surprisingly, the TM-score was decreased even more to 0.5561. An image of the structural model together with the real structure is depcited below.


real structure of 1AUK and structure of 1AUK modelled by modeller with the modified template alignment to 1P49, visualized in Pymol

Multiple Template Modeling

For the multiple template model, we first needed to create a multiple sequence alignment of the templates and the target. We used the salign() function, which uses (like align2d() ) structural information from the template to place gaps in coil regions. Then we calculated the model using this MSA. The detaild workflow is agian described in ( Using Modeller for TASK 4 ). We calculated three models:

  • Model 1 was calculated from a multiple sequence alignment (MSA) of ARS A and the four proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 2VQR, 3ED4D.
  • Model 2 was calculated from a MSA of ARS A and the three proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1FSU, 1P49, 3ED4D.
  • Model 3 was calculated from a MSA of ARS A and the two proteins/chains, which yielded the best TM-score/RMSD in the single template modeling: 1P49, 3ED4D.

Below are depicted the TM- and RMSD scores, as well as images of the structural model and the real structure of ARS A.


Model 1 Model 2 Model 3
real structure of 1AUK and structure of 1AUK modelled (Model 1), visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled (Model 2), visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled (Model 3), visualized in Pymol
PDB Identifier TM-score RMSD
model1 0.5409 2.4
model2 0.6701 2.4 - 3.1
model3 0.6819 3.2


Modification of the MSA

Analogously to the previous section, we modified the alignment of Model 2 such that all active and binding sites were aligned. Images of the alignments and the structural model vs. the real structure are depicted on the right, below respectively.

Initial MSA of Model 2.
Modified MSA (active and binding sites are aligned) of Model 2.
Modified Model 2.

The TM-score of the new model drops to 0.5685.

Discussion

Modeller yields good results. The initial alignments are very important for the quality of the resulting model. When using the 2d-alignment method for single template modeling, we could improve the prediction accuracy, regarding to both measures (RMSD, TMscore).
Surprisingly, the multiple template models all perform worse than the three best single template models. Also the manual modification of the alignments results in a decreased modelling accuracy. This could be due to the fact that the automatic alignments are optimized over the whole alignment while the manual modified alignments are optimized in one region which seems to lead to a decrease of quality in other areas of the alignment.

iTasser

iTasser is a server to model 3D-structure by homology. Also function-predctions are provided. As Zhang-Server iTasser participated in CASP7, CASP8 and CASP9 and was ranked best in CASP7 and CASP8 and ranked second in CASP9. iTasser uses a threading-approach to build the models. Unaligned regions (mainly loops) are modelled ab initio. <ref>Ambrish Roy, Alper Kucukural, Yang Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, vol 5, 725-738 (2010)</ref><ref>Yang Zhang. Template-based modeling and free modeling by I-TASSER in CASP7. Proteins, vol 69 (Suppl 8), 108-117 (2007)</ref>

The confidence of a model is measured with the C-score which is based on the significance of the template alignments and the convergence parameters of the structure assembly simulations. The typical range for the C-score is [-5,2], where a higher C-score means higher confidence in the model. <ref>http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S72828/cscore.txt</ref>

Modelling without template

model 1 for ARSA by iTasser, C-score: 0.958
model 2 for ARSA by iTasser, C-score: 0.359
model 3 for ARSA by iTasser, C-score: -1.322
model 4 for ARSA by iTasser, C-score: -2.267
model 5 for ARSA by iTasser, C-score: -0.428

As one can see, model 1 is the model with the highest confidence. Model 1 has a estimated TM-score of 0.84 ± 0.08 and an estimated RMSD of 5.3 ± 3.4Å.

To compare the models, TM-score and RMSD to 1AUK were calculated:

prediction by iTasser without template
model TM-score RMSD
model1 0.9971 0.5
model2 0.9972 0.5
model3 0.9891 0.8
model4 0.9220 1.8
model5 0.9972 0.5

As one can see, three of the models are very good: model1, model2 and model5. They all have a rather high TM-score while having a very low RMSD. So going with the confidences assigned by iTasser seems to be a good way as these three models also were the models with the highest C-scores.

Modelling with single template

To specify a single template, one can specify a pdb-id or upload a pdb-file. Due to the long runtime of iTasser we only built a model based on 1P49. We chose 1P49 as it was the best template in the analyses before.

model 1 for ARSA by iTasser, C-score: 1.563
model 2 for ARSA by iTasser, C-score: -1.121
model 3 for ARSA by iTasser, C-score: -1.507
model 4 for ARSA by iTasser, C-score: -1.478
model 5 for ARSA by iTasser, C-score: -0.795

As one can see, model 1 is the model with the highest confidence. Model 1 has a estimated TM-score of 0.93 ± 0.06 and an estimated RMSD of 4.0 ± 2.7Å.


To compare the models, TM-score and RMSD to 1AUK were calculated:

prediction by iTasser with 1P49 as template
model TM-score RMSD
model1 0.2059 2.1
model2 0.2057 2.1
model3 0.2063 2.0
model4 0.2065 1.9
model5 0.2052 2.2

Here the C-score doesn't reflect the best model. The best model is model 4. But even model4 has a TM-score that is not really what we desire and a RMSD that is ok.

Discussion

In this case, modelling without template seems to be the way to go. But that could be due to the fact that the option to exclude hits above a certain sequence similarity doesn't seem to work. So the great result could be due to a self-hit (or a very closely related sequence).

SWISS-MODEL

SWISS-MODEL is a online tool to model 3D-structure <ref>Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201.</ref><ref>Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T (2009). The SWISS-MODEL Repository and associated resources. Nucleic Acids Research. 37, D387-D392. </ref><ref>Peitsch, M. C. (1995) Protein modeling by E-mail Bio/Technology 13: 658-660.</ref>. There are three different modes available: automated mode, alignment mode and project mode. We only used the automated and the alignment mode.

Modelling without template

In automated mode without template, suitable templates are selected by a BLAST-run via an e-value treshold. It is used by pasting a protein sequence or a UniProt AC code into a text-field.

Template

SWISS-MODEL identified 1N2LA as best template. The name of the PDB-Entry of 1N2LA is "Crystal structure of a covalent intermediate of endogenous human arylsulfatase A". So the result should be very good but not really significant, as we are using a human Arylsulfatase A as a template. As expected, the Alignment quality was very high:

TARGET    19      RPPNIVLI FADDLGYGDL GCYGHPSSTT PNLDQLAAGG LRFTDFYVPV
1n2lA     19      rppnivli faddlgygdl gcyghpsstt pnldqlaagg lrftdfyvpv
                                                                      
TARGET               sssss ss                    hhhhhhhh   ssss sss  
1n2lA                sssss ss                    hhhhhhhh   ssssssss  


TARGET    67    SLCTPSRAAL LTGRLPVRMG MYPGVLVPSS RGGLPLEEVT VAEVLAARGY
1n2lA     67    sl-tpsraal ltgrlpvrmg mypgvlvpss rgglpleevt vaevlaargy
                                                                      
TARGET               hhhhh hh    hhh          ss s          hhhhhhhh  
1n2lA               hhhhhh hh    hh           ss s          hhhhhhhh  


TARGET    117   LTGMAGKWHL GVGPEGAFLP PHQGFHRFLG IPYSHDQGPC QNLTCFPPAT
1n2lA     117   ltgmagkwhl gvgpegaflp phqgfhrflg ipyshdqgpc qnltcfppat
                                                                      
TARGET          ssssssss    sss sss   hhh   ssss s            sss    s
1n2lA           ssssssss    sss sss   hhh   ssss s            sss    s


TARGET    167   PCDGGCDQGL VPIPLLANLS VEAQPPWLPG LEARYMAFAH DLMADAQRQD
1n2lA     167   pcdggcdqgl vpipllanls veaqppwlpg learymafah dlmadaqrqd
                                                                      
TARGET          ss             ssss s ss          hhhhhhhhh hhhhhhhh  
1n2lA           ss             ssss s ss          hhhhhhhhh hhhhhhhh  


TARGET    217   RPFFLYYASH HTHYPQFSGQ SFAERSGRGP FGDSLMELDA AVGTLMTAIG
1n2lA     217   rpfflyyash hthypqfsgq sfaersgrgp fgdslmelda avgtlmtaig
                                                                      
TARGET           ssssssss                      h hhhhhhhhhh hhhhhhhhhh
1n2lA            ssssssss                      h hhhhhhhhhh hhhhhhhhhh


TARGET    267   DLGLLEETLV IFTADNGPET MRMSRGGCSG LLRCGKGTTY EGGVREPALA
1n2lA     267   dlglleetlv iftadngpet mrmsrggcsg llrcgkgtty eggvrepala
                                                                      
TARGET          hh    ssss sss                                 sss sss
1n2lA           h     ssss sss                              hhhsss sss


TARGET    317   FWPGHIAPGV THELASSLDL LPTLAALAGA PLPNVTLDGF DLSPLLLGTG
1n2lA     317   fwpghiapgv thelassldl lptlaalaga plpnvtldgf dlsplllgtg
                                                                      
TARGET          s       ss s      hhh hhhhhhhh                hhhhh   
1n2lA           s       ss s   ssshhh hhhhhhhh                hhhhh   


TARGET    367   KSPRQSLFFY PSYPDEVRGV FAVRTGKYKA HFFTQGSAHS DTTADPACHA
1n2lA     367   ksprqslffy psypdevrgv favrtgkyka hfftqgsahs dttadpacha
                                                                      
TARGET              sssss             ssssssssss ssss                 
1n2lA               sssss             ssssssssss ssss                 


TARGET    417   SSSLTAHEPP LLYDLSKDPG ENYNLLGGVA GATPEVLQAL KQLQLLKAQL
1n2lA     417   sssltahepp llydlskdpg enynllg--- gatpevlqal kqlqllkaql
                                                                      
TARGET              sss    sssss                    hhhhhhh hhhhhhhhhh
1n2lA               sss    sssss                    hhhhhhh hhhhhhhhhh


TARGET    467   DAAVTFGPSQ VARGEDPALQ ICCHPGCTPR PACCHCPD             
1n2lA     467   daavtfgpsq vargedpalq icchpgctpr pacchcpd-            
                                                                      
TARGET          hhh        hh sss                                     
1n2lA           hhh        hh sss   

Results

Estimated model quality in comparison to nonredundant PDB
Estimated density of model quality
Z-Score by category
estimation of local model quality
model colored by residue error

As one can see in the images above, the model quality is quite good, with uncertainties especially in the loop-regions. The result is not really surprising, as 1N2L is the structure of a human Arylsulfatase A.

Modelling with single template without alignment

It is possible to specify a template in automated mode by specifing a PDB-ID or by uploading a pdb-file.

1P49

1P49 has 39% sequence identity with human arylsulfatase which is the highest identity in all our templates. With this low sequence identity the alignment quality was rather poor:

TARGET    1         RPPNIV LIFADDLGYG DLGCYGHPSS TTPNLDQLAA GGLRFTDFYV
userX     23    aa--srpnii lvmaddlgig dpgcygnkti rtpnidrlas ggvkltqhla
                                                                      
TARGET                 sss ssss                    hhhh       ssssssss
userX                  sss ssss                    hhhh       sss ssss


TARGET    47    PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SS-----RGG LPLEEVTVAE
userX     71    a-spltpsra afmtgrypvr sgmaswsrtg vflftassgg lptdeitfak
                                                                      
TARGET                 hhh hhh     hh h                            hhh
userX                 hhhh hhhh    hh h                            hhh


TARGET    92    VLAARGYLTG MAGKWHLGVG PEG----AFL PPHQGFHRFL GIPYSHDQGP
userX     121   llkdqgysta ligkwhlgms chsktdfchh plhhgfnyfy gisltnlrdc
                                                                      
TARGET          hhhh   sss sssss                        sss ss        
userX           hhhh   sss sssss                        sss ss        


TARGET    138   CQNLT-CFPP ATPCDG---- ---------- ---------- ---------G
userX     171   kpgegsvftt gfkrlvflpl qivgvtlltl aalnclgllh vplgvffsll
                                                                      
TARGET                  hh hhhhh                                      
userX                  hhh hhhh   hhh hhhhhhhhhh hhhhhh        hhhhhhh


TARGET    154   CD--QGLVPI PLLANLSVEA QP-------- ----PWLPGL EARYMAFAHD
userX     221   flaaliltlf lgflhyfrpl ncfmmrnyei iqqpmsydnl tqrltveaaq
                                                                      
TARGET                hhhh hhhhhhh                           hhhhhhhhh
userX           hhhhhhhhhh hhhhhhhhhh    ssss ss sss      h hhhhhhhhhh


TARGET    190   LMADAQRQDR PFFLYYASHH THYPQFSGQS FAERSGRGPF GDSLMELDAA
userX     271   fiq--rntet pfllvlsylh vhtalfsskd fagksqhgvy gdaveemdws
                                                                      
TARGET          hh         ssssssss                      hh hhhhhhhhhh
userX           hhh        ssssssss                      hh hhhhhhhhhh


TARGET    240   VGTLMTAIGD LGLLEETLVI FTADNGPETM RM-----SRG GCSGLLRCGK
userX     319   vgqilnllde lrlandtliy ftsdqgahve evsskgeihg gsngiykggk
                                                                      
TARGET          hhhhhhhhhh h    sssss ssss                            
userX           hhhhhhhhhh h    sssss ssss       sss   sss            


TARGET    285   GTTYEGGVRE PALAFWPGHI -APGVTHELA SSLDLLPTLA ALAGAPLPN-
userX     369   annweggirv pgilrwprvi qagqkidept snmdifptva klagaplped
                                                                      
TARGET               hhsss  ssss         sss   s sshhhhhhhh hhh       
userX                  sss  ssss               s sshhhhhhhh hhh       


TARGET    333   VTLDGFDLSP LLLGTGKSPR QSLFFYPS-- YPDEVRGVFA VRTGKYKAHF
userX     419   riidgrdlmp llegksqrsd heflfhycna ylnavrwhpq nstsiwkaff
                                                                      
TARGET                  hh hhh          hhhhhh   h              ssssss
userX                   hh hhh         sssssss   ssssssss       ssssss


TARGET    381   FTQGSAHSDT TADPACHASS SLTAHEPPLL YDLSKDPGEN YNLLGGVAGA
userX     469   ftpnfnpvcf athvcfcfgs yvthhdppll fdiskdprer nplt----pa
                                                                      
TARGET          ss                      sss   ss sss          sss sss 
userX           ss                      sss   ss sss                  


TARGET    431   TPEVLQAL-K QLQLLKAQLD AAVTFGPSQV A---RGEDPA LQICCHPGCT
userX     519   seprfyeilk vmqeaadrht qtlpevpdqf swnnflwkpw lqlccp---s
                                                                      
TARGET          hh   hh     hhhhhhhhh                                 
userX               hhh    hhhhhhhhhh   

                              
TARGET    477   PRPACC ---                                            
userX     566   tglscqcdre                                            
                                                                      
TARGET                                                                
userX                 sss


Results
Estimated model quality in comparison to nonredundant PDB
Estimated density of model quality
Z-Score by category
estimation of local model quality
model colored by residue error

As one can see in the images above, the model quality is not really good, due to the fact that the template seems to be too far related

2VQR

2VQR has 20% sequence identity with human arylsulfatase which is the lowest identity in all our templates. With this even lower sequence identity the alignment quality was really poor:

TARGET    2     PPNIVLIFAD DLGYGDLG-- --CYGHPS-S TTPNLDQLAA GGLRFTDFYV
userX     3     kknvllivvd qwradfvphv lradgkidfl ktpnldrlcr egvtfrnhvt
                                                                      
TARGET            sssssss                          hhhhhhhh h ssssssss
userX             sssssss         hhh hhhh         hhhhhhhh h ssssssss


TARGET    47    PVSLGTPSRA ALLTGRLPVR MGMYPGVLVP SSRGGLPLEE VTVAEVLAAR
userX     53    tcvpxgpara slltglylmn hravqntv-- ----pldqrh lnlgkalrgv
                                                                      
TARGET              hhhhhh hhh    hhh h                       hhhhhhhh
userX                 hhhh hhh    hhh h                       hhhhhh  


TARGET    97    GYLTGMAGKW HLGVGPEGAF LPPHQGFHRF LGIPYSHDQG PCQ-----NL
userX     97    gydpaligyt ttvpdprtt- spndprfrvl gdlmdgfhpv gafepnmegy
                                                                      
TARGET             ssss                    hh           sss          h
userX              ssss                    hh           sss        hhh


TARGET    142   TCFPPATPCD GGCD-----Q GLVPIPLLAN LSVEAQPPWL PGLEARYMAF
userX     146   fgwvaqngfd lpehrpdiwl pegedavaga tdrpsripke fsdstffter
                                                                      
TARGET          hhhhhh                                        hhhhhhhh
userX           hhhhhh                                        hhhhhhhh


TARGET    187   AHDLMADAQR QDRPFFLYYA SHHTHYPQFS GQSFAERSGR ----------
userX     196   altylkg--r dgkpfflhlg yyrphppfva sapyhamyrp edmpapiraa
                                                                      
TARGET          hhhhhh         ssssss s                               
userX           hhhhhhh  h     ssssss s                               


TARGET    227   ---------- ---------- ---------- ---------- ----GPFGDS
userX     244   npdieaaqhp lmkfyvdsir rgsffqgaeg sgatldeael rqmratycgl
                                                                      
TARGET                                                            hhhh
userX            hhhhhh    hhhhhhhhss s        s ss    hhhh hhhhhhhhhh


TARGET    233   LMELDAAVGT LMTAIGDLGL LEETLVIFTA DNGPETMRMS RGGCSGLLRC
userX     294   itevddclgr vfsyldetgq wddtliifts dhgeqlgdhh ll--------
                                                                      
TARGET          hhhhhhhhhh hhhhhh   h     ssssss s                    
userX           hhhhhhhhhh hhhhhh   h     ssssss s                    


TARGET    283   GKGTTYEGGV REPALAFWPG HI--APGVTH ELASSLDLLP TLAALAGAPL
userX     336   gkigyndpsf riplvikdag enaragaies gftesidvmp tildwlggki
                                                                      
TARGET                 hhs ss ssss          sss    ssshhhhh hhhhhh    
userX                  hhs ss ssss          sss    ssshhhhh hhhhhh    


TARGET    331   PNVTLDGFDL SPLLLGTGKS PRQ-SLFFYP SYP------- -------DEV
userX     386   ph-acdglsl lpflsegrpq dwrtelhyey dfrdvyysep qsflglgmnd
                                                                      
TARGET                     hhh             sssss                      
userX                      hhh             sssss ss         hhhh      


TARGET    366   RGVFAVRTGK YKAHFFTQGS AHSDTTADPA CHASSSLTAH EPPLLYDLSK
userX     435   cslcviqder ykyvhfaa-- ---------- ---------- lpplffdlrh
                                                                      
TARGET           sssssss s sssssss  s ss                 ss s  sssss  
userX           ssssssss s sssssss                             sssss  


TARGET    416   DPGENYNLLG GVAGATPEVL QAL-KQLQLL KAQLDAA -- ----------
userX     463   dpneftnlad d--payaalv rdyaqkalsw rlkhadrtlt hyrsgpegls
                                                                      
TARGET                           hhhh hh    hhhh hhh                  
userX                             hhh hhhhhhhhhh hhh        sssss  sss


TARGET          ----                                                  
userX     511   ersh                                                  
                                                                      
TARGET                                                                
userX           ss


Results

Due to the low alignment quality, the model quality was so low, that the only result was a plot of the local quality.

estimation of local model quality


Modelling with single template with alignment

1P49

SWISS-MODEL was not able to build a model based on our alignment.

2VQR

TARGET    4        PRSLLLA LAAGLAVARP PNIVLIFADD LGYGDLGCYG HPSSTTPNLD
2vqrA     3        kknvlli vvdqwradfv phvlr--adg -kidfl---- ----ktpnld
                                                                      
TARGET               sssss ss sss     hhhh            sss   sss   hhhh
2vqrA                sssss ss         hhhhh  hh                   hhhh


TARGET    51    QLAAGGLRFT DFYVPVSLCT PSRAALLTGR LPVRMGMYPG VLVPSSRGGL
2vqrA     39    rlcregvtfr nhvttcvpcg paraslltgl ylmnhravqn t-vpldqrhl
                                                                      
TARGET          hhhhh ssss ssss    hh hhhhhhh     hhhh                
2vqrA           hhhhh ssss ssss    hh hhhhhhh     hhhh                


TARGET    101   PLEEV--TVA EVLAARGYLT GM-------- -------AGK WHLGVGPEGA
2vqrA     88    nlgkalrgvg ydpaligytt tvpdprttsp ndprfrvlgd lmdgfhpvga
                                                                      
TARGET                       ssss                                sss  
2vqrA            hhhhhh      ssss                   hh           sss  


TARGET    134   FLPPHQGFHR FL------GI PYSHDQGPC- ---------- ---QNLTCFP
2vqrA     138   fepnmegyfg wvaqngfdlp ehrpdiwlpe gedavagatd rpsripkefs
                                                                      
TARGET               hhhhh hh             hhh                         
2vqrA                hhhhh hhhh                                       


TARGET    164   PATPCDGGCD ---QGLVPIP LLANLSV-EA QPPWLPGLEA R-------YM
2vqrA     188   dstffteral tylkgrdgkp fflhlgyyrp hppfvasapy hamyrpedmp
                                                                      
TARGET          hhhhhhhhh             ssssss                          
2vqrA           hhhhhhhhhh hhhhhh     sssssss                         


TARGET    203   AFAHDLMADA QRQDRPFFLY YASHHTHY-- ---PQFSGQS FAE---RSGR
2vqrA     238   apiraanpdi eaaqhplmkf yvdsirrgsf fqgaegsgat ldeaelrqmr
                                                                      
TARGET                 hhh hhh   hhhh hhhh                          hh
2vqrA                  hhh hhh   hhhh hhhhsss         sss     hhhhhhhh


TARGET    245   GPFGDSLMEL DAAVGTLMTA IGDLGLLEET LVIFTADNGP ETMRMSRGGC
2vqrA     288   atycglitev ddclgrvfsy ldetgqwddt liiftsdhg- eql-----gd
                                                                      
TARGET          hhhhhhhhhh hhhhhhhhhh hh   h     ssssssssss           
2vqrA           hhhhhhhhhh hhhhhhhhhh hh   h     sssssss              


TARGET    295   SGLLRCGKGT TYEGGVREPA LAFWPGHI-A PGVTH-ELAS SLDLLPTLAA
2vqrA     332   hhll--gkig yndpsfripl vikdagenar agaiesgfte sidvmptild
                                                                      
TARGET                        hhsss s sss          sss   ss shhhhhhhhh
2vqrA                         hhsss s sss          sss   ss shhhhhhhhh


TARGET    343   LAGAPLPNVT LDGFDLSPLL LGTGKSPRQS LFFYPSYPDE VRGVFAVRTG
2vqrA     380   wlggkiph-a cdglsllpfl s-egr-p-qd wrtelhyeyd frdvy---ys
                                                                      
TARGET          hh               hhhh                ssssss s         
2vqrA           hh               hhh                 ssssss s         


TARGET    393   KYKAHFFTQG S-AHSDTTAD PACHASSSLT AHEPPLLYDL SKDPGENYNL
2vqrA     423   epqs-flglg mndcslcviq derykyvhfa al-pplffdl rhdpneftnl
                                                                      
TARGET                        sssssss s ssssssss      sssss           
2vqrA             hh hh       sssssss s ssssssss      sssss           


TARGET    442   LGGVAGATPE VLQALKQLQL LKAQLDAAVT FGPSQVARGE DPALQICCHP
2vqrA     471   addpayaalv rdyaqkalsw rlkhadrtlt ----hyrsg- -pe-------
                                                                      
TARGET                hhhh hhhhhhhhhh hhh             sssss sss   ssss
2vqrA                  hhh hhhhhhhhhh hhh            sssss            


TARGET    492   GCTPRPACCH CPDPH                                      
2vqrA     508   glser----- ---sh-                                     
                                                                      
TARGET          ssss                                                  
2vqrA           sssss

Results
Estimated model quality in comparison to nonredundant PDB
Estimated density of model quality
Z-Score by category
estimation of local model quality
model colored by residue error

As one can see in the images above, the model was even worse than the one built with 2VQR without alignment.

Discussion

To compare the models, TM-score and RMSD were calculated. For the calculation of the TM-score, TM-Score was used, for the calculation of the RMSD, DaliLite was used.

prediction by SWISS-MODEL
template TM-score RMSD
none 0.9977 0.4
1P49 0.2037 2.2
2VQR no common residue no calculation possible
1P49 ali no model no model
2VQR ali 0.4732 3.4

As expected, the model that was built with a human Arylsulfatase A is by far the best model. But if we imagine that no resolved structure were available the models are all not really good. With 1P49 the RMSD is close to the acceptable range while the TM-score is so low that it does not seem to make sense to use this predicted structure for anything while structure predicted with 2VQR with given alignment has the highest TM-score in our non-self-hits but a RMSD that is above the acceptable range. So it seems as if there are only templates that are too close or too far away to reasonably model the structure with it.

3D-Jigsaw

Without knowing the real TM-score and RMSD, we decided to build a combined structure out of the following predictions:

  • modeller prediction without structure information based on 1P49
  • modeller prediction with structure information based on 1P49
  • iTasser prediction with single template based on 1P49
  • SWISS-MODEL prediction with single template based on 1P49

In the following the models are evaluated using RMSD and TM score and visually superimposed with the real structure using pymol:

Model 1 Model 2 Model 3 Model 4 Model 5
real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol
real structure of 1AUK and structure of 1AUK modelled by 3D-Jigsaw, visualized in Pymol

To compare the models, TM-score and RMSD were calculated. For the calculation of the TM-score, TM-Score was used, for the calculation of the RMSD, DaliLite was used.

prediction by 3D-jigsaw
model TM-score RMSD
Model 1 0.9120 2.3
Model 2 0.9106 2.3
Model 3 0.9121 2.3
Model 4 0.9121 2.1
Model 5 0.9121 1.1 – 1.3

Comparison of the methods

The RMSDs of all methods range from 2 to 3.5 Angstrom. This suggests, that all methods produced accteptable models, but none of them were outstanding (except of the two models for iTasser and SWISS-MODEL, which are better due to the self-hit during the template recognition).
The TM-scores for all models vary more than the RMSDs. An RMSD of < 0.5 suggests that the model is not good and a TM score of >= 0.5 suggests that the model exhibits a very similar fold as the original structure. The TM scores for Modeller range from 0.4 - 0.8. Using multiple templates or manual modifications of the alignments decreased the accuracy. Surprisingly all iTasser (excluding self-hit) models yield a very low TM-score (around 0.4), despite the RMSDs are very similar to the predictions of modeller. Also the SWISS-MODEL TM-scores are low compared to that of iTasser (all below 0.5).
Regarding to the RMSD scores, no method performs substantially better than another one. Regarding to the TM scores Modeller performs best and regarding to both measures also modeller outperforms iTasser and SWISS-MODEL. For all methods, 1P49 was the best template, thus we included the models generated when using 1P49 to 3D-Jigsaw. 3D-Jigsaw generated 5 models; all with very high TM scores (around 0.9) and low RMSDs (1-2 Angstrom). Thus 3D-Jigsaw was able to substantially improve the models generated by our previously applied methods. The best model is model 5 with an RMSD of 1.1-1.3 and a TM score of 0.91.

References

<references />