ASPA Homology Modelling

From Bioinformatikpedia

Homology Modelling

Selected Structures

We used HHSearch and NCBI related structure search to search for usable structures. We applied a threshold of alignments of at least 150bp length to weed out the myriad of hits with good sequence similarity but extremely short alignments.

In the end, we got the following list of structures:

PDB ID Sequence Identity Alignment Length Description
> 60% sequence identity
2GU2_A 87% 307 Aspartoacyclase, Rattus norvegicus
> 40% sequence identity
3NH4_A 43% 309 Aspartoacyclase, Mus musculus
> 0% sequence identity
2QJ8_A 26% 178 MLR6093 Hydrolase, Mesorhizobium loti
1YW4_A 22% 180 Succinylglutamate desuccinylase, E. coli
3CDX_A 18% 180 Succinylglutamate desuccinylase, Rhodobacter sphaeroides

The list is rather thin in the upper percentage ranges, but no other structures could be found. In the bottom region, another Succinylglutamate desuccinylase was omitted, since it had an even worse alignment length and identity percentage and a cluster of closely related proteins of very low sequence identity was thought to slant the prediction in undesirable directions.


Single Template Modelling

Modeller

We chose the structures 2GU2, 3NH4 and 2QJ8 as our templates, since they were either without alternative or best in their group. With standard parameters, we got the following models; in one case, modeller tried to align our target onto the while homodimer sequence of the template, which resultet in nonsens. In accord with the instructions, we didn't do any optimization in this step; we did enforce use of one specified chain later.

Experimental structure (green) vs. 2GU2 standard parameter model
Experimental structure (blue) vs. 3NH4 standard parameter model
Experimental structure (violet) vs. 2QJ8 standard parameter model

We found that in the case of 2GU2, no editing was necessary. The non-SSE-aware alignment aligned the target nicely with the B chain (which is pretty much identical to the A chain and therefore OK); there were no gaps in functional regions or any other issues, so we left it untouched.

The same was true for 3NH4; what gaps we found were either very sensible or without import. Consequently, we did not modify this alignment either.

In the case of 2QJ8, not entirely unexpected problems arose; here, modeller's alignment algorithm had chopped the target sequence into many small segments separated by long stretches of gaps in a futile attempt to produce a global alignment over the whole homodimer sequence. We solved this by building a new PDB file with one chain removed and re-constructing the model. The result was a marked improvement on the previous attempt:

Experimental structure (violet) vs. 2QJ8 edited PDB with standard parameter model


SwissModeller

We also submitted our target and the three selected templates to SwissModeller. Superposited images of the resulting structures versus the real experimental one are shown on the right of this page.

Experimental structure (violet) vs. 2GU2 SwissModeller model
Experimental structure (blue) vs. 3NH4 SwissModeller model
Experimental structure (violet) vs. 2QJ8 SwissModeller model


iTasser

We could not complete this step, because iTasser stated that it had a very very long back queue and would process our jobs once all pending jobs were done; that was all we ever received from it. We will append the structures to this document if iTasser should ever happen to finish our jobs, however unlikely a prospect that may be after all the waiting we've already done.


Multiple Template Modelling

We also tried using all three templates from the third identity segment to construct one model. We let Modeller construct an MSA by itself and tried to edit it, quickly giving up since we could not effect any visible improvements and, by contrast, even produced models that looked worse on visual inspection in PyMol.

This step returned the following structure:

Experimental structure (violet) vs. multiple template model


Evaluation

For superposition images the structure documented on our disease Wiki page, 2O53, was used. The pictures are shown on the right.

Model RMSD TM Score
Modeller edited 2QJ8 2.304 0.5024
Modeller Multi-template 0.427 0.9801
Modeller standard 2GU2 0.412 0.9858
Modeller standard 2QJ8 2.467 0.2706
Modeller standard 3NH4 0.776 0.9662
SwissModel 2GU2 0.413 0.9787
SwissModel 2QJ8 1.746 0.5494
SwlssModel 3NH4 0.769 0.9651


Discussion

The best method is modeller with the 2GU2 template; since the similarity between 2GU2 and the target is extremely high, it is hardly surprising that one of these models happens to be the best one generated; what does come as a surprise is that Modeller outperforms SwissModel for this structure, albeit only by a very small margin. Also very good performance was reached by using multiple templates with Modeller; here, the result is almost as good as that which used the 2GU2 template. This is interesting, as the templates used were of very bad quality, with the one most similar to the target, 2QJ8, leading to extremely bad results when used as single template. Here Modeller outperforms SwissProt again, although this time in screwing up badly.

Modeller appears to lean more towards extremes; its extreme cases are both better and worse than thouse of SwissModel. Sequence similarity played a big role, with 2GU2 being always the best one by a comfortable margin, 3NH4 second best and 2QJ8 pretty much catastrophic. That this can be compensated by combining information from three bad structures is quite fascinating.