Homology modelling TSD

From Bioinformatikpedia
Revision as of 11:14, 2 June 2012 by Meiera (talk | contribs) (SWISS-MODEL)

There will be no curiosity, no enjoyment of the process of life. All competing pleasures will be destroyed. But always — do not forget this, Winston — always there will be the intoxication of power, constantly increasing and constantly growing subtler. Always, at every moment, there will be the thrill of victory, the sensation of trampling on an enemy who is helpless. If you want a picture of the future, imagine a boot stamping on a human face — forever.

1984


protocol

Templates

Since similar sets were already collected for Task 2, the information was reused. In addition searches with HHpred on pdb70 and with COMA on pdb40 were performed. If two structures were mapped to the same Uniprot entry, only one, the 'most native' one, was used. The set of chosen templates is displayed in <xr id="tab:templates" />.
None of the searches revealed any structure with >80% sequence identity, other than the two already known structures 2gjx and 2gk1 which share 100% sequence identity. To still perform the task, 2gjx, which is the native structure <ref name="2gjxref">Lemieux,M. et al. (2006) Crystallographic Structure of Human beta-Hexosaminidase A: Interpretation of Tay-Sachs Mutations and Loss of GM2 Ganglioside Hydrolysis. Journal of molecular biology, 359, 913-29.</ref>, was chosen as reference and 2gk1, which has the inhibitor NGT bound, will be used as template. In the range between 40/80% sequence identity only one entry could be added from COMA. Most of the hits found by either COMA or HHpred turned out to have a sequence identity of lower than 25%.

<figtable id="tab:templates">

PDB id Sequence identity Method
> 80% identity 2gk1 chain A 100% Task2/HHpred/COMA
40% - 80% identity 1o7a chain D 56.6% Task 2
3lmy chain A 54% COMA
< 30% identity 3nsm chain A 27.5% Task 2
3gh5 chain A 20.7% Task 2

Table TODO: </figtable>


SWISS-MODEL

Default Modelling

2gjx chain E gets automatically assigned as template. The resulting alignment between target and template is nearly always identical matching. Only in the first region there is a gap

High sequence identity

Medium Sequence identity

Low sequence identity

With the 3gh5 as template the automated SWISS-MODEL was not able to calculate a model structure for the Hex A subunit. Two alignments were produced, one with Blast and one with HHsearch. While the Blast alignment quality between target and template was too low to start with, the HHsearch alignment reached the next level and was sent to modelling but the building of a model was not successful.
The same occured for 3nsm which has a sequence identity about 7% higher than 3gh5. A sequence identity lower than 30% seems to be too low for modelling.

Evaluation

The QMEAN is a scoring function to describe the model quality. It is a linear combination of the 4 statistical potential terms C_beta interaction energy, all-atom pairwise energy, solvation energy and torsion angle energy. Hereby the QMEAN raw score ranges from 0 to 1 and indicates the reliability of the model. The QMEAN Z-score represents the absolute quality of the model by describing the likelihood that a given model is of comparable quality to experimental structures. Is calculated by comparison to reference structures and has a range of -4 to 4; the smaller the value the worse the model quality. For a more detailed explanation, see [1].


<figtable id="tab:swissOwneval">

QMEAN raw score QMEAN Z-score
2gjx 0.658 -1.63
2gk1 0.698 -0.96
1o7a 0.594 -2.73

Table TODO: Scores provided by SWISS-MODEL. </figtable>

<figtable id="tab:swisseval">

Residues in common Common residue RMSD TM GDT-TS GDT-HA
2gjx 492 0.573 0.995 0.983 0.924
2gk1 492 0.213 0.999 1.000 0.999
1o7a 486 2.411 0.952 0.913 0.802

Table TODO: Calculated TM scores. </figtable>

<figtable id="tab:swissevalsap">

Residues in common Weighted RMSD Unweighted RMSD RMSD 6A around active site
2gjx 492 0.414 0.573
2gk1 491 0.189 0.212
1o7a 471 0.486 1.455

Table TODO: Calculated RMSD scores with SAP and Pymol. </figtable>

iTasser


Modeller


Automatic alignments

2gk1

The alignments produced by the normal and 2D alignments methods within modeller are very similar. One differing position is located at the and of the first alpha helix: Since the preceding loop region is not part of the structure a large gap is needed and found by both alignment methods. However the first residue after the helix, therefore belonging to the loop region is positioned before the gap in the normal alignment and after it in the structure based one. Clearly this is a borderline case and the choice should not make a difference. Placing the threonine after the gap however aligns it with another threonine and is therefore the slightly more convincing choice.

1o7a

The two alignments of this structure with intermediate sequence identity exhibit the same difference with the placement of threonine before or after the gap. Two gaps in the middle of the sequence, both of length three, do not differ between the two alignments. A second difference is found at the end of the alignment where the 2D-Aligner inserts a five residue gap before the last two residues, while the normal alignment simply ends in the gap. Since this is the loop region at the end of the structure, there is nothing to support the shift of the last two residues and the simple alignment is the more convincing one. It also aligns asparagine with glutamine instead of the more distant glutamic acid. Whatever the choice, one would not expect a difference in the produced models.

3gh5

Since the sequence similarity is very low in this case which is reflected in the differing alignments. It is easily observable that the simple alignment methods tries to group gaps together instead of creating short breaks. The structure based alignment introduces several short gaps in looped regions for no apparent reason. More importantly the 2D-Aligner introduces a gap in the middle of the first helix, while it is preserved in the simple alignment. This is very unexpected since the whole point of the 2D-Aligner is that such cases are smoothed out instead of created in the first place. Another helix is broken by both methods, however the 2D-Aligner performs worse again, since it introduces two gaps in the helix, instead of only one. The last residue of 3gh5 is separated by a gap in both alignment methods for no apparent reason, however the structural alignment method again introduces two gaps instead of one, with no clear explanation.

Evaluation of alignment methods

While the templates with higher sequence similarity result in almost equal alignments with both methods, the low similarity template shows that the 2D-Aligner exhibits exactly the opposite of the expected behaviour and actually makes the alignment worse by introducing more gaps that do not conserve but destroy secondary structure elements. Therefore the simple alignment methods is much more convincing.

Manually edited alignments

All the alignments created in the previous step were analysed in respect to the correct alignment of important residues. For the 2gk1 and 1o7a alignments there is nothing to improve upon. The only editing operations that come to mind would turn the simple method based alignment into the 2D based one and vice versa. Since this has already been discussed above no further analysis is performed for these two templates.

The alignment of 3gh5 allowed for some editing operations. The loop region the was broken by both methods was smoothed out, so the alignment ends with gaps?

try to save the helix in the middle (326, SADDY)

load into jalview and before that find out which residues are the important ones, mapped to uniprot (we need this for the next task anyway)

3D-Jigsaw

References

<references/>