Homology modelling TSD

From Bioinformatikpedia
Revision as of 17:13, 2 June 2012 by Meiera (talk | contribs) (Medium Sequence identity)

There will be no curiosity, no enjoyment of the process of life. All competing pleasures will be destroyed. But always — do not forget this, Winston — always there will be the intoxication of power, constantly increasing and constantly growing subtler. Always, at every moment, there will be the thrill of victory, the sensation of trampling on an enemy who is helpless. If you want a picture of the future, imagine a boot stamping on a human face — forever.

1984


protocol

Templates

Since similar sets were already collected for Task 2, the information was reused. In addition searches with HHpred on pdb70 and with COMA on pdb40 were performed. If two structures were mapped to the same Uniprot entry, only one, the 'most native' one, was used. The set of chosen templates is displayed in <xr id="tab:templates" />.
None of the searches revealed any structure with >80% sequence identity, other than the two already known structures 2gjx and 2gk1 which share 100% sequence identity. To still perform the task, 2gjx, which is the native structure <ref name="2gjxref">Lemieux,M. et al. (2006) Crystallographic Structure of Human beta-Hexosaminidase A: Interpretation of Tay-Sachs Mutations and Loss of GM2 Ganglioside Hydrolysis. Journal of molecular biology, 359, 913-29.</ref>, was chosen as reference and 2gk1, which has the inhibitor NGT bound, will be used as template. In the range between 40/80% sequence identity only one entry could be added from COMA. Most of the hits found by either COMA or HHpred turned out to have a sequence identity of lower than 25%.

<figtable id="tab:templates">

PDB id Sequence identity Method
> 80% identity 2gk1 chain A 100% Task2/HHpred/COMA
40% - 80% identity 1o7a chain D 56.6% Task 2
3lmy chain A 54% COMA
< 30% identity 3nsm chain A 27.5% Task 2
3gh5 chain A 20.7% Task 2

Table TODO: </figtable>

Important residues

TODO, jonas?




Modeller


Automatic alignments

2gk1

The alignments produced by the simple and 2D alignments methods within modeller are very similar. One differing position is located at the and of the first alpha helix: Since the preceding loop region is not part of the structure a large gap is needed and found by both alignment methods. However the first residue after the helix, therefore belonging to the loop region is positioned before the gap in the normal alignment and after it in the structure based one. Clearly this is a borderline case and the choice should not make a difference. Placing the threonine after the gap however aligns it with another threonine and is therefore the slightly more convincing choice.

1o7a

The two alignments (simple, 2D) of this structure with intermediate sequence identity exhibit the same difference with the placement of threonine before or after the gap. Two gaps in the middle of the sequence, both of length three, do not differ between the two alignments. A second difference is found at the end of the alignment where the 2D-Aligner inserts a five residue gap before the last two residues, while the normal alignment simply ends in the gap. Since this is the loop region at the end of the structure, there is nothing to support the shift of the last two residues and the simple alignment is the more convincing one. It also aligns asparagine with glutamine instead of the more distant glutamic acid. Whatever the choice, one would not expect a difference in the produced models.

3gh5

Since the sequence similarity is very low in this case which is reflected in the differing alignments. It is easily observable that the simple alignment methods tries to group gaps together instead of creating short breaks. The structure based alignment introduces several short gaps in looped regions for no apparent reason. More importantly the 2D-Aligner introduces a gap in the middle of the first helix, while it is preserved in the simple alignment. This is very unexpected since the whole point of the 2D-Aligner is that such cases are smoothed out instead of created in the first place. Another helix is broken by both methods, however the 2D-Aligner performs worse again, since it introduces two gaps in the helix, instead of only one. The last residue of 3gh5 is separated by a gap in both alignment methods for no apparent reason, however the structural alignment method again introduces two gaps instead of one, with no clear explanation.

Evaluation of alignment methods

While the templates with higher sequence similarity result in almost equal alignments with both methods, the low similarity template shows that the 2D-Aligner exhibits exactly the opposite of the expected behaviour and actually makes the alignment worse by introducing more gaps that do not conserve but destroy secondary structure elements. Therefore the simple alignment methods is much more convincing.

Manually edited alignments

All the alignments created in the previous step were analysed in respect to the correct alignment of important residues. For the 2gk1 and 1o7a alignments there is nothing to improve upon. The only editing operations that come to mind would turn the simple method based alignment into the 2D based one and vice versa. Since this has already been discussed above no further analysis is performed for these two templates.

The alignment of 3gh5 allowed for some editing operations. Taking the alignment produced by the simple method as a basis all important residues but N423 are already correctly aligned. Before that residue however there is the alpha helix, that is broken by both alignments methods. In the edited alignment this helix has been completely aligned and a gap placed before and after it, thereby only affecting loop regions. N423 is left as is, since there is no asparagine nearby that would result in a better alignment and since it is aligned to an aspartic acid which is a likely substitution that should not affect the formation of the hydrogen bond. Finally, the last residue is not placed separately and the alignment instead ends with gaps. The resulting alignment is shown in <xr id="fig:modeller3gh5edited"/>.

<figure id="fig:modeller3gh5edited">

>P1;3GH5 structureX:../../input/3GH5.pdb: -16 :A:+507 :A:MOL_ID 1; MOLECULE BETA-HEXOSAMINIDASE; CHAIN A; [...] HHHHSSGLVPRGSHMASMSQPSILPKPVSYTVGSGQFVLTKNASIFVAGNNVGETDELFNIGQALAKKLNAS TG-------YTISVVKSNQPTAGSIYLTTVGGN---AALGNEGYDLITTSNQVTLTANKPEGVFRGNQTLLQ LLPAGIEKNTVVSGVQWVIPHSNISDKPEYEYRGLMLDVARHFFTVDEVKRQIDLASQYKINKFHMHLSDDQ GWRIEIKSWPDLIEIGSKGQVGGGPGGYYTQEQFKDIVSYAAERYIEVIPEIDMPGHTNAALASYGELNPDG KRKAMRTDTAVGYSTLMPRAEITYQFVEDVISELAAISPSPYIHLGGDESNAT----------------SAA DYDYFFGRVTAIANSY----GKKVVGWDPS-DTSSGATSDSVLQNWTCSASTGTAAKAKGMKVIVSPANAYL DMKYYSDS-PIGLQWRGF-VNTNRAYNWDPTDCIKGANIYGVESTLWTETFVTQDHLDYMLYPKLLSNAEVG WTARGDRNWDDFKERLIEHTPRLQNKGIKFFADPIV---------* >P1;P06865 sequence::  : :  : :::-1.00:-1.00 MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLDEAFQRYRDLL FGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDDQCLLLSETVWGALRGLETFSQ LVWKSAEGT-------FFINKTEIEDFPRFPHRGLLLDTSRHYLPLSSILDTLDVMAYNKLNVFHWHLVDDP SFPYESFTFPELMRKGSYNPVTH----IYTAQDVKEVIEYARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPC YSGSEPSGT---FGPVNPSLNNTYEFMSTFFLEVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGE DFKQLESFYIQTLLDIVSSYGKGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELE--LVTKAGFRALL SAPWYLNRISYGPDWKDFYIVEPLAFEGTPE---QKALVIGGEACMWGE-YVDNTNLVPRLWPRAGAVAERL WSNKLTSDLTFAYERLSHFRCELLRRGVQAQPLNVGFCEQEFEQT*

Edited Alignment of 3gh5_A with P06865 based on the simple (non-structural) alignment produced by Modeller. The helix region and single residue at the end that were changed are highlighted in red. Bold, blue residues are those deemed important (c.f. TODO) </figure>

3gh5

SWISS-MODEL

General

The SWISS-MODEL server provides as output a model in pdb format and the corresponding target template alignment. Besides it also offers diverse scores and functions for the model evaluation.
ANOLEA is the atomic empirical mean force potential. Therefore a program performs energy calculations on a protein chain. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment.
There is also a local model reliability score, the residue error computed along the sequence
QMEAN is a composite scoring function for both the estimation of the global quality of the entire model as well as for the local per-residue analysis of different regions within a model.
QMEAN4 is a scoring function to describe the model quality which can be used in order to compare and rank alternative models of the same target.. It is a linear combination of the 4 statistical potential terms C_beta interaction energy, all-atom pairwise energy, solvation energy and torsion angle energy. Hereby the QMEAN raw score ranges from 0 to 1 and indicates the reliability of the model. The QMEAN Z-score represents the absolute quality of the model by describing the likelihood that a given model is of comparable quality to experimental structures. Is calculated by comparison to reference structures and has a range of -4 to 4; the smaller the value the worse the model quality. For a more detailed explanation, see [1].

Default Modelling

2gjx chain E gets automatically assigned as template. this is not surprising as the 2gjx_a is the reference and all the chains are virtually identical. The resulting alignment between target and template consists of identical matches. Only in the first region a loop in the template is missing and therefore a gap is inserted (see <xr id="fig:2gjxali"/>).

<figure id="fig:2gjxali">

TARGET 1 LWPWPQNF QTSDQRYVLY PNNFQFQYDV SSAAQPGCSV LDEAFQRYRD 2gjxE 23 lwpwpqnf qtsdqryvly pnnfqfqydv ssaaqpgcsv ldeafqryrd TARGET ss sss sssss hh hhhhhhhhhh 2gjxE ss sss sssss hh hhhhhhhhhh TARGET 49 LLFGSGSWPR PYLTGKRHTL EKNVLVVSVV TPGCNQLPTL ESVENYTLTI 2gjxE 71 llfg------ --------tl eknvlvvsvv tpgcnqlptl esvenytlti TARGET hh ! sssss ssssss 2gjxE hh sssss ssssss

Alignment of 2gjx_E with P06865. The gap is highlighted with a exclamation mark. </figure>

For the whole alignment see Swissmodel 2gjx alignment.
<figure id="fig:2gjxanolea">

ANOLEA score for the first region of the 2gjx model.

</figure> As the 2gjx chain is our chosen reference sequence it is expected to provide a perfect model. ANOLEA displays only one weakness of the model which is found at the beginning between position 70 and 90, see <xr id="fig:2gjxanolea"/>. This is also supported by the QMEAN, which signalises a potential error at this site. This is the only gap region from the alignment and thus it receives low support from the template structure.
The further scores (see <xr id="tab:swissscores2gjx"/>) show the good quality of the model, as the energy values are comparably low which signalises a favourable structure and the QMEAN4 close to 1 gives the model a good rank. With the "perfect match template" it is surprising that the Z-scores are negative as this stands for a decay in quality.

<figtable id="tab:swissscores2gjx">

Scoring function term Raw score Z-score
C_beta interaction energy -106 -0.99
All-atom pairwise energy -13340 -0.33
Solvation energy -37 -0.62
Torsion angle energy -93 -1.62
QMEAN4 score 0.68 -1.63

Table TODO: Scores for default modelling of P06865 by SWISS-MODEL. </figtable>

<figure id="fig:2gjxswissmodel">

SWISS-MODEL for the Hex A structure computed with the 2gxj template in comparison to the reference.The reference is displayed in green and for the model the error color-coding from SWISSMODEL is adopted meaning that confident residues are colored blue and the higher the estimated error the more red the residues are displayed. The active site is shown in yellow and the important H-bond residue in orange.

</figure> In <xr id="fig:2gjxswissmodel"/> the computed model is display (blue-red, depending on error) together with the reference (green). It is clear that the model fits the reference perfectly which was the expectancy as the the template is the reference just with another chain (template 2gjx_E, reference 2gjx_A). The only region where the target and reference do not is the loop region colored in red belonging to the gap in the template structure, see <xr id="fig:2gjxali"/>, which was correctly detected by the SWISS-MODEL error scores. The active site and the important residues are matched perfectly by the model. As the model is very accurate the negative Z-scores remains difficult to justify.



High sequence identity

The model with 2gk1 as template is very much alike the previous model, which is not surprising as the templates have 100% sequence identity. The alignment of the Hex A subunit and the template expresses the same gap as shown above (for the whole alignment see Swismsmodel 2gk1 alignment) This alignment also correspond to the structural alignment from Modeller. <figure id="fig:2gk1anolea">

ANOLEA score for the first region of the 2gk1 model.

</figure> The ANOLEA and QMEAN are very good for the alignment, again except for the first region, where there are some signs of lower model quality, see <xr id="fig:2gk1anolea"/>. The scores are again very similar to those of the 2gjx template, see <xr id="tab:swissscores2gk1"/> which accredits the quality of the model. The Z-scores are also negative but show a better overall tendency. It could be that individual terms of geometrical features are slightly improves in this model

<figtable id="tab:swissscores2gk1">

Scoring function term Raw score Z-score
C_beta interaction energy -122 -0.81
All-atom pairwise energy -13468 -0.34
Solvation energy -44 -0.12
Torsion angle energy -109 -1.06
QMEAN4 score 0.69 -0.96

Table TODO: Scores for the model of P06865 with 2gk1 as template by SWISS-MODEL. </figtable>

<figure id="fig:2gkiswissmodel">

SWISS-MODEL derived from 2gk1 template in comparison to the reference.The reference is displayed in green and is shown in error color-coding from SWISSMODEL: confident residues are colored blue and the higher the estimated error the more red the residues are displayed. The active site is shown in yellow and the important H-bond residue in orange.

</figure>

The model with the reference are displayed in <xr id="fig:2gkiswissmodel"/>: There is very little red coloring which shows a low error. The only outlier is the wrongly assigned helix. Besides this the model is very accurate and conserves the active site as well as the important residues.

Medium Sequence identity

The alignment for the modelling with 1o7a as template is as matching as it was previously the case but again the most gaps appear in the beginning of the alignment, see <xr id="fig:1o7aali"/>.

<figure id="fig:1o7aali">

TARGET 1 TALWPWPQ NFQTSDQRYV LYPNNFQFQY DVSSAAQPGC SVLDEAFQRY 1o7aD 54 palwplpl svkmtpnllh lapenfyish spnstagpsc tlleeafr-- TARGET sssss ssss s ssss s hhhhhh 1o7aD sssss ssss s ssss s hhhhhhhh TARGET 49 RDLLFGSGSW PRPYLTGKRH TLEKNVLVVS VVTPGCNQLP TLESVENYTL 1o7aD 100 -----ryhgy ifgtqvq--- q---llvsi- tlqsecdafp nissdesytl TARGET sss hhhh sss ssssss sss 1o7aD hhhhh h s sssss s sss

Alignment of 1o7a with P06865. </figure>

For the whole alignment see Swissmodel 1o7a alignment.

<figure id="fig:1o7aanolea">

ANOLEA score for the first region of the 1o7a model.

</figure> The ANOLEA signals a high error for the gapped alignment region (<xr id="fig:1o7aanolea"/>). With a look at the Modeller alignments (LIINK) it seems that this region is difficult as it is not aligned in the same way.
The scores are a little worse that for the model with the very high sequence identity, see <xr id="tab:swissscore1o7a"/>. The energies are slightly worse the QMEAN4 lower and the Z-score more negative.

<figtable id="tab:swissscore1o7a">

Scoring function term Raw score Z-score
C_beta interaction energy -97 -1.15
All-atom pairwise energy -11181 -0.99
Solvation energy -29 -1.24
Torsion angle energy -72 -2.33
QMEAN4 score 0.594 -2.73

Table TODO: Scores for 1o7a modelling of P06865 by SWISS-MODEL. </figtable>

<figure id="fig:1o7aswissmodel">

SWISS-MODEL derived from 1o7a template in comparison to the reference.The reference is displayed in green and is shown in error color-coding from SWISSMODEL: confident residues are colored blue and the higher the estimated error the more red the residues are displayed. The active site is shown in yellow and the important H-bond residue in orange.

</figure>

The model in comparison to the reference is shown in <xr <figure id="fig:1o7aswissmodel"/>. It stand out that there is more error than in the previous models as there is more red coloring in the model structure. Especially the loop regions are pointed in wrong directions. The erroneous helix on the right of the figure, colored red, stands out as the greatest mistake of the model. It correspond to the gap region of the alignment, see <xr id="fig:1o7aali"/>, which is also highlighted in red. It seems that the region and especially the gaps are erroneous and thus the model is not accurate there. Besides the important residues are all in their right place and overall the model fits the reference quite well.


Alignment mode

The model from the 1o7a template is the first one to have a different alignment than the one proposed by Modeller. Thus it is very interesting to find out which alignment leads to a better structure and therefore the SWISS-MODEL was employed with the alignment from Modeller. This alignment is different in the way that there is only one big gap and not several small ones in the first region (see <xr id="fig:1o7a_ali"/> and <xr id="fig:1o7aali"/>).

<figure id="fig:1o7a_ali">

TARGET 21 TALWPWPQ NFQTSDQRYV LYPNNFQFQY 1o7aD 54 palwplpl svkmtpnllh lapenfyish TARGET sssss ssss s ssss 1o7aD sssss ssss s ssss TARGET 49 DVSSAAQPGC SVLDEAFQRY RDLLFGSGSW PRPYLTGKRH TLEKNVLVVS 1o7aD 82 spnstagpsc tlleeafrry hgyifg---- ---------- tqvqqllvsi TARGET s hhhhhhhhhh hhhh ssssss 1o7aD s hhhhhhhhhh hhhh ssssss

Alignment of 1o7a with P06865. </figure>

Indeed the scores as swon in <xr id="tab:swissscore1o7a_a"/> are better than the scores from the SWISS-MODEL alignment. The energy values are lower and the Z-score higher. The values are more comparable to those received from the models above with templates of high sequence identity.

<figtable id="tab:swissscore1o7a_a">

Scoring function term Raw score Z-score
C_beta interaction energy -126 -0.78
All-atom pairwise energy -13033 -0.48
Solvation energy -35 -0.78
Torsion angle energy -90 -1.67
QMEAN4 score 0.648 -1.87

Table TODO: Scores for default modelling of P06865 by SWISS-MODEL. </figtable>

<figure id="fig:1o7a_alswissmodel">

SWISS-MODEL derived from target-1o7a alignment in comparison to the reference.The reference is displayed in green and is shown in error color-coding from SWISSMODEL: confident residues are colored blue and the higher the estimated error the more red the residues are displayed. The active site is shown in yellow and the important H-bond residue in orange.

</figure>

The <xr id="fig:1o7a_alswissmodel"/> with the superimposed reference seals it, the alignment provided by Modeller suits the structure of the Hex A subunit better than the alignment computed by SWISS-MODEL. There is very little error and no helix or sheets are erroneously modelled. This model is comparable to the first, which was computed with the reference sequence it self.


Low sequence identity

With the 3gh5 as template the automated SWISS-MODEL was not able to calculate a model structure for the Hex A subunit. Two alignments were produced, one with Blast and one with HHsearch. While the Blast alignment quality between target and template was too low to start with, the HHsearch alignment reached the next level and was sent to modelling but the building of a model was not successful.
The same occured for 3nsm which has a sequence identity about 7% higher than 3gh5. A sequence identity lower than 30% seems to be too low for modelling.

Evaluation

<figtable id="tab:swissOwneval">

QMEAN raw score QMEAN Z-score
2gjx 0.658 -1.63
2gk1 0.698 -0.96
1o7a 0.594 -2.73

Table TODO: Scores provided by SWISS-MODEL. </figtable>

<figtable id="tab:swisseval">

Residues in common Common residue RMSD TM GDT-TS GDT-HA
2gjx 492 0.573 0.995 0.983 0.924
2gk1 492 0.213 0.999 1.000 0.999
1o7a 486 2.411 0.952 0.913 0.802

Table TODO: Calculated TM scores. </figtable>

<figtable id="tab:swissevalsap">

Residues in common Weighted RMSD Unweighted RMSD RMSD 6A around active site
2gjx 492 0.414 0.573
2gk1 491 0.189 0.212
1o7a 471 0.486 1.455

Table TODO: Calculated RMSD scores with SAP and Pymol. </figtable>


iTasser

3D-Jigsaw

References

<references/>