Task 4: Homology-based structure prediction

From Bioinformatikpedia

More detailed information about the generation of the results can be found in the Task 4 Protocol.


In order to generate a model for our protein we first need to find homologous structures on which we base our model on. For this task we use HHpred and COMA.


Unfortunately the HHpred search did not produce any hits with a sequence identity between 40% and 80%. For testing purposes we assume 1qs0_A to fit in that category. The 2BFD hit with 99% sequence identity is a 3d structure of the actual human branched-chain alpha-ketoacid dehydrogenase that is only slightly modified. Therefore the hit is not really a homologous protein but the protein itself and is not viable to defer a model from, as we assume for this exercise that there is no actual 3d structure of our protein. The 1qs0_A protein is actually the BCAKD complex in Pseudomonas Putida. While the relationship between human and bacteria is quite distant, this structure is the only one available in the 40%-80% sequence identity range and might still produce a viable model. For sequences with sequence identitiy below 30% percent I currently have no idea thich one to choose as a model.

PDB ID e-Value Identity
>80% Sequence Identity
2BFD_A 1.7e-91 99%
80-40% Sequence Identity
1qs0_A 1.2e-77 39%
<30% Sequence Identity
2ozl_A 3.5e-70 26%
2yic_A 1.2e-57 14%
2xt6_A 5.9e-57 14%
2jgd_A 4.5e-57 14%
3kom_A 4.2e-27 20%
3m49_A 2.4e-27 16%
3l84_A 2.7e-27 16%
2o1s_A 5.1e-27 20%
2e6k_a 5.1e-27 18%
1qpu_A 3.7e-27 18%
3rim_A 3.6e-26 17%
1r9j_A 1.9e-25 15%
3uk1_A 3.8e-26 19%
2r8o_A 2e-25 21%
2o1x_A 3.9e-26 15%
1itz_A 8.2e-24 20%
3mos_A 2.5e-23 20%
2qtc_A 2.2e-22 14%
3ahc_A 2.7e-16 15%
2pan_A 4.1e-16 21%


COMA didn't find any hits with sequence similarity between 40% and 80% either, this time however there weren't any hits in close proximity either. The hit above 80% is again the protein itself and can therefore not be used for structure modeling.

PDB ID e-Value Identity
>80% Sequence Identity
1dtw_A 8e-59 100%
80-40% Sequence Identity
- - -
<30% Sequence Identity
2xt6_A 8.1e-56 12%
3duf_A 2.1e-53 28%
3exe_A 3.6e-50 25%
3mos_A 1e-45 15%
1l8a_A 7.5e-40 9%
3m34_A 1.2e-38 14%
2o1x_A 2.6e-37 12%
3uk1_A 2.4e-33 15%
3ahc_A 1.1e-25 8%

Model Creation and Evaluation

To evaluate our models we had to chose one pdb-structure of obda_human as "gold standard". We decided to take 1DTW, which was already chosen as representatitve structure in earlier tasks.



The Modeller prediction with the 1qs0_A template scored a TM-Score of 0.8316. As scores above 0.5 are considered to be in the same fold this is a pretty good result. The GDT-TS-Score was 0.7421 and the GDT-HA-Score we had was 0.5694. Buy running sap we calculated a weighted RMSd of 1.035 (over 379 atoms) which is again a very good result and indicates a good model quality.

weighted RMSD( 105 atoms )TM_scoreGDT_HAGDT_TS


weighted RMSD( 105 atoms )TM_scoreGDT_HAGDT_TS



The complete SWISS-MODEL result can be found here. As can be seen in Figure 1 the Anolea energy values are quite mixed. While there are some regions (220-240, 255-265, 290-300, 320-345) that have energetically favorable conformations, there are also others (80-90,100-110,135-145,155-165,365-375) that are far worse unfortunately. The QMEAN4 global score for the predicted model is only 0.62 which again indicates that there are some problems with the structural conformation. Interestingly the Gromos (Figure 2) energy calculation shows fewer problematic positions both in number and width.

Figure 1:Anolea(atomic empirical mean force potential) plot for 1qs0_A. Y value represents the energy of each amino acid in the protein chain. Green parts are energetically favorable while red ones are not.
Figure 2:Gromos Energy diagram. The y-axis of the plot represents the energy for each amino acid of the protein chain. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid.
weighted RMSD( 105 atoms )TM_scoreGDT_HAGDT_TS



The complete SWISS-MODEL result can be found here

weighted RMSD( 298 atoms )TM_scoreGDT_HAGDT_TS



As a single model prediction with I-Tasser takes about 30 to 60 hours we only predicted an I-Tasser Modell for 1qs0_A.


The ITasser Result for the 39% Seq Identity Template can be found here. It scored a TM-Score of 0.1720 which is considered almost random structural similarity.

weighted RMSD( 382 atoms )TM_scoreGDT_HAGDT_TS



still running (monday - 09:10)

weighted RMSD( 382 atoms )TM_scoreGDT_HAGDT_TS

Here the superpositioning of the prediction from itasser is schown. Colored in red the predicted structure, green: chain A of 1DTW, blue: chain B of 1DTW


It seems that there is no correlation between RMSD and the other scores. While TMscore and GDTscores differ a lot over the samples, the RMSD is quite good over all. We assume the RMSD to be less helpful in rating the models, because of the constant values.