Task 4: Homology-based structure prediction
More detailed information about the generation of the results can be found in the Task 4 Protocol.
Contents
Datasets
In order to generate a model for our protein we first need to find homologous structures on which we base our model on. For this task we use HHpred and COMA.
HHpred
Unfortunately the HHpred search did not produce any hits with a sequence identity between 40% and 80%. For testing purposes we assume 1qs0_A to fit in that category. The 2BFD hit with 99% sequence identity is a 3d structure of the actual human branched-chain alpha-ketoacid dehydrogenase that is only slightly modified. Therefore the hit is not really a homologous protein but the protein itself and is not viable to defer a model from, as we assume for this exercise that there is no actual 3d structure of our protein. The 1qs0_A protein is actually the BCAKD complex in Pseudomonas Putida. While the relationship between human and bacteria is quite distant, this structure is the only one available in the 40%-80% sequence identity range and might still produce a viable model. For sequences with sequence identitiy below 30% percent I currently have no idea thich one to choose as a model.
PDB ID | e-Value | Identity |
---|---|---|
>80% Sequence Identity | ||
2BFD_A | 1.7e-91 | 99% |
80-40% Sequence Identity | ||
1qs0_A | 1.2e-77 | 39% |
<30% Sequence Identity | ||
2ozl_A | 3.5e-70 | 26% |
2yic_A | 1.2e-57 | 14% |
2xt6_A | 5.9e-57 | 14% |
2jgd_A | 4.5e-57 | 14% |
3kom_A | 4.2e-27 | 20% |
3m49_A | 2.4e-27 | 16% |
3l84_A | 2.7e-27 | 16% |
2o1s_A | 5.1e-27 | 20% |
2e6k_a | 5.1e-27 | 18% |
1qpu_A | 3.7e-27 | 18% |
3rim_A | 3.6e-26 | 17% |
1r9j_A | 1.9e-25 | 15% |
3uk1_A | 3.8e-26 | 19% |
2r8o_A | 2e-25 | 21% |
2o1x_A | 3.9e-26 | 15% |
1itz_A | 8.2e-24 | 20% |
3mos_A | 2.5e-23 | 20% |
2qtc_A | 2.2e-22 | 14% |
3ahc_A | 2.7e-16 | 15% |
2pan_A | 4.1e-16 | 21% |
COMA
COMA didn't find any hits with sequence similarity between 40% and 80% either, this time however there weren't any hits in close proximity either. The hit above 80% is again the protein itself and can therefore not be used for structure modeling.
PDB ID | e-Value | Identity |
---|---|---|
>80% Sequence Identity | ||
1dtw_A | 8e-59 | 100% |
80-40% Sequence Identity | ||
- | - | - |
<30% Sequence Identity | ||
2xt6_A | 8.1e-56 | 12% |
3duf_A | 2.1e-53 | 28% |
3exe_A | 3.6e-50 | 25% |
3mos_A | 1e-45 | 15% |
1l8a_A | 7.5e-40 | 9% |
3m34_A | 1.2e-38 | 14% |
2o1x_A | 2.6e-37 | 12% |
3uk1_A | 2.4e-33 | 15% |
3ahc_A | 1.1e-25 | 8% |
Model Creation and Evaluation
To evaluate our models we had to chose one pdb-structure of obda_human as "gold standard". We decided to take 1DTW, which was already chosen as representatitve structure in earlier tasks.
Modeler
1qs0_A
The Modeller prediction with the 1qs0_A template scored a TM-Score of 0.8316. As scores above 0.5 are considered to be in the same fold this is a pretty good result. The GDT-TS-Score was 0.7421 and the GDT-HA-Score we had was 0.5694. Buy running sap we calculated a weighted RMSd of 1.035 (over 379 atoms) which is again a very good result and indicates a good model quality.
weighted RMSD( 105 atoms ) | TM_score | GDT_HA | GDT_TS |
1.035 | 0.8316 | 0.5694 | 0.7421 |
2XT6_A
weighted RMSD( 105 atoms ) | TM_score | GDT_HA | GDT_TS |
Swiss-Model
1qs0_A
The complete SWISS-MODEL result can be found here. As can be seen in Figure 1 the Anolea energy values are quite mixed. While there are some regions (220-240, 255-265, 290-300, 320-345) that have energetically favorable conformations, there are also others (80-90,100-110,135-145,155-165,365-375) that are far worse unfortunately. The QMEAN4 global score for the predicted model is only 0.62 which again indicates that there are some problems with the structural conformation. Interestingly the Gromos (Figure 2) energy calculation shows fewer problematic positions both in number and width.
weighted RMSD( 105 atoms ) | TM_score | GDT_HA | GDT_TS |
0.723 | 0.0896 | 0.0353 | 0.0524 |
2XT6_A
The complete SWISS-MODEL result can be found here
weighted RMSD( 298 atoms ) | TM_score | GDT_HA | GDT_TS |
1.855 | 0.1559 | 0.0445 | 0.0726 |
I-Tasser
As a single model prediction with I-Tasser takes about 30 to 60 hours we only predicted an I-Tasser Modell for 1qs0_A.
1qs0_A
The ITasser Result for the 39% Seq Identity Template can be found here. It scored a TM-Score of 0.1720 which is considered almost random structural similarity.
weighted RMSD( 382 atoms ) | TM_score | GDT_HA | GDT_TS |
0.979 | 0.1720 | 0.0412 | 0.0713 |
2XT6_A
still running (monday - 09:10)
weighted RMSD( 382 atoms ) | TM_score | GDT_HA | GDT_TS |
conclusion
It seems that there is no correlation between RMSD and the other scores. While TMscore and GDTscores differ a lot over the samples, the RMSD is quite good over all. We assume the RMSD to be less helpful in rating the models, because of the constant values.