Difference between revisions of "Homology based structure predictions BCKDHA"
(→iTasser) |
(→iTasser) |
||
Line 217: | Line 217: | ||
Secondary structure elements are shown as H for Alpha helix,S for Beta sheet and c for Coil |
Secondary structure elements are shown as H for Alpha helix,S for Beta sheet and c for Coil |
||
+ | Additionally iTasser predicts several different models and presents the top five. To predict these models it uses a lot of templates. ITasser searchs the templates itself and also evaluates which one is the best. |
||
− | |||
− | |||
− | iTasser predicts several models and presents 5 of them. To predict those 5 models for 1u5b it uses the templates 2bffA, 2bfdA, 1dtwA, 1dtwA, 2bffA. Although some templates were used more than one time the models are different which is shown by the different scores. |
||
== 2.Evaluation of models == |
== 2.Evaluation of models == |
Revision as of 15:40, 9 June 2011
1.Calculation of models
To find similar structures to BCKDHA we ran HHsearch:
hhsearch -i query -d database -o output
It found the following 10 hits in the pdb70 database.
No | Hit | Prob | E-value | P-value | Score | SS | Cols | Query HMM | Template HMM | Identity |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2bfd_A 2-oxoisovalerate dehydr | 1.0 | 1 | 1 | 791.3 | 0.0 | 400 | 1-400 | 1-400 (400) | 99% |
2 | 1qs0_A 2-oxoisovalerate dehydr | 1.0 | 1 | 1 | 571.5 | 0.0 | 349 | 32-382 | 52-407 (407) | 39% |
3 | 1w85_A Pyruvate dehydrogenase | 1.0 | 1 | 1 | 530.8 | 0.0 | 356 | 8-382 | 6-362 (368) | 34% |
4 | 1umd_A E1-alpha, 2-OXO acid de | 1.0 | 1 | 1 | 521.8 | 0.0 | 351 | 34-386 | 16-367 (367) | 37% |
5 | 2ozl_A PDHE1-A type I, pyruvat | 1.0 | 1 | 1 | 482.7 | 0.0 | 331 | 46-380 | 25-356 (365) | 27% |
6 | 3l84_A Transketolase; TKT, str | 1.0 | 1 | 1 | 85.4 | 0.0 | 133 | 161-297 | 113-252 (632) | 21% |
7 | 2r8o_A Transketolase 1, TK 1; | 1.0 | 1 | 1 | 74.5 | 0.0 | 121 | 161-285 | 113-245 (669) | 33% |
8 | 2o1x_A 1-deoxy-D-xylulose-5-ph | 1.0 | 1 | 1 | 74.2 | 0.0 | 127 | 161-287 | 122-254 (629) | 18% |
9 | 1gpu_A Transketolase; transfer | 1.0 | 1 | 1 | 74.2 | 0.0 | 140 | 161-302 | 115-265 (680) | 22% |
10 | 3m49_A Transketolase; alpha-be | 1.0 | 1 | 1 | 68.8 | 0.0 | 121 | 161-285 | 139-271 (690) | 31% |
> 60% sequence identity:
2bfd_A
> 40% sequence identity:
< 40% sequence identity (ideally go towards 20%) :
1qs0_A, 1umd_A, 1w85_A, 2r8o_A, 3m49_A, 2ozl_A, 1gpu_A, 3l84_A, 2o1x_A, 1w85_A
HHSearch has only hits with an identity higher than 60% or lower than 40%.
These are the templates we will work with:
> 60% sequence identity:
2bfd_A
< 40% sequence identity (ideally go towards 20%) :
2r8o_A
Modeller
MODELLER is used for homology or comparative modeling of protein three-dimensional structures.It calculates a model containing all non-hydrogen atoms. There are also many other tasks provided by MODELLER like de novo modeling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, etc.[1]
A tutorial is provided on [2] and on [3]
To run modeller with more than one target we use the targets:
- 1dtw:A 95%
- 2bfe:A 94%
- 2bfb:A 99%
- 2bfd:A 99%
- 1gpu:A 22%
- 2o1x:A 18%
- 2r8o:A 33%
SWISS-MODEL
To find protein structure homology models SWISS-MODEL can be used. As input it needs a protein sequence or a UniProt AC Code. Optional the template PDB-Id and the chain or a template file can be assigned. SWISS-MODEL is a fully automated protein structure homology-modeling server. It is accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).
SWISS-MODEL server:
ID | link |
---|---|
2bfd_A | 2bfd_A |
2r8o_A | 2r8o_A |
Prediction for 2bfd_A
TARGET 51 KPQFPGAS AEFIDKLEFI QPNVISGIPI YRVMDRQGQI INPSEDPHLP 2bfdA 6 kpqfpgas aefidklefi qpnvisgipi yrvmdrqgqi inpsedphlp TARGET sss ss s 2bfdA sss ss s TARGET 99 KEKVLKLYKS MTLLNTMDRI LYESQRQGRI SFYMTNYGEE GTHVGSAAAL 2bfdA 54 kekvlklyks mtllntmdri lyesqrqgri sfymtnygee gthvgsaaal TARGET hhhhhhhhhh hhhhhhhhhh hhhhhhh h hhhhhhhh 2bfdA hhhhhhhhhh hhhhhhhhhh hhhhhhh h hhhhhhhh TARGET 149 DNTDLVFGQY REAGVLMYRD YPLELFMAQC YGNISDLGKG RQMPVHYGCK 2bfdA 104 dntdlvfgqa reagvlmyrd yplelfmaqc ygnisdlgkg rqmpvhygck TARGET sss hhhhh hhhhhhhh h 2bfdA sss hhhhhh hhhhhhhh h TARGET 199 ERHFVTISSP LATQIPQAVG AAYAAKRANA NRVVICYFGE GAASEGDAHA 2bfdA 154 erhfvtissp latqipqavg aayaakrana nrvvicyfge gaasegdaha TARGET hhhhhhh hhhhhhhh ssssssss hhh hhhh 2bfdA hhhhhhh hhhhhhhh ssssssss hhh hhhh TARGET 249 GFNFAATLEC PIIFFCRNNG YAISTPTSEQ YRGDGIAARG PGYGIMSIRV 2bfdA 204 gfnfaatlec piiffcrnng yaistptseq yrgdgiaarg pgygimsirv TARGET hhhhhhhh ssssssss hhhh hhh sssss 2bfdA hhhhhhhh ssssssss hhhh hhh sssss TARGET 299 DGNDVFAVYN ATKEARRRAV AENQPFLIEA MTYRIGHHST SDDSSAYRSV 2bfdA 254 dgndvfavyn atkearrrav aenqpfliea mtyrig---- ---------- TARGET ss hhhhhh hhhhhhhhhh hh sssss ss 2bfdA ss hhhhhh hhhhhhhhhh hh sssss ss TARGET 349 DEVNYWDKQD HPISRLRHYL LSQGWWDEEQ EKAWRKQSRR KVMEAFEQAE 2bfdA 292 -------std hpisrlrhyl lsqgwwdeeq ekawrkqsrr kvmeafeqae TARGET hhhhhhhh h hhh hhhhhhhhhh hhhhhhhhhh 2bfdA hhhhhhhh h hhh hhhhhhhhhh hhhhhhhhhh TARGET 399 RKPKPNPNLL FSDVYQEMPA QLRKQQESLA RHLQTYGEHY PLDHFDK 2bfdA 354 rkpkpnpnll fsdvyqempa qlrkqqesla rhlqtygehy pldhfdk- TARGET h h hhhhhhhhhh hhhhh 2bfdA h h hhhhhhhhhh hhhhh
Prediction for 2r8o_A
TARGET 152 DL -VFG-QYREA ---GVLMYRD --YPLELFMA QCYGNISDLG 2r8oA 52 pswadr--dr fvlsnghgsm liysllhltg ydlpmeelkn -f-rql---- TARGET s ssssss sssssssss 2r8oA s ssss h hhhhhhhh hhhh TARGET 187 KGRQMPVHYG CK-ERHFVTI SSPLATQIPQ AVGAAYAAKR AN-------- 2r8oA 94 -hsktpghpe vgytagvett tgplgqgian avgmaiaekt laaqfnrpgh TARGET s sssss hhhhh h hhhh hhhhhhhhhh h 2r8oA hhhh hhhhhhhhhh hhhhh TARGET 228 --ANRVVICY FGEGAASEGD AHAGFNFAAT LEC-PIIFFC RNNGYAISTP 2r8oA 143 divdhytyaf mgdgcmmegi shevcslagt lklgkliafy ddngisidgh TARGET ssss s hhhh h hhhhhhhhhh h sssss ss sss ss 2r8oA sssss s hhhh h hhhhhhhhhh h ssssss ss sss ss TARGET 275 TSEQYRGDGI AARGPGYGIM SIR-VDGNDV FAVYNATKEA RRRAVAENQP 2r8oA 193 vegwft-ddt amrfeaygwh virdidghda asikraveea ra---vtdkp TARGET s h hhhhhhh s sss sss h hhhhhhhhhh h s 2r8oA s h hhhhhh s ss sss h hhhhhhhhhh hh s TARGET 324 FLIEAMTYRI GHHSTSDDSS ----AYRSVD EVNYWDKQ - ---------- 2r8oA 239 sllmcktiig fgspnkagth dshgaplgda eialtreqlg wkyapfeips TARGET sssssss hh hhhhhhhh 2r8oA sssssss hh hh hhhhhhhhh h
iTasser
2bfd_A |
2bfd_A |
Prediction for 2bfd
Seq SSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKLYKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSA Pred ccccccccccccccccccccccccccccccccSSSSSccccccccccccccccHHHHHHHHHHHHHHHHHHHHHHHHHHcccccccccccccHHHHHHHH Conf 9867789999988665555664786666789768888999988884236898999999999999999999999999996798467658877389999999 Seq AALDNTDLVFGQYREAGVLMYRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKRANANRVVICYFGEGAASEGD Pred HHcccccSSScccHHHHHHHHccccHHHHHHHHHccccccccccccccccccccccccccccHHHccHcHHHHHHHHHHHcccccSSSSSSccccccccc Conf 9769989775570357899837998999999983777788989998673426212872246336336308999999999709998899994577444210 Seq AHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIAARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAY Pred HHHHHHHHHHHcccSSSSSScccSSccccHHHHHccccHHHHcHcccccSSSSccccHHHHHHHHHHHHHHHHcccccSSSSSSSSSccccccccccccc Conf 9999999999679979999559821467788772698789843106988689769479999999999999998189988999998750686788998667 Seq RSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNPNLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK Pred ccHHHHHHHHHcccHHHHHHHHHHHcccccHHHHHHHHHHHHHHHHHHHHHHHHcccccHHHHHHHHHccccHHHHHHHHHHHHHHHHccccccHHHHcc Conf 8999999998639869999999998799999999999999999999999999858998999999675318998899999999999996733188555249
Secondary structure elements are shown as H for Alpha helix,S for Beta sheet and c for Coil
Additionally iTasser predicts several different models and presents the top five. To predict these models it uses a lot of templates. ITasser searchs the templates itself and also evaluates which one is the best.
2.Evaluation of models
A detailed description of how the created models were evaluated can be found in the Evaluation Protocol. The following section presents only the modelling and evaluation results.
Modeller
Numeric evaluation
template | molpdf | DOPE score | GA341 score |
---|---|---|---|
2R8O | 11049.43 | -7610.51 | 0.00000 |
2BFD | 2247.36 | -41979.05 | 1.00000 |
1DTW, 1GPU, 2BFB, 2BFD, 2BFE, 2O1X, 2R8O | 13873.63 | -43399.59 | 1.00000 |
The DOPE (Discrete Optimized Protein Energy) score is calculated to assess homology models. The lower the value of the DOPE score the better the . This can be also seen in our three models. The first one (2r8o) which has the worst sequence identity has a quite high DOPE score. The model where 2bfd was the template has a very low score which is reasonable since 2bfd had a very high sequence identity. It is interesting that the model which is build with 7 templates has a higher score than the one which is only build with 1bfd. This can be explained by the influence of the templates which have a low sequence identity with 1u5b.
GA341 is calculated to decide wether the result is a good model or not. A model which is quite good has a score near one. When a model has a score lower than 0.6 it is a bad model. This is also reflected by our results. The model with 2r8o as template is not a good model since the sewuence identity was low and also the DOPE score is quite high so it has a GA341 score of 0. This shows that it is a really bad model. The other two models have a GA341 score of one which shows that they are good models.
Comparison to experimental structure
experimental structure | model with template | C-alpha RMSD |
---|---|---|
1U5B_A | 2R8O_A | no value [4] |
1U5B_A | 2BFD_A | 1.1 [5] |
1U5B_A | 1DTW_A, 2BFE_A, 2BFB_A, 2BFD_A, 1GPU_A, 2O1X_A, 2R8O_A | 1.4 [6] |
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The first model with 2r8o as template has no C-alpha RMSD since the programm we used could find enough significant similarities because the structures are to dissimilar. The model build with 2bfd has a C-alpha RMSD score of 1.1. This is a very good score. It is interesting that again the model out of the 7 proteins have no better score (1.4). This shows that the model with 2bfd is the best one.
Swissmodel
Numeric evaluation
QMEAN4 global scores
QMEANscore4
2bfd_A | 2r8o_A |
---|---|
0.67 | 0.203 |
QMEANscore4 is calculated to compare whole models. The score ranges between 0 and 1. The higher the value the better is the quality of the model. By comparing the scores of the model with 2bfd as target and 2r8o as target it iat obvious that the first one os the better one since it has a much higher QMEANscore4.
QMEAN Z-Score
2bfd_A | 2r8o_A |
---|---|
-1.604 | -9.522 |
The QMEAN Z-Score represents the absolute quality of a model. Models with a low quality have a strongly negative QMEAN Z-scores. The 2bfd-model has a less negative score than the 2r8o-model which schos again that this model has a better quality.
Score components
2bfd_A | 2r8o_A |
---|---|
Local scores
2bfd_A | 2r8o_A |
---|---|
With the coloring by residue error the inaccuracy of each residue is esitmated . The results are visualised using a color gradient where blue means that assured region and red means that this region is unreliable.
In the model of 2bfd there are many blue alpha helices which means that they are right and only a few red coils. Since blue is the dominant color this shows that the model is mainly right. In contrast the other model has a lot of red and orange alpha helices and coils and nearly no blue region. This reflects the bad quality of this model.
The residue error plot shows the predicted error (y-axis) per residue (x-axis). The highest error score of the 2bfd-model is 12 and the average is about 3 whereas the highest peak score of the 2r8o-model is 15 and the average is about 5. Again it can be seen that the 2bfd-model is the better one.
Global scores: QMEAN4:
2bfd_A | 2r8o_A | |||
---|---|---|---|---|
Scoring function term | Raw score | Z-score | Raw score | Z-score |
C_beta interaction energy | -162.66 | 0.54 | 74.97 | -4.18 |
All-atom pairwise energy | -10811.93 | 0.35 | 2113.21 | -5.03 |
Solvation energy | -27.04 | -1.02 | 26.87 | -5.92 |
Torsion angle energy | -75.78 | -1.45 | 36.84 | -6.47 |
QMEAN4 score | 0.670 | -1.60 | 0.203 | -9.52 |
Local Model Quality Estimation
2bfd_A | 2r8o_A |
---|---|
For the local model quality estimation we chose the ANOLEA potential. This program performs energy calculations on a protein chain. On the y-axis the energyof each amino acid is represented. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid. The result of the comparison of this estimation between the 2bfd-model and the 2r8o-model is quite clear since nearly the whole left plot is green and nearly the whole right plot is red. These two plots show that the 2bfd-model is much better than the other one.
Comparison to experimental structure
experimental structure | model with template | C-alpha RMSD | TMscore |
---|---|---|---|
1U5B_A | 2BFD_A | 1.1 [7] | |
1U5B_A | 2R8O_A | 3.1 [8] | 0.1639 |
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The 2bfd-model has a score of 1.1 and the 2r8o-model has a score of 3.1. This comparison shows clearly that the first model is mcuh better than the second one.
iTasser
Numeric evaluation
C-score
2bffA | 2bfdA | 1dtwA | 1dtwA | 2bffA |
1.999 | -3.781 | -4.970 | -4.970 | -3.781 |
The C-score is a measure for the quality of predicted models by I-TASSER. C-score ranges between [-5,2], where a C-score of higher value signifies a model with a high confidence.
References
<references />
back to Maple syrup urine disease main page