Difference between revisions of "Homology based structure predictions BCKDHA"

From Bioinformatikpedia
(Comparison to experimental structure)
(Comparison to experimental structure)
(268 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== 1.Calculation of models ==
== 1.Calculation and evaluation of models ==
=== Template selection ===
=== Template selection ===
Line 9: Line 10:
It found the following 10 hits in the pdb70 database.
It found the following 10 hits in the pdb70 database.
{| border="1"
{| border="1" align="center"
Line 42: Line 43:
| 10 || 3m49_A Transketolase; alpha-be || 1.0 || 1 || 1 || 68.8 || 0.0 || 121 || 161-285 || 139-271 (690)||31%
| 10 || 3m49_A Transketolase; alpha-be || 1.0 || 1 || 1 || 68.8 || 0.0 || 121 || 161-285 || 139-271 (690)||31%
Before we can start working with these hits we have to check whether one of them is a PDB structure for BCKDHA. This is the case for 2bfd_A.<br>
* > 60% sequence identity: 2bfd_A<br>
By looking at our results and the fact that this hit can not be used we only have structures with an identity lower than 40%.
* > 40% sequence identity: <br>
Since there are just structures available from this region we decided to take two structures out of it. One with a 39% identity and one with 18% identity so that there is still a variation in the identities.<br>
* < 40% sequence identity (ideally go towards 20%) : 1qs0_A, 1umd_A, 1w85_A, 2r8o_A, 3m49_A, 2ozl_A, 1gpu_A, 3l84_A, 2o1x_A, 1w85_A
In the following we worked with '''1qs0_A''' (39%) and with ''' 2o1x_A''' (18%).
=== General information for the evaluation ===
A detailed description of how the created models were evaluated can be found in the [[Homology_based_Structure_prediction_protocol_BCKDHA#Evaluation | Evaluation Protocol]]. The following section presents only the modelling and evaluation results.
Three interesting scores when comparing two structures for their structural similarity are the '''C&alpha; RMSD''', '''the all-atom RMSD''' and the '''TM Score'''. These are three measures which are usually used to calculate the accuracy of modelling a structure when the native structure is known. In the following we will call the C&alpha; RMSD only RMSD.
HHSearch has only hits with an identity higher than 60% or lower than 40%.
The <b>RMSD</b> is the average distance of all residue pairs in two structures. The C-alpha RMSD is the average distance between aligned alpha-carbons.
The smaller the RMSD value, the better the predicted structure. A local error (e.g. missorientation of the tail) will result in a high RMSD value, although the global structure is correct.
These are the templates we will work with:
* > 60% sequence identity: '''2bfd_A'''
The <b>all-atom RMSD</b> is calculated of the residues which are in an area of 6 around the active sites. As in the C&alpha; RMSD the models which have a low value are the better ones.
* < 40% sequence identity (ideally go towards 20%): '''2r8o_A'''
As the RMSD is sensitive to the local error, the TM Score was proposed.
The <b>TM Score</b> weights close matches stronger than distant matches and therefore the local error problem is overcome. A TM Score <0.5 indicates a model with random structural similarity, whereas 0.5 < TM score < 1.00 means the two compared structures are in about the same fold and therefore the predicted model has a correct topology.
=== Modeller ===
=== Modeller ===
MODELLER is used for homology or comparative modelling of protein three-dimensional structures. It calculates a model containing all non-hydrogen atoms. There are also many other tasks provided by MODELLER like de novo modelling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, etc.[http://salilab.org/modeller/]
MODELLER is used for homology or comparative modelling of protein three-dimensional structures. It calculates a model containing all non-hydrogen atoms. There are also many other features provided by MODELLER like de novo modelling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, and so on.[http://salilab.org/modeller/]
A tutorial is provided on [http://salilab.org/modeller/tutorial/] and on [http://salilab.org/modeller/9.9/manual/node15.html]
A tutorial is provided on [http://salilab.org/modeller/tutorial/] and on [http://salilab.org/modeller/9.9/manual/node15.html]
To run modeller with more than one template we use the targets (the percentage values indicate the sequence similarity to the target):<br>
To run modeller with more than one template we use the targets (the percentage values indicate the sequence similarity to the target):<br>
*1dtw:A 95%
*3m49:A (31%)
*2bfe:A 94%
*2r8o:A (33%)
*2bfb:A 99%
*2o1x:A (18%)
*2bfd:A 99%
*1w85:A (34%)
*1gpu:A 22%
*1qs0:A (39%)
*2o1x:A 18%
*2r8o:A 33%
[[Modeller_protocol_BCKDHA#Modeller | Protocol Modeller]]
[[Homology_based_Structure_prediction_protocol_BCKDHA#Modeller | Protocol Modeller]]
==== Results ====
==== Numeric evaluation ====
[[File:WorkplaceSWISSMODEL.jpg | thumb |right | SWISS-MODEL server page]]<br>
To find protein structure homology models SWISS-MODEL can be used. As input it needs a protein sequence or a UniProt AC Code. Optional the template PDB-Id and the chain or a template file can be assigned.
SWISS-MODEL is a fully automated protein structure homology-modeling server. It is accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).
[http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1 SWISS-MODEL]
{|border="1" align="center"
[[Modeller_protocol_BCKDHA#Swissmodel | Protocol Swissmodel]]
!width="80" |molpdf
!width="80" |DOPE score
!width="80" |GA341 score
| 1QS0_A || 2650.7|| -40503.1|| 1.000
| 2O1X_A || 2958.0|| -30294.5 || 0.419
|2bfd_A || [http://swissmodel.expasy.org/workspace/index.php?userid=e.reisinger@gmx.net&key=8dffddca5aa40d1854b5adf8fa6d14c7&func=workspace_modelling&prjid=P000006 2bfd_A]
| 3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A || 123913.8 || -19573.7|| 0.001
|2r8o_A || [http://swissmodel.expasy.org/workspace/index.php?userid=carina.demel@googlemail.com&key=3ab69fac4253e37144f9fe6f98ca7c92&func=workspace_modelling&prjid=P000005 2r8o_A]
The DOPE ('''D'''iscrete '''O'''ptimized '''P'''rotein '''E'''nergy) score is calculated to assess homology models. The lower the value of the DOPE score the better the model. This can be also seen in our three models. 1qs0 which has the highest sequence identity, definitely has the lowest DOPE score which is obvious because of the high identity. The model where 2o1x was the template has a higher score which is reasonable since 2o1x has a sequence identity of 18% whereas the first model (1qs0) has a sequence identity of 39%. This shows that although both structures were in the group with the lower sequence identities there are noticeable differences between them.
The last model was build with five different structures. Normally this is helpful for the program because when more structures are included in the prediction, the model can be predicted more precisely as if there is a prediction with only one template structure. In this case the problem was that all structures which are combined to build a model had no high sequence identity. So the information Modeller got to build the model were not helpful. This is reflected in the scores because this model has the highest DOPE score.
So as expected the model with 1qs0 as template structure is the most homologous model.
GA341 is calculated to decide wether the result is a good model or not. A model which is quite good has a score near one. When a model has a score lower than 0.6 it is a bad model. This is also reflected by our results. The model with 1qs0 as template is a very good model since the GA341 score is 1.0. This is a bit strange since the sequence identity to our protein is not very good. Of course the DOPE score was good, too. But it can not be correct that a model with a template which has only 39% sequence identity has the best possible GA341 score. The other two models have a score lower than 0.6 which shows that both of them are bad. It is interesting that the model with the 5 templates only has a score of 0.001 which seems a bit too low because the average sequence identity of the used structures is higher than the one of 2o1x which has a GA341 score of 0.419. All in all we can conclude that the 1qs0 model is the most accurate one.
'''Prediction for 2bfd_A'''
To sum up the results of the two scores it is to say that although all of the structures have a low sequence identity the model with 1qs0 as template is the best one.
2bfdA 6 kpqfpgas aefidklefi qpnvisgipi yrvmdrqgqi inpsedphlp
TARGET sss ss s
2bfdA sss ss s
2bfdA 54 kekvlklyks mtllntmdri lyesqrqgri sfymtnygee gthvgsaaal
TARGET hhhhhhhhhh hhhhhhhhhh hhhhhhh h hhhhhhhh
2bfdA hhhhhhhhhh hhhhhhhhhh hhhhhhh h hhhhhhhh
2bfdA 104 dntdlvfgqa reagvlmyrd yplelfmaqc ygnisdlgkg rqmpvhygck
TARGET sss hhhhh hhhhhhhh h
2bfdA sss hhhhhh hhhhhhhh h
2bfdA 154 erhfvtissp latqipqavg aayaakrana nrvvicyfge gaasegdaha
TARGET hhhhhhh hhhhhhhh ssssssss hhh hhhh
2bfdA hhhhhhh hhhhhhhh ssssssss hhh hhhh
2bfdA 204 gfnfaatlec piiffcrnng yaistptseq yrgdgiaarg pgygimsirv
TARGET hhhhhhhh ssssssss hhhh hhh sssss
2bfdA hhhhhhhh ssssssss hhhh hhh sssss
2bfdA 254 dgndvfavyn atkearrrav aenqpfliea mtyrig---- ----------
TARGET ss hhhhhh hhhhhhhhhh hh sssss ss
2bfdA ss hhhhhh hhhhhhhhhh hh sssss ss
2bfdA 292 -------std hpisrlrhyl lsqgwwdeeq ekawrkqsrr kvmeafeqae
TARGET hhhhhhhh h hhh hhhhhhhhhh hhhhhhhhhh
2bfdA hhhhhhhh h hhh hhhhhhhhhh hhhhhhhhhh
2bfdA 354 rkpkpnpnll fsdvyqempa qlrkqqesla rhlqtygehy pldhfdk-
TARGET h h hhhhhhhhhh hhhhh
2bfdA h h hhhhhhhhhh hhhhh
==== Comparison to experimental structure ====
'''Prediction for 2r8o'''
{|border="1" align="center"
!experimental structure
2r8oA 2 ssrkelanai ralsmd--av qkaksghpga pmgmadiaev lwrdflkhnp
!model with template
!width="100" |RMSD (DaliLite)
TARGET h hhh hh hh hhhhh hhhh
!width="100" |RMSD (sap)
2r8oA hhhhhhhh hhhhhh hh hhh hh hh hhhhh hhhh
!width="100" |TM-score
|1U5B_A || 1QS0_A || 2.3 ||0.829||0.8504
2r8oA 50 qnpswadrdr fvlsnghgsm liysllhltg ydlpmeelkn frqlhsktpg
|1U5B_A || 2O1X_A || 3.5||2.727||0.1592
TARGET s ssss h hhhhhhhhh
2r8oA s ssss h hhhhhhhh hhhh
|1U5B_A ||3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A || no score ||11.398||0.1719
2r8oA 100 hpevgytagv etttgplgqg ianavgmaia ektlaaqfnr pghdivdhyt
TARGET h hhhhhhhhhh hhhh ss
2r8oA h hhhhhhhhhh hhhhhhhh ss
2r8oA 150 yafmgdgcmm egishevcsl agtlklgkli afyddngisi dghvegwftd
TARGET ssss hhhhh sss sssss
2r8oA ssss hhhh hhhhhhhh hhhh sss sssss sss sss
2r8oA 200 dtamrfeayg whvirdidgh daasikrave earavtdkps llmcktiigf
TARGET hhhhhhhhh hhhh s ssssss
2r8oA hhhhhhh sss sss hhhhhhhhh hhhh ss ssssss
2r8oA 250 gspnkagthd shgaplgdae ialtreqlgw kyapfeipse iyaqwdakea
TARGET h hhhh hhh
2r8oA h h hhh hhhhhhhh hh hhhh hhh
TARGET 223 GYAISTPTSE QYRGDG---- ---------- ---------- ----------
2r8oA 300 gqakesawne kfaayakayp qeaaeftrrm kgempsdfda kakefiaklq
TARGET hhhhhhhhhh
2r8oA hhhhhhhhhh hhhhhhh hhhhhhhhhh h hhhh hhhhhhhhhh
2r8oA 350 anpakiasrk asqnaieafg pllpeflggs adlapsnltl wsgskained
2r8oA h hhh hhhhhhhhhh hh ssssss s
TARGET 277 QPFLIEAM-- --------TY RIGHHS---- ---------- ----------
2r8oA 400 aagnyihygv refgmtaian gislhggflp ytstflmfve yarnavrmaa
TARGET h hhhh
2r8oA sss hhhhhhhhh hhhh ss ssss h hhhhhhh
2r8oA 450 lmkqrqvmvy thdsiglged gpthqpveqv aslrvtpnms twrpcdqves
2r8oA hh sssss ss hh hhhh s ss hhhh
2r8oA 500 avawkygver qdgptalils rqnlaqqert eeqlaniarg gyvlkdcagq
TARGET hhhhhhhhh
2r8oA hhhhhhhhh ssssss hhhh h s ssss
2r8oA 550 pelifiatgs evelavaaye kltaegvkar vvsmpstdaf dkqdaayres
TARGET sssss hhhhhhhhhh hhhhh sss ss hhhh h
2r8oA sssss hhhhhhhhhh hhhhh sss sssss hhhh h hhhhhh
TARGET ---------- ---------- ---------- ---------- ----------
2r8oA 600 vlpkavtarv aveagiadyw ykyvglngai vgmttfgesa paellfeefg
2r8oA h sss sss h hhh ss s hhhhhhhh
TARGET ---------- ----
2r8oA 650 ftvdnvvaka kell
2r8oA hhhhhhhh hh
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The first model using 2o1x as template has a RMSD score of 2.3 or 0.829. In both cases the value is lower than the ones of the other two models. Since a low value indicates for a good model this model is the best of the three according to the RMSD value.
The RMSD score for the 2o1x model is only a bit higher so it seems that this model is not much worse than the first one. For the third model DaliLite was not able to calculate a RMSD score at all because it could not find enough significant similarities because the structures are too dissimilar. This dissimilarity is reflected by the RMSD value which is calculated with the sap command because it is very high compared with the other two values of sap. An explanation for this bad result for the last model could be that there is too many false information used during the building process.
By comparing the TM scores of the three models with each other we can see that only one model has a value higher than 0.5 which means that only one model is significant good. The 1qs0 model has a TM score of 0.829 so it is declared to be a good model whereas the other two models have a TM score of about 0.1 which is far lower than 0.5 and that indicates that both models are useless.
'''all-atom RMSD'''
'''Prediction for 2r8o_A using the improved alignment'''
{|border="1" align="center"
!width="60" |position
!width="60" |1qs0
2r8oA 2 ssrkelan-- -----airal smdavqkaks ghpgapmgma diaevlwrdf
!width="60" |2o1x
!width="60" |multi
TARGET hhhh hhhhhhhh hhhhhhh
2r8oA hhhhhh hhhhh hhhhhhhh hhhh hhhhhhhhh
2r8oA 45 lkhnpqnpsw adrdrfvlsn ghgsmliysl lh-l--t--- ------gydl
TARGET hhhhhh sssssss s ssssss
2r8oA sssss hhhhhh hh h
2r8oA 83 pmeelknfrq lhsktpghpe vgytagvett tgplgqgian avgmaiaekt
TARGET hhhh hhhh hhhhhhhhhh
2r8oA hhhh hhhh hhhhhhhhhh
TARGET 137 KGRQMPVHYG CKERHFV--- ---------- -------TIS SPLATQ----
2r8oA 133 laaqfnrpgh divdhytyaf mgdgcmmegi shevcslagt lklgkliafy
2r8oA hhhhh sssss s hhhh h hhhhhhhhhh h ssssss
2r8oA 183 ddngisidgh vegwftddta mrfe--ayg- ----whvird idghdaasik
TARGET sss sss sss sss hhhhh
2r8oA ss sss ss s hh hhhh h sss sss hhhhh
2r8oA 226 raveearavt dkpsllmckt iigfgspnka gthdshgapl gdaeialtre
TARGET hhhhhhhh sssssss hhhhhhhhh
2r8oA hhhhhhhh ssssssss hh hhhhhhhhh
2r8oA 276 qlgwkyapfe ipseiyaqwd akeagqakes awnekfaaya kaypqeaaef
TARGET hh hhhhhh hhhhhhhhh h hhh
2r8oA hh hhhhhh hhhhhhhhh hhhhhhhhhh h hhhhhh
2r8oA 326 trrmkgemps dfdakakefi aklqanpaki asrkasqnai eafgpllpef
TARGET hhhhh hhhhhhhhhh hhhhh
2r8oA hhhhh hhhhhhhhhh hhhhh hhhhhhhhh hhhhhh ss
TARGET 320 WDEEQEK--- AWRKQSRR-- ---------- --KVMEAFEQ ----------
2r8oA 376 lggsadlaps nltlwsgska inedaagnyi hygvrefgmt aiangislhg
2r8oA sssss s ss hhhhh hhhhhhhh
TARGET 343 ---------- --------AE RKPK------ ---------- ---------P
2r8oA 426 gflpytstfl mfveyarnav rmaalmkqrq vmvythdsig lgedgpthqp
2r8oA ssssss h hhh hhhhhh s ssssss
2r8oA 476 veqvaslrvt pnmstwrpcd qvesavawky gverqdgpta lilsrqnlaq
TARGET hhhhhh hhhhhhhhhh hhh
2r8oA hhhhhh sss hhhhhhhhhh hhh sss sss
TARGET 390 ---------- ---------D K -------- ---------- ----------
2r8oA 526 qerteeqlan iarggyvlkd cagqpelifi atgsevelav aayekltaeg
2r8oA hhhh h sssss ssss s hhhhhh hhhhhhhhh
TARGET ---------- ---------- ---------- ---------- ----------
2r8oA 576 vkarvvsmps tdafdkqdaa yresvlpkav tarvaveagi adywykyvgl
2r8oA ssssssss hhhhh hh hhhhh ssssss h hhh
TARGET ---------- ---------- ---------- --------
2r8oA 626 ngaivgmttf gesapaellf eefgftvdnv vakakell
2r8oA sss hhhhh hhh hhhh hhhhhh
=== iTasser ===
{| border="1"
|[http://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S72726/ 2bfd_A]
|166||0.668 ||3.697||3.208
|167||0.656||6.759 ||7.962
Additionally we calculated the all-atom RMSD scores for the three catalytic centers of the three models. Like in all the other scores above we can notice that the model with 1qs0 as template is the best one. This is pointed out by the fact that at all three catalytic centers the all-atom RMSD values are the lowest ones. There is one interesting observation by comparing the values of the other two models because at the first catalytic center the model with 2o1x has a much worse score than the model with the five structures as template. At the second center the score of the 2o1x model is just a little bit lower and at the third center it is even higher. So by looking at the all-atom RMSD valus it can not be decided wether the second or the third model is the better one.
'''Prediction for 2bfd'''
Pred ccccccccccccccccccccccccccccccccSSSSSccccccccccccccccHHHHHHHHHHHHHHHHHHHHHHHHHHcccccccccccccHHHHHHHH
Conf 9867789999988665555664786666789768888999988884236898999999999999999999999999996798467658877389999999
Pred HHcccccSSScccHHHHHHHHccccHHHHHHHHHccccccccccccccccccccccccccccHHHccHcHHHHHHHHHHHcccccSSSSSSccccccccc
Conf 9769989775570357899837998999999983777788989998673426212872246336336308999999999709998899994577444210
Conf 9999999999679979999559821467788772698789843106988689769479999999999999998189988999998750686788998667
Conf 8999999998639869999999998799999999999999999999999999858998999999675318998899999999999996733188555249
'''Prediction for 2r8o'''
{| align="center"
|[[File:1ub5_1qs0Superposed.png|thumb|260px|Figure1: Superimposed structures of 1U5B and the modeller model with template 1QS0]]
||[[File:1u5b_2o1xSuperposed.png|thumb|260px|Figure2: Superimposed structures of 1U5B and the modeller model with template 2O1X]]
||[[File:1u5b_multi_superposed.png|thumb|260px|Figure3: Superimposed structures of 1U5B and the modeller model with more than one template]]
Conf 9867789999988665555664786666789768888999988884236898999999999999999999999999996798467658877389999999
Conf 9769989775570357899837998999999983777788989998673426212872246336336308999999999709998899994577444210
Conf 9999999999679979999559821467788772698789843106988689769479999999999999998189988999998750686788998667
Conf 8999999998639869999999998799999999999999999999999999858998999999675318998899999999999996733188555249
All the calculated scores above declare the model which has the structure of 1qs0 as template beeing the best model. By looking at the visulization ([[:File:1ub5_1qs0Superposed.png| Figure 1]]) the assertion of all these scores can be approved. As we can see especially the alpha helices are quite good aligned although there are some which are not aligned. Another fact which shows that the two structures are not perfectly aligned is that on the left and right side of the superposition are two structures which are completely not aligned. But all in all it seems that the model is compatible with our protein. Expecially by comparing it with the two other models. The 2o1x model which is visualized in [[:File:1u5b_2o1xSuperposed.png| Figure 2]] has no aligned structure so that it appears that there are two completely different structures superposed. This impression is supported by the calculated scores above which show that the model using 2o1x as template does not fit very good. This also applies to the third model. As we can see in [[:File:1u5b_multi_superposed.png| Figure 3]] there is no match between the two structures and so there is also no aligned structure. Again this result could be suspected because of the bad evaluation scores.
[http://swissmodel.expasy.org/workspace/index.php?func=modelling_simple1 SWISS-MODEL]
Secondary structure elements are shown as H for Alpha helix,S for Beta sheet and c for Coil
[[File:WorkplaceSWISSMODEL.jpg | thumb |right |Figure4: SWISS-MODEL server page]]<br>
To find protein structure homology models SWISS-MODEL can be used. SWISS-MODEL is a fully automated protein structure homology-modeling server and is accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer). <br>
It provides three different modelling modes: <br>
* Automated Mode
* Alignment Mode
* Project Mode <br>
The Automated Mode uses fully automated modelling and can therefore be only used when the template is very similar to the target.<ref>http://swissmodel.expasy.org/?pid=smd03&uid=&token=</ref>
As an Input for the automated mode, only an amino acid sequence (raw or FASTA format) or the Uniprot AC of the target is required as it is show in [[:File:WorkplaceSWISSMODEL.jpg| Figure 4]]. Optional a template PDB id can be given. Swissmodel automatically selects templates from a Blast run which are suitable due to their E-values if no template is given.
The Alignment Mode has to be used for the structures with a low identity. Since we only have hits in the region < 40% we used this tool. <br>
[[Homology_based_Structure_prediction_protocol_BCKDHA#Swissmodel | Protocol Swissmodel]]
Additionally iTasser predicts several different models and presents the top five. To predict these models it uses a lot of templates. iTasser searches the templates itself and also evaluates which one is the best.
== 2.Evaluation of models ==
=== General ===
==== Results ====
A detailed description of how the created models were evaluated can be found in the [[Modeller_protocol_BCKDHA#Evaluation | Evaluation Protocol]]. The following section presents only the modelling and evaluation results.
==== Prediction ====
Two interesting score when comparing two structures for their structural similarity are the '''RMSD''' and the '''TM-Score'''. These are two measures which are usually used to measure the accuracy of modelling a structure when the native structure is known.
{| align="center"
The RMSD is the average distance of all residue pairs in two structures. The C-alpha RMSD is the average distance between aligned alpha-carbons.
The smaller the RMSD value, the better the predicted structure. A local error (e.g. misorientation of the tail) will result in a high RMSD value, although the global structure is correct.
|[[File:1qs0_secondaryStructurePrediction_BCKDHA_swissmodel.png|thumb|250px|Figure5: Secondary structure prediction of swissmodel with 1qs0 as template]]
||[[File:2o1x_SecondaryStructurePrediction_BCKDHA_Swissmodel.png|thumb|170px|Figure6: Secondary structure prediction of swissmodel with 2o1x as template]]
Swissmodel predicts the secondary structure of our protein with 1qs0 and 2o1x as templates. By looking at the visualization of the secondary structure in [[:File:1qs0_secondaryStructurePrediction_BCKDHA_swissmodel.png| Figure 5]] and [[:File:2o1x_SecondaryStructurePrediction_BCKDHA_Swissmodel.png| Figure 6]] we can see that in both cases the mainly structural elements are alpha helices.
As the RMSD is sensitive to the local error, the TM-Score was proposed.
The TM-Score weights close matches stronger than distant matches and therefore the local error problem is overcome. A TM-Score <0.5 indicates a model with random structural similarity, wherease 0.5 < TM-score < 1.00 means the two compared structures are in about the same fold and therefore the predicted model has a correct topology.
=== Modeller ===
==== Numeric evaluation ====
==== Numeric evaluation ====
===== Global Model Quality Estimation =====
{|border="1" align="center"
!DOPE score
!GA341 score
|'''QMEANscore4'''||align="center"|0.57 ||align="center" | 0.18
| 2R8O || 11049.43 || -7610.51 || 0.00000
|rowspan="2"|'''QMEAN Z-Score'''||align="center"| -3.28||align="center"| -9.89
| 2BFD || 2247.36 || -41979.05 || 1.00000
|[[File:BCKDHA_1qs0_Plot1_Z-score.png | thumb|200px|Figure7: Comparison of the model with non-redundant set of PDB structures; the red x stands for the Z-score of this model ]]
| 1DTW, 1GPU, 2BFB, 2BFD, 2BFE, 2O1X, 2R8O ||13873.63 || -43399.59 || 1.00000
||[[File:BCKDHA_2o1x_Plot1_Z-Score.png|thumb|200px|Figure8: Comparison of the model with non-redundant set of PDB structures; the red x stands for the Z-score of this model]]
[[QMEAN_score_information_BCKDHA|Additional information about the QMEAN score]]
The DOPE ('''D'''iscrete '''O'''ptimized '''P'''rotein '''E'''nergy) score is calculated to assess homology models. The lower the value of the DOPE score the better the . This can be also seen in our three models. The first one (2r8o) which has the worst sequence identity has a quite high DOPE score. The model where 2bfd was the template has a very low score which is reasonable since 2bfd had a very high sequence identity. It is interesting that the model which is build with 7 templates has a higher score than the one which is only build with 1bfd. This can be explained by the influence of the templates which have a low sequence identity with 1u5b.
The QMEANscore4 is calculated to compare whole models. The score ranges between 0 and 1. The higher the value the better is the quality of the model. By comparing the score of the 1qs0 model with the score of the score of the 2o1x model it is obvious that the first one is the better one since it has a much higher QMEANscore4. But although it is better then the model with 2o1x as target it is not very good. It can be inferred from the score of only 0.18 that this model is useless.<br>
The QMEAN Z-Score represents the absolute quality of a model. Models with a low quality have a strongly negative QMEAN Z-scores. By looking at
[[:File:BCKDHA_1qs0_Plot1_Z-score.png| Figure 7]] and [[:File:BCKDHA_2o1x_Plot1_Z-Score.png| Figure 8]] we can see that the QMEAN Z-score of both models is negative and both are under the black or grey graph which is shown in the figures. The fact that both scores are negative indicates that both models are not of top quality. But by comparing the scores directly we can see that the model with 1qs0 as template has a score of -3.28 and the score of the model with 2o1x as template is -9.89 so it appears that the first model is little better than the other one. Both values can be found in the already mentioned plots.
GA341 is calculated to decide wether the result is a good model or not. A model which is quite good has a score near one. When a model has a score lower than 0.6 it is a bad model. This is also reflected by our results. The model with 2r8o as template is not a good model since the sewuence identity was low and also the DOPE score is quite high so it has a GA341 score of 0. This shows that it is a really bad model. The other two models have a GA341 score of one which shows that they are good models.
{|border="1" align="center"
==== Comparison to experimental structure ====
!experimental structure
!model with template
!RMSD (DaliLite)
!RMSD (sap)
|1U5B_A || 2BFD_A || 1.1 || 0.442|| 0.3526 ||[[File:Sup_modeller_2bfd.png|thumb|100px|Superimposed structures of 1U5B and the modeller model with template 2bfd]]
|1U5B_A || 2R8O_A || no value || 95.095|| 0.1749 ||[[File:Sup_modeller_2r8o.png|thumb|100px|Superimposed structures of 1U5B and the modeller model with template 2r8o]]
|[[File:BCKDHA_1qs0_errorPlot.png|thumb|200px |Figure9: Estimated per-residue in accuracies along the sequence for the 1qs0 model]]
|1U5B_A ||1DTW_A, 2BFE_A, 2BFB_A, 2BFD_A, 1GPU_A, 2O1X_A, 2R8O_A || 1.4 ||0.396 ||0.3596 ||[[File:Sup_modeller_all.png|thumb|100px|Superimposed structures of 1U5B and the modeller model with more than one template]]
||[[File:BCKDHA_1qs0_structure.jpg|thumb|180px|Figure10: Coloring of the 1qs0 modell by residue error ]]
||[[File:BCKDHA_2o1x_errorPlot.png|thumb|200px |Figure11: Estimated per-residue in accuracies along the sequence for the 2o1x model]]
||[[File:BCKDHA_2o1x_structure.jpg|thumb|180px |Figure12: Coloring of the 2o1x modell by residue error]]
The plots in [[:File:BCKDHA_1qs0_errorPlot.png| Figure 9]] and [[:File:BCKDHA_2o1x_errorPlot.png| Figure 11]] show the confidence of SWISSMODEL for each residue of the built models. In the plot of the model with 1qs0 as template ([[:File:BCKDHA_1qs0_errorPlot.png| Figure 9]]) we can see that here the program is only unsure about the beginning and the end of the model. In the middle of the model there is also a peak indicating that those residues are not modeled with complete certainty. For the rest of the model the program predicts a low inaccuracy probability. The plot of the 2o1x model ([[:File:BCKDHA_2o1x_errorPlot.png| Figure 11]]) is the complete opposite. Here we can see that there are many very high peaks in the middle of the protein which suggests that the programm predicts for the middle part of the model which is more important than the ends a very high inaccuracy. When a model is wrong in the middle part it is useless and since there are so high peaks in the middle part it is possible that this model is useless.
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The first model with 2r8o as template has no C-alpha RMSD since the programm we used could find enough significant similarities because the structures are to dissimilar. The model build with 2bfd has a C-alpha RMSD score of 1.1. This is a very good score. It is interesting that again the model out of the 7 proteins have no better score (1.4). This shows that the model with 2bfd is the best one.
The same conclusion can be found in [[:File:BCKDHA_1qs0_structure.jpg| Figure 10]] which is the visualization of the 1qs0 model and in [[:File:BCKDHA_2o1x_structure.jpg| Figure 12]] which is the visualization of the 201x model. In both figures we can see the coloured model. The region which is blue stands for a high assurance and when a region is red it is meant that this part is in all probability wrong. When we look at the model with 1qs0 as template we can see that there are only a few red parts and they are mainly in the end of the protein whereas in the center the parts are coloured blue or green which shows that these parts of the model are probably correct. In contrary to this picture the model of 2o1x is nearly completely red which supports the assertion that this model is useless because nearly the complete model is predicted to be possibly wrong.
===== Local Model Quality Estimation: Anolea / QMEAN =====
'''improved alignment'''
The model which was build with 2r8o was so bad that it was not possible for DaliLite to predict a C-alpha RMSD. So we had to improve it.
{|border="1" align="center"
For this improvement we load the alignment of 1u5b and 2r8o in Jalview <ref>http://www.jalview.org/download.html</ref> to compare the two sequences. To find more equal residues in both sequences we deleted some gaps and checked the Consensus-line to find the amino acids which are in both sequences. With this handmade alignment we repeated the MODELLER-run. To evaluate the resulting model we calculated the C-alpha RMSD and the TMscore.
!C-alpha score
|2r8o || 3.1 || 0.1740 ||[[File:Sup_modeller_2r8o_verbessert.png|thumb|100px|Superimposed structures of 1U5B and the modeller model with the improved alignment for template 2r8o]]
| [[File:BCKDHA_1qs0_Annolea.png|thumb | 250px|Figure13: Local Model Quality Estimation with Anolea and QMEAN for the 1qs0 model]]
||[[File:BCKDHA_2o1x_Annolea.png|thumb|250px|Figure14: Local Model Quality Estimation with Anolea and QMEAN for the 2o1x model]]
For the local model quality estimation we chose the ANOLEA potential. This program performs energy calculations on a protein chain. On the y-axis the energy of each amino acid is represented. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid.
As we can see the improvement of the alignment was successful since the model has a much better C-alpha score. In comparison to the C-alpha scores of the other modeller results, this model with the smallest sequence identity still performs worst. The TM-score also gets a little bit smaller compared to the unimproved alignment, indicating that the overall model did not improve.
By looking on both plots we can see that there are many red parts so both of them are perhaps not completely correct. But when we analyse the two figures seperately we can see that the energy calculation for the 1qs0 model ([[:File:BCKDHA_1qs0_Annolea.png| Figure 13]]) contains a few green parts which shows that there are some favourable energy environments in the center of the protein and the part with a bad energy environment is only in the beginning of the protein. So we can deduce that this protein is perhaps correct in the important middle part. The other plot for the 2o1x model ([[:File:BCKDHA_1qs0_Annolea.png| Figure 14]]) is completely red. Not only in the beginning but also in the important middle part of the model. This can denote that this model is probably not useful.
==== Comparison to experimental structure ====
=== Swissmodel ===
==== Numeric evaluation ====
{|border="1" align="center"
''' QMEAN4 global scores '''
!experimental structure
!model with template
!width="100" |RMSD (DaliLite)
!width="100" |RMSD (sap)
!width="100" |TM score
|1U5B_A || 1QS0_A || 3.4 ||0.766 || 0.8771
|1U5B_A || 2O1X_A || 3.3 || 14.305 || 0.1686
|0.67 ||0.203
The RMSD is a measure of the average deviation of the distance between aligned alpha-carbons. The higher this distance value the worse is the model. To calculate the RMSD we used two different programs. Usually the results of both are not the same but they have the same trend. In this case it is different. By comparing the RMSD scores calculated by DaliLate which can be looked up in the table above the 1qs0 model is 0.1 higher than the score of the 2o1x model. So it appears that the model with the 2o1x structure as template is a bit better. But when we compare the scores calculated by the sap command the result is completely different. The 1qs0 has a value of 0.8771 and the 2o1x model has a value of 14.305. Following these results it is obvious that the 1qs0 model is much better which is the opposite to the other RMSD conclusion. But in this case the difference between the scores of the two models is much more significant than in the other case so it can be reasoned that the model with 1qs0 as template is the better model. To confirm this assumption we analyse the TM score. When the TM score is higher than 0.5 it can be said that a model is good. This is not the case for the 2o1x model since it has a score of 0.1686 which is really low. We can argue from this value that the model is bad. In contrary to the model with 1qs0 as it has a score of 0.8771 and so it is declared to be a good model. The conclusion of the TM score supports the one of the RMSD score so it can be said that all in all the 1qs0 model is the better one.
QMEANscore4 is calculated to compare whole models. The score ranges between 0 and 1. The higher the value the better is the quality of the model. By comparing the scores of the model with 2bfd as target and 2r8o as target it iat obvious that the first one os the better one since it has a much higher QMEANscore4.
'''all atom RMSD'''
{|border="1" align="center"
QMEAN Z-Score<br>
!width="60" |position
!width="60" |1QS0_A
!width="60" |2O1X_A
| -1.604 || -9.522
|[[File:Zscore_plot1_2bfd_A.png‎ | thumb | Z-Score plot1 2bfd_A]] || [[File:Zscore_plot1_2r8o_A.png | thumb | Z-Score plot1 2r8o_A]]
|[[File:Zscore_plot2_2bfd_A.png | thumb| Z-Score plot2 2bfd_A]] ||[[File:Zscore_plot2_2r8o_A.png | thumb | Z-Score plot2 2r8o_A]]
Additionally to the scores above we calculated the all-atom RMSD scores for the three catalytic center of the two models. The values of this score are definite. At all three catalytic centers the values for the model with 1qs0 as template are much better since low values stay for good models. The really high values for the 2o1x model indicate that this model is quite useless.
The QMEAN Z-Score represents the absolute quality of a model. Models with a low quality have a strongly negative QMEAN Z-scores. The 2bfd-model has a less negative score than the 2r8o-model which schos again that this model has a better quality.
{| align="center"
|[[File:1qs0_1u5b_swissmodel.png|thumb|260px|Figure15: Superposition of the Swissmodel model using template 1qs0 and target 1U5B]]
|| [[File:2o1x_1u5b_swissmodel.png|thumb|260px|Figure16: Superposition of the Swissmodel model using template 2o1x and target 1U5B]]
The calculated RMSD score, TM score and all-atom RMSD score indicate that the model with 1qs0 as template is the better one and that the other model is quite useless. To check these conclusions we superposed the two models with the structure of our protein. In the visualization of the 1qs0 model superposition in [[:File:1qs0_1u5b_swissmodel.png| Figure 15]] we can see that there are only a few regions of the two structures which could be superposed completely. But the main part of the model is shifted a bit so that the secondary structure elements lay next to each other. This observation shows that the model is just an approximation of the structure but it is not perfect. The visualization of the superposition of the 2o1x model in [[:File:2o1x_1u5b_swissmodel.png| Figure 16]] reflects completely the conclusion we made by analysing the scores above. The model is useless as there is no region which could be superposed perfectly and it looks like a superposition of two completely different structures. <br>
Score components<br>
To summerize the results of the numeric evaluation and of the comparison to experimental structure we can say that the model with 2o1x can not be used for further analysis since there is no similarity between out protein and this model. The 1qs0 model is not that bad since it has quite good scores which show that it is a good model but by looking at the visualization we see that it has indeed the same structure but it is shifted a bit. So we can work with this model but the results which base on this model won`t be completely correct.
=== iTasser ===
==== Results ====
==== Prediction ====
|[[File:1qs0_SecondaryStructurePrediction_iTasser_BCKDHA.png|thumb|250px|Figure17: Secondary structure prediction of iTasser with 1qs0 as template]]
|[[File:Score_components_2bfd_A.png| thumb |score components 2bfd_A]] || [[File:Score_components_2r80_A.png | thumb |score components 2r8o_A]]
||[[File:2o1x_SecondaryStructurePrediction_iTasser_BCKDHA.png|thumb|250px|Figure18: Secondary structure prediction of iTasser with 2o1x as template]]
The secondary structure predictiosn of iTasser for our protein with 1qs0 ([[:File:1qs0_SecondaryStructurePrediction_iTasser_BCKDHA.png| Figure 17]]) and 2o1x ([[:File:2o1x_SecondaryStructurePrediction_iTasser_BCKDHA.png| Figure 18]]) as templates contain mainly alpha helices. The fact that there are mainly alpha helices in the structure agrees with the prediction of swissmodel.
==== Numeric evaluation ====
''' Local scores '''
{|border="1" align="center"
| model1 || model2 || model3 || model4 || model5||model1 || model2 || model3 || model4 || model5
|[[File:Coloring_by_residues_error_2bfd_A.png | thumb |Coloring by residue error 2bfd_A]] || [[File:Coloring_by_residue_error_2r80_A.png |thumb |Coloring by residue error 2r8o_A]]
|1.174 || -0.190 || -0.718 || 0.200 || -5 || -0.150|| -1.276|| -1.863|| -2.155 || -3.208
|[[File:Residue_error_plot_2bfd_A.png | thumb | Residue error plot 2bfd_A]] || [[File:Residue_error_plot_2r80_A.png | thumb | Residue error plot 2r8o_A]]
The C-score is a measure for the quality of predicted models by I-TASSER. C-score ranges between [-5,2], where a C-score of higher value signifies a model with a high confidence. First the five models with 1qs0 were analysed. Model1 has a score of 1.174 which is a high value at this chart so the quality of this model seems to be good. The only other model which also has a positive score is model4 with 0.200. This is not as high as the score of model1 but it is positive enough to say that this is also a good model. Model2 has a negative score of -0.190 but this value is still much higher than -5 so it is still high enough that it can be declared as a good model. Model3 has a C-score of -0.718. This score is nearly in the middle of the chart which indicates that this model is possibly false. The last model is quite interesting since all the other models had not so bad scores but this model has the worst possible score of -5. So it is clear that this model is absolutely useless. Now the models with 2o1x as template are analysed. None of the C-scores is positive which demonstrated that these five models are obviously not very good. The best of the five models is model1 since it has a score of -0.150 which is not very negative. By looking at the scores of the other four models it has to be said that all of them can not be good models because the C-score ranges between -1.276 and -3.208. To summarize the C-scores of the ten models only model1 and model4 which have 1qs0 as template have positive scores. This indicates that only these two models are useful to work with.
With the coloring by residue error the inaccuracy of each residue is esitmated . The results are visualised using a color gradient where blue means that assured region and red means that this region is unreliable.
In the model of 2bfd there are many blue alpha helices which means that they are right and only a few red coils. Since blue is the dominant color this shows that the model is mainly right. In contrast the other model has a lot of red and orange alpha helices and coils and nearly no blue region. This reflects the bad quality of this model.
==== Comparison to experimental structure ====
The residue error plot shows the predicted error (y-axis) per residue (x-axis). The highest error score of the 2bfd-model is 12 and the average is about 3 whereas the highest peak score of the 2r8o-model is 15 and the average is about 5. Again it can be seen that the 2bfd-model is the better one.
{|border="1" align="center"
!width="100" |RMSD (DaliLite)
!width="100" |RMSD (sap)
!width="100" |TM score
!width="100" |RMSD (DaliLite)
!width="100" |RMSD (sap)
!width="100" |TM score
|1 || 2.2 || 0.869||0.8539 || 3.3 || 2.671|| 0.5377
|2 || 1.9 || 0.834|| 0.8627 || 1.6 || 1.056|| 0.8598
|3 || 2.1 || 0.940 ||0.8437 || 3.0 || 2.354|| 0.4688
|4 || 2.2 || 0.880 || 0.8523 || 4.0 || 2.840|| 0.4904
|5 || 2.4 || 0.984 || 0.8363 || 3.3 || 3.123|| 0.4938
The RMSD is a measure of the average deviation of the distance between aligned alpha-carbons. The higher this distance value the worse is the model. We calculated the RMSD score with two different programs so that we can see it if there is a strange calculation in one of the results and that we can compare the two RMSDs. The other calculated score is the TM score. When it is between 0.5 and 1.0 then the predictec model has the correct topology. In the first analysis we will just look at the models with 1qs0 as template. By comparing the scores of the five models with each other it is conspicuously that all of them have nearly the same value. It doesn't matter which RMSD score is considered. In both cases all the scores differ only minimal. When we go into more detail by looking at the DaliLite-RMSD score we recognize that model3 and model5 have a score which is a bit higher but not significant. So we can say that all five models have a well predicted structure. To get more information about the models to make a better statement we also analyzed the TM score. But here we have got the same result as with the RMSD score. All five TM scores are quite the same and are all higher than 0.5. So we can conclude considering the RMSD scores and the TM score that these five models are all very well predicted and that there is nearly no difference between them. <br>
''' Global scores: QMEAN4: '''
The next analysis is of the models which have 2o1x as template. By comparing the scores of the different models we can see that here is more divergence. Only the model2 seems to be a good model because it has low RMSD values and also the TM score is far over 0.5. The only other model which has a TM score over 0.5 is model1 but it has quite high RMSD values compared to the other models. Model3, 4 and 5 have all high RMSD scores which shows that their prediction is unconfident. Additionally all of them have a TM score which is lower than 0.5 so it is possible that their topology is not correct. <br>
Out of all the results we can conclude that the five models which are build with the help of 1qs0 are all very good and useful and of the other five models only the second one seems to be well predicted and usefull.
'''all atom RMSD'''
{|border="1" align="center"
!colspan="2" | 2bfd_A
!colspan="2" | 2r8o_A
!Scoring function term
!width="50" |161
!Raw score
!width="50" |166
!width="50" |167
!Raw score
!width="50" |161
!width="50" |166
!width="50" |167
|C_beta interaction energy || -162.66 || 0.54 || 74.97|| -4.18
|1||0.739 || 0.826||0.786||1.009||1.542 ||1.807
|All-atom pairwise energy || -10811.93 || 0.35 || 2113.21 || -5.03
|Solvation energy || -27.04 || -1.02 || 26.87 || -5.92
|2||0.700 ||0.759||0.590||0.592 ||0.771 ||0.581
|Torsion angle energy || -75.78 || -1.45 || 36.84 || -6.47
|3||1.177 ||0.786||0.844 ||2.363||4.685||5.078
|QMEAN4 score || 0.670 || -1.60 || 0.203 || -9.52
|5||0.739||0.926||0.830||1.609 ||1.174|| 3.539
To calculate the RMSD of the 6A radius of the catalytic center we had to find the catalytic center first. There are three catalytic center on the positions 161, 166 and 167. We calculated the RMSD for all of them. We start with the analysis for the 1qs0 models. Here we can see that there are difference between the five different models although all of them have good values. To go into more detail it has to be said that the second model has the lowest values on each position so it is the most accurate one. Model1 also has good values but they are not as good as the ones of Model2. By looking at the other three models we can see that their values are still good but they are a bit higher than the ones of model1 and 2. By analysing the models built with 2o1x as template we can see that model2 has not only lower values than the other 2o1x models but has the lowest values of all models. So we can say that according to the all-atom RMSD model2 with 2o1x as template is the best model. This model is the only one of the models built with the help of 2o1x which is profitably. All the other models have quite high values up to 5.078 so it is not possible to work with them.
''' Local Model Quality Estimation '''
|[[File:Model1_1qs0_BCKDHA.png|thumb|200px|Figure19: iTasser model 1 for template 1qs0 superimposed on target 1U5B]]
|[[File:Local_Model_Quality_Estimation_2bfd_A.png | thumb | Local Model Quality Estimation 2bfd_A]] || [[File:Local_Model_Quality_Estimation_2r80_A.png | thumb | Local Model Quality Estimation 2r8o_A]]
||[[File:Model2_1qs0_BCKDHA.png|thumb|200px|Figure20: iTasser model 2 for template 1qs0 superimposed on target 1U5B]]
||[[File:Model3_1qs0_BCKDHA.png|thumb|200px|Figure21: iTasser model 3 for template 1qs0 superimposed on target 1U5B]]
||[[File:Model4_1qs0_BCKDHA.png|thumb|200px|Figure22: iTasser model 4 for template 1qs0 superimposed on target 1U5B]]
||[[File:Model5_1qs0_BCKDHA.png|thumb|200px|Figure23: iTasser model 4 for template 1qs0 superimposed on target 1U5B]]
|[[File:Model1_BCKDHA_itasser_2o1x.png|thumb|200px|Figure24: iTasser model 1 for template 2o1x superimposed on target 1U5B]]
||[[File:Model2_2o1x_itasser_BCKDHA.png|thumb|200px|Figure25: iTasser model 2 for template 2o1x superimposed on target 1U5B]]
||[[File:Model3_2o1x_itasser_BCKDHA.png|thumb|200px|Figure26: iTasser model 3 for template 2o1x superimposed on target 1U5B]]
||[[File:Model4_2o1x_itasser_BCKDHA.png|thumb|200px|Figure27: iTasser model 4 for template 2o1x superimposed on target 1U5B]]
||[[File:Model5_2o1x_itasser_BCKDHA.png|thumb|200px|Figure28: iTasser model 4 for template 2o1x superimposed on target 1U5B]]
Since the above discussed results are not definite we have to look at the superpositions of the model with the structure of our protein.
For the local model quality estimation we chose the ANOLEA potential. This program performs energy calculations on a protein chain. On the y-axis the energyof each amino acid is represented. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid. The result of the comparison of this estimation between the 2bfd-model and the 2r8o-model is quite clear since nearly the whole left plot is green and nearly the whole right plot is red. These two plots show that the 2bfd-model is much better than the other one.
As in the previous analysis we start with the models of 1qs0. In [[:File:Model1_1qs0_BCKDHA.png| Figure 19]] the superposition with model1 is visualized and we can see that the model has the same structure as our protein but it is shifted a bit. This observation agrees with the assumption that the model is quite good but not perfect. In [[:File:Model2_1qs0_BCKDHA.png| Figure 20]] model2 is shown which is according to the scores a really good model. In fact there are structural elements which can be superposed perfectly but there are also parts which are shifted or can not be superposed at all. So we have to conclude that in this case the model seems to be not as good as the scores predicted.
According to the scores model3 [[:File:Model3_1qs0_BCKDHA.png| Figure 21]] is not as good as the other two models mentioned above. This can be supported by the visualization since there are many regions which are shifted or can not be superposed at all. Model4 which is shown in [[:File:Model4_1qs0_BCKDHA.png| Figure 22]] actually has a bit worse scores than model3 and this difference can also be seen articulately in the picture. Model5 is predicted to be the worst models of all because it has bad scores compared to the other models. By looking at the superposition of the structures in [[:File:Model5_1qs0_BCKDHA.png| Figure 23]] this result can be affirmed since no perfect superposed structural element can be seen.<br>
The qualitiy of the models with 2o1x as template is very sure. The calculated scores show that model2 is a very good one. To check this it is helpful to look at the visualization of the superposition of model2 and the structure of BCKDHA in [[:File:Model2_2o1x_itasser_BCKDHA.png| Figure 25]]. The overlay is not perfect but seems to be shifted in most parts of the model though there are the same structural elements which point in the same direction. By looking at the other four models ([[:File:Model1_BCKDHA_itasser_2o1x.png| Figure 24]],[[:File:Model3_2o1x_itasser_BCKDHA.png| Figure 26]],[[:File:Model4_2o1x_itasser_BCKDHA.png| Figure 27]],[[:File:Model5_2o1x_itasser_BCKDHA.png| Figure 28]]) we can see that all the models can not be superposed with the structure of our protein. This observation supports the already made assumption that all four models can not be used since they are too dissimilar to the structure of BCKDHA.
=== 3DJigsaw ===
==== Comparison to experimental structure ====
3DJigsaw is a server which builds protein models based on already predicted models for a specific target. It recombines the models and optimizes them.
!experimental structure
!model with template
!RMSD (DaliLite)
!RMSD (sap)
|1U5B_A || 2BFD_A || 1.1 ||0.288||
|1U5B_A || 2R8O_A || 3.1 || 2.110||0.1639
Since we have only models for the low sequence-identity category we started it only once with the best models of this category.
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The 2bfd-model has a score of 1.1 and the 2r8o-model has a score of 3.1. This comparison shows clearly that the first model is mcuh better than the second one.
The following models were chosen to build a recombined model with 3DJigsaw because of their high TM score:
* modeller model for template 1qs0
* swissmodel model for template 1qs0
* iTasser model 1 for template 1qs0
* iTasser model 2 for template 1qs0
* iTasser model 4 for template 1qs0
'''improved alignment'''
==== Results ====
!experimental structure
==== Prediction ====
!model with template
[[File:3dJigsaw_BCKDHA_SecondaryStructurePrediction.png|thumb|250px|center|Figure29: Secondary structure prediction of 3D-Jigsaw]]
!C-alpha RMSD
3D-Jigsaw also predicts mainly alpha helices for out protein with the help of the five previously built models. But as we can see in [[:File:3dJigsaw_BCKDHA_SecondaryStructurePrediction.png|Figure 29]] this tool predicts more beta sheets than Swissmodel or iTasser
|1U5B_A || 2R8O_A || || 0.1592
=== iTasser ===
==== Numeric evaluation ====
==== Numeric evaluation ====
{|border="1" align="center"
|width="70" |<b>Energy</b>||align="center" width="55"| -506.64||align="center" width="55"| -506.52||align="center" width="55"| -505.12||align="center" width="55"| -500.75||align="center" width="55"| -496.28
| model1 || model2 || model3 || model4 || model5
|<b>Coverage</b>||align="center" | 1.00||align="center" | 1.00||align="center" | 1.00||align="center" | 1.00||align="center" | 1.00
|1.999 || -3.781 || -4.970 || -4.970 || -3.781
The C-score is a measure for the quality of predicted models by I-TASSER. C-score ranges between [-5,2], where a C-score of higher value signifies a model with a high confidence.
3D-Jigsaw calculates the energy and the coverage for each predicted model. By comparing the coverage for each model we can see that the predicted model covers 100% of the model since the coverage for each model is 1.0. The energies are more different between the five predicted models. The lower the energy the better because a low energy indicates a stable model. Although the energies are different between the five models it is obvious that these differences are not significant. The first one indeed has the lowest energy but only minimal lower than the second or the third model. The energy of model 4 is about 5 points higher so we can say that the first three models are better than model 4 and 5. But all in all it can be said that both the energy calculation and the coverage predict all five models to be good.
==== Comparison to experimental structure ====
==== Comparison to experimental structure ====
{|border="1" align="center"
!RMSD (DaliLite)
!RMSD (sap)
!RMSD (DaliLite)
!RMSD (DaliLite)
!RMSD (sap)
!RMSD (sap)
!TM score
|1 || 0.9709 || 0.49 || 0.312 || [[File:Sup_iTasser_2bfd_model1.png|thumb|150px|iTasser model 1 for template 2bfd superimposed on target 1U5B]] || 0.5190 || 3.4 || 3.377 || [[File:Sup_iTasser_2r8o_model1.png|thumb|150px|iTasser model 1 for template 2r8o superimposed on target 1U5B]]
|2 || 0.8609 || 1.44 || 0.354 || [[File:Sup_iTasser_2bfd_model2.png|thumb|150px|iTasser model 2 for template 2bfd superimposed on target 1U5B]] || 0.4979 || 3.2 || 3.935 || [[File:Sup_iTasser_2r8o_model2.png|thumb|150px|iTasser model 2 for template 2r8o superimposed on target 1U5B]]
|3|| 0.8597 || 1.43 || 0.478 || [[File:Sup_iTasser_2bfd_model3.png|thumb|150px|iTasser model 3 for template 2bfd superimposed on target 1U5B]] || 0.4871 || 3.0 || 3.476 || [[File:Sup_iTasser_2r8o_model3.png|thumb|150px|iTasser model 3 for template 2r8o superimposed on target 1U5B]]
|4 || 0.8549 || 1.71 || 0.493 || [[File:Sup_iTasser_2bfd_model4.png|thumb|150px|iTasser model 4 for template 2bfd superimposed on target 1U5B]] || 0.5354 || 4.8 || 2.449 || [[File:Sup_iTasser_2r8o_model4.png|thumb|150px|iTasser model 4 for template 2r8o superimposed on target 1U5B]]
|5 || 0.8251 || 1.73 || 0.348 || [[File:Sup_iTasser_2bfd_model5.png|thumb|150px|iTasser model 5 for template 2bfd superimposed on target 1U5B]] || 0.5107 || 6.0 || 2.540 || [[File:Sup_iTasser_2r8o_model5.png|thumb|150px|iTasser model 5 for template 2r8o superimposed on target 1U5B]]
To find out how good the models are we calculate the RMSD twice with different tools. Due to the fact that a low RMSD value indicates a good model we can say that all five models seem to be very good. The RMSD calculated by DaliLite varies only between 1.8 and 2.1 which is a very small range. This is the same for the RMSD calculated by the sap-command. By looking at both values in more detail we can see that the first three models are as in the energy calculation a bit better than the last two models but not much. Additionally we calculated the TM score for each model. All of the five models have TM score higher than 0.5 which shows that all of the five models have a correct predicted topology. Again a cut can be seen between model 3 and 4 indicateing that, as the other scores suggested, the first three models are a bit better.
All of these models are very good which is shown by the table since they have all a high TMscore and a low C-alpha RMSD score. But this is clear because they are the top 5 hits of iTasser. Perhaps the first model is a bit better than the other 4. This can be expected since the Scores are a bit better than of the other 4 models.
==Comparison of the methods==
<b>all atom RMSD</b>
{|border="1" align="center"
|<b>161</b>||0.750||0.750 ||0.750||0.826||1.071
|<b>166</b>||0.709||0.709||0.709 ||0.784||0.738
|<b>167</b>||0.593||0.593 ||0.593||0.671 ||0.580
The calculated all atom RMSD scores affirm the assertion of the above discussed results. As we can see of the all atom RMSD the first three models are all equally good. Additionally there is again the cut after model 3. Model 4 and 5 are both a bit worse than the first three models. But by looking at the values we can see that all of them are quite low so it is obvious that all five models are good.
==== Superposition ====
|[[File:3dJigsaw_model1_BCKDHA.png|thumb|200px|Figure30: 3D-Jigsaw model 1 superimposed with the target 1U5B]]
||[[File:3dJigsaw_model2_BCKDHA.png|thumb|200px|Figure31: 3D-Jigsaw model 2 superimposed with the target 1U5B]]
||[[File:3djigsaw_model3_BCKDHA.png|thumb|200px|Figure32: 3D-Jigsaw model 3 superimposed with the target 1U5B]]
||[[File:3djigsaw_model4_BCKDHA.png|thumb|200px|Figure33: 3D-Jigsaw model 4 superimposed with the target 1U5B]]
||[[File:3djigsaw_model5_BCKDHA.png|thumb|200px|Figure34: 3D-Jigsaw model 5 superimposed with the target 1U5B]]
To check our conclusion that the first three models are a bit better than the other two but all in all the five models are all very good is correct we look at the superposition of the model with the correct structure of BCKDHA. [[:File:3dJigsaw_model1_BCKDHA.png| Figure 30]] shows model 1 and it is apparent that the prediction of the structure of this model was very good as most of the two structures can be aligned perfectly. Of course there are also some parts which are not aligned but that is due to the scores which are also not perfect. This score indicates that also model 2 is very similar to the real structure. This can be supported by the superposition ([[:File:3dJigsaw_model2_BCKDHA.png|Figure 31]]) since most of the protein is covered by the model. Again we can see that the superposition is not perfect but this was also expected. [[:File:3djigsaw_model3_BCKDHA.png| Figure 32]] shows the third of the models which is predicted by the scores to be very good. This prediction can be approved since the superposition of model 3 and the structure of BCKDHA is for most parts perfect. As in the two other models there are some regions which could not be aligned but that was expected again. The superpositions of model 4 and 5 with the structure of BCKDHA ([[:File:3djigsaw_model4_BCKDHA.png| Figure 33]] and [[:File:3djigsaw_model5_BCKDHA.png| Figure 34]]) reflect the cut between them and the first three models. Still the model covers the real structur very good but it seems that there are more regions which can not be aligned or that there are shifts between the two structures. With the observation of the superpositions the results dicussed above are supported.
== Comparison of the methods ==
=== Numerical Evaluation ===
The following tables list the RMSD and TM score values, which were computed before, to provide an overview of the performance of the different methods.
|<b>RMSD (sap)</b>||0.829|| 2.727|| 11.398
!C-alpha RMSD
!C-alpha RMSD
!C-alpha RMSD
|1.1|| 0.3526 || 3.1|| 0.1749 ||1.4||0.3596
|<b>TM score</b>|| 0.8504|| 0.1592 || 0.1719
|<b>RMSD (sap)</b>||0.766 ||14.305
!C-alpha RMSD
!C-alpha RMSD
|1.1|| 0.1640 ||3.1||0.1639
|<b>TM score</b>|| 0.8771 ||0.1686
'''iTasser '''
'''iTasser '''
|<b>RMSD (sap)</b>||0.869||0.834||0.940 ||0.880 ||0.984 ||2.671||1.056 ||2.354 ||2.840 ||3.123
|0.49||0.9709||1.44||0.8609 ||1.43||0.8597||1.71||0.8549||1.73||0.8251||3.4 ||0.5190 ||3.2 ||0.4979 ||3.0 ||0.4871 || 4.8 || 0.5354 ||6.0 ||0.5107
|<b>TM score</b>||0.8539 ||0.8627 ||0.8437 ||0.8523 ||0.8363 ||0.5377 ||0.8598 ||0.4688 ||0.4904 ||0.4938
'''3DJigsaw '''
|<b>TM score</b> ||0.8627 ||0.8626 ||0.8631 ||0.8539 ||0.8545
=== Discussion ===
To compare the predicted models and the real crystallized structure of our template different scores (RMSD, TM score) were calculated. Based on these scores it is not easy to decide which tool is the best one. For modeling with 2o1x as template which has the lowest sequence identity it is obvious that iTasser did the best job since the TM score is much higher than the TM score of the other two programms for models with this template. But for the models with 1qs0 as template all values are very nearby. By looking very close at the values we can see that the TM score of Swissmodel is the best of all TM scores and additionally the RMSD score of Swissmodel is the lowest one. So we can say that Swissmodel is the best tool. It is interesting that Swissmodel is even more precise than 3D-Jigsaw although this tool worked with the best predictions of Modeller, Swissmodel and iTasser. An explanation could be that all of the models are not very good because in the beginning we had two templates which have both a low sequence identity so perhaps there was too many false information in the models so that it was very hard for 3D-Jigsaw to build a very good model out of the 5 models. But it is important to see that the difference between Modeller, Swissmodel, iTasser and 3D-Jigsaw is only minimal.
We can conclude that the similarity of the template is the limiting factor for the model prediction and composes which tool is the most useful one.
== References ==
== References ==
Line 706: Line 578:
back to [[Secondary_Structure_Prediction_BCKDHA]]
back to [[Secondary_Structure_Prediction_BCKDHA]]
go to Task 5: [[Mapping_SNPs_BCKDHA|Mapping SNPs]]

Latest revision as of 22:02, 25 August 2011

1.Calculation and evaluation of models

Template selection

Homology modelling is a technique to determine the secondary structure of a target protein. It is based on an alignment of the target sequence and one or more template sequences with known secondary structures. The target sequence is assigned a secondary structure based on the template structure. The better the alignment, the better the predicted secondary structure for our template. Therefore the template selection is a crucial step in homology modelling.

To find similar structures to BCKDHA we ran HHsearch using the following command:
hhsearch -i query -d database -o output

It found the following 10 hits in the pdb70 database.

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM Identity
1 2bfd_A 2-oxoisovalerate dehydr 1.0 1 1 791.3 0.0 400 1-400 1-400 (400) 99%
2 1qs0_A 2-oxoisovalerate dehydr 1.0 1 1 571.5 0.0 349 32-382 52-407 (407) 39%
3 1w85_A Pyruvate dehydrogenase 1.0 1 1 530.8 0.0 356 8-382 6-362 (368) 34%
4 1umd_A E1-alpha, 2-OXO acid de 1.0 1 1 521.8 0.0 351 34-386 16-367 (367) 37%
5 2ozl_A PDHE1-A type I, pyruvat 1.0 1 1 482.7 0.0 331 46-380 25-356 (365) 27%
6 3l84_A Transketolase; TKT, str 1.0 1 1 85.4 0.0 133 161-297 113-252 (632) 21%
7 2r8o_A Transketolase 1, TK 1; 1.0 1 1 74.5 0.0 121 161-285 113-245 (669) 33%
8 2o1x_A 1-deoxy-D-xylulose-5-ph 1.0 1 1 74.2 0.0 127 161-287 122-254 (629) 18%
9 1gpu_A Transketolase; transfer 1.0 1 1 74.2 0.0 140 161-302 115-265 (680) 22%
10 3m49_A Transketolase; alpha-be 1.0 1 1 68.8 0.0 121 161-285 139-271 (690) 31%

Before we can start working with these hits we have to check whether one of them is a PDB structure for BCKDHA. This is the case for 2bfd_A.
By looking at our results and the fact that this hit can not be used we only have structures with an identity lower than 40%. Since there are just structures available from this region we decided to take two structures out of it. One with a 39% identity and one with 18% identity so that there is still a variation in the identities.
In the following we worked with 1qs0_A (39%) and with 2o1x_A (18%).

General information for the evaluation

A detailed description of how the created models were evaluated can be found in the Evaluation Protocol. The following section presents only the modelling and evaluation results.

Three interesting scores when comparing two structures for their structural similarity are the Cα RMSD, the all-atom RMSD and the TM Score. These are three measures which are usually used to calculate the accuracy of modelling a structure when the native structure is known. In the following we will call the Cα RMSD only RMSD.

The RMSD is the average distance of all residue pairs in two structures. The C-alpha RMSD is the average distance between aligned alpha-carbons. The smaller the RMSD value, the better the predicted structure. A local error (e.g. missorientation of the tail) will result in a high RMSD value, although the global structure is correct.
The all-atom RMSD is calculated of the residues which are in an area of 6 around the active sites. As in the Cα RMSD the models which have a low value are the better ones.
As the RMSD is sensitive to the local error, the TM Score was proposed. The TM Score weights close matches stronger than distant matches and therefore the local error problem is overcome. A TM Score <0.5 indicates a model with random structural similarity, whereas 0.5 < TM score < 1.00 means the two compared structures are in about the same fold and therefore the predicted model has a correct topology.


MODELLER is used for homology or comparative modelling of protein three-dimensional structures. It calculates a model containing all non-hydrogen atoms. There are also many other features provided by MODELLER like de novo modelling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, and so on.[1]

A tutorial is provided on [2] and on [3]

To run modeller with more than one template we use the targets (the percentage values indicate the sequence similarity to the target):

  • 3m49:A (31%)
  • 2r8o:A (33%)
  • 2o1x:A (18%)
  • 1w85:A (34%)
  • 1qs0:A (39%)

Protocol Modeller


Numeric evaluation

template molpdf DOPE score GA341 score
1QS0_A 2650.7 -40503.1 1.000
2O1X_A 2958.0 -30294.5 0.419
3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A 123913.8 -19573.7 0.001

The DOPE (Discrete Optimized Protein Energy) score is calculated to assess homology models. The lower the value of the DOPE score the better the model. This can be also seen in our three models. 1qs0 which has the highest sequence identity, definitely has the lowest DOPE score which is obvious because of the high identity. The model where 2o1x was the template has a higher score which is reasonable since 2o1x has a sequence identity of 18% whereas the first model (1qs0) has a sequence identity of 39%. This shows that although both structures were in the group with the lower sequence identities there are noticeable differences between them. The last model was build with five different structures. Normally this is helpful for the program because when more structures are included in the prediction, the model can be predicted more precisely as if there is a prediction with only one template structure. In this case the problem was that all structures which are combined to build a model had no high sequence identity. So the information Modeller got to build the model were not helpful. This is reflected in the scores because this model has the highest DOPE score. So as expected the model with 1qs0 as template structure is the most homologous model.

GA341 is calculated to decide wether the result is a good model or not. A model which is quite good has a score near one. When a model has a score lower than 0.6 it is a bad model. This is also reflected by our results. The model with 1qs0 as template is a very good model since the GA341 score is 1.0. This is a bit strange since the sequence identity to our protein is not very good. Of course the DOPE score was good, too. But it can not be correct that a model with a template which has only 39% sequence identity has the best possible GA341 score. The other two models have a score lower than 0.6 which shows that both of them are bad. It is interesting that the model with the 5 templates only has a score of 0.001 which seems a bit too low because the average sequence identity of the used structures is higher than the one of 2o1x which has a GA341 score of 0.419. All in all we can conclude that the 1qs0 model is the most accurate one.

To sum up the results of the two scores it is to say that although all of the structures have a low sequence identity the model with 1qs0 as template is the best one.

Comparison to experimental structure

experimental structure model with template RMSD (DaliLite) RMSD (sap) TM-score
1U5B_A 1QS0_A 2.3 0.829 0.8504
1U5B_A 2O1X_A 3.5 2.727 0.1592
1U5B_A 3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A no score 11.398 0.1719

C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The first model using 2o1x as template has a RMSD score of 2.3 or 0.829. In both cases the value is lower than the ones of the other two models. Since a low value indicates for a good model this model is the best of the three according to the RMSD value. The RMSD score for the 2o1x model is only a bit higher so it seems that this model is not much worse than the first one. For the third model DaliLite was not able to calculate a RMSD score at all because it could not find enough significant similarities because the structures are too dissimilar. This dissimilarity is reflected by the RMSD value which is calculated with the sap command because it is very high compared with the other two values of sap. An explanation for this bad result for the last model could be that there is too many false information used during the building process. By comparing the TM scores of the three models with each other we can see that only one model has a value higher than 0.5 which means that only one model is significant good. The 1qs0 model has a TM score of 0.829 so it is declared to be a good model whereas the other two models have a TM score of about 0.1 which is far lower than 0.5 and that indicates that both models are useless.

all-atom RMSD

position 1qs0 2o1x multi
161 0.332 6.172 2.607
166 0.668 3.697 3.208
167 0.656 6.759 7.962

Additionally we calculated the all-atom RMSD scores for the three catalytic centers of the three models. Like in all the other scores above we can notice that the model with 1qs0 as template is the best one. This is pointed out by the fact that at all three catalytic centers the all-atom RMSD values are the lowest ones. There is one interesting observation by comparing the values of the other two models because at the first catalytic center the model with 2o1x has a much worse score than the model with the five structures as template. At the second center the score of the 2o1x model is just a little bit lower and at the third center it is even higher. So by looking at the all-atom RMSD valus it can not be decided wether the second or the third model is the better one.


Figure1: Superimposed structures of 1U5B and the modeller model with template 1QS0
Figure2: Superimposed structures of 1U5B and the modeller model with template 2O1X
Figure3: Superimposed structures of 1U5B and the modeller model with more than one template

All the calculated scores above declare the model which has the structure of 1qs0 as template beeing the best model. By looking at the visulization ( Figure 1) the assertion of all these scores can be approved. As we can see especially the alpha helices are quite good aligned although there are some which are not aligned. Another fact which shows that the two structures are not perfectly aligned is that on the left and right side of the superposition are two structures which are completely not aligned. But all in all it seems that the model is compatible with our protein. Expecially by comparing it with the two other models. The 2o1x model which is visualized in Figure 2 has no aligned structure so that it appears that there are two completely different structures superposed. This impression is supported by the calculated scores above which show that the model using 2o1x as template does not fit very good. This also applies to the third model. As we can see in Figure 3 there is no match between the two structures and so there is also no aligned structure. Again this result could be suspected because of the bad evaluation scores.



Figure4: SWISS-MODEL server page

To find protein structure homology models SWISS-MODEL can be used. SWISS-MODEL is a fully automated protein structure homology-modeling server and is accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).
It provides three different modelling modes:

  • Automated Mode
  • Alignment Mode
  • Project Mode

The Automated Mode uses fully automated modelling and can therefore be only used when the template is very similar to the target.<ref>http://swissmodel.expasy.org/?pid=smd03&uid=&token=</ref>
As an Input for the automated mode, only an amino acid sequence (raw or FASTA format) or the Uniprot AC of the target is required as it is show in Figure 4. Optional a template PDB id can be given. Swissmodel automatically selects templates from a Blast run which are suitable due to their E-values if no template is given. The Alignment Mode has to be used for the structures with a low identity. Since we only have hits in the region < 40% we used this tool.

Protocol Swissmodel



1qs0 2o1x
Figure5: Secondary structure prediction of swissmodel with 1qs0 as template
Figure6: Secondary structure prediction of swissmodel with 2o1x as template

Swissmodel predicts the secondary structure of our protein with 1qs0 and 2o1x as templates. By looking at the visualization of the secondary structure in Figure 5 and Figure 6 we can see that in both cases the mainly structural elements are alpha helices.

Numeric evaluation

Global Model Quality Estimation
1qs0 2o1x
QMEANscore4 0.57 0.18
QMEAN Z-Score -3.28 -9.89
Figure7: Comparison of the model with non-redundant set of PDB structures; the red x stands for the Z-score of this model
Figure8: Comparison of the model with non-redundant set of PDB structures; the red x stands for the Z-score of this model

Additional information about the QMEAN score

The QMEANscore4 is calculated to compare whole models. The score ranges between 0 and 1. The higher the value the better is the quality of the model. By comparing the score of the 1qs0 model with the score of the score of the 2o1x model it is obvious that the first one is the better one since it has a much higher QMEANscore4. But although it is better then the model with 2o1x as target it is not very good. It can be inferred from the score of only 0.18 that this model is useless.
The QMEAN Z-Score represents the absolute quality of a model. Models with a low quality have a strongly negative QMEAN Z-scores. By looking at Figure 7 and Figure 8 we can see that the QMEAN Z-score of both models is negative and both are under the black or grey graph which is shown in the figures. The fact that both scores are negative indicates that both models are not of top quality. But by comparing the scores directly we can see that the model with 1qs0 as template has a score of -3.28 and the score of the model with 2o1x as template is -9.89 so it appears that the first model is little better than the other one. Both values can be found in the already mentioned plots.

1qs0 2o1x
Figure9: Estimated per-residue in accuracies along the sequence for the 1qs0 model
Figure10: Coloring of the 1qs0 modell by residue error
Figure11: Estimated per-residue in accuracies along the sequence for the 2o1x model
Figure12: Coloring of the 2o1x modell by residue error

The plots in Figure 9 and Figure 11 show the confidence of SWISSMODEL for each residue of the built models. In the plot of the model with 1qs0 as template ( Figure 9) we can see that here the program is only unsure about the beginning and the end of the model. In the middle of the model there is also a peak indicating that those residues are not modeled with complete certainty. For the rest of the model the program predicts a low inaccuracy probability. The plot of the 2o1x model ( Figure 11) is the complete opposite. Here we can see that there are many very high peaks in the middle of the protein which suggests that the programm predicts for the middle part of the model which is more important than the ends a very high inaccuracy. When a model is wrong in the middle part it is useless and since there are so high peaks in the middle part it is possible that this model is useless. The same conclusion can be found in Figure 10 which is the visualization of the 1qs0 model and in Figure 12 which is the visualization of the 201x model. In both figures we can see the coloured model. The region which is blue stands for a high assurance and when a region is red it is meant that this part is in all probability wrong. When we look at the model with 1qs0 as template we can see that there are only a few red parts and they are mainly in the end of the protein whereas in the center the parts are coloured blue or green which shows that these parts of the model are probably correct. In contrary to this picture the model of 2o1x is nearly completely red which supports the assertion that this model is useless because nearly the complete model is predicted to be possibly wrong.

Local Model Quality Estimation: Anolea / QMEAN

1qs0 2o1x
Figure13: Local Model Quality Estimation with Anolea and QMEAN for the 1qs0 model
Figure14: Local Model Quality Estimation with Anolea and QMEAN for the 2o1x model

For the local model quality estimation we chose the ANOLEA potential. This program performs energy calculations on a protein chain. On the y-axis the energy of each amino acid is represented. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid. By looking on both plots we can see that there are many red parts so both of them are perhaps not completely correct. But when we analyse the two figures seperately we can see that the energy calculation for the 1qs0 model ( Figure 13) contains a few green parts which shows that there are some favourable energy environments in the center of the protein and the part with a bad energy environment is only in the beginning of the protein. So we can deduce that this protein is perhaps correct in the important middle part. The other plot for the 2o1x model ( Figure 14) is completely red. Not only in the beginning but also in the important middle part of the model. This can denote that this model is probably not useful.

Comparison to experimental structure

experimental structure model with template RMSD (DaliLite) RMSD (sap) TM score
1U5B_A 1QS0_A 3.4 0.766 0.8771
1U5B_A 2O1X_A 3.3 14.305 0.1686

The RMSD is a measure of the average deviation of the distance between aligned alpha-carbons. The higher this distance value the worse is the model. To calculate the RMSD we used two different programs. Usually the results of both are not the same but they have the same trend. In this case it is different. By comparing the RMSD scores calculated by DaliLate which can be looked up in the table above the 1qs0 model is 0.1 higher than the score of the 2o1x model. So it appears that the model with the 2o1x structure as template is a bit better. But when we compare the scores calculated by the sap command the result is completely different. The 1qs0 has a value of 0.8771 and the 2o1x model has a value of 14.305. Following these results it is obvious that the 1qs0 model is much better which is the opposite to the other RMSD conclusion. But in this case the difference between the scores of the two models is much more significant than in the other case so it can be reasoned that the model with 1qs0 as template is the better model. To confirm this assumption we analyse the TM score. When the TM score is higher than 0.5 it can be said that a model is good. This is not the case for the 2o1x model since it has a score of 0.1686 which is really low. We can argue from this value that the model is bad. In contrary to the model with 1qs0 as it has a score of 0.8771 and so it is declared to be a good model. The conclusion of the TM score supports the one of the RMSD score so it can be said that all in all the 1qs0 model is the better one.

all atom RMSD

position 1QS0_A 2O1X_A
161 0.337 3.258
166 0.585 1.028
167 0.594 1.309

Additionally to the scores above we calculated the all-atom RMSD scores for the three catalytic center of the two models. The values of this score are definite. At all three catalytic centers the values for the model with 1qs0 as template are much better since low values stay for good models. The really high values for the 2o1x model indicate that this model is quite useless.


Figure15: Superposition of the Swissmodel model using template 1qs0 and target 1U5B
Figure16: Superposition of the Swissmodel model using template 2o1x and target 1U5B

The calculated RMSD score, TM score and all-atom RMSD score indicate that the model with 1qs0 as template is the better one and that the other model is quite useless. To check these conclusions we superposed the two models with the structure of our protein. In the visualization of the 1qs0 model superposition in Figure 15 we can see that there are only a few regions of the two structures which could be superposed completely. But the main part of the model is shifted a bit so that the secondary structure elements lay next to each other. This observation shows that the model is just an approximation of the structure but it is not perfect. The visualization of the superposition of the 2o1x model in Figure 16 reflects completely the conclusion we made by analysing the scores above. The model is useless as there is no region which could be superposed perfectly and it looks like a superposition of two completely different structures.
To summerize the results of the numeric evaluation and of the comparison to experimental structure we can say that the model with 2o1x can not be used for further analysis since there is no similarity between out protein and this model. The 1qs0 model is not that bad since it has quite good scores which show that it is a good model but by looking at the visualization we see that it has indeed the same structure but it is shifted a bit. So we can work with this model but the results which base on this model won`t be completely correct.




1qs0 2o1x
Figure17: Secondary structure prediction of iTasser with 1qs0 as template
Figure18: Secondary structure prediction of iTasser with 2o1x as template

The secondary structure predictiosn of iTasser for our protein with 1qs0 ( Figure 17) and 2o1x ( Figure 18) as templates contain mainly alpha helices. The fact that there are mainly alpha helices in the structure agrees with the prediction of swissmodel.

Numeric evaluation


1qs0 2o1x
model1 model2 model3 model4 model5 model1 model2 model3 model4 model5
1.174 -0.190 -0.718 0.200 -5 -0.150 -1.276 -1.863 -2.155 -3.208

The C-score is a measure for the quality of predicted models by I-TASSER. C-score ranges between [-5,2], where a C-score of higher value signifies a model with a high confidence. First the five models with 1qs0 were analysed. Model1 has a score of 1.174 which is a high value at this chart so the quality of this model seems to be good. The only other model which also has a positive score is model4 with 0.200. This is not as high as the score of model1 but it is positive enough to say that this is also a good model. Model2 has a negative score of -0.190 but this value is still much higher than -5 so it is still high enough that it can be declared as a good model. Model3 has a C-score of -0.718. This score is nearly in the middle of the chart which indicates that this model is possibly false. The last model is quite interesting since all the other models had not so bad scores but this model has the worst possible score of -5. So it is clear that this model is absolutely useless. Now the models with 2o1x as template are analysed. None of the C-scores is positive which demonstrated that these five models are obviously not very good. The best of the five models is model1 since it has a score of -0.150 which is not very negative. By looking at the scores of the other four models it has to be said that all of them can not be good models because the C-score ranges between -1.276 and -3.208. To summarize the C-scores of the ten models only model1 and model4 which have 1qs0 as template have positive scores. This indicates that only these two models are useful to work with.

Comparison to experimental structure

1qs0 2o1x
No RMSD (DaliLite) RMSD (sap) TM score RMSD (DaliLite) RMSD (sap) TM score
1 2.2 0.869 0.8539 3.3 2.671 0.5377
2 1.9 0.834 0.8627 1.6 1.056 0.8598
3 2.1 0.940 0.8437 3.0 2.354 0.4688
4 2.2 0.880 0.8523 4.0 2.840 0.4904
5 2.4 0.984 0.8363 3.3 3.123 0.4938

The RMSD is a measure of the average deviation of the distance between aligned alpha-carbons. The higher this distance value the worse is the model. We calculated the RMSD score with two different programs so that we can see it if there is a strange calculation in one of the results and that we can compare the two RMSDs. The other calculated score is the TM score. When it is between 0.5 and 1.0 then the predictec model has the correct topology. In the first analysis we will just look at the models with 1qs0 as template. By comparing the scores of the five models with each other it is conspicuously that all of them have nearly the same value. It doesn't matter which RMSD score is considered. In both cases all the scores differ only minimal. When we go into more detail by looking at the DaliLite-RMSD score we recognize that model3 and model5 have a score which is a bit higher but not significant. So we can say that all five models have a well predicted structure. To get more information about the models to make a better statement we also analyzed the TM score. But here we have got the same result as with the RMSD score. All five TM scores are quite the same and are all higher than 0.5. So we can conclude considering the RMSD scores and the TM score that these five models are all very well predicted and that there is nearly no difference between them.
The next analysis is of the models which have 2o1x as template. By comparing the scores of the different models we can see that here is more divergence. Only the model2 seems to be a good model because it has low RMSD values and also the TM score is far over 0.5. The only other model which has a TM score over 0.5 is model1 but it has quite high RMSD values compared to the other models. Model3, 4 and 5 have all high RMSD scores which shows that their prediction is unconfident. Additionally all of them have a TM score which is lower than 0.5 so it is possible that their topology is not correct.
Out of all the results we can conclude that the five models which are build with the help of 1qs0 are all very good and useful and of the other five models only the second one seems to be well predicted and usefull.

all atom RMSD

1qs0 2o1x
model 161 166 167 161 166 167
1 0.739 0.826 0.786 1.009 1.542 1.807
2 0.700 0.759 0.590 0.592 0.771 0.581
3 1.177 0.786 0.844 2.363 4.685 5.078
4 0.906 0.852 0.989 0.798 1.211 2.984
5 0.739 0.926 0.830 1.609 1.174 3.539

To calculate the RMSD of the 6A radius of the catalytic center we had to find the catalytic center first. There are three catalytic center on the positions 161, 166 and 167. We calculated the RMSD for all of them. We start with the analysis for the 1qs0 models. Here we can see that there are difference between the five different models although all of them have good values. To go into more detail it has to be said that the second model has the lowest values on each position so it is the most accurate one. Model1 also has good values but they are not as good as the ones of Model2. By looking at the other three models we can see that their values are still good but they are a bit higher than the ones of model1 and 2. By analysing the models built with 2o1x as template we can see that model2 has not only lower values than the other 2o1x models but has the lowest values of all models. So we can say that according to the all-atom RMSD model2 with 2o1x as template is the best model. This model is the only one of the models built with the help of 2o1x which is profitably. All the other models have quite high values up to 5.078 so it is not possible to work with them.



Figure19: iTasser model 1 for template 1qs0 superimposed on target 1U5B
Figure20: iTasser model 2 for template 1qs0 superimposed on target 1U5B
Figure21: iTasser model 3 for template 1qs0 superimposed on target 1U5B
Figure22: iTasser model 4 for template 1qs0 superimposed on target 1U5B
Figure23: iTasser model 4 for template 1qs0 superimposed on target 1U5B


Figure24: iTasser model 1 for template 2o1x superimposed on target 1U5B
Figure25: iTasser model 2 for template 2o1x superimposed on target 1U5B
Figure26: iTasser model 3 for template 2o1x superimposed on target 1U5B
Figure27: iTasser model 4 for template 2o1x superimposed on target 1U5B
Figure28: iTasser model 4 for template 2o1x superimposed on target 1U5B

Since the above discussed results are not definite we have to look at the superpositions of the model with the structure of our protein. As in the previous analysis we start with the models of 1qs0. In Figure 19 the superposition with model1 is visualized and we can see that the model has the same structure as our protein but it is shifted a bit. This observation agrees with the assumption that the model is quite good but not perfect. In Figure 20 model2 is shown which is according to the scores a really good model. In fact there are structural elements which can be superposed perfectly but there are also parts which are shifted or can not be superposed at all. So we have to conclude that in this case the model seems to be not as good as the scores predicted. According to the scores model3 Figure 21 is not as good as the other two models mentioned above. This can be supported by the visualization since there are many regions which are shifted or can not be superposed at all. Model4 which is shown in Figure 22 actually has a bit worse scores than model3 and this difference can also be seen articulately in the picture. Model5 is predicted to be the worst models of all because it has bad scores compared to the other models. By looking at the superposition of the structures in Figure 23 this result can be affirmed since no perfect superposed structural element can be seen.
The qualitiy of the models with 2o1x as template is very sure. The calculated scores show that model2 is a very good one. To check this it is helpful to look at the visualization of the superposition of model2 and the structure of BCKDHA in Figure 25. The overlay is not perfect but seems to be shifted in most parts of the model though there are the same structural elements which point in the same direction. By looking at the other four models ( Figure 24, Figure 26, Figure 27, Figure 28) we can see that all the models can not be superposed with the structure of our protein. This observation supports the already made assumption that all four models can not be used since they are too dissimilar to the structure of BCKDHA.


3DJigsaw is a server which builds protein models based on already predicted models for a specific target. It recombines the models and optimizes them.

Since we have only models for the low sequence-identity category we started it only once with the best models of this category. The following models were chosen to build a recombined model with 3DJigsaw because of their high TM score:

  • modeller model for template 1qs0
  • swissmodel model for template 1qs0
  • iTasser model 1 for template 1qs0
  • iTasser model 2 for template 1qs0
  • iTasser model 4 for template 1qs0



Figure29: Secondary structure prediction of 3D-Jigsaw

3D-Jigsaw also predicts mainly alpha helices for out protein with the help of the five previously built models. But as we can see in Figure 29 this tool predicts more beta sheets than Swissmodel or iTasser

Numeric evaluation

Model 1 2 3 4 5
Energy -506.64 -506.52 -505.12 -500.75 -496.28
Coverage 1.00 1.00 1.00 1.00 1.00

3D-Jigsaw calculates the energy and the coverage for each predicted model. By comparing the coverage for each model we can see that the predicted model covers 100% of the model since the coverage for each model is 1.0. The energies are more different between the five predicted models. The lower the energy the better because a low energy indicates a stable model. Although the energies are different between the five models it is obvious that these differences are not significant. The first one indeed has the lowest energy but only minimal lower than the second or the third model. The energy of model 4 is about 5 points higher so we can say that the first three models are better than model 4 and 5. But all in all it can be said that both the energy calculation and the coverage predict all five models to be good.

Comparison to experimental structure

Model RMSD (DaliLite) RMSD (sap) TM score
1 1.9 0.834 0.8627
2 1.9 0.834 0.8626
3 1.8 0.833 0.8631
4 2.1 0.869 0.8539
5 2.1 0.972 0.8545

To find out how good the models are we calculate the RMSD twice with different tools. Due to the fact that a low RMSD value indicates a good model we can say that all five models seem to be very good. The RMSD calculated by DaliLite varies only between 1.8 and 2.1 which is a very small range. This is the same for the RMSD calculated by the sap-command. By looking at both values in more detail we can see that the first three models are as in the energy calculation a bit better than the last two models but not much. Additionally we calculated the TM score for each model. All of the five models have TM score higher than 0.5 which shows that all of the five models have a correct predicted topology. Again a cut can be seen between model 3 and 4 indicateing that, as the other scores suggested, the first three models are a bit better.

all atom RMSD

model1 model2 model3 model4 model5
161 0.750 0.750 0.750 0.826 1.071
166 0.709 0.709 0.709 0.784 0.738
167 0.593 0.593 0.593 0.671 0.580

The calculated all atom RMSD scores affirm the assertion of the above discussed results. As we can see of the all atom RMSD the first three models are all equally good. Additionally there is again the cut after model 3. Model 4 and 5 are both a bit worse than the first three models. But by looking at the values we can see that all of them are quite low so it is obvious that all five models are good.


Figure30: 3D-Jigsaw model 1 superimposed with the target 1U5B
Figure31: 3D-Jigsaw model 2 superimposed with the target 1U5B
Figure32: 3D-Jigsaw model 3 superimposed with the target 1U5B
Figure33: 3D-Jigsaw model 4 superimposed with the target 1U5B
Figure34: 3D-Jigsaw model 5 superimposed with the target 1U5B

To check our conclusion that the first three models are a bit better than the other two but all in all the five models are all very good is correct we look at the superposition of the model with the correct structure of BCKDHA. Figure 30 shows model 1 and it is apparent that the prediction of the structure of this model was very good as most of the two structures can be aligned perfectly. Of course there are also some parts which are not aligned but that is due to the scores which are also not perfect. This score indicates that also model 2 is very similar to the real structure. This can be supported by the superposition (Figure 31) since most of the protein is covered by the model. Again we can see that the superposition is not perfect but this was also expected. Figure 32 shows the third of the models which is predicted by the scores to be very good. This prediction can be approved since the superposition of model 3 and the structure of BCKDHA is for most parts perfect. As in the two other models there are some regions which could not be aligned but that was expected again. The superpositions of model 4 and 5 with the structure of BCKDHA ( Figure 33 and Figure 34) reflect the cut between them and the first three models. Still the model covers the real structur very good but it seems that there are more regions which can not be aligned or that there are shifts between the two structures. With the observation of the superpositions the results dicussed above are supported.

Comparison of the methods

Numerical Evaluation

The following tables list the RMSD and TM score values, which were computed before, to provide an overview of the performance of the different methods.


1qs0 2o1x Multi
RMSD (sap) 0.829 2.727 11.398
TM score 0.8504 0.1592 0.1719


1qs0 2o1x
RMSD (sap) 0.766 14.305
TM score 0.8771 0.1686


1qs0 2o1x
model1 model2 model3 model4 model5 model1 model2 model3 model4 model5
RMSD (sap) 0.869 0.834 0.940 0.880 0.984 2.671 1.056 2.354 2.840 3.123
TM score 0.8539 0.8627 0.8437 0.8523 0.8363 0.5377 0.8598 0.4688 0.4904 0.4938


model1 model2 model3 model4 model5
RMSD 0.834 0.834 0.833 0.869 0.972
TM score 0.8627 0.8626 0.8631 0.8539 0.8545


To compare the predicted models and the real crystallized structure of our template different scores (RMSD, TM score) were calculated. Based on these scores it is not easy to decide which tool is the best one. For modeling with 2o1x as template which has the lowest sequence identity it is obvious that iTasser did the best job since the TM score is much higher than the TM score of the other two programms for models with this template. But for the models with 1qs0 as template all values are very nearby. By looking very close at the values we can see that the TM score of Swissmodel is the best of all TM scores and additionally the RMSD score of Swissmodel is the lowest one. So we can say that Swissmodel is the best tool. It is interesting that Swissmodel is even more precise than 3D-Jigsaw although this tool worked with the best predictions of Modeller, Swissmodel and iTasser. An explanation could be that all of the models are not very good because in the beginning we had two templates which have both a low sequence identity so perhaps there was too many false information in the models so that it was very hard for 3D-Jigsaw to build a very good model out of the 5 models. But it is important to see that the difference between Modeller, Swissmodel, iTasser and 3D-Jigsaw is only minimal. We can conclude that the similarity of the template is the limiting factor for the model prediction and composes which tool is the most useful one.


<references />

back to Maple syrup urine disease main page

back to Secondary_Structure_Prediction_BCKDHA

go to Task 5: Mapping SNPs