Difference between revisions of "Homology based structure predictions BCKDHA"
(→Comparison to experimental structure) |
(→Comparison to experimental structure) |
||
Line 342: | Line 342: | ||
|} |
|} |
||
− | To calculate the RMSD of the 6A radius of the catalytic center we had to find the catalytic center first. There are three catalytic center on the positions 161, 166 and 167. We calculated the RMSD for all of them. We start with the analysis for the 1qs0 models. Here we can see that there are difference between the five different models although all of them have good values. |
+ | To calculate the RMSD of the 6A radius of the catalytic center we had to find the catalytic center first. There are three catalytic center on the positions 161, 166 and 167. We calculated the RMSD for all of them. We start with the analysis for the 1qs0 models. Here we can see that there are difference between the five different models although all of them have good values. To go into more detail it has to be said that the second model has the lowest values on each position so it is the most accurate one. Model1 also has good values but they are not as good as the ones of Model2. By looking at the other three models we can see that their values are still good but they are a bit higher than the ones of model1 and 2. |
Revision as of 13:32, 12 August 2011
!!! This task has to be re-done. The template used for the "good" category is a structure of BCKDHA itself, and when running iTasser the self hit was NOT excluded!!!
Contents
1.Calculation and evaluation of models
Template selection
Homology modelling is a technique to determine the secondary structure of a target protein. It is based on an alignment of the target sequence and one or more template sequences with known secondary structures. The target sequence is assigned a secondary structure based on the template structure. The better the alignment, the better the predicted secondary structure for our template. Therefore the template selection is a crucial step in homology modelling.
To find similar structures to BCKDHA we ran HHsearch using the following command:
hhsearch -i query -d database -o output
It found the following 10 hits in the pdb70 database.
No | Hit | Prob | E-value | P-value | Score | SS | Cols | Query HMM | Template HMM | Identity |
---|---|---|---|---|---|---|---|---|---|---|
1 | 2bfd_A 2-oxoisovalerate dehydr | 1.0 | 1 | 1 | 791.3 | 0.0 | 400 | 1-400 | 1-400 (400) | 99% |
2 | 1qs0_A 2-oxoisovalerate dehydr | 1.0 | 1 | 1 | 571.5 | 0.0 | 349 | 32-382 | 52-407 (407) | 39% |
3 | 1w85_A Pyruvate dehydrogenase | 1.0 | 1 | 1 | 530.8 | 0.0 | 356 | 8-382 | 6-362 (368) | 34% |
4 | 1umd_A E1-alpha, 2-OXO acid de | 1.0 | 1 | 1 | 521.8 | 0.0 | 351 | 34-386 | 16-367 (367) | 37% |
5 | 2ozl_A PDHE1-A type I, pyruvat | 1.0 | 1 | 1 | 482.7 | 0.0 | 331 | 46-380 | 25-356 (365) | 27% |
6 | 3l84_A Transketolase; TKT, str | 1.0 | 1 | 1 | 85.4 | 0.0 | 133 | 161-297 | 113-252 (632) | 21% |
7 | 2r8o_A Transketolase 1, TK 1; | 1.0 | 1 | 1 | 74.5 | 0.0 | 121 | 161-285 | 113-245 (669) | 33% |
8 | 2o1x_A 1-deoxy-D-xylulose-5-ph | 1.0 | 1 | 1 | 74.2 | 0.0 | 127 | 161-287 | 122-254 (629) | 18% |
9 | 1gpu_A Transketolase; transfer | 1.0 | 1 | 1 | 74.2 | 0.0 | 140 | 161-302 | 115-265 (680) | 22% |
10 | 3m49_A Transketolase; alpha-be | 1.0 | 1 | 1 | 68.8 | 0.0 | 121 | 161-285 | 139-271 (690) | 31% |
Before we can start working with these hits we have to check whether one of them is a PDB structure for BCKDHA. This is the case for 2bfd_A.
By looking at our results and the fact that this hit can not be used we only have structures with an identity lower than 40%.
Since there are just structures available from this region we decided to take two structures out of it. One with a 39% identity and one with 18% identity so that there is still a variation in the identities.
In the following we worked with 1qs0_A (39%) and with 2o1x_A (18%).
General information for the evaluation
A detailed description of how the created models were evaluated can be found in the Evaluation Protocol. The following section presents only the modelling and evaluation results.
Three interesting score when comparing two structures for their structural similarity are the Cα RMSD, the all-atom RMSD and the TM Score. These are two measures which are usually used to measure the accuracy of modelling a structure when the native structure is known. In the following we will call the Cα RMSD only RMSD.
The RMSD is the average distance of all residue pairs in two structures. The C-alpha RMSD is the average distance between aligned alpha-carbons.
The smaller the RMSD value, the better the predicted structure. A local error (e.g. misorientation of the tail) will result in a high RMSD value, although the global structure is correct.
The all-atom RMSD is calculated of the residues which are in an area of 6 around the active sites. As in the Cα RMSD the models which have a low value are the better ones.
As the RMSD is sensitive to the local error, the TM-Score was proposed.
The TM Score weights close matches stronger than distant matches and therefore the local error problem is overcome. A TM-Score <0.5 indicates a model with random structural similarity, wherease 0.5 < TM-score < 1.00 means the two compared structures are in about the same fold and therefore the predicted model has a correct topology.
Modeller
MODELLER is used for homology or comparative modelling of protein three-dimensional structures. It calculates a model containing all non-hydrogen atoms. There are also many other features provided by MODELLER like de novo modelling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, and so on.[1]
A tutorial is provided on [2] and on [3]
To run modeller with more than one template we use the targets (the percentage values indicate the sequence similarity to the target):
- 3m49:A (31%)
- 2r8o:A (33%)
- 2o1x:A (18%)
- 1w85:A (34%)
- 1qs0:A (39%)
Results
Numeric evaluation
template | molpdf | DOPE score | GA341 score |
---|---|---|---|
1QS0_A | 2650.7 | -40503.1 | 1.000 |
2O1X_A | 2958.0 | -30294.5 | 0.419 |
3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A | 123913.8 | -19573.7 | 0.001 |
The DOPE (Discrete Optimized Protein Energy) score is calculated to assess homology models. The lower the value of the DOPE score the better the model. This can be also seen in our three models. 1qs0 which has the highest sequence identity, definitely has the lowest DOPE-score which is obvious because of the high identity. The model where 2o1x was the template has a higher score which is reasonable since 2o1x has a sequence identity of 18% whereas the first model (1qs0) has a sequence identity of 39%. This shows that although both structures were in the group with the lower sequence identities there are noticeable differences between them.
The last model was build with five different structures. Normally it is helpful for the programm because when more structures are included in the prediction, the model can predict more precisely than with just one template structure. In this case the problem was that all structures which are combined to build a model had no high sequence identity so the information Modeller got to build the model were not helpful. This is reflectived in the scores because this model has the highest DOPE-score.
So as expected the model with 1qs0 as template structure is the most homologous model.
GA341 is calculated to decide wether the result is a good model or not. A model which is quite good has a score near one. When a model has a score lower than 0.6 it is a bad model. This is also reflected by our results. The model with 1qs0 as template is avery good model since the GA341-score is 1.0. This is a bit strange since the sequence identity to our protein is not very good. Of course the DOPE score was good, too but it can not be correct that a model with a template which has only 39% sequence identity has the best possible GA341-score. The other two models have a score lower than 0.6 which shows that both of them are bad. It is interesting that the model with the 5 templates only has a score of 0.001 which seems a bit to low because the average sequence identity of the used structures is higher than the one of 2o1x which has a GA341 score of 0.419. All in all we can conclude that the 1qs0 model is the most accurate one.
To sum up the results of the two scores it is to say that although all of the structures have a low sequence identity the model with 1qs0 as template is the best one.
Comparison to experimental structure
experimental structure | model with template | RMSD (DaliLite) | RMSD (sap) | TM-score |
---|---|---|---|---|
1U5B_A | 1QS0_A | 2.3 | 0.829 | 0.8504 |
1U5B_A | 2O1X_A | 3.5 | 2.727 | 0.1592 |
1U5B_A | 3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A | no score | 11.398 | 0.1719 |
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The first model using 2o1x as template has a RMSD score of 2.3 or 0.829. In both cases the value is lower than the ones of the other two models. Since a low value indicates for a good model this model is the best of the three according to the RMSD value.
The RMSD score for the 2o1x model is only a bit higher so it seems that this model is not much worse than the first one. For the third model DaliLite was not able to calculate a RMSD score at all because it could not find enough significant similarities because the structures are to dissimilar. This dissimilarity is reflected by the RMSD value which is calculated with the sap command because it is very high compared with the other two values of sap. An explanation for this bad result for the last model could be that there are to many false information used during the building process.
By comparing the TM scores of the three models with each other we can see that only one model has a value higher than 0.5 which means that only one model is significant good. The 1qs0 model has a TM score of 0.829 so it is declared to be a good model whereas the other two models have a TM score of about 0.1 which is far lower than 0.5 and that indicates that both models are useless.
all-atom RMSD
position | 1qs0 | 2o1x | multi |
---|---|---|---|
161 | 0.332 | 6.172 | 2.607 |
166 | 0.668 | 3.697 | 3.208 |
167 | 0.656 | 6.759 | 7.962 |
Additionally we calculated the all-atom RMSD scores for the three catalytic center of the three models. As in all the other scores above we can notice again that the model with 1qs0 as template is the best one. This is pointed out by the fact that at all three catalytic centers the all-atom RMSD values are the lowest ones. There is one interesting observation by comparing the values of the other two models because at the first catalytic center the model with 2o1x has a much worse score than the model with the five structures as template. At the second center the score of the 2o1x model is just a little bit lower and at the third center it is even higher. So by looking at the all-atom RMSD valus it can not be decided wether the second or the third model is the better one.
Superposition
All the calculated scores above declare the model which has the structure of 1qs0 as template beeing the best model. By looking at the visulization ( Figure 1) the assertion of all these scores can be approved. As we can see especially the alpha helices are quite good aligned although there are some which are not aligned. Another thing which shwos that the two structures are not perfectly aligned is that on the left and right side of the superposition there are two structures which are completely not aligned. But all in all it seems that the model compatible with our protein. Expecially by comparing it with the two other models. The 2o1x model which is visualized in Figure 2 has no aligned structure so that it appears that there are two completely different structures superposed. This impression is supported by the calculated scores above which all show that the model which uses 2o1x as template does not fit very good. This also applies to the third model. As we can see in Figure 3 there is again no match between the two structures and so there is also no aligned structure. Again this result could be suspected because of the bac evaluation scores.
SWISS-MODEL
To find protein structure homology models SWISS-MODEL can be used. SWISS-MODEL is a fully automated protein structure homology-modeling server and is accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).
It provides three different modelling modes:
- Automated Mode
- Alignment Mode
- Project Mode
The Automated Mode uses fully automated modelling and can therefore be only used when the template is very similar to the target.<ref>http://swissmodel.expasy.org/?pid=smd03&uid=&token=</ref>
As an Input for the automated mode, only an amino acid sequence (raw or FASTA format) or the Uniprot AC of the target is required as it is show in Figure 4. Optional a template PDB id can be given. Swissmodel automatically selects templates from a Blast run which are suitable due to their E-values if no template is given.
The Alignment Mode has to be used for the structures with a low identity. Since we only have hits in the region < 40% we used this tool.
Results
Numeric evaluation
Global Model Quality Estimation
1qs0 | 2o1x | |
---|---|---|
QMEANscore4 | 0.57 | 0.18 |
QMEAN Z-Score | -3.28 | -9.89 |
Additional information about the QMEAN score
The QMEANscore4 is calculated to compare whole models. The score ranges between 0 and 1. The higher the value the better is the quality of the model. By comparing the score of the 1qs0 model with the score of the score of the 2o1x model it is obvious that the first one is the better one since it has a much higher QMEANscore4. But although it is better then the model with 2o1x as target it is not very good. This can be argued from the QMEANscore4 of 0.57 which is not a quarter as good as a model with the score of 1. It can be inferred from the score of only 0.18 that this model is useless.
The QMEAN Z-Score represents the absolute quality of a model. Models with a low quality have a strongly negative QMEAN Z-scores. By looking at
Figure 5 and Figure 6 we can see that the QMEAN Z-score of both models is negative and both are under the black or grey graph which is shown in the figures. The fact that both scores are negative indicates that both models are not of top quality. But by comparing the scores directly we can see that the model with 1qs0 as template has a score of -3.28 and the score of the model with 2o1x as template is -9.89 so it appears that the first model is a bit better than the other one. Both values can be found in the already mentioned plots.
1qs0 | 2o1x | ||
---|---|---|---|
The plots in Figure 7 and Figure 9 show the confidence of SWISSMODEL for each residue of the built models. In the plot of the model with 1qs0 as template ( Figure 7) we can see that here the program is only unsure about the beginning and the end of the model. In the middle of the model there is also a peak indicating that those residues are not modeled with complete certainty. For the rest of the model the program predicts a low inaccuracy probability. The plot of the 2o1x model ( Figure 9) is the complete opposite. Here we can see that there are many very high peaks in the middle of the protein which suggests that the programm predicts for the middle part of the model which is more important than the ends a very high inaccuracy. When a model is wrong in the middle part it is useless and since there are so high peaks in the middle part it can be that this model is useless. The same conclusion can be found in Figure 8 which is the visualization of the 1qs0 model and in Figure 10 which is the visualization of the 201x model. In both figures we can see the coloured model. The region which is blue stands for a high assurance and when a region is red is means that this part is in all probability wrong. When we look at the model with 1qs0 as template we can see that there are only a few red parts and they are mainly in the end of the protein whereas in the center the parts are coloured blue or green which shows that these parts of the model are probably correct. In contrary to this picture the model of 2o1x is nearly completely red which supports the assertion that this model is useless because nearly the complet model is predicted to be possibly wrong.
Local Model Quality Estimation: Anolea / QMEAN
1qs0 | 2o1x |
---|---|
For the local model quality estimation we chose the ANOLEA potential. This program performs energy calculations on a protein chain. On the y-axis the energy of each amino acid is represented. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid.
By looking on both plots we can see that in both there are many red parts so both of them are perhaps not completely correct. But when we analyse the two figures seperately we can see that the energy calculation for the 1qs0 model ( Figure 11) contains a few green parts which shows that there are some favourable energy environments in the center of the protein and part with a really bad energy environment is only in the beginning of the protein. So we can deduce that this protein is perhaps correct in the important middle part. The other plot for the 2o1x model ( Figure 11) is completely red. Not only in the beginning but also in the important middle part of the model. This can denote that this model is probably not useful.
Comparison to experimental structure
experimental structure | model with template | RMSD (DaliLite) | RMSD (sap) | TM score |
---|---|---|---|---|
1U5B_A | 1QS0_A | 3.4 | 0.766 | 0.8771 |
1U5B_A | 2O1X_A | 3.3 | 14.305 | 0.1686 |
The RMSD is a measure of the average deviation of the distance between aligned alpha-carbons. The higher this distance value the worse is the model. To calculate the RMSD we used two different programms. Usually the results of both are not the same but they have the same trend. In this case it is different. By comparing the RMSD scores calculated by DaliLate which can be looked up in the table above the 1qs0 model is 0.1 higher than the score of the 2o1x model. So it appears that the model with the 2o1x structure as template is a bit better. But when we compare the scores calculated by the sap command the result is completely different. The 1qs0 has a value of 0.8771 and the 2o1x model has a value of 14.305. Following these results it is obvious that the 1qs0 model is much better which is the opposite to the other RMSD conclusion. But in this case the difference between the scores of the two models is much more significant than in the other case so it can be reasoned that the model with 1qs0 as template is the better model. To confirm this assumption we analyse the TM score. When the TM score is higher than 0.5 it can be said that a model is good. This is not the case for the 2o1x model since it has a score of 0.1686 which is really low. We can argue from this value that the model is bad. In contrary to the model with 1qs0 as it has a score of 0.8771 and so it is declared to be a good model. The conclusion of the TM score supports the one of the RMSD score so it can be said that all in all the 1qs0 model is the better one.
all atom RMSD
position | 1QS0_A | 2O1X_A |
---|---|---|
161 | 0.337 | 3.258 |
166 | 0.585 | 1.028 |
167 | 0.594 | 1.309 |
Additionally to the scores above we calculated the all-atom RMSD scores for the three catalytic center of the two models. The values of this score are definite. At all three catalytic centers the values for the model with 1qs0 as template are much better since low values stay for good models. The really high values for the 2o1x model indicate that this model is quite useless.
Superposition
The calculated RMSD score, TM score and all-atom RMSD score indicate all that the model with 1qs0 as template is the better one and that the other model is quite useless. To check these conclusions we superposed the two models with the structure of our protein. In the visualization of the 1qs0 model superposition in Figure 13 we can see that there are only a few regions of the two structures which could be superposed completely. But the main part of the model is shifted a bit so that the secondary structure elements lay next to each other. This observation shows that the model is just a approximation of the structure but is not perfect. The visualization of the superposition of the 2o1x model in Figure 14 reflects completely the conclusion we made by analysing the scores above. The model is useless as there is no region which could be superposed perfectly and it looks like a superposition of two completely different structures.
To summerize the results of the numeric evaluation and of the comparison to experimental structure we can say that the model with 2o1x can not be used for further analysis since there is no similarity between out protein and this model. The 1qs0 model is not that bad since it has quite good scores which show that it is a good model but by looking at the visualization we see that it has indeed the same structure but it is shifted a bit. So we can work with this model but the results which base on this model won`t be completely correct.
iTasser
Numeric evaluation
C-score
1qs0 | 2o1x | ||||||||
---|---|---|---|---|---|---|---|---|---|
model1 | model2 | model3 | model4 | model5 | model1 | model2 | model3 | model4 | model5 |
1.174 | -0.190 | -0.718 | 0.200 | -5 | -0.150 | -1.276 | -1.863 | -2.155 | -3.208 |
The C-score is a measure for the quality of predicted models by I-TASSER. C-score ranges between [-5,2], where a C-score of higher value signifies a model with a high confidence. First the five models with 1qs0 were analysed. Model1 has a score of 1.174 which a high value at this chart so the quality of this model seems to be good. The only other model which also has a positive score is model4 with 0.200. This is not as high as the score of model1 but it is positive enough to say that this is also a good model. Model2 has a negative score of -0.190 but this value is still much higher than 5 so it is still high enough that it can be declared as a good model. Model3 has a C-score of -0.718. This score is nearly in the middle of the chart which indicates that this model is possibly false. The last model is quite interesting since all the other models had not that bad score but this model has the worst possible score of -5. So it is clear that this model is absolutely useless. Now the models with 2o1x as template are analysed. None of the C-scores is positive which demonstrated that these five models are obviously not very good. The best of the five models is model1 since it has a score of -0.150 which is not very negative. By looking at the scores of the other four models it has to be said that all of them can not be good models because the C-score ranges between -1.276 and -3.208. To summarize the C-scores of the ten models it has to be said that only model1 and model4 which have 1qs0 as template have positive scores indicating that it is only possible to work with them.
Comparison to experimental structure
1qs0 | 2o1x | |||||
---|---|---|---|---|---|---|
No | RMSD (DaliLite) | RMSD (sap) | TMscore | RMSD (DaliLite) | RMSD (sap) | TMscore |
1 | 2.2 | 0.869 | 0.8539 | 3.3 | 2.671 | 0.5377 |
2 | 1.9 | 0.834 | 0.8627 | 1.6 | 1.056 | 0.8598 |
3 | 2.1 | 0.940 | 0.8437 | 3.0 | 2.354 | 0.4688 |
4 | 2.2 | 0.880 | 0.8523 | 4.0 | 2.840 | 0.4904 |
5 | 2.4 | 0.984 | 0.8363 | 3.3 | 3.123 | 0.4938 |
The RMSD is a measure of the average deviation of the distance between aligned alpha-carbons. The higher this distance value the worse is the model. We calculated the RMSD score with two different programms so that we can see it if there is a strange calculation in one of the results and that we can compare the two RMSDs. The other calculated score is the TM score. When it is between 0.5 and 1.0 then the predictec model has the correct topology. In the first analysis we will just look at the models with 1qs0 as template. By comparing the scores of the five models with each other it is conspicuously that all of them have nearly the same value. It doesn't matter which RMSD score is considered. In both cases all the scores differ only minimal. When we go into more detail by looking at the DaliLite-RMSD score we recognize that model3 and model5 have a score which is a bit higher but not significant. So we can say that all five models have a well predicted structure. To get more information about the models to make a better statement we also analyzed the TM score. But here we have got the same result as with the RMSD score. All five TM scores are quite the same and are all higher than 0.5. So we can concluded considering the RMSD scores and the TM score that these five models are all very well predicted and that there is nearly no difference between them.
The next analysis is of the models which have 2o1x as template. By comparing the scores of the different models we can see that here is more divergence. Only the model2 seems to be a good model because it has low RMSD values and also the TM score is far over 0.5. The only other model which has a TM score over 0.5 is model1 but it has quite high RMSD values compared to the other models. Model3, 4 and 5 have all high RMSD scores which shows that their prediction is unconfident. Additionally all of them have a TM score which is lower than 0.5 so their topology is possibly not correct.
Out of all the results we can conclude that the five models which are build with the help of 1qs0 are all very good and useful and of the other fove models only the second one seems to be well predicted and usefull.
all atom RMSD
1qs0 | 2o1x | |||||
---|---|---|---|---|---|---|
model | 161 | 166 | 167 | 161 | 166 | 167 |
1 | 0.739 | 0.826 | 0.786 | 1.009 | 1.542 | 1.807 |
2 | 0.700 | 0.759 | 0.590 | 0.592 | 0.771 | 0.581 |
3 | 1.177 | 0.786 | 0.844 | 2.363 | 4.685 | 5.078 |
4 | 0.906 | 0.852 | 0.989 | 0.798 | 1.211 | 2.984 |
5 | 0.739 | 0.926 | 0.830 | 1.609 | 1.174 | 3.539 |
To calculate the RMSD of the 6A radius of the catalytic center we had to find the catalytic center first. There are three catalytic center on the positions 161, 166 and 167. We calculated the RMSD for all of them. We start with the analysis for the 1qs0 models. Here we can see that there are difference between the five different models although all of them have good values. To go into more detail it has to be said that the second model has the lowest values on each position so it is the most accurate one. Model1 also has good values but they are not as good as the ones of Model2. By looking at the other three models we can see that their values are still good but they are a bit higher than the ones of model1 and 2.
Superposition
1qs0
2o1x
3DJigsaw
3DJigsaw is a server which builds protein models based on already predicted models for a specific target. It recombines the models and optimizes them.
We startet Jigsaw for different categories of sequence identity. The first category used models created by modeller, Swissmodel and iTasser for the 2bfd template. The second Jigsaw run recombined models for a template with low sequence identity (2r8o).
high sequence-identity category:
The following models were chosen to build a recombined model with 3DJigsaw:
- modeller model for template 2bfd
- modeller model for multiple templates
- swissmodel model for template 2bfd
- iTasser model 1 for template 2bfd
- iTasser model 3 for template 2bfd
As the predicted models have quite bad TM-scores (around 0.3), another 3DJigsaw run was startet using the five iTasser models for 2bfd as input. The first run was not evaluated further as the new results are expected to be better. The following models were chosen to build a better recombined model with 3DJigsaw:
- iTasser model 1 for template 2bfd
- iTasser model 2 for template 2bfd
- iTasser model 3 for template 2bfd
- iTasser model 4 for template 2bfd
- iTasser model 5 for template 2bfd
low sequence-identity category: The following models were chosen to build recombined models with 3DJigsaw (inferred from models created with templates with low sequence identity):
- iTasser model 1 for template 2r8o
- iTasser model 2 for template 2r8o
- iTasser model 3 for template 2r8o
- iTasser model 4 for template 2r8o
- iTasser model 5 for template 2r8o
These models were chosen because of their high TM-score.
Prediction for the high-sequence identity-category
AA: SSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKL
Pred: CCCCCCCCCCCCCHHHHHHHCCCCHHHCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHH
Conf: 987768999799968982200028344168867999989985867653479988999999
AA: YKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLM
Pred: HHHHHHHHHHHHHHHHHHHCCCEEEEECCCCHHHHHHHHHHHCCCCCEEEEECCCHHHHH
Conf: 999999999999999999879869982125848999999973586879997303389999
AA: YRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKR
Pred: HCCCCHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCEECCCCHHHHHHHHHHHHHHHHHH
Conf: 879998999999707788888999872123446778104620467559999999999996
AA: ANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIA
Pred: CCCCCEEEEEECCHHHHCCHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCHHHHH
Conf: 599818999986427738849999999986399889999888811246765456806899
AA: ARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAY
Pred: HHHHHCCCCEEEECCCCHHHHHHHHHHHHHHHHHCCCCEEEEEEECCCCCCCCCCCCCCC
Conf: 999863995999977189999999999999998539989999953378787899996628
AA: RSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNP
Pred: CCHHHHHHHHHCCCHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCH
Conf: 998999998853997999999999878999899999999999999999999971688898
AA: NLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred: HHHHHHHHHHCCHHHHHHHHHHHHHHHHCCCCCCHHHHCC
Conf: 9999989774898999999999999987775578658709
Prediction for the low-sequence identity-category
AA: SSLDDKPQFPGASAEFIDKLEFIQPNVISGIPIYRVMDRQGQIINPSEDPHLPKEKVLKL
Pred: CCCCCCCCCCCCCHHHHHHHCCCCHHHCCCCCEEEEECCCCCCCCCCCCCCCCHHHHHHH
Conf: 987768999799968982200028344168867999989985867653479988999999
AA: YKSMTLLNTMDRILYESQRQGRISFYMTNYGEEGTHVGSAAALDNTDLVFGQYREAGVLM
Pred: HHHHHHHHHHHHHHHHHHHCCCEEEEECCCCHHHHHHHHHHHCCCCCEEEEECCCHHHHH
Conf: 999999999999999999879869982125848999999973586879997303389999
AA: YRDYPLELFMAQCYGNISDLGKGRQMPVHYGCKERHFVTISSPLATQIPQAVGAAYAAKR
Pred: HCCCCHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCEECCCCHHHHHHHHHHHHHHHHHH
Conf: 879998999999707788888999872123446778104620467559999999999996
AA: ANANRVVICYFGEGAASEGDAHAGFNFAATLECPIIFFCRNNGYAISTPTSEQYRGDGIA
Pred: CCCCCEEEEEECCHHHHCCHHHHHHHHHHHCCCCEEEEEECCCCCCCCCCCCCCCHHHHH
Conf: 599818999986427738849999999986399889999888811246765456806899
AA: ARGPGYGIMSIRVDGNDVFAVYNATKEARRRAVAENQPFLIEAMTYRIGHHSTSDDSSAY
Pred: HHHHHCCCCEEEECCCCHHHHHHHHHHHHHHHHHCCCCEEEEEEECCCCCCCCCCCCCCC
Conf: 999863995999977189999999999999998539989999953378787899996628
AA: RSVDEVNYWDKQDHPISRLRHYLLSQGWWDEEQEKAWRKQSRRKVMEAFEQAERKPKPNP
Pred: CCHHHHHHHHHCCCHHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHHHCCCCCH
Conf: 998999998853997999999999878999899999999999999999999971688898
AA: NLLFSDVYQEMPAQLRKQQESLARHLQTYGEHYPLDHFDK
Pred: HHHHHHHHHHCCHHHHHHHHHHHHHHHHCCCCCCHHHHCC
Conf: 9999989774898999999999999987775578658709
2.Evaluation of models
General
A detailed description of how the created models were evaluated can be found in the Evaluation Protocol. The following section presents only the modelling and evaluation results.
Two interesting score when comparing two structures for their structural similarity are the RMSD and the TM-Score. These are two measures which are usually used to measure the accuracy of modelling a structure when the native structure is known.
The RMSD is the average distance of all residue pairs in two structures. The C-alpha RMSD is the average distance between aligned alpha-carbons. The smaller the RMSD value, the better the predicted structure. A local error (e.g. misorientation of the tail) will result in a high RMSD value, although the global structure is correct.
As the RMSD is sensitive to the local error, the TM-Score was proposed. The TM-Score weights close matches stronger than distant matches and therefore the local error problem is overcome. A TM-Score <0.5 indicates a model with random structural similarity, wherease 0.5 < TM-score < 1.00 means the two compared structures are in about the same fold and therefore the predicted model has a correct topology.
Modeller
Numeric evaluation
template | molpdf | DOPE score | GA341 score |
---|---|---|---|
2R8O | 11049.43 | -7610.51 | 0.00000 |
2BFD | 2247.36 | -41979.05 | 1.00000 |
1DTW, 1GPU, 2BFB, 2BFD, 2BFE, 2O1X, 2R8O | 13873.63 | -43399.59 | 1.00000 |
The DOPE (Discrete Optimized Protein Energy) score is calculated to assess homology models. The lower the value of the DOPE score the better the . This can be also seen in our three models. The first one (2r8o) which has the worst sequence identity has a quite high DOPE score. The model where 2bfd was the template has a very low score which is reasonable since 2bfd had a very high sequence identity. It is interesting that the model which is build with 7 templates has a higher score than the one which is only build with 1bfd. This can be explained by the influence of the templates which have a low sequence identity with 1u5b.
GA341 is calculated to decide wether the result is a good model or not. A model which is quite good has a score near one. When a model has a score lower than 0.6 it is a bad model. This is also reflected by our results. The model with 2r8o as template is not a good model since the sewuence identity was low and also the DOPE score is quite high so it has a GA341 score of 0. This shows that it is a really bad model. The other two models have a GA341 score of one which shows that they are good models.
Comparison to experimental structure
experimental structure | model with template | RMSD (DaliLite) | RMSD (sap) | TM-score | Superposition |
---|---|---|---|---|---|
1U5B_A | 2BFD_A | 1.1 | 0.442 | 0.3526 | |
1U5B_A | 2R8O_A | no value | 95.095 | 0.1749 | |
1U5B_A | 1DTW_A, 2BFE_A, 2BFB_A, 2BFD_A, 1GPU_A, 2O1X_A, 2R8O_A | 1.4 | 0.396 | 0.3596 |
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The first model using 2r8o as template has no C-alpha RMSD since the programm we used could find enough significant similarities because the structures are to dissimilar. The model build with 2bfd has a C-alpha RMSD score of 1.1. This is a very good score. It is interesting that again the model for 7 template proteins does not have a better score (1.4), although some templates with very high sequence similarity were included. This shows that the templates with low sequence similarity have too much influence on the final model. The model with 2bfd is the best prediction by modeller for our target.
all atom RMSD
position | 2bfd | 2r8o | multi |
---|---|---|---|
161 | 0.478 | - | 0.238 |
166 | 0.186 | - | 0.146 |
167 | 0.184 | - | 0.149 |
It was not possible for pymol to calculate the RMSD value for the second model because it was not possible to create a matching alignment. "multi" includes the models with 1DTW_A, 2BFE_A, 2BFB_A, 2BFD_A, 1GPU_A, 2O1X_A, 2R8O_A as template sequences.
improved alignment
The model which was build with 2r8o was so bad that it was not possible for DaliLite to predict a C-alpha RMSD. So we had to improve it. For this improvement we load the alignment of 1u5b and 2r8o in Jalview <ref>http://www.jalview.org/download.html</ref> to compare the two sequences. To find more equal residues in both sequences we deleted some gaps and checked the Consensus-line to find the amino acids which are in both sequences. With this handmade alignment we repeated the MODELLER-run. To evaluate the resulting model we calculated the C-alpha RMSD and the TMscore.
template | C-alpha score | TMscore | Superposition |
---|---|---|---|
2r8o | 3.1 | 0.1740 |
As we can see the improvement of the alignment was successful since the model has a much better C-alpha score. In comparison to the C-alpha scores of the other modeller results, this model with the smallest sequence identity still performs worst. The TM-score also gets a little bit smaller compared to the unimproved alignment, indicating that the overall model did not improve.
Swissmodel
Numeric evaluation
QMEAN4 global scores
QMEANscore4
2bfd_A | 2r8o_A |
---|---|
0.67 | 0.203 |
QMEANscore4 is calculated to compare whole models. The score ranges between 0 and 1. The higher the value the better is the quality of the model. By comparing the scores of the model with 2bfd as target and 2r8o as target it iat obvious that the first one os the better one since it has a much higher QMEANscore4.
QMEAN Z-Score
2bfd_A | 2r8o_A |
---|---|
-1.604 | -9.522 |
The QMEAN Z-Score represents the absolute quality of a model. Models with a low quality have a strongly negative QMEAN Z-scores. The 2bfd-model has a less negative score than the 2r8o-model which schos again that this model has a better quality.
Score components
2bfd_A | 2r8o_A |
---|---|
Local scores
2bfd_A | 2r8o_A |
---|---|
With the coloring by residue error the inaccuracy of each residue is esitmated . The results are visualised using a color gradient where blue means that assured region and red means that this region is unreliable.
In the model of 2bfd there are many blue alpha helices which means that they are right and only a few red coils. Since blue is the dominant color this shows that the model is mainly right. In contrast the other model has a lot of red and orange alpha helices and coils and nearly no blue region. This reflects the bad quality of this model.
The residue error plot shows the predicted error (y-axis) per residue (x-axis). The highest error score of the 2bfd-model is 12 and the average is about 3 whereas the highest peak score of the 2r8o-model is 15 and the average is about 5. Again it can be seen that the 2bfd-model is the better one.
Global scores: QMEAN4:
2bfd_A | 2r8o_A | |||
---|---|---|---|---|
Scoring function term | Raw score | Z-score | Raw score | Z-score |
C_beta interaction energy | -162.66 | 0.54 | 74.97 | -4.18 |
All-atom pairwise energy | -10811.93 | 0.35 | 2113.21 | -5.03 |
Solvation energy | -27.04 | -1.02 | 26.87 | -5.92 |
Torsion angle energy | -75.78 | -1.45 | 36.84 | -6.47 |
QMEAN4 score | 0.670 | -1.60 | 0.203 | -9.52 |
Local Model Quality Estimation
2bfd_A | 2r8o_A |
---|---|
For the local model quality estimation we chose the ANOLEA potential. This program performs energy calculations on a protein chain. On the y-axis the energyof each amino acid is represented. Negative energy values (in green) represent favourable energy environment whereas positive values (in red) unfavourable energy environment for a given amino acid. The result of the comparison of this estimation between the 2bfd-model and the 2r8o-model is quite clear since nearly the whole left plot is green and nearly the whole right plot is red. These two plots show that the 2bfd-model is much better than the other one.
Comparison to experimental structure
experimental structure | model with template | RMSD (DaliLite) | RMSD (sap) | TMscore | Superposition |
---|---|---|---|---|---|
1U5B_A | 2BFD_A | 1.1 | 0.288 | 0.1640 | |
1U5B_A | 2R8O_A | 3.1 | 2.110 | 0.1639 |
C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The 2bfd-model has a score of 1.1 and the 2r8o-model has a score of 3.1. This comparison shows clearly that the first model is mcuh better than the second one.
all atom RMSD
position | 2bfd | 2r8o |
---|---|---|
161 | 0.165 | 1.075 |
166 | 0.126 | 2.585 |
167 | 0.127 | 2.043 |
improved alignment
experimental structure | model with template | C-alpha RMSD | TMscore | Superposition |
---|---|---|---|---|
1U5B_A | 2R8O_A | 0.1592 |
iTasser
Numeric evaluation
C-score
2bfd | ||||
---|---|---|---|---|
model1 | model2 | model3 | model4 | model5 |
1.999 | -3.781 | -4.970 | -4.970 | -3.781 |
The C-score is a measure for the quality of predicted models by I-TASSER. C-score ranges between [-5,2], where a C-score of higher value signifies a model with a high confidence.
Comparison to experimental structure
2bfd | 2r8o | |||||||
---|---|---|---|---|---|---|---|---|
No | TMscore | RMSD (DaliLite) | RMSD (sap) | Superposition | TMscore | RMSD (DaliLite) | RMSD (sap) | Superposition |
1 | 0.9709 | 0.49 | 0.312 | 0.5190 | 3.4 | 3.377 | ||
2 | 0.8609 | 1.44 | 0.354 | 0.4979 | 3.2 | 3.935 | ||
3 | 0.8597 | 1.43 | 0.478 | 0.4871 | 3.0 | 3.476 | ||
4 | 0.8549 | 1.71 | 0.493 | 0.5354 | 4.8 | 2.449 | ||
5 | 0.8251 | 1.73 | 0.348 | 0.5107 | 6.0 | 2.540 |
To calculate the RMSD of the 6A radius of the catalytic center we had to find the catalytic center first. There are three catalytic center on the positions 161, 166 and 167. We calculated the RMSD for all of them.
2bfd | 2r8o | |||||
---|---|---|---|---|---|---|
model | 161 | 166 | 167 | 161 | 166 | 167 |
1 | 0.269 | 0.251 | 0.191 | 5.940 | 1.070 | 0.952 |
2 | 0.349 | 0.348 | 0.271 | 0.676 | 1.141 | 1.142 |
3 | 0.480 | 0.467 | 0.330 | 1.527 | 1.252 | 1.310 |
4 | 0.440 | 0.507 | 0.430 | 1.748 | 1.224 | 1.074 |
5 | 0.299 | 0.291 | 0.269 | 1.315 | 1.053 | 1.180 |
All of these models are very good which is shown by the table since they have all a high TMscore and a low C-alpha RMSD score. But this is clear because they are the top 5 hits of iTasser. Perhaps the first model is a bit better than the other 4. This can be expected since the Scores are a bit better than of the other 4 models.
3DJigsaw
Numeric evaluation
high sequence identity | low sequence identity | |||||
---|---|---|---|---|---|---|
No | Energy | Coverage | Ramachandran Plot | Energy | Coverage | Ramachandran Plot |
1 | -504.14 | 0.98 | -442.89 | 1.0 | ||
2 | -503.04 | 0.98 | -442.26 | 1.0 | ||
3 | -502.83 | 0.98 | -441.89 | 1.0 | ||
4 | -502.16 | 0.98 | -441.76 | 1.0 | ||
5 | -501.32 | 0.98 | -441.62 | 1.0 |
Comparison
high sequence-identity category | low sequence-identity category: | |||||||
---|---|---|---|---|---|---|---|---|
Model | RMSD (DaliLite) | RMSD (sap) | TM-score | Superposition | RMSD (DaliLite) | RMSD (sap) | TM-score | Superposition |
1 | 0.6 | 0.347 | 0.9887 | 3.3 | 3.862 | 0.5031 | ||
2 | 0.6 | 0.347 | 0.9887 | 3.9 | 3.910 | 0.5028 | ||
3 | 1.3 | 0.439 | 0.9712 | 3.8 | 3.932 | 0.5029 | ||
4 | 1.5 | 0.998 | 0.9629 | 3.3 | 3.902 | 0.4982 | ||
5 | 1.6 | 0.993 | 0.9617 | 3.8 | 3.968 | 0.5031 |
As expected the 3DJigsaw prediction based on models for a template with high sequence identity to our target is very good. The RMSD values are very low and the TM-scores are all close to 1.0. The predicted models for a template with high sequence identity are therefore good models which could be used to assign a structure to our target.
The models created by 3DJigsaw based on the iTasser models for the template with little sequence identity are also an improvement compared to the iTasser models which were used as input.
So in our case 3DJigsaw used the given information and improved the previously predicted programs.
Comparison of the methods
Numerical Evaluation
The following tables again list the RMSD and TM-score values, which were computed before, to provide an overview of the performance of the different methods.
modeller
2BFD_A | 2R8O_A | Multi | |||
---|---|---|---|---|---|
C-alpha RMSD | TMscore | C-alpha RMSD | TMscore | C-alpha RMSD | TMscore |
1.1 | 0.3526 | 3.1 | 0.1749 | 1.4 | 0.3596 |
Swissmodel
2BFD_A | 2R8O_A | ||
---|---|---|---|
C-alpha RMSD | TMscore | C-alpha RMSD | TMscore |
1.1 | 0.1640 | 3.1 | 0.1639 |
iTasser
2bfd | 2r8o | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model1 | model2 | model3 | model4 | model5 | model1 | model2 | model3 | model4 | model5 | ||||||||||
RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore |
0.49 | 0.9709 | 1.44 | 0.8609 | 1.43 | 0.8597 | 1.71 | 0.8549 | 1.73 | 0.8251 | 3.4 | 0.5190 | 3.2 | 0.4979 | 3.0 | 0.4871 | 4.8 | 0.5354 | 6.0 | 0.5107 |
3DJigsaw
models for template 2bfd | models for template 2r8o | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
model1 | model2 | model3 | model4 | model5 | model1 | model2 | model3 | model4 | model5 | ||||||||||
RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore | RMSD | TMscore |
0.347 | 0.9887 | 0.347 | 0.9887 | 0.439 | 0.9712 | 0.998 | 0.9629 | 0.993 | 0.9617 | 3.862 | 0.5031 | 3.910 | 0.5028 | 3.932 | 0.5029 | 3.902 | 4.982 | 3.968 | 0.5031 |
Discussion
To compare the predicted models and the real crystallized structure of our template different scores (RMSD, TM-score) were calculated. Based on these scores iTasser computed the best ab initio models for our template. Especially the TM-score is much higher for all of the iTasser models compared to the modeller and Swissmodel predictions.
The similarity of the template is a limiting factor for the model prediction. In our case, the best model for a sequence with only 33% sequence similarity to the target had an RMSD value of 3.4 and an TM-score of 0.5190 (iTasser model1).
References
<references />
back to Maple syrup urine disease main page
back to Secondary_Structure_Prediction_BCKDHA
go to Task 5: Mapping SNPs