Homology based structure predictions BCKDHA

1.Calculation and evaluation of models

Template selection

Homology modelling is a technique to determine the secondary structure of a target protein. It is based on an alignment of the target sequence and one or more template sequences with known secondary structures. The target sequence is assigned a secondary structure based on the template structure. The better the alignment, the better the predicted secondary structure for our template. Therefore the template selection is a crucial step in homology modelling.

To find similar structures to BCKDHA we ran HHsearch using the following command:
hhsearch -i query -d database -o output

It found the following 10 hits in the pdb70 database.

No	Hit	Prob	E-value	P-value	Score	Cols	Query HMM	Template HMM	Identity
1	2bfd_A 2-oxoisovalerate dehydr	1.0	1	1	791.3	400	1-400	1-400 (400)	99%
2	1qs0_A 2-oxoisovalerate dehydr	1.0	1	1	571.5	349	32-382	52-407 (407)	39%
3	1w85_A Pyruvate dehydrogenase	1.0	1	1	530.8	356	8-382	6-362 (368)	34%
4	1umd_A E1-alpha, 2-OXO acid de	1.0	1	1	521.8	351	34-386	16-367 (367)	37%
5	2ozl_A PDHE1-A type I, pyruvat	1.0	1	1	482.7	331	46-380	25-356 (365)	27%
6	3l84_A Transketolase; TKT, str	1.0	1	1	85.4	133	161-297	113-252 (632)	21%
7	2r8o_A Transketolase 1, TK 1;	1.0	1	1	74.5	121	161-285	113-245 (669)	33%
8	2o1x_A 1-deoxy-D-xylulose-5-ph	1.0	1	1	74.2	127	161-287	122-254 (629)	18%
9	1gpu_A Transketolase; transfer	1.0	1	1	74.2	140	161-302	115-265 (680)	22%
10	3m49_A Transketolase; alpha-be	1.0	1	1	68.8	121	161-285	139-271 (690)	31%

Before we can start working with these hits we have to check whether one of them is a PDB structure for BCKDHA. This is the case for 2bfd_A.
By looking at our results and the fact that this hit can not be used we only have structures with an identity lower than 40%. Since there are just structures available from this region we decided to take two structures out of it. One with a 39% identity and one with 18% identity so that there is still a variation in the identities.
In the following we worked with 1qs0_A (39%) and with 2o1x_A (18%).

General information for the evaluation

A detailed description of how the created models were evaluated can be found in the Evaluation Protocol. The following section presents only the modelling and evaluation results.

Three interesting scores when comparing two structures for their structural similarity are the Cα RMSD, the all-atom RMSD and the TM Score. These are three measures which are usually used to calculate the accuracy of modelling a structure when the native structure is known. In the following we will call the Cα RMSD only RMSD.

The RMSD is the average distance of all residue pairs in two structures. The C-alpha RMSD is the average distance between aligned alpha-carbons. The smaller the RMSD value, the better the predicted structure. A local error (e.g. missorientation of the tail) will result in a high RMSD value, although the global structure is correct.
The all-atom RMSD is calculated of the residues which are in an area of 6 around the active sites. As in the Cα RMSD the models which have a low value are the better ones.
As the RMSD is sensitive to the local error, the TM Score was proposed. The TM Score weights close matches stronger than distant matches and therefore the local error problem is overcome. A TM Score <0.5 indicates a model with random structural similarity, whereas 0.5 < TM score < 1.00 means the two compared structures are in about the same fold and therefore the predicted model has a correct topology.

Modeller

MODELLER is used for homology or comparative modelling of protein three-dimensional structures. It calculates a model containing all non-hydrogen atoms. There are also many other features provided by MODELLER like de novo modelling of loops in protein structures, optimization of various models of protein structure with respect to a flexibly defined objective function, multiple alignment of protein sequences and/or structures, clustering, searching of sequence databases, comparison of protein structures, and so on.[1]

A tutorial is provided on [2] and on [3]

To run modeller with more than one template we use the targets (the percentage values indicate the sequence similarity to the target):

3m49:A (31%)
2r8o:A (33%)
2o1x:A (18%)
1w85:A (34%)
1qs0:A (39%)

Protocol Modeller

Results

Numeric evaluation

template	molpdf	DOPE score	GA341 score
1QS0_A	2650.7	-40503.1	1.000
2O1X_A	2958.0	-30294.5	0.419
3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A	123913.8	-19573.7	0.001

The DOPE (Discrete Optimized Protein Energy) score is calculated to assess homology models. The lower the value of the DOPE score the better the model. This can be also seen in our three models. 1qs0 which has the highest sequence identity, definitely has the lowest DOPE score which is obvious because of the high identity. The model where 2o1x was the template has a higher score which is reasonable since 2o1x has a sequence identity of 18% whereas the first model (1qs0) has a sequence identity of 39%. This shows that although both structures were in the group with the lower sequence identities there are noticeable differences between them. The last model was build with five different structures. Normally this is helpful for the program because when more structures are included in the prediction, the model can be predicted more precisely as if there is a prediction with only one template structure. In this case the problem was that all structures which are combined to build a model had no high sequence identity. So the information Modeller got to build the model were not helpful. This is reflected in the scores because this model has the highest DOPE score. So as expected the model with 1qs0 as template structure is the most homologous model.

GA341 is calculated to decide wether the result is a good model or not. A model which is quite good has a score near one. When a model has a score lower than 0.6 it is a bad model. This is also reflected by our results. The model with 1qs0 as template is a very good model since the GA341 score is 1.0. This is a bit strange since the sequence identity to our protein is not very good. Of course the DOPE score was good, too. But it can not be correct that a model with a template which has only 39% sequence identity has the best possible GA341 score. The other two models have a score lower than 0.6 which shows that both of them are bad. It is interesting that the model with the 5 templates only has a score of 0.001 which seems a bit to low because the average sequence identity of the used structures is higher than the one of 2o1x which has a GA341 score of 0.419. All in all we can conclude that the 1qs0 model is the most accurate one.

To sum up the results of the two scores it is to say that although all of the structures have a low sequence identity the model with 1qs0 as template is the best one.

Comparison to experimental structure

experimental structure	model with template	RMSD (DaliLite)	RMSD (sap)	TM-score
1U5B_A	1QS0_A	2.3	0.829	0.8504
1U5B_A	2O1X_A	3.5	2.727	0.1592
1U5B_A	3M49_A, 2R8O_A, 2O1X_A, 1W85_A, 1QS0_A	no score	11.398	0.1719

C-alpha RMSD is a measure of the average deviation in distance between aligned alpha-carbons. The higher this distance value the worse is the model. The first model using 2o1x as template has a RMSD score of 2.3 or 0.829. In both cases the value is lower than the ones of the other two models. Since a low value indicates for a good model this model is the best of the three according to the RMSD value. The RMSD score for the 2o1x model is only a bit higher so it seems that this model is not much worse than the first one. For the third model DaliLite was not able to calculate a RMSD score at all because it could not find enough significant similarities because the structures are to dissimilar. This dissimilarity is reflected by the RMSD value which is calculated with the sap command because it is very high compared with the other two values of sap. An explanation for this bad result for the last model could be that there is too many false information used during the building process. By comparing the TM scores of the three models with each other we can see that only one model has a value higher than 0.5 which means that only one model is significant good. The 1qs0 model has a TM score of 0.829 so it is declared to be a good model whereas the other two models have a TM score of about 0.1 which is far lower than 0.5 and that indicates that both models are useless.

all-atom RMSD

position	1qs0	2o1x	multi
161	0.332	6.172	2.607
166	0.668	3.697	3.208
167	0.656	6.759	7.962

Additionally we calculated the all-atom RMSD scores for the three catalytic centers of the three models. Like in all the other scores above we can notice that the model with 1qs0 as template is the best one. This is pointed out by the fact that at all three catalytic centers the all-atom RMSD values are the lowest ones. There is one interesting observation by comparing the values of the other two models because at the first catalytic center the model with 2o1x has a much worse score than the model with the five structures as template. At the second center the score of the 2o1x model is just a little bit lower and at the third center it is even higher. So by looking at the all-atom RMSD valus it can not be decided wether the second or the third model is the better one.

Superposition

Figure1: Superimposed structures of 1U5B and the modeller model with template 1QS0

Figure2: Superimposed structures of 1U5B and the modeller model with template 2O1X

Figure3: Superimposed structures of 1U5B and the modeller model with more than one template

All the calculated scores above declare the model which has the structure of 1qs0 as template beeing the best model. By looking at the visulization ( Figure 1) the assertion of all these scores can be approved. As we can see especially the alpha helices are quite good aligned although there are some which are not aligned. Another fact which shows that the two structures are not perfectly aligned is that on the left and right side of the superposition are two structures which are completely not aligned. But all in all it seems that the model is compatible with our protein. Expecially by comparing it with the two other models. The 2o1x model which is visualized in Figure 2 has no aligned structure so that it appears that there are two completely different structures superposed. This impression is supported by the calculated scores above which show that the model using 2o1x as template does not fit very good. This also applies to the third model. As we can see in Figure 3 there is no match between the two structures and so there is also no aligned structure. Again this result could be suspected because of the bad evaluation scores.

SWISS-MODEL

Figure4: SWISS-MODEL server page

To find protein structure homology models SWISS-MODEL can be used. SWISS-MODEL is a fully automated protein structure homology-modeling server and is accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).
It provides three different modelling modes:

Automated Mode
Alignment Mode
Project Mode

The Automated Mode uses fully automated modelling and can therefore be only used when the template is very similar to the target.<ref>http://swissmodel.expasy.org/?pid=smd03&uid=&token=</ref>
As an Input for the automated mode, only an amino acid sequence (raw or FASTA format) or the Uniprot AC of the target is required as it is show in Figure 4. Optional a template PDB id can be given. Swissmodel automatically selects templates from a Blast run which are suitable due to their E-values if no template is given. The Alignment Mode has to be used for the structures with a low identity. Since we only have hits in the region < 40% we used this tool.

Protocol Swissmodel