Homology-modelling HEXA

From Bioinformatikpedia
Revision as of 20:51, 13 June 2011 by Uskat (talk | contribs) (iTasser)

Homology structure groups

We decided to choose one protein from each sequence identity group:

The complete HHsearch output can be found [here ]

We used following proteins:

> 60% sequence identity
PDB id name similarity
3bc9_A AMYB, alpha amylase 80.8%
> 40%
3cui_A EXO-beta-1,4-glucanase; 49.5%
< 40%
3hn3_A Beta-G1, beta-glucuroni 25.1%
3lut_A Voltage-gated potassium 20.1%

Calculation of the models

Swissmodel

To calculate the models with Swiss-Model we used the [Webserver]. For the template with high sequence identity, we used the automated and the alignment method, for the other two templates we only used the alignment method.

The used alignments can be found [here].

Modeller

We used Modeller from the command line. Therefore we followed the instructions, described [here].

First of all, we created an alignment for each of our three selected sequences. In the next step we used Modeller to model the 3D structure of the protein.

For Modeller we used the Pir Alignment format, which can be found here: [3BC9], [3CUI], [3LUT]

iTasser

To calculate our models with iTasser we used the [Webserver]. We defined the target and template sequence, but this time without an alignment. We used the same template sequences as before.

Results

Swissmodel

3BC9:

todo

3CUI:

The detailed prediction can be found [here]

Swiss-Model also give same scores to give the user the possibility to estimate the quality of the predicted model, which are showed in the next paragraphes.

The most important score in the following table is the QMEAN4 score, because this score consists of the scores above and give the user the possibility to compare different results.

 Global Score
Scoring function term Raw score Z score
C_beta interaction energy 202.24 -4.65
All-atom pairwise energy 9942.28 -6.16
Solvation energy 67.79 -8.08
Torsion angle energy 76.36 -7.72
QMEAN4 score 0.057 -11.76

Swiss-Modeler also returns some pictures, which show the qualitity of the model.

Predicted Structure:

Prediction the structure of HEXA_HUMAN with 3cui as template structure
Prediction of the wrong predicted residues



Model qualitity:

Visualisation of the QMEAN Z-Score for this model
Visualisation of the QMEAN score in comparison with a gaussian distribution
Quality of the model in comparison to a X-ray structure
Plot, which shows the wrong predicted residues of this model


3LUT:
We decided to model the 3D structure with the template structure which has a very low sequence identity. Therefore, we decided to model the structure with 3LUT. Sadly, Swissmodel could not model the structure with this template, because Swissmodel was not able to create an alignment which can be used as basic for the model. Because of this, we decided to calculate a model with 3HN3 as template, which was the next template with a little bit more sequence identity. Therefore, we present here the results of the modelling with 3HN3, whereas by the other methods, we used 3LUT.
3HN3:

The detailed prediction can be found [here]

Swiss-Model also give same scores to give the user the possibility to estimate the quality of the predicted model, which are showed in the next paragraphes.

The most important score in the following table is the QMEAN4 score, because this score consists of the scores above and give the user the possibility to compare different results.

Global Score
Scoring function term Raw score Z score
C_beta interaction energy 120.70 -5.31
All-atom pairwise energy 2585.98 -5.22
Solvation energy 71.87 -9.92
Torsion angle energy 80.43 -8.44
QMEAN4 score 0.010 -12.80

Swiss-Modeler also returns some pictures, which show the qualitity of the model.

Predicted Structure:

Prediction the structure of HEXA_HUMAN with 3hn3 as template structure
Prediction of the wrong predicted residues



Model qualitity:

Visualisation of the QMEAN Z-Score for this model
Visualisation of the QMEAN score in comparison with a gaussian distribution
Quality of the model in comparison to a X-ray structure
Plot, which shows the wrong predicted residues of this model

Modeller

3BC9:

Modeller calculated one model for 3BC9, which can be seen in the next picture:

3D structure of HEXA_HUMAN with 3BC9 as template predicted by Modeller.

3CUI:

Modeller calculated one model for 3CUI, which can be seen in the next picture:

3D structure of HEXA_HUMAN with 3CUI as template predicted by Modeller.

3LUT:

Modeller calculated one model for 3LUT, which can be seen in the next picture:

3D structure of HEXA_HUMAN with 3LUT as template predicted by Modeller.

iTasser

iTasser delivers a wide range of result with many predicted informations. The first ones are the predicted secondary structure and the predicted solvent accessibility. Furhtermore it provides the first top 5 predicted models, the predicted function, predicted GO terms and the predicted binding site. The predicted secondary structure elements are shown as H for alpha helix (red),S for beta sheet (blue) & C for coil (yellow). The predicted solvent accessibility has values range from 0 (buried residue) to 9 (highly exposed residue) which describes the solvent accessibility. The predicted function are the predicted EC numbers which are the TM-score, the RMSD score etc. The predicted GO terms are the molecular function, biological process or the cellular location. There are many different predicted GO terms for each protein.

3BC9:

The following picture shows the predicted secondary structure of 3BC9.

Predicted secondary structure of 3BC9(chain A)

The following picture shows the predicted solvent accessibility of 3BC9.

Predicted solvent accessibility of 3BC9(chain A)

The following pictures show the top 5 models predicted by iTasser. They have different c-values.

First predicted model for 3BC9 (chain A)
Second predicted model for 3BC9 (chain A)
Third predicted model for 3BC9 (chain A)
Fourth predicted model for 3BC9 (chain A)
Fifth predicted model for 3BC9 (chain A)

The following picture shows the predicted binding site of 3BC9.

Predicted binding site of 3BC9(chain A)

3CUI:

The following picture shows the predicted secondary structure of 3CUI.

Predicted secondary structure of 3CUI (chain A)

The following picture shows the predicted solvent accessibility of 3CUI.

Predicted solvent accessibility of 3CUI(chain A)

The following pictures show the top 5 models predicted by iTasser. They have different c-values.

First predicted model for 3CUI (chain A)
Second predicted model for 3CUI (chain A)
Third predicted model for 3CUI (chain A)
Fourth predicted model for 3CUI (chain A)
Fifth predicted model for 3CUI (chain A)

The following picture shows the predicted binding site of 3CUI.

Predicted binding site of 3CUI(chain A)


3LUT:

The following picture shows the predicted secondary structure of 3LUT.

Predicted secondary structure of 3LUT (chain A)

The following picture shows the predicted solvent accessibility of 3LUT.

Predicted solvent accessibility of 3LUT(chain A)

The following pictures show the top 5 models predicted by iTasser. They have different c-values.

First predicted model for 3LUT (chain A)
Second predicted model for 3LUT (chain A)
Third predicted model for 3LUT (chain A)
Fourth predicted model for 3LUT (chain A)
Fifth predicted model for 3LUT (chain A)

The following picture shows the predicted binding site of 3LUT.

Predicted binding site of 3LUT(chain A)

Analysis

RMSD and TM-Score

To get the possibility to estimate the quality of the predicted models, we calculated the RMSD between the observed protein and the predicted protein. For this purpose, we used PyMol, because PyMol is able to superpose two 3D protein structures and to calculate the RMSD. Furthermore, we used TM-align to calculate the RMSD and also the TM-Score. Therefore, we used following command:

align model1, model2

As an output, we got the superposed structures, the number of aligned residues and the RMSD.

We decided not to use the RMSD, which is calculated by Modeller, because we wanted to calculate the RMSD for each structure in the same way, to get comparative results. In the pictures, which show the superposed structures, the green one is always the target, whereas the red one is always the template.
For 3D-Jigsaw we choose for the template 3CUI the models of Swissmodel, Modeller, iTasser Model1, iTasser Model2 and iTasser Model3. For 3BC9 and 3LUT, we choose Modeller, iTasser Model1, iTasser Model2, iTasser Model3 and iTasser Model5. Here we could not use the Swiss model, because in case of 3BC9, the Swissmodel run was not successful and for 3LUT we had to use 3HN3, because the sequence identity between our protein and 3LUT was too low. These models were not the best models we got, but we wanted to check, whether a bad model has a bad influence of the result of 3D-Jigsaw, or if 3D-Jigsaw can compensate one bad model.

Results for 3BC9:

Swissmodel Modeller iTasser Model 1 iTasser Model 2
RMSD (Pymol) todo 26.271 1.075 16.450
RMSD (TM-align) todo 5.94 0.63 5.76
TM Score todo 0.43072 0.92545 0.41556
Structural alignment (Pymol) todo
Superposition by Pymol with the structure predicted by Modeller
Superposition by Pymol with the structure predicted by iTasser (Model 1)
Superposition by Pymol with the structure predicted by iTasser (Model 2)
Structural alignment (TM-align)
Superposition by TM-align with the structure predicted by Modeller
Superposition by TM-align with the structure predicted by iTasser (Model 1)
Superposition by TM-align with the structure predicted by iTasser (Model 2)
iTasser Model 3 iTasser Model 4 iTasser Model 5 Jigsaw
RMSD (Pymol) 19.716 10.824 20.450 0.648
RMSD (TM-align) 5.54 6.27 5.55 0.35
TM-Score 0.39710 0.50099 0.39822 0.99607
Structural alignment (Pymol)
Superposition by Pymol with the structure predicted by iTasser (Model 3)
Superposition by Pymol with the structure predicted by iTasser (Model 4)
Superposition by Pymol with the structure predicted by iTasser (Model 5)
Superposition by Pymol with the structure predicted by 3D-Jigsaw
Structural alignment (TM-align)
Superposition by TM-align with the structure predicted by iTasser (Model 3)
Superposition by TM-align with the structure predicted by iTasser (Model 4)
Superposition by TM-align with the structure predicted by iTasser (Model 5)
Superposition by Pymol with the structure predicted by 3D-Jigsaw



Results for 3CUI:

Swissmodel Modeller iTasser Model 1 iTasser Model 2
RMSD (Pymol) 24.447 23.856 1.13 4.168
RMSD (TM-align) 5.49 5.46 0.56 3.73
TM Score 0.45333 0.44048 0.92583 0.75323
Structural alignment (Pymol)
Superposition by Pymol with the structure predicted by Modeller
Superposition by Pymol with the structure predicted by Modeller
Superposition by Pymol with the structure predicted by iTasser (Model 1)
Superposition by Pymol with the structure predicted by iTasser (Model 2)
Structural alignment (TM-align)
Superposition by TM-align with the structure predicted by Modeller
Superposition by TM-align with the structure predicted by Modeller
Superposition by TM-align with the structure predicted by iTasser (Model 1)
Superposition by TM-align with the structure predicted by iTasser (Model 2)
iTasser Model 3 iTasser Model 4 iTasser Model 5 Jigsaw
RMSD (Pymol) 8.979 1.443 16.502 21.059
RMSD (TM-align) 4.72 1.26 5.86 5.18
TM-Score 0.60810 0.91279 0.40638 0.41901
Structural alignment (Pymol)
Superposition by Pymol with the structure predicted by iTasser (Model 3)
Superposition by Pymol with the structure predicted by iTasser (Model 4)
Superposition by Pymol with the structure predicted by iTasser (Model 5)
Superposition by Pymol with the structure predicted by 3D-Jigsaw
Structural alignment (TM-align)
Superposition by TM-align with the structure predicted by iTasser (Model 3)
Superposition by TM-align with the structure predicted by iTasser (Model 4)
Superposition by TM-align with the structure predicted by iTasser (Model 5)
Superposition by Pymol with the structure predicted by 3D-Jigsaw



Results for 3LUT:

As describe above, we had to use 3HN3 for the Swissmodel prediction, because the prediction did not work for 3LUT. Therefore, here are listed the prediction results for 3LUT and also 3HN3 (only by Swissmodel)

Swissmodel Modeller iTasser Model 1 iTasser Model 2
RMSD (Pymol) 27.968 24.153 1.796 1.491
RMSD (TM-align) 4.30 5.29 1.58 1.40
TM Score 0.40661 0.38126 0.89839 0.90660
Structural alignment (Pymol)
Superposition by Pymol with the structure predicted by Modeller
Superposition by Pymol with the structure predicted by Modeller
Superposition by Pymol with the structure predicted by iTasser (Model 1)
Superposition by Pymol with the structure predicted by iTasser (Model 2)
Structural alignment (TM-align)
Superposition by TM-align with the structure predicted by Modeller
Superposition by TM-align with the structure predicted by Modeller
Superposition by TM-align with the structure predicted by iTasser (Model 1)
Superposition by TM-align with the structure predicted by iTasser (Model 2)
iTasser Model 3 iTasser Model 4 iTasser Model 5 Jigsaw
RMSD (Pymol) 3.489 9.949 20.406 already running
RMSD (TM-align) 3.33 5.64 5.77 already running
TM-Score 0.78407 0.54518 0.38126 already running
Structural alignment (Pymol)
Superposition by Pymol with the structure predicted by iTasser (Model 3)
Superposition by Pymol with the structure predicted by iTasser (Model 4)
Superposition by Pymol with the structure predicted by iTasser (Model 5)
already running
Structural alignment (TM-align)
Superposition by TM-align with the structure predicted by iTasser (Model 3)
Superposition by TM-align with the structure predicted by iTasser (Model 4)
Superposition by TM-align with the structure predicted by iTasser (Model 5)
already running

Conclusion

The RMSD (root-mean square deviation) calculates the distance between two aligned residues. A RMSD near to 0 is a very good result, because than there are only less deviation between temaplte and target.
But the RMSD score weights the distance between all residue pairs equally. This means, that some very distant residues can arise the RMSD value dramatically, although the overall topology of the two proteins is quite similar. Another problem with the RMSD is, that the length of the two proteins don't receive attention by the calculation. Therefore, long proteins have almost a worse RMSD value in contrast to short ones, even if the topology of both protein pairs is equal.
We used the RMSD calculation by Pymol and also by TM-align. As you can see in the tables above, there is a big difference between these two RMSD values. This can be explained by different calculation methods to caluclate the RMSD. So first of all, it is important to clariy how these two methods calculate the RMSD.
Pymol first does a sequence alignment and then try to align the structures to minimize the RMSD between all aligned residues. TM-align indeed first rotate one structure to the other in an optimal way and in the next step the RSMD between the corresponding residues is calculated. These two approaches are totally different and also lead to different results.

As describe before, the RMSD value has some problems. Therefore, we also calculated the TM-score, which receive attention to the length of the protein structures. A TM-Score of 1 means that template and target have the same structure, a TM-Score > 0.5 means, both structures have the same fold, whereas a TM-Score < 0.2 means that both structures are totally different.
The TM-Score has also some problems. The most important problem is, that if it is not possible to align any residues between the two structures, the Score will be 1. So keep in mind, if there is a score of 1, look at the picture to see, if the structures are identically or if the TM-score failed.

Comparison of the different methods

The first we can see on the tables above is, that the RMSD score calculated by Pymol is always much higher than the RMSD score calulcated by TM-align. Therefore, it is more effective to rotate the structure to each other, than to use sequence and structure alignment. This can be seen by looking at the RMSD score, but also by looking at the pictures, which show the superposed structures. Furthermore, Modeller and Swissmodel both predict the structure bad. Both methods always have a very high RMSD and a very low TM-Score. To learn more about the prediction results, we analysed the scores for each template.

  • 3BC9:

3BC9 is the template with the highest sequence identity. Therefore, the predicted results should be very similar to our structure. The prediction of Modeller is really bad and also iTasser predicted wrong structures. Only model 1 of iTasser is very similar to the real structure, which can also be seen in the RMSD (near to 0) and the TM-Score (near to 1).
The best result with 3BC9 as target was the iTasser model1 prediction.

  • 3CUI:

3CUI has a sequence identity of 49.5%, which is not that much, but it should be possible to predict a structure which is almost similar to the real structure. As before, Swissmodeller and Modeller predict structures which fit not very well to our real structure. But iTasser predicted two models, which are very similar to our structure. Model1 and Model4 have very low RMSD values, high TM-Scores and with a look to the pictures it is clear, that target and template structure are really similar.
So again, in this case we got the best result from iTasser.

  • 3LUT /3HN3:

Swissmodel was not able to predict the structure of our target with 3LUT as template. Therefore, we used 3HN3, which has with 25% a bit more sequence identity than 3LUT (20%). We suggest, that this prediction result is the worst result, because of this low sequence identity. Interesstingly, the prediction results of Modeller and Swissmodel are not much worse than their result with 3CUI as template. Furthermore, iTasser predicted two models, which fit very well to our real structure and also has very low RMSD scores and high TM-Scores.
We want to highlight, that this result is not the norm. We aligned the structure of 2GJX:A and 3LUT:A and the TM-Score between these two structures is 0.50014, the RMSD 5.04, which is a very good result regarding that the sequence identity is that low. So in this case we were lucky to get such a good result, but in general, the results by predicting two that much distinct sequences is much worse.
In agreement with the two results from above, iTasser again gave the best results.

In sum, iTasser is the best prediction method from the three used methods. But iTasser also needs a lot of time to predict the sequences and also allows only one sequence per user to predict in the same time. Therefore, if there is enough time, iTasser is the best choice. If there is not that much time, Modeller and Swissmodel can be used. Both methods have approximalty the same prediction results. Modeller can only run on the command line, which means Modeller have to be installed on the system. If the user just want to install Modeller, it will take a while, because Modeller sends a licence per E-Mail which can take up to one day. Swissmodel is available on the Internet and can be used without any delay. So if the user only want to get an approximat estimation of the structure of the protein and do not have that much time, Swissmodel will be the right choice.