Difference between revisions of "Task 5 (MSUD)"
(Created page with "Lab journal == Results == == Discussion ==") |
(→Discussion) |
||
(34 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | == Results == |
||
+ | |||
[[Lab Journal of Task 5 (MSUD)|Lab journal]] |
[[Lab Journal of Task 5 (MSUD)|Lab journal]] |
||
− | == |
+ | === Modeller === |
+ | |||
+ | ==== Single template modelling ==== |
||
+ | Following structures were used as template: |
||
+ | <table style=" text-align: left;" border="1" cellpadding="2" |
||
+ | cellspacing="2"> |
||
+ | <tr> |
||
+ | <td>'''PDB ID'''</td> |
||
+ | <td>'''Sequence Identity (to 1U5B)'''<br> |
||
+ | </td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td> 2BFE<br> |
||
+ | </td> |
||
+ | <td> 99%<br> |
||
+ | </td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td style="vertical-align: top;">3EXG<br> |
||
+ | </td> |
||
+ | <td style="vertical-align: top;">24.9%<br> |
||
+ | </td> |
||
+ | </tr> |
||
+ | </table> |
||
+ | |||
+ | <gallery widths=480px heights=300px caption="Computed models aligned to the structures of BCKDHA (1U5B)"> |
||
+ | File:Pymol-alignment-2bfe.png|Comparison of original structure(1U5B, in green) and modeled structure(in red) of BCKDHA. RMSD computed by PyMol is 0.784 Â. Template structure is '''2BFE'''. |
||
+ | File:Pymol-alignment-3exg.png|Comparison of original structure(1U5B, in green) and modeled structure(in red) of BCKDHA. RMSD computed by PyMol is 1.590 Â. Template structure is '''3EXG'''. |
||
+ | </gallery> |
||
+ | |||
+ | ==== Multiple template modelling ==== |
||
+ | Following groups of structures were used for modelling structure of BCKDHA from its sequence: |
||
+ | <table style="width: 400px;" border="1"> |
||
+ | <tr> |
||
+ | <td>ID</td> |
||
+ | <td>PDB ID</td> |
||
+ | <td>Chain</td> |
||
+ | <td>Sequence Identity(to 1U5B)</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td colspan="4" style="font-weight: bold; text-align: center;">High sequence identity</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>1</td> |
||
+ | <td>2BEW</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>2</td> |
||
+ | <td>2BEV</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>3</td> |
||
+ | <td>2BEU</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>4</td> |
||
+ | <td>1X80</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>5</td> |
||
+ | <td>1WCI</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>5</td> |
||
+ | <td>1U5B</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>6</td> |
||
+ | <td>1OLX</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>7</td> |
||
+ | <td>1OLS</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>8</td> |
||
+ | <td>1DTW</td> |
||
+ | <td>A</td> |
||
+ | <td>94.8%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td colspan="4" style="font-weight: bold; text-align: center;">Low sequence identity</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>1</td> |
||
+ | <td>3EXH</td> |
||
+ | <td>G</td> |
||
+ | <td>26.4%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>2</td> |
||
+ | <td>2R8P</td> |
||
+ | <td>G</td> |
||
+ | <td>21.4%</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>3</td> |
||
+ | <td>3MOS</td> |
||
+ | <td>A</td> |
||
+ | <td>18.1%</td> |
||
+ | </tr> |
||
+ | </table> |
||
+ | |||
+ | <gallery widths=480px heights=300px caption="Computed models aligned to the structures of BCKDHA (1U5B)"> |
||
+ | File:Pymol-alignment-multitemplate-high.png|Comparison of original structure(1U5B, in green) and modeled structure(in red) of BCKDHA. RMSD computed by PyMol is 0.174 Â. Template structures have '''high sequence identity'''(>90%) to BCKDHA. |
||
+ | File:Pymol-alignment-multitemplate-low.png|Comparison of original structure(1U5B, in green) and modeled structure(in red) of BCKDHA. RMSD computed by PyMol is 24.12 Â. Template structures have '''low sequence identity'''(<30%) to BCKDHA. |
||
+ | </gallery> |
||
+ | |||
+ | === Swissmodel === |
||
+ | |||
+ | In the output of Swissmodel, almost all scores reported are 0 or not shown, probably due to an error in the program. The only score reported is the Anolea (atomic empirical mean force potential), which measures if the amino acids are in a favorable energy environment. |
||
+ | |||
+ | ==== High sequence identity template ==== |
||
+ | |||
+ | The following diagram shows the Anolea for the model build with 2BFE as template. |
||
+ | |||
+ | [[File:MSUD_swissmodel_2bfe_anolea.jpg]] |
||
+ | |||
+ | Almost all residues are in a favorable energy environment (negative Anolea values) and only some small parts of the protein structure are in an unfavorable one (positive values). |
||
+ | |||
+ | ==== Low sequence identity template ==== |
||
+ | |||
+ | Anolea for the model build with 3EXG as template: |
||
+ | |||
+ | [[File:MSUD_swissmodel_3exg_anolea.jpg]] |
||
+ | |||
+ | Only a small percentage of the residues is in a favorable energy environment, but most parts of the protein are unfavorable. |
||
+ | |||
+ | === iTasser === |
||
+ | |||
+ | For the five alternative models that are predicted by iTasser, the C-scores are shown in the following table. The C-score is a confidence score that ranges from -5 to 2. A higher (less negative) score indicates a better quality of the model. |
||
+ | |||
+ | |||
+ | {| class="wikitable" border="1" style="text-align:center;width:400px" |
||
+ | |+ C-scores of structure models calculated with iTasser |
||
+ | |- |
||
+ | ! template !! model1 !! model2 !! model3 !! model4 !! model5 |
||
+ | |- |
||
+ | | 2BFE || -0.83 || -1.25 || -2.12 || -2.36 || -1.98 |
||
+ | |- |
||
+ | | 3EXG || -1.86 || -2.02 || -2.43 || -2.49 || -2.97 |
||
+ | |} |
||
+ | |||
+ | |||
+ | The models build with the high sequence identity template have a better average quality than those build with the low sequence identity template. But also the quality of models build with the same template deviate a lot from each other. |
||
+ | |||
+ | === Evaluation of models === |
||
+ | |||
+ | To compare the calculated models to the reference structure 1U5B, we report RMSD and GDT_TS. The score GDT_TS combines GDT (global distance test) values calculated with several distance cutoffs. |
||
+ | |||
+ | ==== Single template modelling ==== |
||
+ | |||
+ | ===== High sequence identity template ===== |
||
+ | |||
+ | {| class="wikitable" border="1" style="text-align:center;width:800px" |
||
+ | |+ Comparison of structure models (build with template 2BFE) to the reference structure 1U5B |
||
+ | |- |
||
+ | ! !! Modeller !! Swissmodel !! iTasser, model1 !! iTasser, model2 !! iTasser, model3 !! iTasser, model4 !! iTasser, model5 |
||
+ | |- |
||
+ | |'''GDT_TS''' || 81.7 || 96.0 || 89.1 || 95.5 || 88.5 || 94.9 || 96.9 |
||
+ | |- |
||
+ | |'''RMSD [Å]''' || 1.0 || 1.7 || 2.9 || 1.0 || 3.1 || 1.5 || 0.8 |
||
+ | |} |
||
+ | |||
+ | ===== Low sequence identity template ===== |
||
+ | |||
+ | {| class="wikitable" border="1" style="text-align:center;width:800px" |
||
+ | |+ Comparison of structure models (build with template 3EXG) to the reference structure 1U5B |
||
+ | |- |
||
+ | ! !! Modeller !! Swissmodel !! iTasser, model1 !! iTasser, model2 !! iTasser, model3 !! iTasser, model4 !! iTasser, model5 |
||
+ | |- |
||
+ | |'''GDT_TS''' || 58.3 || 61.7 || 68.4 || 66.0 || 66.6 || 69.0 || 69.1 |
||
+ | |- |
||
+ | |'''RMSD [Å]''' || 1.3 || 9.7 || 14.5 || 13.9 || 14.0 || 15.4 || 15.4 |
||
+ | |} |
||
+ | |||
+ | ==== Multiple template modelling ==== |
||
+ | <table border="1"> |
||
+ | <tr> |
||
+ | <td></td> |
||
+ | <td>'''High sequence identity'''</td> |
||
+ | <td>'''Low sequence identity'''</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>'''GDT_TS'''</td> |
||
+ | <td>95.5</td> |
||
+ | <td>6.3</td> |
||
+ | </tr> |
||
+ | <tr> |
||
+ | <td>'''RMSD [Å]'''</td> |
||
+ | <td>0.6</td> |
||
+ | <td>12.5</td> |
||
+ | </tr> |
||
+ | </table> |
||
== Discussion == |
== Discussion == |
||
+ | |||
+ | * RMSD is higher for models calculated from the low sequence identity template, indicating a higher mean distance to the reference. The GDT_TS score is higher for the models from high sequence identity template, meaning a higher percentage of residues are below some distance thresholds. |
||
+ | * Generally, the GDT_TS score is better for measuring the quality of a model, because it can take into account, if a part of the structure is fitting the reference very good. The RMSD measures a global distance and will be very high if only a part of the structure is similar and the other part does not fit. |
||
+ | * The Anolea score of Swissmodel, which measures the packing quality of residues in the model, correlates with RMSD and GDT_TS. So its gives a good indication of the quality of the model. |
||
+ | * The average iTasser C-scores of the models with higher quality (those build with the high sequence identity template) are higher than C-scores of low quality models. But between alternative models, which were build with the same template, we cannot identify any correlation to RMSD or GDT_TS. So it is not clear, if this score really gives a good estimation of the confidence of the model, especially for deciding between different models. |
||
+ | * If GDT_TS is taken as quality measure, iTasser gives better models compared to Swissmodel for a template with low sequence identity. But it has to be noted, that iTasser has a far higher running time than Swissmodel. |
||
+ | * Generally there is a strong relationship between function and protein structure. Although 3EXG has only about 25% sequence identity to BCKDHA, they share a large part of common structure patterns. |
||
+ | * Very high sequence similarity implies similarity between structures. The result shows that multiple template modelling with templates of very high sequence similarity generates a very similar structure. |
||
+ | * If the template structures have less homology to target sequence, the resulted structure can be very divergent to the real structure of target protein. |
||
+ | * Using a template structure with high sequence similarity, a good structure model can be created. But when only structures with low sequence identity are available, the programs do not manage to create reliable structure models in most cases. |
Latest revision as of 13:25, 2 August 2013
Contents
Results
Modeller
Single template modelling
Following structures were used as template:
PDB ID | Sequence Identity (to 1U5B) |
2BFE |
99% |
3EXG |
24.9% |
Multiple template modelling
Following groups of structures were used for modelling structure of BCKDHA from its sequence:
ID | PDB ID | Chain | Sequence Identity(to 1U5B) |
High sequence identity | |||
1 | 2BEW | A | 94.8% |
2 | 2BEV | A | 94.8% |
3 | 2BEU | A | 94.8% |
4 | 1X80 | A | 94.8% |
5 | 1WCI | A | 94.8% |
5 | 1U5B | A | 94.8% |
6 | 1OLX | A | 94.8% |
7 | 1OLS | A | 94.8% |
8 | 1DTW | A | 94.8% |
Low sequence identity | |||
1 | 3EXH | G | 26.4% |
2 | 2R8P | G | 21.4% |
3 | 3MOS | A | 18.1% |
Swissmodel
In the output of Swissmodel, almost all scores reported are 0 or not shown, probably due to an error in the program. The only score reported is the Anolea (atomic empirical mean force potential), which measures if the amino acids are in a favorable energy environment.
High sequence identity template
The following diagram shows the Anolea for the model build with 2BFE as template.
Almost all residues are in a favorable energy environment (negative Anolea values) and only some small parts of the protein structure are in an unfavorable one (positive values).
Low sequence identity template
Anolea for the model build with 3EXG as template:
Only a small percentage of the residues is in a favorable energy environment, but most parts of the protein are unfavorable.
iTasser
For the five alternative models that are predicted by iTasser, the C-scores are shown in the following table. The C-score is a confidence score that ranges from -5 to 2. A higher (less negative) score indicates a better quality of the model.
template | model1 | model2 | model3 | model4 | model5 |
---|---|---|---|---|---|
2BFE | -0.83 | -1.25 | -2.12 | -2.36 | -1.98 |
3EXG | -1.86 | -2.02 | -2.43 | -2.49 | -2.97 |
The models build with the high sequence identity template have a better average quality than those build with the low sequence identity template. But also the quality of models build with the same template deviate a lot from each other.
Evaluation of models
To compare the calculated models to the reference structure 1U5B, we report RMSD and GDT_TS. The score GDT_TS combines GDT (global distance test) values calculated with several distance cutoffs.
Single template modelling
High sequence identity template
Modeller | Swissmodel | iTasser, model1 | iTasser, model2 | iTasser, model3 | iTasser, model4 | iTasser, model5 | |
---|---|---|---|---|---|---|---|
GDT_TS | 81.7 | 96.0 | 89.1 | 95.5 | 88.5 | 94.9 | 96.9 |
RMSD [Å] | 1.0 | 1.7 | 2.9 | 1.0 | 3.1 | 1.5 | 0.8 |
Low sequence identity template
Modeller | Swissmodel | iTasser, model1 | iTasser, model2 | iTasser, model3 | iTasser, model4 | iTasser, model5 | |
---|---|---|---|---|---|---|---|
GDT_TS | 58.3 | 61.7 | 68.4 | 66.0 | 66.6 | 69.0 | 69.1 |
RMSD [Å] | 1.3 | 9.7 | 14.5 | 13.9 | 14.0 | 15.4 | 15.4 |
Multiple template modelling
High sequence identity | Low sequence identity | |
GDT_TS | 95.5 | 6.3 |
RMSD [Å] | 0.6 | 12.5 |
Discussion
- RMSD is higher for models calculated from the low sequence identity template, indicating a higher mean distance to the reference. The GDT_TS score is higher for the models from high sequence identity template, meaning a higher percentage of residues are below some distance thresholds.
- Generally, the GDT_TS score is better for measuring the quality of a model, because it can take into account, if a part of the structure is fitting the reference very good. The RMSD measures a global distance and will be very high if only a part of the structure is similar and the other part does not fit.
- The Anolea score of Swissmodel, which measures the packing quality of residues in the model, correlates with RMSD and GDT_TS. So its gives a good indication of the quality of the model.
- The average iTasser C-scores of the models with higher quality (those build with the high sequence identity template) are higher than C-scores of low quality models. But between alternative models, which were build with the same template, we cannot identify any correlation to RMSD or GDT_TS. So it is not clear, if this score really gives a good estimation of the confidence of the model, especially for deciding between different models.
- If GDT_TS is taken as quality measure, iTasser gives better models compared to Swissmodel for a template with low sequence identity. But it has to be noted, that iTasser has a far higher running time than Swissmodel.
- Generally there is a strong relationship between function and protein structure. Although 3EXG has only about 25% sequence identity to BCKDHA, they share a large part of common structure patterns.
- Very high sequence similarity implies similarity between structures. The result shows that multiple template modelling with templates of very high sequence similarity generates a very similar structure.
- If the template structures have less homology to target sequence, the resulted structure can be very divergent to the real structure of target protein.
- Using a template structure with high sequence similarity, a good structure model can be created. But when only structures with low sequence identity are available, the programs do not manage to create reliable structure models in most cases.