Difference between revisions of "Task 4 (MSUD)"

Latest revision as of 21:14, 30 August 2013

Structural alignment

Results

Following are the structures we have chosen for evaluation of structural alignment methods.

PDB ID	Identity to BCKDHA	Origin	Comment
1u5b	100%	Mitochondrial branched-chain α-keto acid dehydrogenase (Human)	BCKDHA
2bfe	99%	Mitochondrial branched-chain α-keto acid dehydrogenase (Human)	BCKDHA alternative structure
3exg	24.9%	Pyruvate dehydrogenase (Human)	structure with low identity
13pk	4.1%	Phosphoglycerate kinase (Trypanosoma brucei)	same CAT classification
17gz	8.3%	Glutathione S-Transferase (Human)	same CA classification
2z37	5.4%	Chitinase (Brassica juncea)	same C classification
1f0y	8.1%	L-3-Hydroxyacyl-CoA Dehydrogenase (Human)	different CATH classification

We did not find PDB structure which has a sequence identity between 40% to 90% in comparison to BCKDHA.

Structural alignments

Structures superimposed to X-ray structures of BCKDHA using PyMOL
BCKDHA aligned to 2BFE. Structure of BCKDHA is shown in green.
BCKDHA aligned to 3EXG. Structure of BCKDHA is shown in green.
BCKDHA aligned to 13PK. Structure of BCKDHA is shown in green.
BCKDHA aligned to 17GS. Structure of BCKDHA is shown in green.
BCKDHA aligned to 2Z37. Structure of BCKDHA is shown in green.
BCKDHA aligned to 1F0Y. Structure of BCKDHA is shown in green.

Quality measures

RMSD (Å) for structural alignments using different methods are shown in following table (All these structures were aligned to the structure of BCKDHA - 1u5b):

Structure type	PDB	PyMOL	LGA	SSAP	TopMatch(E_r)	CE
>90% identity	2bfe	0.21	0.29	0.30	0.34	0.29
<30% identity	3exg	1.30	1.73	9.22	2.05	1.97
CATH: same CAT	13pk	12.05	3.11	19.22	2.45	4.01
CATH: same CA	17gs	10.75	3.41	15.88	2.72	6.05
CATH: same C	2z37	9.07	3.27	25.26	2.17	6.17
different CATH	1f0y	18.73	2.96	14.07	2.52	5.29

Discussion

Sequence dissimilarity does not implicit structural dissimilarity.
- Although chain A of PDB structure 3EXG has only a sequence identity of 25% to sequence of BCKDHA, they have similar structures.

Different alignment methods shows high deviation of RMSD.
- RMSD derived from SSAP and TopMatch have an amost 8 fold value difference.
- Deviation of RMSD can be explained by the different goal of methods.
  - For example, TopMatch tends to optimize structural alignments to a local optimal, i.e. instead of align the whole structure, regions with high structural similarity are aligned. The resulted RMSD only describes the structural deviation between the local aligned regions.
  - In contrast to TopMatch, SSAP aligns structures in a global favor. Domains with high similarity in different proteins can not be detected due to the overall diverged structures.

User should select the best tool to use
- Although these methods returns very different results, it is not easy to decide which one is the universal solution in the field of structural alignment.
- For proteins with similar structures, it is meaningful to use a global alignment like SSAP which can align conserved domains, secondary structures together.
- For researching conservation of specific domains, local alignment can find out common structural patterns in structurally dissimilar proteins.

Evaluation of alignments using structures

Lab journal

Results

The following table shows an overview of the structures used for building models, the scores of the structural alignment (RMSD and LGA_S - structure similarity score, both according to the Cα atoms), and the scores of the sequence alignment (E-value, probability and sequence identity).

model	RMSD [Â]	LGA_S	E-value	probability [%]	sequence identity [%]
1qs0	1.24	84.09	5.8E-94	100.0	38
1w85	1.77	78.36	8.3E-87	100.0	33
2ozl	1.63	74.03	3.2E-69	100.0	27
2yic	2.45	41.73	5.7E-47	100.0	16
3l84	2.01	32.41	6.5E-18	99.5	21
2q28	1.86	25.40	1.6E-08	97.9	13
1r9j	1.73	30.10	1.1E-06	97.2	25
2vk8	2.12	21.99	3.7E-05	96.4	22
1t9b	1.83	23.72	1.1E-03	94.9	18
2c31	2.00	21.85	1.1E-02	92.7	21

Correlations of structural to sequence alignment scores
	E-value	log10(E-value)	probability [%]	sequence identity
RMSD	0.15	0.49	-0.19	-0.74
LGA_S	-0.33	-0.98	0.71	0.82

As can be seen in the above table, the RMSD has a weak correlation to the logarithm of E-value and a higher correlation to sequence identity. The RMSD is lower, if the E-value is lower or the sequence identity is higher.

The same tendency can be seen for the LGA_S score, but here the correlations are higher. The LGA_S score shows also a correlation to the probability in contrast to the RMSD.

The signs are opposite for RMSD and LGA_S, because the RMSD is lower for higher similarity, but the LGA_S is higher.

The relationship of LGA_S and E-value, the pair of scores with the highest correlation, for the 10 models is shown in the following plot.

Discussion

The correlations between structural and sequence alignment scores are as expected. A low E-value indicates a hit that is unlikely to occur only by chance, so it is significant. This means it is related to the query and will have a similar structure. So the RMSD, which measures the difference (root mean squared distance) in the aligned structures, will be low for nearly related proteins. Also if two sequences have a high sequence identity, they will more likely have the same structure, which explains the correlation of RMSD to it. The reason for the observed correlations of RMSD to the alignment scores being weaker than those of the LGA_S score, could be that the RMSD is calculated only locally for structurally aligned residues. So it tends to be too low, because a protein pair which has a very similar part but another dissimilar, not alignable part, would have a low RMSD. For probability the values in the sample do not cover the whole range of possible values, so there was observed almost no correlation to RMSD. We did not take very distant relatives to create structure models, if they had a too high E-value.

The LGA_S score, which combines local and global distances to the reference structure (here: the structure of our BCKDHA protein), gives a better indication of the overall structure similarity, than the local RMSD. It is correlated to the sequence alignment scores (in particular to the logarithm of E-value), so a significant hit in the sequence alignment is likely to have a structure similar to those of the query, and thus can be used to create a model of the structure of the query protein.

All observed local RMSD values vere relatively low and also the LGA_S values were high at least for nearly related structures. So the approach of simply copying Cα atom coordinates of aligned residues from the structure of a related sequence helps to build a model of the unknown structure of a protein.

Difference between revisions of "Task 4 (MSUD)"

Latest revision as of 21:14, 30 August 2013

Contents

Structural alignment

Results

Structural alignments

Quality measures

Discussion

Evaluation of alignments using structures

Results

Discussion

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools

@@ Line 4: / Line 4: @@
 === Results ===
+Following are the structures we have chosen for evaluation of structural alignment methods.
+<table border="1">
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A1">
+                <p class="P1">'''PDB ID'''</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A1">
+                <p class="P1">'''Identity to BCKDHA'''</p>
+            </td>
+<td>'''Origin'''</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C1">
+                <p class="P2">'''Comment'''</p>
+            </td>
+        </tr>
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A2">
+                <p class="P1">1u5b</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A2">
+                <p class="P1">100%</p>
+            </td>
+<td>Mitochondrial branched-chain &alpha;-keto acid dehydrogenase (Human)</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C2">
+                <p class="P2">BCKDHA</p>
+            </td>
+        </tr>
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A2">
+                <p class="P1">2bfe</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A2">
+                <p class="P3">99%</p>
+            </td>
+<td>Mitochondrial branched-chain &alpha;-keto acid dehydrogenase (Human)</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C2">
+                <p class="P2">BCKDHA alternative structure</p>
+            </td>
+        </tr>
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A2">
+                <p class="P4">3exg</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A2">
+                <p class="P4">24.9%</p>
+            </td>
+<td>Pyruvate dehydrogenase (Human)</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C2">
+                <p class="P4">structure with low identity</p>
+            </td>
+        </tr>
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A2">
+                <p class="P2">13pk</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A2">
+                <p class="P2">4.1%</p>
+            </td>
+<td>Phosphoglycerate kinase (''Trypanosoma brucei'')</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C2">
+                <p class="P2">same CAT classification</p>
+            </td>
+        </tr>
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A2">
+                <p class="P2">17gz</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A2">
+                <p class="P2">8.3%</p>
+            </td>
+<td>Glutathione S-Transferase (Human)</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C2">
+                <p class="P2">same CA classification</p>
+            </td>
+        </tr>
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A2">
+                <p class="P5">2z37</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A2">
+                <p class="P5">5.4%</p>
+            </td>
+<td>Chitinase (''Brassica juncea'')</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C2">
+                <p class="P2">same C classification</p>
+            </td>
+        </tr>
+        <tr>
+            <td style="text-align:left;width:0.6326in;" class="Table1_A2">
+                <p class="P5">1f0y</p>
+            </td>
+            <td style="text-align:left;width:1.5264in;" class="Table1_A2">
+                <p class="P5">8.1%</p>
+            </td>
+<td>L-3-Hydroxyacyl-CoA Dehydrogenase (Human)</td>
+            <td style="text-align:left;width:2.1972in;" class="Table1_C2">
+                <p class="P5">different CATH classification</p>
+            </td>
+        </tr>
+</table>
+We did not find PDB structure which has a sequence identity between 40% to 90% in comparison to BCKDHA.
+==== Structural alignments ====
+<gallery perrow=3 widths=300px heights=220px caption="Structures superimposed to X-ray structures of BCKDHA using PyMOL">
+File:Bckdha-aligned-to-2bfe.png|BCKDHA aligned to 2BFE. Structure of BCKDHA is shown in green.
+File:Bckdha-aligned-to-3exg.png|BCKDHA aligned to 3EXG. Structure of BCKDHA is shown in green.
+File:Bckdha-aligned-to-13pk.png|BCKDHA aligned to 13PK. Structure of BCKDHA is shown in green.
+File:Bckdha-aligned-to-17gs.png|BCKDHA aligned to 17GS. Structure of BCKDHA is shown in green.
+File:Bckdha-aligned-to-2z37.png|BCKDHA aligned to 2Z37. Structure of BCKDHA is shown in green.
+File:Bckdha-aligned-to-1f0y.png|BCKDHA aligned to 1F0Y. Structure of BCKDHA is shown in green.
+</gallery>
+==== Quality measures ====
+RMSD (&Aring;) for structural alignments using different methods are shown in following table (All these structures were aligned to the structure of BCKDHA - 1u5b):
+<table border="1">
+        <tr>
+<td>'''Structure type'''</td>
+            <td style="text-align:left;width:1.1542in;">
+                <p class="P6">'''PDB'''</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;">
+                <p class="P7">'''PyMOL'''</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;">
+                <p class="P6">'''LGA'''</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;">
+                <p class="P6">'''SSAP'''</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;">
+                <p class="P6">'''TopMatch'''<span class="T2">(''E<sub>r</sub>'')</span></p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_F1">
+                <p class="P6">'''CE'''</p>
+            </td>
+        </tr>
+        <tr>
+<td>&gt;90% identity</td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">2bfe</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P8">0.21</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">0.29</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">0.30</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P13"><span class="T2">0.3</span><span class="T5">4</span></p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_F2">
+                <p class="P6">0.29</p>
+            </td>
+        </tr>
+        <tr>
+<td>&lt;30% identity</td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">3exg</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P8">1.30</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">1.73</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">9.22</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P15">2.05</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_F2">
+                <p class="P6">1.97</p>
+            </td>
+        </tr>
+        <tr>
+<td>CATH: same CAT</td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">13pk</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P8">12.05</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P6">3.11</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P9">19.22</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P16">2.45</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_F2">
+                <p class="P9">4.01</p>
+            </td>
+        </tr>
+        <tr>
+<td>CATH: same CA</td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P9">17gs</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P8">10.75</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P10">3.41</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P10">15.88</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P16">2.72</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_F2">
+                <p class="P10">6.05</p>
+            </td>
+        </tr>
+        <tr>
+<td>CATH: same C</td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P11">2z37</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P8">9.07</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P12">3.27</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P12">25.26</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P16">2.17</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_F2">
+                <p class="P12">6.17</p>
+            </td>
+        </tr>
+        <tr>
+<td>different CATH</td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P12">1f0y</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P8">18.73</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P12">2.96</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P12">14.07</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_A2">
+                <p class="P17">2.52</p>
+            </td>
+            <td style="text-align:left;width:1.1542in;" class="Table2_F2">
+                <p class="P12">5.29</p>
+            </td>
+        </tr>
+    </table>
 === Discussion ===
+* Sequence dissimilarity does not implicit structural dissimilarity.
+** Although chain A of PDB structure 3EXG has only a sequence identity of 25% to  sequence of BCKDHA, they have similar structures.
+* Different alignment methods shows high deviation of RMSD.
+** RMSD derived from SSAP and TopMatch have an amost 8 fold value difference.
+** Deviation of RMSD can be explained by the different goal of methods.
+*** For example, TopMatch tends to optimize structural alignments to a local optimal, i.e. instead of align the whole structure, regions with high structural similarity are aligned. The resulted RMSD only describes the structural deviation between the local aligned regions.
+*** In contrast to TopMatch, SSAP aligns structures in a global favor. Domains with high similarity in different proteins can not be detected due to the overall diverged structures.
+* User should select the best tool to use
+** Although these methods returns very different results, it is not easy to decide which one is the universal solution in the field of structural alignment.
+** For proteins with similar structures, it is meaningful to use a global alignment like SSAP which can align conserved domains, secondary structures together.
+** For researching conservation of specific domains, local alignment can find out common structural patterns in structurally dissimilar proteins.
 == Evaluation of alignments using structures ==
@@ Line 16: / Line 357: @@
-{| class="wikitable" border="1" style="text-align:center;width:450px" align="center"
+{| class="wikitable" border="1" style="text-align:center;width:500px" align="center"
-!model !! RMSD !! LGA_S !! E-value !! probability !! sequence identity
+!model !! RMSD [Â] !! LGA_S !! E-value !! probability [%] !! sequence identity [%]
 |-
-|1qs0 || 1.24 || 84.085 || 5.8E-94 || 100.0 || 38
+|1qs0 || 1.24 || 84.09 || 5.8E-94 || 100.0 || 38
 |-
-|1w85 || 1.77 || 78.356 || 8.3E-87 || 100.0 || 33
+|1w85 || 1.77 || 78.36 || 8.3E-87 || 100.0 || 33
 |-
-|2ozl || 1.63 || 74.027 || 3.2E-69 || 100.0 || 27
+|2ozl || 1.63 || 74.03 || 3.2E-69 || 100.0 || 27
 |-
-|2yic || 2.45 || 41.732 || 5.7E-47 || 100.0 || 16
+|2yic || 2.45 || 41.73 || 5.7E-47 || 100.0 || 16
 |-
-|3l84 || 2.01 || 32.412 || 6.5E-18 || 99.5 || 21
+|3l84 || 2.01 || 32.41 || 6.5E-18 || 99.5 || 21
 |-
-|2q28 || 1.86 || 25.398 || 1.6E-08 || 97.9 || 13
+|2q28 || 1.86 || 25.40 || 1.6E-08 || 97.9 || 13
 |-
-|1r9j || 1.73 || 30.095 || 1.1E-06 || 97.2 || 25
+|1r9j || 1.73 || 30.10 || 1.1E-06 || 97.2 || 25
 |-
-|2vk8 || 2.12 || 21.990 || 3.7E-05 || 96.4 || 22
+|2vk8 || 2.12 || 21.99 || 3.7E-05 || 96.4 || 22
 |-
-|1t9b || 1.83 || 23.724 || 0.0011 || 94.9 || 18
+|1t9b || 1.83 || 23.72 || 1.1E-03 || 94.9 || 18
 |-
-|2c31 || 2.00 || 21.849 || 0.011 || 92.7 || 21
+|2c31 || 2.00 || 21.85 || 1.1E-02 || 92.7 || 21
 |}
-{| class="wikitable" border="1" style="text-align:center;width:450px" align="center"
+{| class="wikitable" border="1" style="text-align:center;width:500px" align="center"
-|+Correlations of structural to sequence alignement scores
+|+Correlations of structural to sequence alignment scores
 |-
-! !! E-value !! log10(E-value) !! probability !! sequence identity
+! !! E-value !! log10(E-value) !! probability [%] !! sequence identity
 |-
 |'''RMSD''' || 0.15 || 0.49 || -0.19 || -0.74
@@ Line 64: / Line 405: @@
 === Discussion ===
-The correlations between structural and sequence alignment scores are as expected. A low E-value indicates a hit that is unlikely to occur only by chance, so it is significant. This means it is related to the query and will have a similar structure. So the RMSD, which measures the difference (root mean squared distance) in the aligned structures, will be low for nearly related proteins. Also if two sequences have a high sequence identity, they will more likely have the same structure, which explains the correlation of RMSD to it. The reason for the observed correlations of RMSD to the alignment scores being weaker than those of the LGA_S score, could be that the RMSD is calculated only locally for structurally aligned residues. So it tends to be too low, because a protein pair which has a very similar part but another dissimilar, not alignable part, would have a low RMSD. For probability the values in the sample don't cover the whole range of possible values, so there was observed almost no correlation to RMSD. We did not take very distant relatives to create structure models, if they had a too high E-value.
+The correlations between structural and sequence alignment scores are as expected. A low E-value indicates a hit that is unlikely to occur only by chance, so it is significant. This means it is related to the query and will have a similar structure. So the RMSD, which measures the difference (root mean squared distance) in the aligned structures, will be low for nearly related proteins. Also if two sequences have a high sequence identity, they will more likely have the same structure, which explains the correlation of RMSD to it. The reason for the observed correlations of RMSD to the alignment scores being weaker than those of the LGA_S score, could be that the RMSD is calculated only locally for structurally aligned residues. So it tends to be too low, because a protein pair which has a very similar part but another dissimilar, not alignable part, would have a low RMSD. For probability the values in the sample do not cover the whole range of possible values, so there was observed almost no correlation to RMSD. We did not take very distant relatives to create structure models, if they had a too high E-value.
 The LGA_S score, which combines local and global distances to the reference structure (here: the structure of our BCKDHA protein), gives a better indication of the overall structure similarity, than the local RMSD. It is correlated to the sequence alignment scores (in particular to the logarithm of E-value), so a significant hit in the sequence alignment is likely to have a structure similar to those of the query, and thus can be used to create a model of the structure of the query protein.