Task 4: Structural Alignments

From Bioinformatikpedia
Revision as of 13:25, 13 August 2013 by Betza (talk | contribs)

Structural alignments for evaluating sequence alignments

In this task, we wanted to evaluate LGA structural alignments with the help of sequence alignments. HHsearch and the pdb database were used to generate sequence alignments to the reference.

<figtable id="models">

LGA hhsearch
PDB ID superimposed residues RMSD Seq_id LGA_S LGA_Q probability E-value seq. identity
3p73_A 257 1.99 96.11 75.9 12.3 100.0 4E-68 34.9
1uvq_B 147 2.71 74.15 34.9 5.2 100.0 3.5E-40 22.2
1bii_B 91 1.89 97.8 30.6 4.6 99.6 4.2E-19 18.0
1i1c_A 70 1.73 97.14 23.7 3.8 98.4 2.1E-10 25.0
2vol_A 68 1.85 89.7 22.4 3.5 97.6 9.4E-08 24.4
1iga_A 58 1.99 74.14 16.0 2.8 96.1 9.4E-05 21.3
2wng_A 62 1.91 95.2 19.1 3.1 94.5 0.0022 14.6
1wwc_A 55 2.52 63.64 14.0 2.1 92.9 0.01 6.9
1rhf_A 51 2.22 74.51 13.6 2.2 79.4 0.4 8.0
Table 5: Selected 9 sequences from the hhsearch sequence search for pdb structures. The alignment scores are listed together with the LGA scores of the corresponding model.

</figtable>

In total, we found 449 sequences from which we selected the 9 listed in <xr id="models"/>. Those pdb structures where then used to construct simple 3D models which were then aligned to the reference 1A6Z_A using LGA. The resulting structural alignment scores are also stated in <xr id="models"/>.


<figtable id="correlations">

superimposed residues RMSD Seq_id LGA_S LGA_Q
probability 0.4689 -0.1825 0.4517 0.4977 0.4794
log10( E-value ) -0.9919 -0.1500 -0.9663 -0.9544 -0.2915
seq. identity 0.733 -0.3858 0.5799 0.7784 0.7866
Table 6: Pearson's correlation coefficient between all pair of scores. The log10 of the E-value yield the highest correlation to the number of superimposed residues, the sequence identity (LGA) and the LGA_S score.

</figtable>

Pearson's correlation coefficient was computed for all pairs of alignment and LGA scores, see <xr id="correlations"/>. The 10-logarithm of the E-value very high correlations to the superimposed residues, the LGA sequence identity and the LGA_S score. The sequence identity has the highest correlation to the RMSD and the LGA_Q. Unexpectedly, all the correlations to the RMSD are very close to 0 and therfore not significant.