Task 4: Structural Alignments
<css> table.colBasic2 { margin-left: auto; margin-right: auto; border: 2px solid black; border-collapse:collapse; width: 70%; }
.colBasic2 th,td { padding: 3px; border: 2px solid black; }
.colBasic2 td { text-align:left; }
.colBasic2 tr th { background-color:#efefef; color: black;} .colBasic2 tr:first-child th { background-color:#adceff; color:black;}
</css>
PDB structures selection
We first selected a set of structures that span different ranges of sequence identity to the reference structure (1A6Z). The domain A of the reference structure has the CATH annotation 3.30.500.10.9 (Murine Class I Major Histocompatibility Complex H2-DB subunit A domain 1) and the domain b 2.60.40.10 (immunoglobulins). We decided to take the domain A as template and only searched for structures with a similar annotation to 3.30.500.10.9, since the immunglobulin domain is only bound to the protein and not directly connected. Also, because the disease causing mutations are all located in the MHC domain. <xr id="selected structures"/> list the structures, their CATH numbers and percent sequence idenity to the reference. Unfortunately, we could not find a structure with a sequence identity over 60%. The most similar structure we could find was 1qvo with 39% identity.
<figtable id="selected structures">
category | ID | chain | domain | CATH number | Sequence identity (%) | protein (organism) |
---|---|---|---|---|---|---|
reference | 1A6Z | A | 1 | 3.30.500.10 | - | HFE (Homo sapiens) |
identical sequence | 1DE4 | A | 1 | 3.30.500.10 | 100 | HFE (Homo sapiens) |
> 30% SeqID | 1QVO | A | 01 | 3.30.500.10 | 39 | HLA class I histocompatibility antigen, A-11 alpha chain (Homo sapiens) |
< 30% SeqID | 1S7X | A | 00 | 3.30.500.10 | 29 | H-2 class I histocompatibility antigen, D-B alpha chain (Mus musculus) |
CAT | 2IA1 | A | 01 | 3.30.500.20 | 11.1 | BH3703 protein (Bacillus halodurans) |
CA | 3NCI | A | 01 | 3.30.342.10 | 5.8 | DNA polymerase (Enterobacteria phage RB69) |
C | 1VZY | A | 01 | 3.55.30.10 | 2.8 | 33 KDA CHAPERONIN (Bacillus subtilis) |
different CATH | 1MUS | A | 01 | 1.10.246.40 | 12.6 | Tn5 transposase (Escherichia coli) |
</figtable>
Results
In Pymol, each structure from <xr id="selected structures"/> was aligned to the reference 1A6Z_A using only the C_alpha atoms and also using all the atoms. The resulting RMSD values are specified in <xr id="score results"/>. The numbers in brackets after the RMSD values indicate the number of aligned residues that were used to compute the corresponding values.
<figtable id="score results">
PDB ID | Seq. identity (%) | Pymol | LGA | SSAP | TopMatch | CE | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
RMSD (only C_alpha) | RMSD (all atom) | RMSD | LGA_S | RMSD | SSAP_Score | RMSD | S | S_r | RMSD | Score | ||
1DE4_A | 100 | 0.675 (237) | 0.767 (1836) | 1.14 (267) | 95.77 | 1.60 (272) | 93.07 | 1.08 | 260 | 1.03 | 1.19 (267) | 543 |
1QVO_A | 39 | 2.165 (233) | 2.279 (1565) | 2.29 (259) | 67.86 | 2.58 (268) | 86.39 | 2.62 | 228 | 2.50 | 2.44 (266) | 432 |
1S7X_A | 29 | 1.889 (233) | 2.049 (1557) | 2.12 (256) | 71.90 | 2.36 (267) | 86.25 | 2.66 | 227 | 2.56 | 2.29 (265) | 342 |
2IA1_A | 11.1 | 18.132 (74) | 18.283 (501) | 2.83 (86) | 19.44 | 15.85 (140) | 56.19 | 2.91 | 76 | 2.82 | 3.93 (93) | 300 |
3NCI_A | 5.8 | 16.561 (26) | 17.329 (178) | 3.11 (84) | 17.19 | 14.54 (168) | 30.18 | 3.05 | 53 | 2.94 | 4.47 (75) | 333 |
1VZY_A | 2.8 | 6.260 (29) | 6.951 (168) | 3.25 (63) | 13.44 | 26.34 (208) | 58.01 | 2.61 | 68 | 2.53 | 5.80 (91) | 245 |
1MUS_A | 12.6 | 23.521 (180) | 23.891 (1143) | 2.82 (69) | 16.02 | 18.53 (215) | 46.30 | 3.58 | 69 | 3.43 | 6.61 (78) | 379 |
</figtable>
Images of the superimposed structures, using the C_alpha atoms, are shown in <xr id="pymol str. al.">. The pictures show clearly that a successful superposition is only possible if the two structures share a certain level of sequence identity. 1QVO_A could be aligned to the reference with a low RMSD (39% sequenc identity), but 1S7X_A has a even lower value, although the sequenc identity is smaller (29%). This could be explained by the fact that 1S7X_A is the exact mouse ortholog of the human Murine Class I Major Histocompatibility Complex H2-DB chain A (1A6Z_A) and therfore has a nearly identical structure. Apart from the three structures 1DE4_A, 1QVO_A and 1S7X_A, the other proteins could not really be superimposed to the reference, see the high RMSD values in column 3 and also the low number of equivalent residues in <xr id="score results"/>. Using all the atoms for the computation of the RMSD did not increase the quality of the alignments and the RMSD, see column 4 <xr id="score results"/>. Instead, it lead to a overall higher RMSD.
<figtable id="pymol str. al.">
</figtable>
Different structural alignments were applied, in addition to Pymol,to superimpose all the structures to the reference. The resulting alignments scores are specified in <xr id="score results"/>. RMSD values vary between different methods, but this can be explained with the varying number of equivalent residues each method found. The more residues aligned, the higher is the RMSD. LGA is the best method for finding good local superpositions, this can be seen with the very low RMSD values for structures with low sequence identity. Nevertheless, the LGA_S score gives a good impression of how similar the two structures are globally. Very similar structures get a high value near 100 and divergent structures only a score of 13-20, in our case. Therfore, we find that LGA gives us the best impression of structural relatedness. SSAP could align the most residues in comparison to the other methods. But the SSAP_Score, ranging from 0 to 100, is relatively high for the structures with low sequence similarity. For example 1MUS_A has a score of 46.30, although the two protein do not share a common fold. This leads to a false impression of structural similarity. TopMAtch also has overall low RMSD values, but the S score
- Pymol only uses a subset of atoms for the computation of the RMSD.
- LGA computes the RMSD from all atoms under distance cutoff. It therfore uses more atoms than Pymol if the proteins are similar, but few if the structures are more divergent.
- SSAP
- TopMatch
- CE