Canavan Disease: Task 04 - Structural Alignments
Contents
LabJournal
Dataset
To gain the dataset as desired first a reference sequence (2I3C) was chosen. Then the dataset was generated using this sequence fulfilling the required criteria. The full composition and additional information can be found in <xr id="dataset"> Table </xr>. <figtable id="dataset">
Dataset composition | |||
---|---|---|---|
PDB-id | Description | Criterium | |
2I3C | ASPA from Human | reference structure | |
2O4H | ASPA from Human with bound N-phosphonomethyl-L-aspartate | sequence identity 100% & bound active centre | |
2Q51 | ASPA from Human (Ensemble refinement) | sequence identity 100% & unbound active centre | |
2GU2 | ASPA from Rat | seq. identity >60% | |
2QJ8 | ASPA family protein from mesorhizobium loti | sequence identity <30% | |
1AYE | Procarbooxypeptidase from Human | similar CATH classification for CAT | |
1BKJ | FMN Oxireductase from vibrio harveyi | similar CATH classification for CA | |
1BD0 | Alanine racemase | similar CATH classification for C | |
1B3U | Regulatory domain of human PP2A | completely different CATH classification |
</figtable>
Structural Alignment Exploration
Pymol
2O4H vs. 2I3C
2O4H was found via the sequence search tab for the reference sequence 2I3C. The structure was chosen due to the fact that it is contained in the 100% sequence identity cluster. Additionally it has a bound compound at the active site however it is not N-acetyl-L-aspartate, but N-Hydroxy(methyl)phosphoric-L-aspartate binding to the same active center. This compound is not degraded through the enzymatic activity of the protein but "blocks" the active center and therefore the potential change in conformation of the protein can be captured by X-Ray crystallography.
Due to the fact that 2O4H and 2I3C have 100% sequence identity, the structural alignment via Pymol works very accurate. Both structures are within the bounds of the accuracy of X-ray crystallography the same. The RMSD between 2OH4 and 2I3C, calculated by the alignment process of Pymol is 0.445Å. As the measure for the divergence is smaller than possible to reach resolution of the structure they can be safely considered to be identical. The visual representation of the structural alignment is displayed in <xr id="2O4H_pymol">Figure </xr>. Additionally the possible conformational change of the protein due to the bound substrate in the active site can not be observed. <figure id="2O4H_pymol">
</figure>
2Q51 vs 2I3C
2Q51 was chosen to complement 2OH4 as it is annotated as the same sequence (100% sequence identity to 2I3C) but without a bound compound in the active center. Assuming that both 2Q51 and 2I3C share the same sequence and the property that both crystallized structures have no bound compound at the active site the result should show that both of them are identical in 3D structure as well (within resolution boundaries). However if comparing both structures with the aid of Pymol (see <xr id="2Q51_pymol">Figure </xr>) it is visible that they in fact differ at least slightly. They share a RMSD of 0.223Å which is smaller compared to the RMSD between 2O4H and 2I3C (0.314Å), nevertheless if compared visually they show different lengths of beta-strands and small variance in their conformation. Double checking the experimental origin of the PDB structure 2Q51 revealed that the atom coordinates and the conformation of the secondary structure elements were derived as a mean of multiple experimental 3D structure assignments, using X-ray crystallography. This fact is most certain the reason why the difference of the RMSD between the C-alpha atoms is that small, but the visual difference is bigger between 2Q51 and 2I3C than between 2OH4 and 2I3C (see above). <figure id="2Q51_pymol">
</figure>
2GU2 vs 2I3C
2GU2 is the ASPA ortholog in rat. Due to its sequence similarity of 84% to the human ASPA protein (2I3C) this protein was chosen as to represent the group of protein structures with a sequence similarity between 60% and 100%. Performing the structural alignment with the aid of Pymol reveals that the difference in the sequence between both proteins is the result of an extension of the N- and C-terminal ends of the protein which form a beta sheet in 2GU2 that is not present in 2I3C (see <xr id="2GU2_pymol">Figure </xr>). Otherwise the sequences are (in terms of 3D structure) identical within the borders of resolution. This is also reflected in the the RMSD of 0.493Å between the two aligned structures. In this example one important dogma, namely that structure is better conserved that sequence can be observed very well. Despite having only about 80% sequence similarity the three dimensional of the two proteins is nearly identical. <figure id="2GU2_pymol">
</figure>
2QJ8 vs 2I3C
2QJ8 is a family member of the ASPA protein family. The protein originates from mesorhizobium loti a gram negative bacterium and has a sequence similarity of below 30% if aligned to 2I3C. The superimposition of the two proteins using Pymol demonstrates the previously mentioned dogma of a far better structure conservation than conservation of sequence even better than the previous example (see <xr id="2GU2_pymol">Figure </xr>). Despite the sequence similarity of less than 30% the overall shape of 2QJ8 is conserved very good and has a high resemblance to 2I3C as well as a calculated RMSD of 3.474Å. Focusing on the active site the conservation of the structure is even more visible as it can be seen in <xr id="2QJ8_pymol">Figure </xr> However it has to kept in mind that the two proteins are part of the same protein family and therefore they have this high structural resemblance despite the low sequence similarity. If comparing two proteins from distinct protein families this effect is not likely to be observed. <figure id="2QJ8_pymol">
</figure>
Remaining Proteins
1AYE, 1BKJ, 1BD0 and 1B3U are proteins that in decreasing order get more distantly related to 2I3C in terms of CATH classification. Superimposing the structures to 2I3C using Pymol it gets visible that 1AYE despite having the same classification of class, architecture and topology is already as distant in terms of spacial arrangement as it is not possible to find well overlapping structures (see <xr id="Remaining_pymol"> Figure</xr> A). This trend is getting worse as the two proteins that are superimposed have less CATH classes in common (see <xr id="Remaining_pymol"> Figure</xr> B-D).
<figure id="Remaining_pymol">
</figure>
Comparison of SSAP, Topmatch, CE & LGA
<figtable id="comp">
Comparison of LGA, SSAP, Topmatch & CE | ||||||||
---|---|---|---|---|---|---|---|---|
LGA | SSAP (CATH) | Topmatch | CE | |||||
protein | RMSD | SeqId | RMSD | SeqId | RMSD | SeqId | RMSD | SeqId |
2O4H | 3.33 | 9.62 | 1.04 | 99 | 0.65 | 100 | 1.02 | 100 |
2Q51 | 1.04 | 100 | 1.04 | 100 | 1.04 | 100 | 1.00 | 100 |
2GU2 | 0.97 | 86.29 | 1.23 | 86 | 0.91 | 87 | 0.97 | 84.59 |
2QJ8 | 2.57 | 21.53 | 8.39 | 9 | 3.51 | 17 | 3.14 | 13.86 |
1AYE | 2.58 | 15.24 | 4.19 | 8 | 2.85 | 13 | 3.83 | 7.14 |
1BKJ | 3.26 | 1.64 | 18.59 | 3 | 2.87 | 6 | 4.27 | 4.04 |
1BD0 | 3.36 | 5.48 | 20.38 | 3 | 2.91 | 11 | 5.74 | 1.71 |
1B3U | 3.57 | 11.11 | 28.05 | 6 | 3.34 | 9 | 6.05 | 6.30 |
</figtable>
Structural Alignment Evaluation
Tasks
- Link to Task 01: Canavan Disease
- Link to Task 02: Alignments
- Link to Task 03: Sequence-based Predictions
- Link to Task 04: Structural Alignments
- Link to Task 05: Homology Modelling
- Link to Task 06: Protein Structure Prediction from Evolutionary Sequence Variation
- Link to Task 07: Researching SNPs
- Link to Task 08: Sequence-based Mutation Analysis
- Link to Task 09: Structure-based Mutation Analysis
- Link to Task 10: Normal Mode Analysis
- Link to Task 11: Molecular Dynamics Simulation