Canavan Disease: Task 04 - Structural Alignments

From Bioinformatikpedia
Revision as of 11:24, 6 August 2013 by Mahlich (talk | contribs) (Pymol)

LabJournal

Dataset

To gain the dataset as desired first a reference sequence (2I3C) was chosen. Then the dataset was generated using this sequence fulfilling the required criteria. The full composition and additional information can be found in <xr id="dataset"> Table </xr>. <figtable id="dataset">

Dataset composition
PDB-id Description Criterium
2I3C ASPA from Human reference structure
2O4H ASPA from Human with bound N-phosphonomethyl-L-aspartate sequence identity 100% & bound active centre
2Q51 ASPA from Human (Ensemble refinement) sequence identity 100% & unbound active centre
2GU2 ASPA from Rat seq. identity >60%
2QJ8 ASPA family protein from mesorhizobium loti sequence identity <30%
1AYE Procarbooxypeptidase from Human similar CATH classification for CAT
1BKJ FMN Oxireductase from vibrio harveyi similar CATH classification for CA
1BD0 Alanine racemase similar CATH classification for C
1B3U Regulatory domain of human PP2A completely different CATH classification
Overview of the dataset composition for Task 04, containing a brief description of the the chosen structures. Sequence identity and CATH classification similarities with respect to reference sequence 2I3C.

</figtable>


Structural Alignment Exploration

Pymol

2O4H vs. 2I3C

2O4H was found via the sequence search tab for the reference sequence 2I3C. The structure was chosen due to the fact that it is contained in the 100% sequence identity cluster. Additionally it has a bound compound at the active site however it is not N-acetyl-L-aspartate, but N-Hydroxy(methyl)phosphoric-L-aspartate binding to the same active center. This compound is not degraded through the enzymatic activity of the protein but "blocks" the active center and therefore the potential change in conformation of the protein can be captured by X-Ray crystallography.

Due to the fact that 2O4H and 2I3C have 100% sequence identity, the structural alignment via Pymol works very accurate. Both structures are within the bounds of the accuracy of X-ray crystallography the same. The RMSD between 2OH4 and 2I3C, calculated by the alignment process of Pymol is 0.445Å. As the measure for the divergence is smaller than possible to reach resolution of the structure they can be safely considered to be identical. The visual representation of the structural alignment is displayed in <xr id="2O4H_pymol">Figure </xr>. Additionally the possible conformational change of the protein due to the bound substrate in the active site can not be observed. <figure id="2O4H_pymol">

Representation of 2OH4 aligned to 2I3C. Both structures are displayed as carton, 2OH4 in black, 2I3C in orange. The zinc atom at the active site is represented as gray sphere, and the N-Hydroxy(methyl)phosphoric-L-aspartate is represented as balls and sticks at the active site. With a calculated RMSD of 0.445Å both structures can be considered the same as the divergence is even lower than the possible resolution of the crystal structure.

</figure>

2Q51 vs 2I3C

2Q51 was chosen to complement 2OH4 as it is annotated as the same sequence (100% sequence identity to 2I3C) but without a bound compound in the active center. Assuming that both 2Q51 and 2I3C share the same sequence and the property that both crystallized structures have no bound compound at the active site the result should show that both of them are identical in 3D structure as well (within resolution boundaries). However if comparing both structures with the aid of Pymol (see <xr id="2Q51_pymol">Figure </xr>) it is visible that they in fact differ at least slightly. They share a RMSD of 0.223Å which is smaller compared to the RMSD between 2O4H and 2I3C (0.314Å), nevertheless if compared visually they show different lengths of beta-strands and small variance in their conformation. Double checking the experimental origin of the PDB structure 2Q51 revealed that the atom coordinates and the conformation of the secondary structure elements were derived as a mean of multiple experimental 3D structure assignments, using X-ray crystallography. This fact is most certain the reason why the difference of the RMSD between the C-alpha atoms is that small, but the visual difference is bigger between 2Q51 and 2I3C than between 2OH4 and 2I3C (see above). <figure id="2Q51_pymol">

Representation of 2Q51 aligned to 2I3C. Both structures are displayed as carton, 2Q51 in blue, 2I3C in orange. The zinc atom at the active site is represented as gray sphere. The calculated RMSD between the C-alpha atoms of the two structures is as small as 0.223Å, however the displayed secondary structure elements vary in length and sterical conformation (see the beta-strands in the lefter loop region of the protein). The reason may be that the atom coordinates of 2Q51 represent a mean of multiple X-ray crystallography experiments to determine the structure of ASPA.

</figure>

2GU2 vs 2I3C

2GU2 is the ASPA ortholog in rat. Due to its sequence similarity of 84% to the human ASPA protein (2I3C) this protein was chosen as to represent the group of protein structures with a sequence similarity between 60% and 100%. Performing the structural alignment with the aid of Pymol reveals that the difference in the sequence between both proteins is the result of an extension of the N- and C-terminal ends of the protein which form a beta sheet in 2GU2 that is not present in 2I3C (see <xr id="2GU2_pymol">Figure </xr>). Otherwise the sequences are (in terms of 3D structure) identical within the borders of resolution. This is also reflected in the the RMSD of 0.493Å between the two aligned structures. In this example one important dogma, namely that structure is better conserved that sequence can be observed very well. Despite having only about 80% sequence similarity the three dimensional of the two proteins is nearly identical. <figure id="2GU2_pymol">

Representation of 2GU2 aligned to 2I3C. Both structures are displayed as carton, 2GU2 in turquoise, 2I3C in orange. The zinc atom at the active site is represented as gray sphere. Apart form the N- and C-terminal ends of the 2GU2 peptide chain which form a beta-sheet both structures are identical. The calculated RMSD between the two peptides is 0.493Å.

</figure>

2QJ8 vs 2I3C

2QJ8 is a family member of the ASPA protein family. The protein originates from mesorhizobium loti a gram negative bacterium and has a sequence similarity of below 30% if aligned to 2I3C. The superimposition of the two proteins using Pymol demonstrates the previously mentioned dogma of a far better structure conservation than conservation of sequence even better than the previous example (see <xr id="2GU2_pymol">Figure </xr>). Despite the sequence similarity of less than 30% the overall shape of 2QJ8 is conserved very good and has a high resemblance to 2I3C as well as a calculated RMSD of 3.474Å. Focusing on the active site the conservation of the structure is even more visible as it can be seen in <xr id="2QJ8_pymol">Figure </xr> However it has to kept in mind that the two proteins are part of the same protein family and therefore they have this high structural resemblance despite the low sequence similarity. If comparing two proteins from distinct protein families this effect is not likely to be observed. <figure id="2QJ8_pymol">

Representation of 2QJ8 aligned to 2I3C. Both structures are displayed as carton, 2QJ8 in green, 2I3C in orange. The zinc atom at the active site is represented as gray sphere. Both proteins share the same protein family while having a sequence similarity of less than 30%. The fact of the same protein family results however in a high structural resemblance of the two proteins despite the sequence similarity. The RMSD calculated by the superimposition is 3.474Å

</figure>

Remaining Proteins

1AYE, 1BKJ, 1BD0 and 1B3U are proteins that in decreasing order get more distantly related to 2I3C in terms of CATH classification. Superimposing the structures to 2I3C using Pymol it gets visible that 1AYE despite having the same classification of class, architecture and topology is already as distant in terms of spacial arrangement as it is not possible to find well overlapping structures (see <xr id="Remaining_pymol"> Figure</xr> A). This trend is getting worse as the two proteins that are superimposed have less CATH classes in common (see <xr id="Remaining_pymol"> Figure</xr> B-D).

<figure id="Remaining_pymol">

A: 1AYE vs 2I3C
B: 1BKJ vs 2I3C
C: 1BD0 vs 2I3C
D: 1B3U vs 2I3C
Foo Bar

</figure>

Comparison of SSAP, Topmatch, CE & LGA

<figtable id="comp">

Comparison of LGA, SSAP, Topmatch & CE
LGA SSAP (CATH) Topmatch CE
protein RMSD SeqId RMSD SeqId RMSD SeqId RMSD SeqId
2O4H 3.33 9.62 1.04 99 0.65 100 1.02 100
2Q51 1.04 100 1.04 100 1.04 100 1.00 100
2GU2 0.97 86.29 1.23 86 0.91 87 0.97 84.59
2QJ8 2.57 21.53 8.39 9 3.51 17 3.14 13.86
1AYE 2.58 15.24 4.19 8 2.85 13 3.83 7.14
1BKJ 3.26 1.64 18.59 3 2.87 6 4.27 4.04
1BD0 3.36 5.48 20.38 3 2.91 11 5.74 1.71
1B3U 3.57 11.11 28.05 6 3.34 9 6.05 6.30
text

</figtable>

Structural Alignment Evaluation

Tasks