Structural Alignments (Phenylketonuria)

Summary

Structural alignments are used to determine the functional and evolutionary relationships between protein structures. <ref name="struc_align"> Walter Pirovano, K Anton Feenstra and Jaap Heringa (2008): "The meaning of alignment: lessons from structural diversity". BMC Bioinformatics Vol.9:556. doi:10.1186/1471-2105-9-556 </ref> In this task, we first generated a dataset of different related and unrelated structures to our protein sequence (PAH). Subsequently, we used different methods and measurements to quantify structural similarity between the given structures. Then, we generated structural alignments for the evaluation of some sequence-based alignments of Task 2. The results and appendant discussions are shown below.

Explore structural alignments

Lab journal

Dataset generation

Our protein (PAH) has the CATH Code 1.10.800.10 (Phenylalanine Hydroxylase). We used, for the generation of the dataset, similar and dissimilar structures to this protein. Thus, we added the following structures into it:

reference structure of PAH: 2PAH (96,41% identity)
identical sequence with filled binding site: 1LRM (100% identity --> pdb entry: looked at 3D structure and saw two filled binding site with the ligands: FE and HBI)
identical sequence with unfilled binding site: not found anyone
low sequence identity: 3LUY (32,2% - no pdb ID under 30%)
high sequence identity: pdb ID: 2PHM (89,7%)
CAT: 1J8U (CATH Code: 1.10.800.10) - there is no other category than this for CAT
CA: 2B5U (CATH Code: 1.10.287.620)
C: 3BQO (CATH Code: 1.25.40.210)
other CATH category: 1V8H (CATH Code: 2.60.40.10)

Now we want to apply different structural alignment methods with this dataset. In this case, each structure has only to be superimposed on the reference structure and not on the other structures too.

Pymol

Pymol is a python-enhanced and open source molecular visualization tool. It is particularly suitable for 3D visualization of proteins and small molecules as well as their density, surfaces and trajectories. It also includes molecular editing like aligning or superimposition of two molecules. <ref> http://sourceforge.net/projects/pymol/ short Pymol summary, retrieved June 02, 2013 </ref>

// TODO: pictures of one or two structures with defined binding site:

with all atoms
only C-alpha
only binding site

What changes and why?

LGA

The LGA (Local-Global Alignment) method affords the possibility to compare fragments or whole protein structures in sequence dependent and independent modes <ref name="lga"> Adam Zemla (2003): "LGA: a method for finding 3D similarities in protein structures". Nucleic Acids Research Vol.31(13):3370-3374. doi:10.1093/nar/gkg571 </ref>. It uses the two methods LCS(longest continuous segments) and GDT (global distance test) to detect regions of local and global structural similarity <ref name="slides"> File:Presentation structuralAlignments.pdf: Slides of Katharinas presentation. </ref>. The generated data can successfully be used in a scoring function to rank two structures related to the level of similarity between them. It allows structure classification when many proteins are analyzed, as well as clustering of similar protein structure fragments <ref name="lga"/>

SSAP / CATHEDRAL (used by CATH)

For the alignment method used by CATH, we utilized the SSAP Server. The sequential structure alignment program (SSAP) is a method for comparing protein structures based on distance plots. It computes the residue view of each residue by the set of distance vectors from Cβ atom to Cβ atom of all other residues. <ref name="ssap"> Christine A. Orengo and William R. Taylor (1996): "SSAP: Sequential Structure Alignment Program for Protein Structure Comparison". Methods in Enzymology Vol.266:617–635. PMID:8743709 </ref>

TopMatch

TopMatch is a successor of ProSup, a structure comparison tool. It is useful for protein structure alignments, visualization of structural similarities and highlighting relationships between proteins. <ref name="topmatch"> Manfred J. Sippl and Markus Wiederstein (2008): "A note on difficult structure alignment problems". Bioinformatics Vol.24(3): 426-427 doi:10.1093/bioinformatics/btm622 </ref> Thereby, the method represents structures by Cα atoms and joins multiple chains to single ones. <ref name="slides"/>

SAP or CE

First, we wanted to do the structural alignment with the SAP webserver, but we did get an Error with this program. So, we used the CE server to build the structural alignment. CE builds an alignment between two protein structures based on a combinatorial extension (CE) of an alignment path defined by aligned fragment pairs (AFPs). These AFPs are fragments of each protein, which confer structure similarity and are based on local geometry. It is a fast and accurate algorithm in finding an optimal alignment. <ref> Ilya N. Shindyalov and Philip E. Bourne (1998): "Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path". Protein Engineering Vol.11(9): 739-747. </ref> Furthermore CE is direct available at RSCB-CE (RCSB PDB Protein Comparison Tool), where only the algorithmus jCEalgorithm has to be selected. Additionally a variety of different methods for generating sequence and structural alignments are included here.

Modelling scores

To compare the different models, the RMSDs (root-mean-square deviation) are compared. In TopMatch the same formular is taken but called Er (root-mean-square error). The RMSD gives the squared distance between corresponding positions of two superimposed proteins in Ångström. The results are shown in <xr id="rmsd"/>. <figtable id="rmsd">

RMSD results
Method	1lrm	3luy	2phm	1j8u	2b5u	3bqo	1vh8
LGA-RMSD	0.81	3.30	0.88	0.73	3.07	3.59	3.42
SSAP-RMSD	0.99	18.77	1.24	1.02	39.16	22.39	7.27
TopMatch-Er	0.60	1.98	0.81	0.63	1.21	1.12	3.25
CE-RMSD	0.65	5.13	0.95	0.68	4.06	4.68	5.92

Root-mean-square deviation/error in Ångström for the four protein structure alignment predictors LGA, SSAP, TopMatch and CE.

</figtable>

Lowest RMSDs were found with TopMatch as even for enrelated structures TopMatch always finds something to align using local alignment. For example 3luy with a low sequence identity still has a RMSD of 1.98, however if you look at the structure or the alignment itself only small accordances can be viewed, which also can be caused by chance.

LGA and CE sometimes the one sometimes the other is better. For very similar structures CE better, otherwise LGA???
worst/highest RMSDs: SSAP, maybe the Cβ are more distant???

careful about the sidechains: here always 2pah.A as query is taken and xx.A as target

...

Evaluate sequence alignments

Lab journal
<figtable id="model_rmsd">

LGA and hhsearch results
	LGA				hhsearch
pdb	RMSD	LGA_S	LGA_Q	seq_id	probability	e-value	identities(%)
1phz	0.83	90.65	32.44	99.34	100.00	6.9e-165	92
1j8u	0.73	90.29	35.83	99.67	100.00	3.1e-135	100
2v27	1.70	62.77	12.55	96.02	100.00	3.6e-74	32
2qmx	3.18	7.46	1.25	4.88	98.20	1.1e-09	36
3luy	2.82	7.17	1.24	13.89	98.07	3.3e-09	22
1qey	0.64	3.65	1.63	0.00	54.00	3.4	67
1wyp	2.67	8.43	1.37	0.00	29.42	15	19
1a6s	3.15	6.93	1.08	11.43	20.59	29	36

Results of LGA comparison of our protein against others, where the proteins are found with hhsearch and the Cαs are located with hhmakemodel.pl. Only the results of eight example proteins are shown.

</figtable>

last two have a very low probability...
higher RMSDs with higher sequence identities and especially lower e-values.

Results of Pearsons correlation between RMSD of LGA and e-value of hhsearch: -0.4843469
Results of Pearsons correlation between RMSD of LGA and sequence identity of hhsearch: -0.8340592

Correlation results between e-value and RMSD and between sequence identity and RMSD for the eight proteins (3 different correlation methods):

method           Evalue          Identity 
pearson:         0.4843469       -0.8340592  
spearman:        0.3809524       -0.5389318 
kendall:         0.2857143       -0.3273268

Correlations for the eight proteins of RMSD with Prob, Eval, Identity, RMSD, Seq_Id, LGA_S, LGA_Q:

-0.3195465 0.4843469 -0.8340592 1 -0.6117928 -0.6677496 -0.7023956

Correlation results between e-value and RMSD and between sequence identity and RMSD for the 26 proteins (3 different correlation methods):

method           Evalue          Identity 
pearson:         0.4183518       -0.8315063 
spearman:        0.4724504       -0.6272464 
kendall:         0.3571446       -0.4976726

Correlations for the 26 proteins of RMSD with Prob, Eval, Identity, RMSD, Seq_Id, LGA_S, LGA_Q:

-0.4124872 0.4183518 -0.8315063 1 -0.6885208 -0.7186945 -0.7479712

References

Structural Alignments (Phenylketonuria)

Contents

Summary

Explore structural alignments

Dataset generation

Pymol

LGA

SSAP / CATHEDRAL (used by CATH)

TopMatch

SAP or CE

Modelling scores

Evaluate sequence alignments

References

Navigation menu

Views

Personal tools

Bioinformatik navigation

MediaWiki navigation

Search

Tools