Difference between revisions of "Structural Alignments (Phenylketonuria)"
(→LGA) |
(→LGA) |
||
Line 29: | Line 29: | ||
=== LGA === |
=== LGA === |
||
− | The [http://proteinmodel.org/AS2TS/LGA/lga.html LGA] (Local-Global Alignment) method affords the possibility to compare fragments or whole protein structures in sequence dependent and independent modes. The generated data can successfully be used in a scoring function to rank two structures related to the level of similarity between them. It allows structure classification when many |
+ | The [http://proteinmodel.org/AS2TS/LGA/lga.html LGA] (Local-Global Alignment) method affords the possibility to compare fragments or whole protein structures in sequence dependent and independent modes. The generated data can successfully be used in a scoring function to rank two structures related to the level of similarity between them. It allows structure classification when many proteins are analyzed, as well as clustering of similar protein structure fragments. <ref name="lga"> Adam Zemla (2003): "[http://nar.oxfordjournals.org/content/31/13/3370.long LGA: a method for finding 3D similarities in protein structures]". Nucleic Acids Research Vol.31(13):3370-3374. [http://en.wikipedia.org/wiki/Digital_object_identifier doi]:[http://nar.oxfordjournals.org/content/31/13/3370.abstract 10.1093/nar/gkg571] </ref> |
=== SSAP / CATHEDRAL (used by CATH) === |
=== SSAP / CATHEDRAL (used by CATH) === |
Revision as of 18:10, 2 June 2013
Contents
Summary
Structural alignments are used to determine the functional and evolutionary relationships between protein structures. <ref name="struc_align"> Walter Pirovano, K Anton Feenstra and Jaap Heringa (2008): "The meaning of alignment: lessons from structural diversity". BMC Bioinformatics Vol.9:556. doi:10.1186/1471-2105-9-556 </ref> In this task, we first generated a dataset of different related and unrelated structures to our protein sequence (PAH). Subsequently, we used different methods and measurements to quantify structural similarity between the given structures. Then, we generated structural alignments for the evaluation of some sequence-based alignments of Task 2. The results and appendant discussions are shown below.
Explore structural alignments
Dataset generation
Our protein (PAH) has the CATH Code 1.10.800.10 (Phenylalanine Hydroxylase). We used, for the generation of the dataset, similar and dissimilar structures to this protein. Thus, we added the following structures into it:
- reference structure of PAH: 2PAH (96,41% identity)
- identical sequence with filled binding site: 1LRM (100% identity --> pdb entry: looked at 3D structure and saw two filled binding site with the ligands: FE and HBI)
- identical sequence with unfilled binding site: not found anyone
- low sequence identity: 3LUY (32,2% - no pdb ID under 30%)
- high sequence identity: pdb ID: 2PHM (89,7%)
- CAT: 1J8U (CATH Code: 1.10.800.10) - there is no other category than this for CAT
- CA: 2B5U (CATH Code: 1.10.287.620)
- C: 3BQO (CATH Code: 1.25.40.210)
- other CATH category: 1V8H (CATH Code: 2.60.40.10)
Now we want to apply different structural alignment methods with this dataset. In this case, each structure has only to be superimposed on the reference structure and not on the other structures too.
Pymol
Pymol is a python-enhanced and open source molecular visualization tool. It is particularly suitable for 3D visualization of proteins and small molecules as well as their density, surfaces and trajectories. It also includes molecular editing like aligning or superimposition of two molecules. <ref> http://sourceforge.net/projects/pymol/ short Pymol summary, retrieved June 02, 2013 </ref>
// TODO: pictures of one or two structures with defined binding site:
- with all atoms
- only C-alpha
- only binding site
What changes and why?
LGA
The LGA (Local-Global Alignment) method affords the possibility to compare fragments or whole protein structures in sequence dependent and independent modes. The generated data can successfully be used in a scoring function to rank two structures related to the level of similarity between them. It allows structure classification when many proteins are analyzed, as well as clustering of similar protein structure fragments. <ref name="lga"> Adam Zemla (2003): "LGA: a method for finding 3D similarities in protein structures". Nucleic Acids Research Vol.31(13):3370-3374. doi:10.1093/nar/gkg571 </ref>
SSAP / CATHEDRAL (used by CATH)
SSAP
...
uses Cβ
TopMatch
TopMatch
...
uses Cα
SAP or CE
SAP? ->Error!
->
CE
CE-PDB
...
Modelling scores
To compare the different models, the RMSDs (root-mean-square deviation) are compared. In TopMatch the same formular is taken but called Er (root-mean-square error). The RMSD gives the squared distance between corresponding positions of two superimposed proteins in Ångström. The results are shown in <xr id="rmsd"/>. <figtable id="rmsd">
RMSD results | |||||||
---|---|---|---|---|---|---|---|
Method | 1lrm | 3luy | 2phm | 1j8u | 2b5u | 3bqo | 1vh8 |
LGA-RMSD | 0.81 | 3.30 | 0.88 | 0.73 | 3.07 | 3.59 | 3.42 |
SSAP-RMSD | 0.99 | 18.77 | 1.24 | 1.02 | 39.16 | 22.39 | 7.27 |
TopMatch-Er | 0.60 | 1.98 | 0.81 | 0.63 | 1.21 | 1.12 | 3.25 |
CE-RMSD | 0.65 | 5.13 | 0.95 | 0.68 | 4.06 | 4.68 | 5.92 |
</figtable>
- lowest RMSDs: TopMatch
- LGA and CE sometimes the one sometimes the other is better. For very similar structures CE better, otherwise LGA???
- worst/highest RMSDs: SSAP, maybe the Cβ are more distant???
- careful about the sidechains: here always 2pah.A as query is taken and xx.A as target
...
Evaluate sequence alignments
Lab journal
<figtable id="model_rmsd">
LGA and hhsearch results | |||||||
---|---|---|---|---|---|---|---|
LGA | hhsearch | ||||||
pdb | RMSD | LGA_S | LGA_Q | seq_id | probability | e-value | identities(%) |
1phz | 0.83 | 90.65 | 32.44 | 99.34 | 100.00 | 6.9e-165 | 92 |
1j8u | 0.73 | 90.29 | 35.83 | 99.67 | 100.00 | 3.1e-135 | 100 |
2v27 | 1.70 | 62.77 | 12.55 | 96.02 | 100.00 | 3.6e-74 | 32 |
2qmx | 3.18 | 7.46 | 1.25 | 4.88 | 98.20 | 1.1e-09 | 36 |
3luy | 2.82 | 7.17 | 1.24 | 13.89 | 98.07 | 3.3e-09 | 22 |
1qey | 0.64 | 3.65 | 1.63 | 0.00 | 54.00 | 3.4 | 67 |
1wyp | 2.67 | 8.43 | 1.37 | 0.00 | 29.42 | 15 | 19 |
1a6s | 3.15 | 6.93 | 1.08 | 11.43 | 20.59 | 29 | 36 |
</figtable>
- last two have a very low probability...
References
<references/>