HSSP curve

From Protein Prediction 2 Winter Semester 2014
Revision as of 11:54, 20 November 2014 by Ppwikiuser (talk | contribs)

Introduction to HSSP curve

HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences.


Different thresholds for HSSP-curve
Different thresholds for HSSP-curve


Existing visualisations

There is an existing implementation that can be found on this page.

The program accepts either a set of sequences in FASTA format or a list of identifiers from either of the following protein databases: SWISS-PROT (13), PDB (14) or TrEMBL (13). Alternatively, one of the following alignment-file formats is accepted to bypass the first step of the algorithm (see below): BLAST, PSIBLAST, pair, markx0, markx1, markx2, markx3, markx10 or srspair.

It runs based on a greedy algorithm that calculates the HSSP-values.

Tool's Objective

Visualize the HSSP curve and allow the user to dynamically filter or categorize the data shown on the graph for better insights.

Core Functionalities

Task Implemented
Import BLAST results No
Parse BLAST results No
Visualize HSSP curve No
Ability to filter based on a threshold No

Roadmap

  • Understand the HSSP curve and the calculations needed to visualize it
  • Gather input (BLAST results) with which we can work on visualizing
  • Parse BLAST results input
  • Calculate and visualize the HSSP curve
  • Implement dynamic filtering of the curve
  • Get feedback from biologist about possible improvements for better insights
  • Work on changes/new features based on the feedback

People

References

http://en.wikipedia.org/wiki/Homology-derived_Secondary_Structure_of_Proteins

http://www.ncbi.nlm.nih.gov/pubmed/?term=UniqueProt%3A+creating+representative+protein+sequence+sets

http://en.wikipedia.org/wiki/Protein_superfamily

http://nar.oxfordjournals.org/content/24/1/201.full.pdf