Introduction to HSSP curve
HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences.
There is an existing implementation that can be found on this page.
The program accepts either a set of sequences in FASTA format or a list of identifiers from either of the following protein databases: SWISS-PROT (13), PDB (14) or TrEMBL (13). Alternatively, one of the following alignment-file formats is accepted to bypass the first step of the algorithm (see below): BLAST, PSIBLAST, pair, markx0, markx1, markx2, markx3, markx10 or srspair.
The program runs based on a greedy algorithm that calculates the HSSP-values.
Visualize the HSSP curve and allow the user to dynamically filter or categorize the data shown on the graph for better insights.
1. Understand the HSSP curve and the calculations needed to visualize it 2. Gather input (BLAST results) with which we can work on visualizing 3. Parse BLAST results input 4. Calculate and visualize the HSSP curve 5. Implement dynamic filtering of the curve 6. Get feedback from biologist about possible improvements for better insights 7. Work on changes/new features based on the feedback