Introduction to HSSP curve
HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences.
There is an existing implementation that can be found on this page.
The program accepts either a set of sequences in FASTA format or a list of identifiers from either of the following protein databases: SWISS-PROT (13), PDB (14) or TrEMBL (13). Alternatively, one of the following alignment-file formats is accepted to bypass the first step of the algorithm (see below): BLAST, PSIBLAST, pair, markx0, markx1, markx2, markx3, markx10 or srspair.
It runs based on a greedy algorithm that calculates the HSSP-values.
Visualize the HSSP curve and allow the user to dynamically filter or categorize the data shown on the graph for better insights.
- Understand the HSSP curve and the calculations needed to visualize it
- Gather input (BLAST results) with which we can work on visualizing
- Parse BLAST results input
- Calculate and visualize the HSSP curve
- Implement dynamic filtering of the curve
- Get feedback from biologist about possible improvements for better insights
- Work on changes/new features based on the feedback
Libraries we plan to use
For the first releases we plan to use:
And later on react to changes