Parsing And Visualization Of GenBank

From Protein Prediction 2 Winter Semester 2014
Revision as of 17:38, 30 November 2014 by Ppwikiuser (talk | contribs) (About the project)

About the project

The Genbank contains many annotated sequences and these can be visualized and also the features that occur in this sequence can be displayed,selected,exported. Although the Genbank is very popular in Academia, in industry people dont tend to publish annotated sequences but rather these are maintained in their own Databases, these are the proprietary sequences. In order for the bioinformaticians working on this to visualize this sequence, they are again dependent on propreitary software that are developed as Desktop applications but not Web applications, the major problem with this is that the lab technicians, lose a lot of time doing this not being able to visualize the sequence immediately.

The main task of our project is not only, provided a genbank file parse it and visualize it. But also build it in such a way that it can be easily included in other projects. Although these are the primary goals of our project, there are a few more functional requirements(which can be seen here) and also some features need to be built into the project ( explained here).

Requirements

This section describes all the requirements that we have identified after our meeting with our mentors.

GUI mockups

User experience:

  • Quoting the Mentor (Dr.El Mazouari)*

“Practical use case: a team is developing a web app that implements in-house algorithms for annotated in-house proprietary sequences. The web app screens the company sequence database for specific set of features. Sequence hits are then annotated and a Genbank output is generated for annotated sequences. At this time, wet-lab users download the annotated sequence in Genbank format and then open it in VectorNTi or MacVector in order to view the annotation map and features. These extra steps are time consuming… If they can view the sequence directly in their browser from the web, they will be more productive and “happy scientists;)” Something that will help them to view the annotated sequence, select the features they want and export them will be very welcome”

Functionality

  • Select features from annotated input sequences
  • Parse and Visualize the input sequence in the genbank format
  • Export selected features which should be able to work with later*

Features

  • Easy to use for the end users
  • Highlight and export features in a user friendly
  • Should easily be able to integrated into other web applications.

MockUps

MockUps done before contacting the client:

GenBank input Prior to Mentor's directive:

  • GenBank input Prior to Mentor's directive

GenBank output Prior to Mentor's directive:

  • GenBank output Prior to Mentor's directive

Refactored MockUps resulting from Mentors updates:

GenBank input After to Mentor's directive:

  • GenBank input After to Mentor's directive

GenBank output After to Mentor's directive: 

  • GenBank output After to Mentor's directive


Application design

Expected technical difficulties

  • Implementing the parser
  • Selecting and exporting features dynamically
  • Highlighting multiple features

Fancy libraries you plan to use

  • Jquery
  • D3 (if necessary)
  • BioJava
  • BioJS(?)

Your data

Remarks about your input format

  • The input is going to be annotated sequence and it should be in the Genbank Format

After Meeting the Mentor

  • 1 meeting (email):
    • The discussion with the client was around the problem statement. The team decided to make for the Mentor,Dr. El Mazouari, a list of questions containing all the doubts they had. The first clarification was around the nature of the data that has to be handled, and, the application end-user, as a bioinformatician, is interested in Sequence Annotation, no matter what sequence he is working on. The main focus concerning this data has to go in its Visualization and Presentation. Data must be user-friendly and easy to understand.
    • Second point of the discussion was around the reason beneath the decision to create a new sequence parser' when there is already Genbank. The problems here is that Genbank is public and most of the industries will not use it, thus companies will not upload their sequence to public DB by default. Huge amount of industry sequences are in-house sequences that must be processed in-house.
    • Third point was on the necessity to have a Web-Application. Since there are many Desktop Applications that already read annotated sequences (mostly in GenBank format), the team has to develop a Web-Application.
    • Fourth Dr.El Mazouari clarified some doubts around the Bio-Libraries. He introduced two of them: BioJava and BioPerl. The problem with these is that, as easily understandable from their name, they are not written in Javascript, therefore the team will have to choose from Bridging the Java/Perl code and the Javascript-Application or Writing their own parser .
    • Fifth point in the discussion was 'The Mockups presented'. GUI Mockups
  • 2 meeting (Facetime videochat):