Parsing And Visualization Of GenBank

From Protein Prediction 2 Winter Semester 2014
Revision as of 17:34, 30 November 2014 by Ppwikiuser (talk | contribs) (Roadmap)

About the project


This section describes all the requirements that we have identified after our meeting with our mentors.

GUI mockups

User experience:

  • Quoting the Mentor (Dr.El Mazouari)*

“Practical use case: a team is developing a web app that implements in-house algorithms for annotated in-house proprietary sequences. The web app screens the company sequence database for specific set of features. Sequence hits are then annotated and a Genbank output is generated for annotated sequences. At this time, wet-lab users download the annotated sequence in Genbank format and then open it in VectorNTi or MacVector in order to view the annotation map and features. These extra steps are time consuming… If they can view the sequence directly in their browser from the web, they will be more productive and “happy scientists;)” Something that will help them to view the annotated sequence, select the features they want and export them will be very welcome”


  • Select features from annotated input sequences
  • Parse and Visualize the input sequence in the genbank format
  • Export selected features which should be able to work with later*


  • Easy to use for the end users
  • Highlight and export features in a user friendly
  • Should easily be able to integrated into other web applications.


MockUps done before contacting the client:

GenBank input Prior to Mentor's directive:

  • GenBank input Prior to Mentor's directive

GenBank output Prior to Mentor's directive:

  • GenBank output Prior to Mentor's directive

Refactored MockUps resulting from Mentors updates:

GenBank input After to Mentor's directive:

  • GenBank input After to Mentor's directive

GenBank output After to Mentor's directive: 

  • GenBank output After to Mentor's directive

Application design

Expected technical difficulties

  • Implementing the parser
  • Selecting and exporting features dynamically
  • Highlighting multiple features

Fancy libraries you plan to use

  • Jquery
  • D3 (if necessary)
  • BioJava
  • BioJS(?)

Your data

Remarks about your input format

  • The input is going to be annotated sequence and it should be in the Genbank Format

After Meeting the Mentor

  • 1 meeting (email):
    • The discussion with the client was around the problem statement. The team decided to make for the Mentor,Dr. El Mazouari, a list of questions containing all the doubts they had. The first clarification was around the nature of the data that has to be handled, and, the application end-user, as a bioinformatician, is interested in Sequence Annotation, no matter what sequence he is working on. The main focus concerning this data has to go in its Visualization and Presentation. Data must be user-friendly and easy to understand.
    • Second point of the discussion was around the reason beneath the decision to create a new sequence parser' when there is already Genbank. The problems here is that Genbank is public and most of the industries will not use it, thus companies will not upload their sequence to public DB by default. Huge amount of industry sequences are in-house sequences that must be processed in-house.
    • Third point was on the necessity to have a Web-Application. Since there are many Desktop Applications that already read annotated sequences (mostly in GenBank format), the team has to develop a Web-Application.
    • Fourth Dr.El Mazouari clarified some doubts around the Bio-Libraries. He introduced two of them: BioJava and BioPerl. The problem with these is that, as easily understandable from their name, they are not written in Javascript, therefore the team will have to choose from Bridging the Java/Perl code and the Javascript-Application or Writing their own parser .
    • Fifth point in the discussion was 'The Mockups presented'. GUI Mockups
  • 2 meeting (Facetime videochat):