Difference between revisions of "Parsing And Visualization Of GenBank"

From Protein Prediction 2 Winter Semester 2014
(After Meeting the Mentor)
(After Meeting the Mentor)
Line 63: Line 63:
 
*1 Meeting (email):
 
*1 Meeting (email):
 
The developer team wrote a first email to the Mentor Dr.El Mazouari, so that they could get to know both the Project and the Client.
 
The developer team wrote a first email to the Mentor Dr.El Mazouari, so that they could get to know both the Project and the Client.
** ''The discussion with the client was around the problem statement. The team decided to make for the Mentor,Dr. El Mazouari, a list of questions containing all the doubts they had. The first clarification was around the nature of the data that has to be handled, and, the application end-user, as a bioinformatician, is interested in '''Sequence Annotation''', no matter what sequence he is working on. The main focus concerning this data has to go in its '''Visualization and Presentation'''. Data must be user-friendly and easy to understand. ''
+
** ''The discussion with the client was around the problem statement. The team decided to make for the Mentor,Dr. El Mazouari, a list of questions containing all the doubts they had. The first clarification was around the nature of the data that has to be handled, and, the application end-user, that, as a bioinformatician, is interested in '''Sequence Annotation''', no matter what sequence he is working on. The main focus concerning this data has to go in its '''Visualization and Presentation'''. Data must be user-friendly and easy to understand. ''
 
** ''Second point of the discussion was around the reason beneath the decision to create a '''new sequence parser''' when there is already ''Genbank'''. The problems here is that Genbank is public and most of the industries will not use it, thus companies will not upload their sequence to public DB by default. Huge amount of industry sequences are in-house sequences that must be processed in-house.''
 
** ''Second point of the discussion was around the reason beneath the decision to create a '''new sequence parser''' when there is already ''Genbank'''. The problems here is that Genbank is public and most of the industries will not use it, thus companies will not upload their sequence to public DB by default. Huge amount of industry sequences are in-house sequences that must be processed in-house.''
 
** ''Third point was on the necessity to have a '''Web-Application'''. Since there are many '''Desktop Applications''' that already read annotated sequences (mostly in GenBank format), the team has to develop a Web-Application. ''
 
** ''Third point was on the necessity to have a '''Web-Application'''. Since there are many '''Desktop Applications''' that already read annotated sequences (mostly in GenBank format), the team has to develop a Web-Application. ''
 
** ''Fourth Dr.El Mazouari clarified some doubts around the '''Bio-Libraries'''. He introduced two of them: '''BioJava''' and '''BioPerl'''. The problem with these is that, as easily understandable from their name, they are not written in Javascript, therefore the team will have to choose from '''Bridging the Java/Perl code and the Javascript-Application ''' or '''Writing their own parser '''. ''
 
** ''Fourth Dr.El Mazouari clarified some doubts around the '''Bio-Libraries'''. He introduced two of them: '''BioJava''' and '''BioPerl'''. The problem with these is that, as easily understandable from their name, they are not written in Javascript, therefore the team will have to choose from '''Bridging the Java/Perl code and the Javascript-Application ''' or '''Writing their own parser '''. ''
** ''Fifth point in the discussion was '[[Parsing_And_Visualization_Of_GenBank#GUI_mockups|The Mockups presented]]': The Mentor told the team that the only '''Input''' type they have to accept is the sequence, thus they don't need to implement the '''Search for id''' feature. Since the final users will be Biologists and the main focus is to make a user-friendly application for them, Dr. El Mazouari asked to remove the '''Customization Feature''', so not to confuse them.''
+
** ''Fifth point in the discussion was '[[Parsing_And_Visualization_Of_GenBank#GUI_mockups|The Mockups presented]]': The Mentor told the team that the only '''Input''' type they have to accept is the sequence, thus they don't need to implement the '''Search for id''' feature. Since the final users will be Bioinformaticians and the main focus is to make a user-friendly application for them, Dr. El Mazouari asked to remove the '''Customization Feature''', so not to confuse them.''
 
** ''Finally the team was asked to implement an additional feature in order to make selected features extracted from an annotated sequences '''Exportable'''. ''
 
** ''Finally the team was asked to implement an additional feature in order to make selected features extracted from an annotated sequences '''Exportable'''. ''
   

Revision as of 19:29, 30 November 2014

About the project

The Genbank contains many annotated sequences and these can be visualized and also the features that occur in this sequence can be displayed,selected,exported. Although the Genbank is very popular in Academia, in industry people dont tend to publish annotated sequences but rather these are maintained in their own Databases, these are the proprietary sequences. In order for the bioinformaticians working on this to visualize this sequence, they are again dependent on propreitary software that are developed as Desktop applications but not Web applications, the major problem with this is that the lab technicians, lose a lot of time doing this not being able to visualize the sequence immediately.

The main task of our project is not only, provided a genbank file parse it and visualize it. But also build it in such a way that it can be easily included in other projects. Although these are the primary goals of our project, there are a few more functional requirements(which can be seen here) and also some features need to be built into the project ( explained here).

Requirements

This section describes all the requirements that we have identified after our meeting with our mentors.

GUI mockups

User experience:

  • Quoting the Mentor (Dr.El Mazouari)*

“Practical use case: a team is developing a web app that implements in-house algorithms for annotated in-house proprietary sequences. The web app screens the company sequence database for specific set of features. Sequence hits are then annotated and a Genbank output is generated for annotated sequences. At this time, wet-lab users download the annotated sequence in Genbank format and then open it in VectorNTi or MacVector in order to view the annotation map and features. These extra steps are time consuming… If they can view the sequence directly in their browser from the web, they will be more productive and “happy scientists;)” Something that will help them to view the annotated sequence, select the features they want and export them will be very welcome”

Functionality

  • Select features from annotated input sequences
  • Parse and Visualize the input sequence in the genbank format
  • Export selected features which should be able to work with later*

Features

  • Easy to use for the end users
  • Highlight and export features in a user friendly
  • Should easily be able to integrated into other web applications.

MockUps

MockUps done before contacting the client:

GenBank input Prior to Mentor's directive:

  • GenBank input Prior to Mentor's directive

GenBank output Prior to Mentor's directive:

  • GenBank output Prior to Mentor's directive

Refactored MockUps resulting from Mentors updates:

GenBank input After to Mentor's directive:

  • GenBank input After to Mentor's directive

GenBank output After to Mentor's directive: 

  • GenBank output After to Mentor's directive


Application design

Expected technical difficulties

  • Implementing the parser
  • Selecting and exporting features dynamically
  • Highlighting multiple features

Fancy libraries you plan to use

  • Jquery
  • D3 (if necessary)
  • BioJava
  • BioJS(?)

Your data

Remarks about your input format

  • The input is going to be annotated sequence and it should be in the Genbank Format

After Meeting the Mentor

  • 1 Meeting (email):

The developer team wrote a first email to the Mentor Dr.El Mazouari, so that they could get to know both the Project and the Client.

    • The discussion with the client was around the problem statement. The team decided to make for the Mentor,Dr. El Mazouari, a list of questions containing all the doubts they had. The first clarification was around the nature of the data that has to be handled, and, the application end-user, that, as a bioinformatician, is interested in Sequence Annotation, no matter what sequence he is working on. The main focus concerning this data has to go in its Visualization and Presentation. Data must be user-friendly and easy to understand.
    • Second point of the discussion was around the reason beneath the decision to create a new sequence parser' when there is already Genbank. The problems here is that Genbank is public and most of the industries will not use it, thus companies will not upload their sequence to public DB by default. Huge amount of industry sequences are in-house sequences that must be processed in-house.
    • Third point was on the necessity to have a Web-Application. Since there are many Desktop Applications that already read annotated sequences (mostly in GenBank format), the team has to develop a Web-Application.
    • Fourth Dr.El Mazouari clarified some doubts around the Bio-Libraries. He introduced two of them: BioJava and BioPerl. The problem with these is that, as easily understandable from their name, they are not written in Javascript, therefore the team will have to choose from Bridging the Java/Perl code and the Javascript-Application or Writing their own parser .
    • Fifth point in the discussion was 'The Mockups presented': The Mentor told the team that the only Input type they have to accept is the sequence, thus they don't need to implement the Search for id feature. Since the final users will be Bioinformaticians and the main focus is to make a user-friendly application for them, Dr. El Mazouari asked to remove the Customization Feature, so not to confuse them.
    • Finally the team was asked to implement an additional feature in order to make selected features extracted from an annotated sequences Exportable.
  • 2 Meeting (Facetime video-chat):

After the first contact with the Mentor, the team started brainstorming around the problem statement, and the result of this was the refinement of the functional requirements that Dr El Mazouari requested. They asked then to have another exchange with the Mentor. Since one of the topics of the discussion was concerning biological arguments and the team has not enough experience in this sphere, Tutor Tatyana Goldberg offered to join the Facetime talk.

    • First
    • Second