Project ideas

From Protein Prediction 2 Winter Semester 2014
Revision as of 14:25, 14 November 2014 by Goldberg (talk | contribs) (Microarrays)

Venn Diagram Viewer

Venn diagrams present a very popular method to display list comparisons. [Jvenn] is an interactive Venn diagram viewer written in JavaScript. The objective of this project would be to use the code base of Jvenn to make it compatible with BioJS2.0.
Literature: jvenn: an interactive Venn diagram viewer
Mentors: PP2_CS_2014 mentors
Students: 2 Habtom Kahsay Gidey

Jvenn example

Gene Cluster Viewer

The viewer is supposed to show the conserved gene order in prokaryotic genomes. The data will be derived from GenBank.

Source: Example for visualization
Mentors: Björn Grüning (Galaxy) gruening. (at)
Students: 2

Gene cluster

BLAST visualization

BLAST finds regions of local similarity between sequences. It allows to search for genes, proteins and genome segments in databases like Uniprot or Genbank without the need to have an overlap or match in the database (in fact PSI-BLAST can find orthologs with even less than 30% sequence similarity). It is the best known algorithm in bioinformatics with more than 105k citations. The aim of this project to develop an interactive visualization for the result of BLAST - a component that in the end could be used by Uniprot

Mentors: Xavier Watkins xwatkins (at) ebi (dot) ac (dot) uk
Students: 2 (Sebastian Wilzbach seb (at) wilzba (dot) ch ), Homa Rasouli

BLAST result overview.png
BLAST outputbox.png

There is already a BioJS parser for the BLAST XML output.

Dot-Bracket Notation 1

RNA secondary structure is often defined using Dot-Bracket Notation (DBN). Valid structures in DBN format are well-parenthesized words consisting of dots '.', opening '(' and closing ')' parentheses. Dotted positions are unpaired, whereas matching parenthesized positions represent base-pairing nucleotides. As the number of nucleotides interacting is always even (everyone must have a parter), the brackets must be balanced. Source: [Wikipedia:]


Mentors: Björn Grüning (Galaxy) gruening. (at)
Students: 2 Sven Punga, Benedikt Rauscher


Dot-Bracket Notation 2

This project deals with a slightly different representation of the Dot-Bracket Notation.


Mentors: Björn Grüning (Galaxy) gruening. (at)
Students: 2


Pedigree Chart Visualization

A pedigree chart is a simple and easy to read diagram showing the occurrence and appearance or phenotypes of a particular gene in an organism and its ancestors. Pedigrees use a standardized set of symbols:

  • squares: males
  • circles: females
  • diamonds: the sex of the person is unknown
  • filled-in (darker) symbol: someone with the phenotype in question
  • shaded or half-filled symbol: heterozygotes
  • horizontal and a vertical line: connects parents to their offspring
  • ....


Mentors: PP2_CS_2014 mentors
Students: 2 (David Dao, Iris Shih)

Pedigree chart

Sub-cellular localization in a cell

Archaea, Bacteria and Eukaryota form the three domains of life. Eukaryotic cells contain a nucleus and other membrane-bound organelles. The cells of archaea and bacteria in contrast are formed by a single compartment that is surrounded by the plasma membrane (Gram-negative bacteria have an additional outer membrane). The objective of this project is to visualize biological cells and highlight by a user selected sub-cellular compartments in a way that they stand out from the un-selected ones. Similar idea: The Compartments database
Mentors: PP2_CS_2014 mentors
Students: 2 (Maribel, Madhura,Pui)

Pedigree chart

Force directed network (spring algorithm), Graph Viewer


The objective of this project is to visualize a network (large networks of >2000 nodes) in a way that the distance of a node from the rest of the network is determined by the number of nodes it is connected to => the more neighbors a node has the larger is its distance from the network. The component must allow zooming in/out, selection by the number of neighbors, coloring by various thresholds and other graph-related features.

Relevant sources:

Mentors: PP2_CS_2014 mentors, Yana Bromberg (Rutgers University), Björn Grüning (Galaxy) gruening. (at)

Students: 3-4 (ahsan ziaullah/vasantha kumari kommanapalli/Anuradha Ganapathi Rathnachalam)


HSSP curve

The HSSP curve at a threshold of interest (HSSP value=0 is default) must be visualized in a 2D graph. Additionally, alignments of protein sequences, provided by the user, must be plotted on the graph.


Mentors: PP2_CS_2014 mentors
Students: 2 - Agon Lohaj, Bardh Lohaj

HSSP curve

2D Chemical Components Visualizer

The goal is to automatically create 2D diagrams of chemical complexes with known 3D structure according to chemical drawing conventions.

Similar projects:

Mentors: Julian Heinrich (CSIRO) julian.heinrich. (at), Björn Grüning (Galaxy) gruening. (at)
Students: 2-3 -


Genome Browser


We would like to have an integration of several views: Genome view, that includes all chromosomes, chromosome view (just one chromosome) and zoom view.


In this view different features are displayed for several people. Each person is a track. Clicking on a feature releases a pop up window with more info.

Existing projects (in JS):

Relevant sources:

Mentor: Miguel Pignatelli (EMBL-EBI) <emepyc (at)>, Manuel Corpas (TGAC) mc (at)
Students: 3-4

Formats: GFF, BED, bigBED, (GTrack, MAF, WIG)

Visualization of iAnn events

The iAnn calendar is one of the most used tools to annotate and curate scientific announcements. The idea of this project is to visualize iAnn announcements in the following ways:

  • as an interactive map
  • a table
  • and e.g. a pie chart or histograms showing statistics by various keywords (dates, country, field, etc.)

Relevant sources:

Mentor: Manuel Corpas (TGAC) mc. (at)
Students: 2-3 (Akshit Malhotra, Vinod Rajendran, Irman Abdic)



Microarray is a hybridization of a sample (target) to a very large set of probes, which are attached to a solid support. It is used a high-throughput method used to track the interactions and activities of the sample (e.g. proteins) and to determine their function. The chip consists of a support surface such as a glass slide, nitrocellulose membrane, bead, or microtitre plate, to which an array of capture is bound. Any reaction between the probe and the immobilised sample emits a fluorescent signal that is read by a laser scanner. Common examples are either Protein microarrays (to detect the expression of Proteins) or DNA microarrays (to determine sequence or to detect variations in a gene sequence or expression or for gene mapping). Typical formats are Mage-ML or Mage-Tab.

The goal of this project is to develop an interactive visualization of a microarray.

Mentor: PP2_Mentors
Students: 2-3 (Matheus Raszl, Diana Papyan)



Parser for GenBank format and visualization of annotations

Genbank is a Standard format for exchanging annotated sequence. Any bioinformatics library should be able to parse annotated sequence in Genbank format or generate Genbank file from annotated sequence. Genbank format is well documented:

It would be possible to use Genbank parser from Bio-projects like BioJava and BioPerl as a starting point. Parser that can highlight or extract annotated features will be very usefully to people developing web app for sequence visualization.

To get an idea what is expected from a project like this, take a look at this sequence: and see what happens when you click on annotated features, like CDS, TATA_signal.

Mentors: Khalil El Mazouari khalil.elmazouari (at)
Students: Carlo Di Domenico, Enrico Gigantiello, Sai Kiran Krishna Murthy


Graphical Model Editor

Editor of probabilistic/statistical graphical models. Examples: hidden markov models (HMMs), conditional random fields (CRFs), or weighted finite state transducers.

The goal is to construct a general editor to be able to draw:

  1. Nodes
  2. Nodes properties, like start node and end node
  3. Dependencies/edges between nodes (directed and undirected)
  4. Allow cyclical connections
  5. Set values and probabilities for nodes
  6. Set values and probabilities for edges
  7. Automatically/algorithmically draw the graph so that it's best understood (good use of canvas space, no overlapping of edges, ...)
  8. Export the graphs to images
  9. ...

Getting this would be an excellent achievement. After that, the project could continue by trying to integrate the component into actual machine learning frameworks. For example, a first step would be to export the graph's data to some frameworks' internal representation.

  • Mentors: Juan Miguel Cejuela (juanmi (at) and PP2_CS_2014 mentors
  • Students: 2
Graphical models.jpg

A Splice Junction Viewer

Screen Shot 2014-11-11 at 20.54.14.png

BAM files are next generation sequencing alignments of reads in compressed format. As part of the BioJS Google Summer of Code we developed a BAMviewer whose objective is to visualise these files in raw format as seen above.


We would like to know be able to take the information contained in the BAM files and develop a transcriptome assembly viewer. BAM files may contain only those bits of DNA that are transcribed to RNA. This is what we call the transcriptome, as opposed to genome. When reads in a BAM file come from transcribed bits of DNA one can assemble them like a puzzle. This transcriptome assembly is a crucial tool to understand the internal structure of how genes are organised and reveal biologically meaningful features that have been related to disease.

Screen Shot 2014-11-11 at 21.05.35.png

Visualization of events on the GOBLET platform

Similar idea as for iAnn events -> visualization of events based on keywords


Mentor: Manuel Corpas (TGAC) mc. (at)
Students: Yue Xie,Ying Li