Difference between revisions of "Project ideas"

From Protein Prediction 2 Winter Semester 2014
(Dot-Bracket Notation)
(Venn Diagram Viewer)
 
(119 intermediate revisions by 5 users not shown)
Line 4: Line 4:
 
Literature: [http://www.ncbi.nlm.nih.gov/pubmed/?term=jvenn%3A+an+interactive+Venn+diagram+viewer jvenn: an interactive Venn diagram viewer] <br>
 
Literature: [http://www.ncbi.nlm.nih.gov/pubmed/?term=jvenn%3A+an+interactive+Venn+diagram+viewer jvenn: an interactive Venn diagram viewer] <br>
 
Mentors: PP2_CS_2014 mentors <br>
 
Mentors: PP2_CS_2014 mentors <br>
Students: 2
+
Students: Habtom Kahsay Gidey, Mohamed Ahmed
 
[[File:Jvenn.png | 500px | center | Jvenn example]]
 
[[File:Jvenn.png | 500px | center | Jvenn example]]
   
  +
==Gene Cluster Viewer==
==Genome Browser==
 
   
  +
The viewer is supposed to show the conserved gene order in prokaryotic genomes. The data will be derived from [http://www.ncbi.nlm.nih.gov/genbank GenBank].
   
  +
Source: [https://www.biostars.org/p/19009/ Example for visualization] <br>
==Protein Viewer==
 
 
 
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
 
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
 
Students: 2
 
Students: 2
   
  +
[[File:gene_cluster.png | 500px | center | Gene cluster]]
==Chemical Viewer==
 
   
  +
== BLAST visualization ==
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
 
Students: 2
 
   
  +
BLAST finds regions of local similarity between sequences. It allows to search for genes, proteins and genome segments in databases like Uniprot or Genbank without the need to have an overlap or match in the database (in fact PSI-BLAST can find orthologs with even less than 30% sequence similarity). It is the best known algorithm in bioinformatics with more than [http://scholar.google.com/citations?user=VRccPlQAAAAJ&hl=en 105k citations]. The aim of this project to develop an interactive visualization for the result of BLAST - a component that in the end could be used by [http://uniprot.org Uniprot]
==Graph Viewer==
 
   
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
+
Mentors: Xavier Watkins xwatkins (at) ebi (dot) ac (dot) uk <br>
  +
Students: 2 (Sebastian Wilzbach seb (at) wilzba (dot) ch ), Homa Rasouli
Students: 2
 
   
  +
[[File:Kablammo.png | 500px | center]]
==Gene Cluster Viewer==
 
   
  +
[[File:BLAST result overview.png| 500px | center]]
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
 
  +
[[File:BLAST_outputbox.png | 500px | center]]
Students: 2
 
   
  +
There is already a BioJS [https://github.com/greenify/biojs-io-blast parser] for the BLAST XML output.
==Dot-Bracket Notation==
 
  +
  +
* [http://blast.be-md.ncbi.nlm.nih.gov/Blast.cgi NCBI-BLAST]
  +
* [http://www.uniprot.org/blast/ Uniprot-BLAST]
  +
* [http://kablammo.wasmuthlab.org Kablammo]
  +
  +
==Dot-Bracket Notation 1==
   
 
RNA secondary structure is often defined using Dot-Bracket Notation (DBN). Valid structures in DBN format are well-parenthesized words consisting of dots '.', opening '(' and closing ')' parentheses. Dotted positions are unpaired, whereas matching parenthesized positions represent base-pairing nucleotides. As the number of nucleotides interacting is always even (everyone must have a parter), the brackets must be balanced. Source: [Wikipedia: http://ultrastudio.org/en/Dot-Bracket_Notation]
 
RNA secondary structure is often defined using Dot-Bracket Notation (DBN). Valid structures in DBN format are well-parenthesized words consisting of dots '.', opening '(' and closing ')' parentheses. Dotted positions are unpaired, whereas matching parenthesized positions represent base-pairing nucleotides. As the number of nucleotides interacting is always even (everyone must have a parter), the brackets must be balanced. Source: [Wikipedia: http://ultrastudio.org/en/Dot-Bracket_Notation]
   
 
Sources:
 
Sources:
* [http://www.ncbi.nlm.nih.gov/pubmed/?term=RNApdbee%E2%80%93%E2%80%93a+webserver+to+derive+secondary+structures+from+pdb+files+of+knotted+and+unknotted+RNAs Antczak M, SzachniukM, et al. (2014) RNApdbee––a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. NAR.]
 
 
* [http://ultrastudio.org/en/Dot-Bracket_Notation Wikipedia]
 
* [http://ultrastudio.org/en/Dot-Bracket_Notation Wikipedia]
  +
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
  +
Students: 2 Sven Punga, Benedikt Rauscher
  +
[[File:rna.png | 200px | center | RNA ]]
   
  +
==Dot-Bracket Notation 2==
[[File:rna.png | 300px | center | RNA ]]
 
   
  +
This project deals with a slightly different representation of the Dot-Bracket Notation.
   
  +
Sources:
  +
* [http://www.ncbi.nlm.nih.gov/pubmed/?term=RNApdbee%E2%80%93%E2%80%93a+webserver+to+derive+secondary+structures+from+pdb+files+of+knotted+and+unknotted+RNAs Antczak M, SzachniukM, et al. (2014) RNApdbee––a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. NAR.]
 
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
 
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
 
Students: 2
 
Students: 2
  +
[[File:rna2.png | 200px | center | RNA ]]
   
 
==Pedigree Chart Visualization==
 
==Pedigree Chart Visualization==
Line 57: Line 68:
 
* [http://en.wikipedia.org/wiki/Pedigree_chart Wikipedia]
 
* [http://en.wikipedia.org/wiki/Pedigree_chart Wikipedia]
 
Mentors: PP2_CS_2014 mentors <br>
 
Mentors: PP2_CS_2014 mentors <br>
Students: 2
+
Students: 2 (Still free!)
   
 
[[File:Pedigree chart.png | 300px | center | Pedigree chart]]
 
[[File:Pedigree chart.png | 300px | center | Pedigree chart]]
Line 64: Line 75:
 
Archaea, Bacteria and Eukaryota form the three domains of life. Eukaryotic cells contain a nucleus and other membrane-bound organelles. The cells of archaea and bacteria in contrast are formed by a single compartment that is surrounded by the plasma membrane (Gram-negative bacteria have an additional outer membrane). The objective of this project is to visualize biological cells and highlight by a user selected sub-cellular compartments in a way that they stand out from the un-selected ones.
 
Archaea, Bacteria and Eukaryota form the three domains of life. Eukaryotic cells contain a nucleus and other membrane-bound organelles. The cells of archaea and bacteria in contrast are formed by a single compartment that is surrounded by the plasma membrane (Gram-negative bacteria have an additional outer membrane). The objective of this project is to visualize biological cells and highlight by a user selected sub-cellular compartments in a way that they stand out from the un-selected ones.
 
Similar idea: [http://compartments.jensenlab.org/Search The Compartments database] <br>
 
Similar idea: [http://compartments.jensenlab.org/Search The Compartments database] <br>
Mentors: PP2_CS_2014 mentors, Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com <br>
+
Mentors: PP2_CS_2014 mentors <br>
Students: 2
+
Students: 2 (Maribel, Madhura,Prapaporn)
   
 
[[File:Compartments.png | 300px | center | Pedigree chart]]
 
[[File:Compartments.png | 300px | center | Pedigree chart]]
   
==Force directed network (spring algorithm)==
+
==Force directed network (spring algorithm), Graph Viewer ==
  +
  +
'''(Challenging!)'''
  +
 
The objective of this project is to visualize a network (large networks of >2000 nodes) in a way that the distance of a node from the rest of the network is determined by the number of nodes it is connected to => the more neighbors a node has the larger is its distance from the network. The component must allow zooming in/out, selection by the number of neighbors, coloring by various thresholds and other graph-related features.
 
The objective of this project is to visualize a network (large networks of >2000 nodes) in a way that the distance of a node from the rest of the network is determined by the number of nodes it is connected to => the more neighbors a node has the larger is its distance from the network. The component must allow zooming in/out, selection by the number of neighbors, coloring by various thresholds and other graph-related features.
   
Line 75: Line 89:
 
* [http://sydney.edu.au/engineering/it/~aquigley/3dfade/ http://sydney.edu.au/engineering/it/~aquigley/3dfade/]
 
* [http://sydney.edu.au/engineering/it/~aquigley/3dfade/ http://sydney.edu.au/engineering/it/~aquigley/3dfade/]
 
* [http://gephi.github.io/ Gephi]
 
* [http://gephi.github.io/ Gephi]
  +
* [https://github.com/bgruening/galaxytools/tree/master/graph_converter Björn's graphs converters]
Mentors: PP2_CS_2014 mentors, Yana Bromberg (Rutgers University) <br>
 
  +
* [https://github.com/bgruening/galaxytools/tree/master/visualisations Bjoern's other graph visualizations]
Students: 3-4
 
  +
Mentors: PP2_CS_2014 mentors, Yana Bromberg (Rutgers University), Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br> <br>
  +
Students: 3-4 (ahsan ziaullah/vasantha kumari kommanapalli/Anuradha Ganapathi Rathnachalam)
   
 
[[File:Netscape tcv call.jpg | 300px | center | Graph]]
 
[[File:Netscape tcv call.jpg | 300px | center | Graph]]
Line 82: Line 98:
 
==HSSP curve==
 
==HSSP curve==
   
The HHSP curve at a threshold of interest (HSSP value=0 is default) must be visualized in a 2D graph. Additionally, alignments of protein sequences, provided by the user, must be plotted on the graph.
+
The HSSP curve at a threshold of interest (HSSP value=0 is default) must be visualized in a 2D graph. Additionally, alignments of protein sequences, provided by the user, must be plotted on the graph.
   
 
Literature:
 
Literature:
Line 89: Line 105:
   
 
Mentors: PP2_CS_2014 mentors <br>
 
Mentors: PP2_CS_2014 mentors <br>
Students: 2
+
Students: 2 - Agon Lohaj, Bardh Lohaj
   
 
[[File:Gkg62001.jpg | 300px | center | HSSP curve]]
 
[[File:Gkg62001.jpg | 300px | center | HSSP curve]]
 
==Graphical Model Editor==
 
 
@Juanmi, can you please add a description here? Thanks :)
 
   
 
==2D Chemical Components Visualizer==
 
==2D Chemical Components Visualizer==
 
The goal is to automatically create 2D diagrams of chemical complexes with known 3D structure according to chemical drawing conventions.
 
The goal is to automatically create 2D diagrams of chemical complexes with known 3D structure according to chemical drawing conventions.
   
  +
Similar projects:
Similar project: [http://poseview.zbh.uni-hamburg.de/ Poseview] <br>
 
  +
*[http://poseview.zbh.uni-hamburg.de/ Poseview]
Mentor: Julian Heinrich (CSIRO) julian.heinrich. (at) .csiro.au
 
  +
* [https://github.com/patrickfuller/imolecule iMolecule]
  +
* [http://www.elncloud.com/jsdrawapp/jsdraw/plugin.htm JSDraw]
  +
  +
Mentors: Julian Heinrich (CSIRO) julian.heinrich. (at) .csiro.au, Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de<br>
  +
Students: 2-3 -
   
 
[[File:3erd.png | 300px | center | Poseview]]
 
[[File:3erd.png | 300px | center | Poseview]]
   
 
==Genome Browser==
 
==Genome Browser==
  +
@Miguel, Manny: can you please add description here?
 
  +
'''(Challenging!)'''
  +
  +
We would like to have an integration of several views: Genome view, that includes all chromosomes, chromosome view (just one chromosome) and zoom view.
  +
  +
[[File:Journal.pone.0026345.g001.png | 600px | center]]
  +
In this view different features are displayed for several people. Each person is a track. Clicking on a feature releases a pop up window with more info.
  +
  +
Existing projects (in JS):
  +
  +
* [http://www.biodalliance.org/ Biodalliance]
  +
* [http://jbrowse.org/ JBrowse]
   
 
Relevant sources:
 
Relevant sources:
Line 112: Line 140:
 
* [http://genome.ucsc.edu/ UCSC genome browser]
 
* [http://genome.ucsc.edu/ UCSC genome browser]
   
  +
Mentor: Miguel Pignatelli (EMBL-EBI) <emepyc (at) gmail.com>, Manuel Corpas (TGAC) mc (at) manuelcorpas.com<br>
==BigWig and BigBed File Viewers==
 
  +
Students: 3-4
   
  +
Formats: [http://gmod.org/wiki/GFF3 GFF], [http://genome.ucsc.edu/FAQ/FAQformat.html BED], bigBED, (GTrack, MAF, WIG)
The idea came from Saket, but Ricardo might be working on it already. Wrote these guys on email/Skype and awaiting reply.
 
   
 
== Visualization of iAnn events ==
 
== Visualization of iAnn events ==
Line 128: Line 157:
   
 
Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com <br>
 
Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com <br>
Students: 2-3
+
Students: 2-3 (Akshit Malhotra, Vinod Rajendran, Irman Abdic)
   
 
[[File:iAnn.jpg | 300px | center | Poseview]]
 
[[File:iAnn.jpg | 300px | center | Poseview]]
  +
  +
== Microarrays ==
  +
  +
Microarray is a hybridization of a sample (target) to a very large set of probes, which are attached to a solid support. It is used a high-throughput method used to track the interactions and activities of the sample (e.g. proteins) and to determine their function. The chip consists of a support surface such as a glass slide, nitrocellulose membrane, bead, or microtitre plate, to which an array of capture is bound. Any reaction between the probe and the immobilised sample emits a fluorescent signal that is read by a laser scanner.
  +
Common examples are either Protein microarrays (to detect the expression of Proteins) or DNA microarrays (to determine sequence or to detect variations in a gene sequence or expression or for gene mapping).
  +
Typical formats are [http://www.mged.org/Workgroups/MAGE/mage.html Mage-ML] or Mage-Tab.
  +
  +
The goal of this project is to develop an interactive visualization of a microarray.
  +
  +
Mentor: PP2_Mentors <br>
  +
Students: 2-3 (Matheus Raszl, Diana Papyan)
  +
  +
[[File:Microarray.jpg | 600px | center | Poseview]]
  +
  +
Sources:
  +
  +
* [http://www.genome.gov/10000533 NHGRI]
  +
* [http://www.ncbi.nlm.nih.gov/probe/docs/techmicroarray/ NCBI Resources]
  +
  +
==Parser for GenBank format and visualization of annotations==
  +
  +
[http://www.ncbi.nlm.nih.gov/genbank Genbank] is a Standard format for exchanging annotated sequence. Any bioinformatics library should be able to parse annotated sequence in Genbank format or generate Genbank file from annotated sequence. Genbank format is well documented: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html <br>
  +
  +
It would be possible to use Genbank parser from Bio-projects like BioJava and BioPerl as a starting point. Parser that can highlight or extract annotated features will be very usefully to people developing web app for sequence visualization.
  +
  +
To get an idea what is expected from a project like this, take a look at this sequence: http://www.ncbi.nlm.nih.gov/nuccore/Z26331.1 and see what happens when you click on annotated features, like CDS, TATA_signal.
  +
  +
Mentors: Khalil El Mazouari khalil.elmazouari (at) gmail.com<br>
  +
Students: Carlo Di Domenico, Enrico Gigantiello, Sai Kiran Krishna Murthy
  +
  +
[[File:Genbank.png | 600px | center]]
  +
  +
==Graphical Model Editor==
  +
  +
  +
Editor of probabilistic/statistical graphical models. Examples: hidden markov models (HMMs), conditional random fields (CRFs), or weighted finite state transducers.
  +
  +
The goal is to construct a general editor to be able to draw:
  +
  +
# Nodes
  +
# Nodes properties, like start node and end node
  +
# Dependencies/edges between nodes (directed and undirected)
  +
# Allow cyclical connections
  +
# Set values and probabilities for nodes
  +
# Set values and probabilities for edges
  +
# Automatically/algorithmically draw the graph so that it's best understood (good use of canvas space, no overlapping of edges, ...)
  +
# Export the graphs to images
  +
# ...
  +
  +
Getting this would be an excellent achievement. After that, the project could continue by trying to integrate the component into actual machine learning frameworks. For example, a first step would be to export the graph's data to some frameworks' internal representation.
  +
  +
* Mentors: Juan Miguel Cejuela (juanmi (at) tagtog.net) and PP2_CS_2014 mentors
  +
* Students: 2
  +
  +
[[File:Graphical_models.jpg | center]]
  +
  +
==A Splice Junction Viewer==
  +
  +
[[File:Screen Shot 2014-11-11 at 20.54.14.png | 600px | center]]
  +
  +
BAM files are next generation sequencing alignments of reads in compressed format. As part of the BioJS Google Summer of Code we developed a BAMviewer whose objective is to visualise these files in raw format as seen above.
  +
  +
[http://biojs-samviewer.blogspot.co.uk/]
  +
  +
We would like to know be able to take the information contained in the BAM files and develop a transcriptome assembly viewer. BAM files may contain only those bits of DNA that are transcribed to RNA. This is what we call the transcriptome, as opposed to genome. When reads in a BAM file come from transcribed bits of DNA one can assemble them like a puzzle. This transcriptome assembly is a crucial tool to understand the internal structure of how genes are organised and reveal biologically meaningful features that have been related to disease.
  +
  +
Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com <br>
  +
Students: 2-3
  +
  +
[[File:Screen Shot 2014-11-11 at 21.05.35.png | 300px | center]]
   
 
==Visualization of events on the GOBLET platform==
 
==Visualization of events on the GOBLET platform==
Line 138: Line 237:
 
* http://mygoblet.org/training-portal <br>
 
* http://mygoblet.org/training-portal <br>
 
* http://mygoblet.org/
 
* http://mygoblet.org/
  +
* [http://www.ncbi.nlm.nih.gov/pubmed/?term=The+Sanger+FASTQ+file+format+for+sequences+with+quality+scores%2C+and+the+Solexa%2FIllumina+FASTQ+variants Cock PJS, Rise P et al. (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. NAR. ]
  +
* [http://maq.sourceforge.net/fastq.shtml http://maq.sourceforge.net/fastq.shtml]
 
Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com <br>
 
Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com <br>
Students: 2-3
+
Students:
   
  +
[[File:F1.large.jpg | 600px | center]]
==Visualization of FastQ formats==
 
  +
FASTQis a common file format storing sequencing read data together with its associated per base quality score. The objective of this project to visualize file in the fastQ format in an attractive ans easily interprtetable way. An example of a fastQ file is given below.
 
  +
== Web Components for Interactive Visualization of 1Kite Data ==
  +
  +
'''(Challenging!)'''
  +
  +
In this project, we are developing several interactive statistic plots and reuse many BioJS components for visualizing alignment data. <br>
  +
The idea is to implement and connect several interactive components which allows to easily dive in and interact with data.
  +
  +
* Line Chart
  +
* Bar Charts
  +
* Phylogenetic Tree Viewer
  +
* MSA Viewer
  +
* Plots for explorative data analysis (Box Diagram, Heat Maps..)
  +
* Easy to use Web Interface for the User
  +
  +
Our goal is to create a front-end for a webservice, which allows to visualize alignment data in a user-friendly way. <br>
  +
A successful outcome of this project will be used for visualizing [http://www.sciencemag.org/content/346/6210/763 1Kite Data] (Cover of Science Mag 7.11)
  +
  +
[[File:Science.gif | 300px | center]]
   
 
Sources:
 
Sources:
  +
* [http://biojs.net BioJS]
* [http://www.ncbi.nlm.nih.gov/pubmed/?term=The+Sanger+FASTQ+file+format+for+sequences+with+quality+scores%2C+and+the+Solexa%2FIllumina+FASTQ+variants Cock PJS, Rise P et al. (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. NAR. ]
 
* [http://maq.sourceforge.net/fastq.shtml http://maq.sourceforge.net/fastq.shtml]
+
* [http://dc-js.github.io/dc.js/ DC.js]
Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com <br>
 
Students: 2
 
   
[[File:fastq.png | 150px | center | Poseview]]
 
   
  +
* Mentors: PP2_CS_2014 mentors
  +
* Students (>= 2): David Dao, Iris Shih, Ying Li<br>
  +
'''(Interested Students please write a mail to "[[contact.daviddao at gmail.com]]")'''
   
  +
Site: [[1 Kite Dashboard]]
==Parser for GenBank format and visualisation of annotations==
 

Latest revision as of 23:03, 28 January 2015

Venn Diagram Viewer

Venn diagrams present a very popular method to display list comparisons. [Jvenn] is an interactive Venn diagram viewer written in JavaScript. The objective of this project would be to use the code base of Jvenn to make it compatible with BioJS2.0.
Literature: jvenn: an interactive Venn diagram viewer
Mentors: PP2_CS_2014 mentors
Students: Habtom Kahsay Gidey, Mohamed Ahmed

Jvenn example

Gene Cluster Viewer

The viewer is supposed to show the conserved gene order in prokaryotic genomes. The data will be derived from GenBank.

Source: Example for visualization
Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de
Students: 2

Gene cluster

BLAST visualization

BLAST finds regions of local similarity between sequences. It allows to search for genes, proteins and genome segments in databases like Uniprot or Genbank without the need to have an overlap or match in the database (in fact PSI-BLAST can find orthologs with even less than 30% sequence similarity). It is the best known algorithm in bioinformatics with more than 105k citations. The aim of this project to develop an interactive visualization for the result of BLAST - a component that in the end could be used by Uniprot

Mentors: Xavier Watkins xwatkins (at) ebi (dot) ac (dot) uk
Students: 2 (Sebastian Wilzbach seb (at) wilzba (dot) ch ), Homa Rasouli

Kablammo.png
BLAST result overview.png
BLAST outputbox.png

There is already a BioJS parser for the BLAST XML output.

Dot-Bracket Notation 1

RNA secondary structure is often defined using Dot-Bracket Notation (DBN). Valid structures in DBN format are well-parenthesized words consisting of dots '.', opening '(' and closing ')' parentheses. Dotted positions are unpaired, whereas matching parenthesized positions represent base-pairing nucleotides. As the number of nucleotides interacting is always even (everyone must have a parter), the brackets must be balanced. Source: [Wikipedia: http://ultrastudio.org/en/Dot-Bracket_Notation]

Sources:

Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de
Students: 2 Sven Punga, Benedikt Rauscher

RNA

Dot-Bracket Notation 2

This project deals with a slightly different representation of the Dot-Bracket Notation.

Sources:

Mentors: Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de
Students: 2

RNA

Pedigree Chart Visualization

A pedigree chart is a simple and easy to read diagram showing the occurrence and appearance or phenotypes of a particular gene in an organism and its ancestors. Pedigrees use a standardized set of symbols:

  • squares: males
  • circles: females
  • diamonds: the sex of the person is unknown
  • filled-in (darker) symbol: someone with the phenotype in question
  • shaded or half-filled symbol: heterozygotes
  • horizontal and a vertical line: connects parents to their offspring
  • ....

Literature:

Mentors: PP2_CS_2014 mentors
Students: 2 (Still free!)

Pedigree chart

Sub-cellular localization in a cell

Archaea, Bacteria and Eukaryota form the three domains of life. Eukaryotic cells contain a nucleus and other membrane-bound organelles. The cells of archaea and bacteria in contrast are formed by a single compartment that is surrounded by the plasma membrane (Gram-negative bacteria have an additional outer membrane). The objective of this project is to visualize biological cells and highlight by a user selected sub-cellular compartments in a way that they stand out from the un-selected ones. Similar idea: The Compartments database
Mentors: PP2_CS_2014 mentors
Students: 2 (Maribel, Madhura,Prapaporn)

Pedigree chart

Force directed network (spring algorithm), Graph Viewer

(Challenging!)

The objective of this project is to visualize a network (large networks of >2000 nodes) in a way that the distance of a node from the rest of the network is determined by the number of nodes it is connected to => the more neighbors a node has the larger is its distance from the network. The component must allow zooming in/out, selection by the number of neighbors, coloring by various thresholds and other graph-related features.

Relevant sources:

Mentors: PP2_CS_2014 mentors, Yana Bromberg (Rutgers University), Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de

Students: 3-4 (ahsan ziaullah/vasantha kumari kommanapalli/Anuradha Ganapathi Rathnachalam)

Graph

HSSP curve

The HSSP curve at a threshold of interest (HSSP value=0 is default) must be visualized in a 2D graph. Additionally, alignments of protein sequences, provided by the user, must be plotted on the graph.

Literature:

Mentors: PP2_CS_2014 mentors
Students: 2 - Agon Lohaj, Bardh Lohaj

HSSP curve

2D Chemical Components Visualizer

The goal is to automatically create 2D diagrams of chemical complexes with known 3D structure according to chemical drawing conventions.

Similar projects:

Mentors: Julian Heinrich (CSIRO) julian.heinrich. (at) .csiro.au, Björn Grüning (Galaxy) gruening. (at) .informatik.uni-freiburg.de
Students: 2-3 -

Poseview

Genome Browser

(Challenging!)

We would like to have an integration of several views: Genome view, that includes all chromosomes, chromosome view (just one chromosome) and zoom view.

Journal.pone.0026345.g001.png

In this view different features are displayed for several people. Each person is a track. Clicking on a feature releases a pop up window with more info.

Existing projects (in JS):

Relevant sources:

Mentor: Miguel Pignatelli (EMBL-EBI) <emepyc (at) gmail.com>, Manuel Corpas (TGAC) mc (at) manuelcorpas.com
Students: 3-4

Formats: GFF, BED, bigBED, (GTrack, MAF, WIG)

Visualization of iAnn events

The iAnn calendar is one of the most used tools to annotate and curate scientific announcements. The idea of this project is to visualize iAnn announcements in the following ways:

  • as an interactive map
  • a table
  • and e.g. a pie chart or histograms showing statistics by various keywords (dates, country, field, etc.)

Relevant sources:

Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com
Students: 2-3 (Akshit Malhotra, Vinod Rajendran, Irman Abdic)

Poseview

Microarrays

Microarray is a hybridization of a sample (target) to a very large set of probes, which are attached to a solid support. It is used a high-throughput method used to track the interactions and activities of the sample (e.g. proteins) and to determine their function. The chip consists of a support surface such as a glass slide, nitrocellulose membrane, bead, or microtitre plate, to which an array of capture is bound. Any reaction between the probe and the immobilised sample emits a fluorescent signal that is read by a laser scanner. Common examples are either Protein microarrays (to detect the expression of Proteins) or DNA microarrays (to determine sequence or to detect variations in a gene sequence or expression or for gene mapping). Typical formats are Mage-ML or Mage-Tab.

The goal of this project is to develop an interactive visualization of a microarray.

Mentor: PP2_Mentors
Students: 2-3 (Matheus Raszl, Diana Papyan)

Poseview

Sources:

Parser for GenBank format and visualization of annotations

Genbank is a Standard format for exchanging annotated sequence. Any bioinformatics library should be able to parse annotated sequence in Genbank format or generate Genbank file from annotated sequence. Genbank format is well documented: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

It would be possible to use Genbank parser from Bio-projects like BioJava and BioPerl as a starting point. Parser that can highlight or extract annotated features will be very usefully to people developing web app for sequence visualization.

To get an idea what is expected from a project like this, take a look at this sequence: http://www.ncbi.nlm.nih.gov/nuccore/Z26331.1 and see what happens when you click on annotated features, like CDS, TATA_signal.

Mentors: Khalil El Mazouari khalil.elmazouari (at) gmail.com
Students: Carlo Di Domenico, Enrico Gigantiello, Sai Kiran Krishna Murthy

Genbank.png

Graphical Model Editor

Editor of probabilistic/statistical graphical models. Examples: hidden markov models (HMMs), conditional random fields (CRFs), or weighted finite state transducers.

The goal is to construct a general editor to be able to draw:

  1. Nodes
  2. Nodes properties, like start node and end node
  3. Dependencies/edges between nodes (directed and undirected)
  4. Allow cyclical connections
  5. Set values and probabilities for nodes
  6. Set values and probabilities for edges
  7. Automatically/algorithmically draw the graph so that it's best understood (good use of canvas space, no overlapping of edges, ...)
  8. Export the graphs to images
  9. ...

Getting this would be an excellent achievement. After that, the project could continue by trying to integrate the component into actual machine learning frameworks. For example, a first step would be to export the graph's data to some frameworks' internal representation.

  • Mentors: Juan Miguel Cejuela (juanmi (at) tagtog.net) and PP2_CS_2014 mentors
  • Students: 2
Graphical models.jpg

A Splice Junction Viewer

Screen Shot 2014-11-11 at 20.54.14.png

BAM files are next generation sequencing alignments of reads in compressed format. As part of the BioJS Google Summer of Code we developed a BAMviewer whose objective is to visualise these files in raw format as seen above.

[1]

We would like to know be able to take the information contained in the BAM files and develop a transcriptome assembly viewer. BAM files may contain only those bits of DNA that are transcribed to RNA. This is what we call the transcriptome, as opposed to genome. When reads in a BAM file come from transcribed bits of DNA one can assemble them like a puzzle. This transcriptome assembly is a crucial tool to understand the internal structure of how genes are organised and reveal biologically meaningful features that have been related to disease.

Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com
Students: 2-3

Screen Shot 2014-11-11 at 21.05.35.png

Visualization of events on the GOBLET platform

Similar idea as for iAnn events -> visualization of events based on keywords

Sources:

Mentor: Manuel Corpas (TGAC) mc. (at) .manuelcorpas.com
Students:

F1.large.jpg

Web Components for Interactive Visualization of 1Kite Data

(Challenging!)

In this project, we are developing several interactive statistic plots and reuse many BioJS components for visualizing alignment data.
The idea is to implement and connect several interactive components which allows to easily dive in and interact with data.

  • Line Chart
  • Bar Charts
  • Phylogenetic Tree Viewer
  • MSA Viewer
  • Plots for explorative data analysis (Box Diagram, Heat Maps..)
  • Easy to use Web Interface for the User

Our goal is to create a front-end for a webservice, which allows to visualize alignment data in a user-friendly way.
A successful outcome of this project will be used for visualizing 1Kite Data (Cover of Science Mag 7.11)

Science.gif

Sources:


  • Mentors: PP2_CS_2014 mentors
  • Students (>= 2): David Dao, Iris Shih, Ying Li

(Interested Students please write a mail to "contact.daviddao at gmail.com")

Site: 1 Kite Dashboard