Difference between revisions of "AllBio Forum"

From Bioinformatikpedia
(September 29, 2013)
 
(6 intermediate revisions by the same user not shown)
Line 3: Line 3:
   
 
== What's new ==
 
== What's new ==
  +
* Metastudent results are ready
  +
* OrthoMCL results are ready
 
* '''new potato ITAG available for download in the [[#section Download|Download section]]'''
 
* '''new potato ITAG available for download in the [[#section Download|Download section]]'''
 
* New TODO list [Sept 29, 2013]
 
* New TODO list [Sept 29, 2013]
Line 11: Line 13:
   
 
== Annotation Pipelines ==
 
== Annotation Pipelines ==
  +
* Metastudent (Tanya) output: done | parsing : done.
 
 
* Trinotate (Agnieszka + Didi+ Oren) output: done | parsing: done.
 
* Trinotate (Agnieszka + Didi+ Oren) output: done | parsing: done.
 
* Blast2GO (Estelle) output: done | parsing: done.
 
* Blast2GO (Estelle) output: done | parsing: done.
Line 31: Line 33:
 
Result files are placed on VM in ''/virdir/Scratch/Oren/''
 
Result files are placed on VM in ''/virdir/Scratch/Oren/''
   
* NEW_ORTHOMCL/ORTHOMCL_GO_PGSC.txt - GO terms for OrthoMCL results for PGSC
+
* NEW_ORTHOMCL/ORTHOMCL_GO_PGSC.txt - OrthoMCL's GO terms for PGSC
* NEW_ORTHOMCL/ORTHOMCL_GO_ITAG.txt - GO terms for OrthoMCL results for ITAG
+
* NEW_ORTHOMCL/ORTHOMCL_GO_ITAG.txt - OrthoMCL's GO terms for ITAG
  +
* FLOWS_OUTPUTS/METASTUDENT_GO_PGSC.txt - Metastudent's GO terms for PGSC
  +
* FLOWS_OUTPUTS/METASTUDENT_GO_ITAG.txt - Metastudent's GO terms for ITAG
  +
* KO2GO - 8 files (2 ITAG "goldpairs", 2 PGSC "goldpairs", 2 ITAG "all", 2 PGSC "all" - see README file in same folder)
   
   
Line 38: Line 43:
   
 
===September 29, 2013===
 
===September 29, 2013===
* Get GO for another KAAS output of PGSC and for 2 KAAS outputs for ITAG ['''Estelle''']
+
* Get GO terms from the Metastudent pipeline [Tanya: '''DONE''']
  +
* Get GO for another KAAS output of PGSC and for 2 KAAS outputs for ITAG [Estelle, Tanya:'''DONE''']
 
* Finish running OrthoMCL for 52 species -> For 52 species OrthoMCL crashed, so we took 18 representative species instead [Estelle, Tanya: '''DONE''']
 
* Finish running OrthoMCL for 52 species -> For 52 species OrthoMCL crashed, so we took 18 representative species instead [Estelle, Tanya: '''DONE''']
 
* OrthoMCL 18 species flow output – calculate all measurement as above. ['''Estelle''', '''Agnieszka''', '''Didi''', '''Tanya''' and '''Oren'''].
 
* OrthoMCL 18 species flow output – calculate all measurement as above. ['''Estelle''', '''Agnieszka''', '''Didi''', '''Tanya''' and '''Oren'''].
 
* Calculate precision and recall to SG and the PotatoCyc output gene lists (after mapping from KEGG to GO IDs ['''Oren''', '''Agnieszka''', '''Didi'''].
 
* Calculate precision and recall to SG and the PotatoCyc output gene lists (after mapping from KEGG to GO IDs ['''Oren''', '''Agnieszka''', '''Didi'''].
* Assigning weights done by Erik and Sanjeev on PGSC to combined results from ITAG (based on corresponding pairs of GO and PGSC_id), so they don't have to do it again manually. ['''Agnieszka'''].
+
* Assigning weights done by Erik and Sanjeev on PGSC to combined results from ITAG (based on corresponding pairs of GO and PGSC_id), so they don't have to do it again manually. [Agnieszka: '''DONE'''].
* Searching for the 'SanjeevGold' gene IDs in all outputs separately and collecting some statistics and comparisons. ['''Agnieszka'''].
+
* Searching for the 'SanjeevGold' gene IDs in all outputs separately and collecting some statistics and comparisons. [Agnieszka: '''DONE'''].
 
* Run all expression based analyses on 3 GE data sets: a) all tisues; b) tubers; c) leaves.['''Oren''', '''Didi''' and '''Itziar'''].
 
* Run all expression based analyses on 3 GE data sets: a) all tisues; b) tubers; c) leaves.['''Oren''', '''Didi''' and '''Itziar'''].
 
* Run co-expression simulations
 
* Run co-expression simulations
Line 49: Line 55:
 
* Plot GO similarity analysis [semsim package – '''Didi'''].
 
* Plot GO similarity analysis [semsim package – '''Didi'''].
 
* Get input from Kate and write discussion.
 
* Get input from Kate and write discussion.
* Inspect manually the GO prediction made by flows for the carotenoid pathway genes. ['''Oren'''].
+
* Inspect manually the GO prediction made by flows for the carotenoid pathway genes. [Oren: '''DONE'''].
 
* Make annotations available for biologists to use. Communicate with Lukas Mueller involved with Solgenomics site to do this ['''Oren''' and '''Erik''']
 
* Make annotations available for biologists to use. Communicate with Lukas Mueller involved with Solgenomics site to do this ['''Oren''' and '''Erik''']
 
* reults and scirpts QA ['''All''']
 
* reults and scirpts QA ['''All''']
 
* Documentation of scripts and writing MS ['''All''']
 
* Documentation of scripts and writing MS ['''All''']
 
   
 
==Download==
 
==Download==

Latest revision as of 17:08, 29 November 2013

This wiki was created after the first Hack-a-Thon Session for Test Cases #9 and #12, held in Amsterdam, September 19-21, 2013. For general information about this Session please refer to the AllBio website. The discussion board will be used by all members of the Hack-a-thon team for a regular update on project's status.


What's new

  • Metastudent results are ready
  • OrthoMCL results are ready
  • new potato ITAG available for download in the Download section
  • New TODO list [Sept 29, 2013]
  • First Hack-a-thon protocol and result slides added to Dropbox
  • Photos added to Dropbox
  • Wiki is up! [Sept 23, 2013]


Annotation Pipelines

  • Metastudent (Tanya) output: done | parsing : done.
  • Trinotate (Agnieszka + Didi+ Oren) output: done | parsing: done.
  • Blast2GO (Estelle) output: done | parsing: done.
  • PotatoCyc (Kate) output: done | parsing: MISSING??.
  • OrthoMCL (Tanya + Estelle): output: for 52 proteomes -> OrthoMCL crushes for 52 proteomes, therefore we took 18 instead: done | parsing: done.
  • Phytozome (Itziar) output: done| parsing: done.
  • KEGG (Itziar contacted the KEGG people?) output: done?? | parsing: done??.


Protocol & Result Files

All the power point point presentations, protocols, etc. can be found in ../Dropbox/Hack-a-thon/protocols_results/.
Below is their short description.

  • HackathonReport_09292013.docx - Hack-a-thon final report, Amsterdam Sept. 19-21, 2013
  • workflow_09292013.pptx - annotation workflow and preliminary results on every flow performance & gene expression data
  • 4Greg_Hack-a-thon_Amsterdam_0913.pptx

Result files are placed on VM in /virdir/Scratch/Oren/

  • NEW_ORTHOMCL/ORTHOMCL_GO_PGSC.txt - OrthoMCL's GO terms for PGSC
  • NEW_ORTHOMCL/ORTHOMCL_GO_ITAG.txt - OrthoMCL's GO terms for ITAG
  • FLOWS_OUTPUTS/METASTUDENT_GO_PGSC.txt - Metastudent's GO terms for PGSC
  • FLOWS_OUTPUTS/METASTUDENT_GO_ITAG.txt - Metastudent's GO terms for ITAG
  • KO2GO - 8 files (2 ITAG "goldpairs", 2 PGSC "goldpairs", 2 ITAG "all", 2 PGSC "all" - see README file in same folder)


TO DO Lists

September 29, 2013

  • Get GO terms from the Metastudent pipeline [Tanya: DONE]
  • Get GO for another KAAS output of PGSC and for 2 KAAS outputs for ITAG [Estelle, Tanya:DONE]
  • Finish running OrthoMCL for 52 species -> For 52 species OrthoMCL crashed, so we took 18 representative species instead [Estelle, Tanya: DONE]
  • OrthoMCL 18 species flow output – calculate all measurement as above. [Estelle, Agnieszka, Didi, Tanya and Oren].
  • Calculate precision and recall to SG and the PotatoCyc output gene lists (after mapping from KEGG to GO IDs [Oren, Agnieszka, Didi].
  • Assigning weights done by Erik and Sanjeev on PGSC to combined results from ITAG (based on corresponding pairs of GO and PGSC_id), so they don't have to do it again manually. [Agnieszka: DONE].
  • Searching for the 'SanjeevGold' gene IDs in all outputs separately and collecting some statistics and comparisons. [Agnieszka: DONE].
  • Run all expression based analyses on 3 GE data sets: a) all tisues; b) tubers; c) leaves.[Oren, Didi and Itziar].
  • Run co-expression simulations
  • Inspected manually modules that averaged r-value scores <0.9.These should help us to get an insights of GO term that highly validated in expression data sets.
  • Plot GO similarity analysis [semsim package – Didi].
  • Get input from Kate and write discussion.
  • Inspect manually the GO prediction made by flows for the carotenoid pathway genes. [Oren: DONE].
  • Make annotations available for biologists to use. Communicate with Lukas Mueller involved with Solgenomics site to do this [Oren and Erik]
  • reults and scirpts QA [All]
  • Documentation of scripts and writing MS [All]

Download

Potato itag potato dm v403 File:Potato itag potato dm v403.gff3.zip
(click on the file name to get to the download page, there click on the .zip file to initiate the download)


Links

The files related to the PGSC latest version (v4.03) of pseudomolecules is now available at: http://solanaceae.plantbiology.msu.edu/pgsc_download.shtml

The article describing the construction of these reference pseudomolecules is available at: http://www.g3journal.org/content/early/2013/09/18/g3.113.007153.full.pdf+html


Photos

New photos from the hack-a-thon in AMS are in ..\Dropbox\Hack-a-thon\photos-Amsterdam\


Contacts

For any question regarding this wiki page please contact Tatyana Goldberg