WP4: Data services and reporting standards

Size: px
Start display at page:

Download "WP4: Data services and reporting standards"

Transcription

1 WP4: Data services and reporting standards WP leader Guy Cochrane EMBL-EBI WP co-leader Nils-Peder Willassen UiT Task 4.1 leader Jeena Rajan EMBL-EBI Task 4.2 leader Jeena Rajan EMBL-EBI Task 4.3 leader Petra ten Hoopen/Jeena Rajan EMBL-EBI Various contributions various CABI, HCMR, CNRS/IFB,CNR, SAMS, USTAN EMBRIC General Assembly Faro, 1 September 2017

2 WP4 goals To implement sustainable data management services for the marine science community with a focus on facilitating applied biodiscovery research: Construction of a service unit ( configurator ) to provide structured bioinformatics consultancy within EMBRIC Establishment of a data warehouse for the EMBRIC cluster in order to accelerate and enhance the utilization of resources for natural product discovery Further development of existing marine data standards 2

3 WP4 goals To implement sustainable data management services for the marine science community with a focus on facilitating applied biodiscovery research: Construction of a service unit ( configurator ) to provide structured bioinformatics consultancy within EMBRIC Establishment of a data warehouse for the EMBRIC cluster in order to accelerate and enhance the utilization of resources for natural product discovery Further development of existing marine data standards 3

4 WP4 tasks, deliverables and workshops Tasks 4.1 EMBRIC Configurator ongoing 4.2 Natural product central reporting system ongoing 4.3 Marine data standards ongoing Deliverables D 4.1 Data standards available for the EMBRIC community M18 D 4.2 Configurator service available for the EMBRIC community M18 D4.3 Release of pilot version of data warehouse M24 D 4.4 Report on cloud technology M45 Workshops EMBRIC case studies requirements workshop EMBL-EBI M9 Liaison workshop with Aquaexcel EMBL-EBI M21 EMBRIC chemical biology workshop EMBL-EBI M21 Deviation of resources: none 2

5 Configurator (Task 4.1) From: scale and nature of data generation analysis needs long-term archiving needs metadata requirements publication requirements To: processing/analytical workflows data storage data transfer strategy network requirements data coordination solutions curation analysis expertise sustainability plans and costs 2

6 Configurator (task 4.1) 17 registered consultants 5 ongoing cases Call for: More consultants More cases embric-configurator@ebi.ac.uk 6 goo.gl/u4hejo

7 Configurator: example I data produced genome assembly transcriptome gene expression data

8

9 Configurator: example II Feed efficiency in Sea Bass (Aquaexcel) Study outline Scale A pilot study on Sea Bass divergent isogenic lines for fasting tolerance. This pilot study focuses on 600 fish from two divergent isogenic fish lines (high feeding efficiency and low feeding efficiency). For each of these 600 fish the following data is being generated: phenotypic data (sex, growth indexes, fasting tolerance and risktaking behaviour for example), raw genomic data (Illumina 3KSNPchip and potentially RNAseq data for 54 samples), and analysed genomic data (genetic map, QTL mapping, RNA differential expression data). The scientific question behind this study is the following: can we evaluate individual feed efficiency of fish? and can we find genomic clues of evaluation of this phenotype for leading genomic selection of feed efficiency in breeding programs. In this experiment a minimum of 600 and maximum of 1000 samples will be evaluated for individual feed efficiency in 200 individual aquaria, then the effect of individual feed efficiency is cross-validated by group feed efficiency of contrasted groups of 50 fish in 4 replicates through 4 periods of 3 weeks of evaluation The data for this study will be submitted by 2 to 4 different people and at least 8 people will use and need to have access to the data produced by this study. 9

10 Chemicals warehouse: molecules strains-genes collections (Task 4.2) Other data? Biological resources Literature Natural products Genes, gene clusters, pathways, regulation Chemicals: ChEBI Profiles: Metabolights Assays: ChEMBL Genes: ENA Proteins: UniProt Links to collections: ENA Data warehouse: ftp://ftp.ebi.ac.uk/pub/databases/ena/collaboration/embricdb_v1.tar.gz Technical interface: multiple relational database clients (SQLite format)

11 11

12 Chemicals warehouse: ongoing and future work Interface Roadmap for interface development Use cases and examples from WP2, WP7 and elsewhere, Oct (Ian Probert) Hackathon to explore use cases and define interface requirements, Jan Deeper exploration of use cases through to impact Interface options Underlying Data Resources Marine Metagenomics Portal (MMP at University of Tromsø offering an online service. BacDive at Leibniz-DSMZ Content Increase mappings within the current databases Expand the number of databases that can be mapped Further explore the use of strain data to link to culture collections and to the literature to facilitate data integration across small molecules, culture collections and genomics databases.

13 Marine data standards (task 4.3) Partners involved: Ian Johnston (USTAN, UK) Alicia Bertolotti (USTAN, UK) David Smith (CABI, UK) Mariella Ferrante (SZN, IT) Torsten Meiners (FMP, GER) Martin Neuenschwander (FMP, GER) Mark Hoebeke (SB-ROSCOFF, FR) Reza Salek (EMBL-EBI, UK) Nils Peder Willassen (UiT, NO) Guy Cochrane EBI (EMBL-EBI, UK) Petra ten Hoopen (EMBL-EBI, UK) Objectives and results: Contextual data checklist for a molecular sample from strains in collections mapping between EMbaRC, GSC MIxS and Micro-B3 M2B3 Contextual data checklist for a molecular sample from shellfish informed by GSC MIxS, Micro-B3 M2B3 and inproject experience Finfish contextual data mature trait ontologies, e.g. ATOL, EOL, but culturerelated challenges CORBEL: Amphioxus, sea urchin and clytia anatomy and developmental stage ontologies EXCELERATE: Marine data standards relating to metagenomics and barcoding Future standards development in response to needs

14 Training (with WP9) Application within EMBRIC with WP9 to run a genomicscentric training course Future application to run a metabolomics data submission and access course with Metabolights data resource Other needs? 14

15 People Case consultants 17 people Standards development Ian Johnston (USTAN, UK), Alicia Bertolotti (USTAN, UK), David Smith (CABI, UK), Mariella Ferrante (SZN, IT), Torsten Meiners (FMP, GER), Martin Neuenschwander (FMP, GER), Mark Hoebeke (SB-ROSCOFF, FR), Reza Salek (EMBL-EBI, UK), Nils Peder Willassen (UiT, NO), Guy Cochrane EBI (EMBL-EBI, UK), Petra ten Hoopen (EMBL-EBI, UK) Training Nicole Silvester (EMBL-EBI), Jeena Rajan (EMBL-EBI), Nils Peder Willassen (UiT), Thibaud Mascart (UGent) Core team Jeena Rajan (EMBL-EBI), Isabel Santos Magalhaes (EMBL-EBI), Petra ten Hoopen (EMBL- EBI), Nicole Silvester (EMBL-EBI), Guy Cochrane (EMBL-EBI), Nils Peder Willassen (UiT) Recruitment at EMBL-EBI: Other WP4 participants Georgios Kotoulas (HCMR), John Day (SAMS), Claire Gachon (SAMS), Bruno Fosso (UB), Graziano Pesole (UB), Christophe Blanchet (CNRS), Jean-François Gibrat (CNRS), Ian Probert (UPMC) 15

16 What we need Configurator cases and promotion Configurator case consultants Use cases for chemicals warehouse (via Ian Probert/WP2) Training participation and promotion 16