ELIXIR: The Federated data infrastructure for Europe s lifescience

Size: px
Start display at page:

Download "ELIXIR: The Federated data infrastructure for Europe s lifescience"

Transcription

1 ELIXIR: The Federated data infrastructure for Europe s lifescience research

2 A network of data Nodes ELIXIR Nodes are funded nationally de.nbi - The German Network for Bioinformatics Infrastructure de.nbi consortium ELIXIR Nodes build on national strengths and priorities 39 project partners 30 institutions 8 service centers designated national German node in ELIXIR ELIXIR Nodes provides a national framework for longterm resource management

3 ELIXIR Common Services our federated infrastructure platforms Compute: Secure data transfer, cloud computing, AAI Data deposition: ENA, EGA, PDBe, EuropePMC, Added value data resources: UniProt, Ensembl, OrphaNet, Bioinformatics tools: Bio.tools, Containers, Galaxy Data management: Genome annotation Data management plans Training: TeSS, Data Carpentry, elearning Data Interoperability: Standards,Identifiers, Ontologies

4 OPINION ARTICLE The future of metabolomics FOOD AND in HUMAN ELIXIR CELL [version 2; referees: NUTRITION ATLAS 2 approved, 1 approved with reservations] Merlijn van Rijswijk 1,2, Charlie Beirnaert 3, Christophe Caron HUMAN 4, Marta COPY Cascante NUMBER 5, Victoria Dominguez 4 VARIATION GALAXY, Warwick B. Dunn 6, Timothy M. D. Ebbels 7, Franck Giacomoni 8, Alejandra Gonzalez-Beltran 9, Thomas Hankemeier OPINION 2,10, Kenneth ARTICLEHaug 11, Jose L. Izquierdo-Garcia 12,13, Rafael C. Jimenez 14, Fabien Jourdan 15, Namrata Kale 11, Maria I. Klapa 16, Oliver Kohlbacher 17-19, Kairi Koort 20,21, Kim Kultima, Gildas Le Corguillé, Pablo Moreno, Juan Antonio Vizcaíno 1, Mathias Walzer 1, Rafael C. Jiménez 2, Nicholas K. Moschonas 16,24, Steffen Neumann 25, Claire O Donovan 11, MICROBIAL METABOLOMICS Wout Bittremieux 3, David BIOTECHNOLOGY Bouyssié 4, Christine Carapito 4, Fernando Corrales 5, Martin Reczko 26, Philippe Rocca-Serra 9, Antonio Rosato 27, Reza M. Salek 11, Susanna-Assunta Sansone 9 Myriam, Venkata Satagopam 28 Ferro 4, Albert J.R., Daniel Schober 25 Heck 6,7, Peter Horvatovich 8, Martin Hubalek 9,, Ruth Shimmo 20,21, Rachel A. Spicer 11 Lydie, Ola Spjuth 29 Lane 10,11, Kris Laukens 3, Etienne A. Thévenot 30, Fredrik Levander 12, Frederique Lisacek 13,14,, Mark R. Viant 6, Ralf J. M. Weber 6, Egon L. Willighagen Petr Novak 31 15, Gianluigi, Magnus Zanetti 32 Palmblad 16,, Damiano Piovesan 17, Alfred Pühler 18, Christoph Steinbeck 33 Veit Schwämmle 19, Dirk Valkenborg 20-22, Merlijn van Rijswijk 23,24, Jiri Vondrasek 9, 1ELIXIR-NL, Dutch Techcentre for Life Sciences, Utrecht, 3503 RM, Netherlands 2 PROTEOMICS 22 4,23 11 Netherlands Metabolomics Center, Leiden, 2333 CC, Netherlands F1000Research 2017, 6(ELIXIR):1649 Last updated: 08 NOV 2017 A community proposal to integrate proteomics activities in ELIXIR [version 1; referees: 2 approved] Martin Eisenacher 25, Lennart Martens 26,27, Oliver Kohlbacher European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, CB10 1SD, UK 2ELIXIR Hub, Cambridge, CB10 1SD, UK 3Department of Mathematics and Computer Science, University of Antwerp, Antwerp, 2020, Belgium 4French Proteomics Infrastructure ProFI, Grenoble, (EDyP U1038, CEA/Inserm/ Grenoble Alpes University) Toulouse (IPBS, Université de Toulouse, CNRS, UPS), Strasbourg (LSMBO, IPHC UMR7178, CNRS-Université de Strasbourg), France 5ProteoRed, Proteomics Unit, Centro Nacional de Biotecnología (CSIC), Madrid, 28049, Spain 6Biomolecular Mass Spectrometry and Proteomics, Bijvoet Centre for Biomolecular Research and Utrecht Institute for Pharmaceutical Sciences, University of Utrecht, Utrecht, 3548 CH, Netherlands 7Netherlands Proteomics Center, Utretcht, 3584 CH, Netherlands 8Analytical Biochemistry, Department of Pharmacy, University of Groningen, Groningen, 9713 AV, Netherlands 9Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague 1, , Czech Republic 10CALIPHO Group, SIB Swiss Institute of Bioinformatics, Geneva, 1015, Switzerland 11Department of Human Protein Science, Faculty of Medicine, University of Geneva, Geneva, 1205, Switzerland 12National Bioinformatics Infrastructure Sweden (NBIS), SciLifeLab, Department of Immunotechnology, Lund University, Lund, , Sweden 13Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1015, Switzerland 14Computer Science Department, University of Geneva, Geneva, 1205, Switzerland ELIXIR Structure Five Platforms for Compute, Data, Tools and Interoperability Complemented by Use Cases for Marine meta-genomics, Rare diseases, Human data, Plants sciences, proteomics, metabolomics and galaxy F1000Research 2017, 6:875 Last updated: 08 NOV 2017 Use cases under review: Microbial biotechnology Food and nutrition Human Cell Atlas Human copy number variation

5 Open data requires infrastructure

6 Open access life science data is intensively reused Biosimulation market worth $1bn/yr (2015)

7 What are ELIXIR Core Data Resources? A set of data resources that are of fundamental importance to the broad life science community and the long-term preservation of biological data They provide complete collections of generic value to life science, and show high levels of usage, scientific quality and service

8 ELIXIR Core Data Resources fundamentally important to lifescience research 16 Core Data Resourced Nominated ELIXIR is committed to Open Access as a core principle for publicly funded research. Discussions on-going with Nodes, Resources and funders on high-quality, non-open Access resources ELIXIR Core Data Resources should reflect this commitment and have terms of use or a license that enables the reuse and remixing of data. See Identifying ELIXIR Core Data Resources Agreed collectively by 21 Node directors

9 Large impact on science Citations of key papers (EuropePMC 2015) for ELIXIR Core Data Resources ELIXIR Core Data Resources over citations of key papers in Plus direct citations of data records and identifiers in scientific literature > articles w data citations (2014) > direct citations of accessions in full-text open access articles (2014) ELIXIR Data Platform metrics group are working on standard methodology

10 An infrastructure for bioeconomy innovation : patents* referred to bioinformatics repositories *Patterns of database citations in articles and patents indicate long-term scientific and industry value:

11 Towards a Global coalition to sustain Core Data Resources Call for Action published in Nature in March 2017 Full text of article available as pre-print in biorxiv June workshop in London with international funders Great interest in Core Data Resources (outcome and method) Outcomes taken into HIRO meeting following day Working Group established to take forward next steps

12 Changing landscape with many actors Highly distributed data-generating & monitoring Distributed analysis requires reference datasets (organized centrally, locally or in distributed networks) Manage Legal requirements in transnational settings D! National data centres N! International Resources A! Institutional data centres

13 ELIXIR Position Paper on FAIR data management in the life sciences 1. Open sharing of research data is a core principle 2. Data Management is crucial to science 3. Data should be submitted to deposition databases 4. All data submitted to Open Data archives should align with community-defined standards 5. ELIXIR Nodes implement FAIR for their respective nations 6. Professional skills, adequate resources and appropriate funding are needed for Data Management and infrastructure Blomberg N and ELIXIR Consortium. ELIXIR position paper on FAIR data management in the life sciences. F1000Research 2017, 6(ELIXIR):1857 (document) (doi: /f1000research )

14 Whenever possible, biological research data should be submitted to the recommended community deposition databases" The ELIXIR Deposition Databases meet the technical quality and governance criteria expected of ELIXIR Core Data Resources See Identifying ELIXIR Core Data Resources Agreed collectively by 21 Node directors International collaborative effort

15 All data submitted to Open Data archives must be annotated in accordance with community-defined standards

16 FAIR data management requires professional skills and adequate resources Bring your own data workshops Problem-centered workshops Integration experts - Data resources Users With national nodes or pan-european projects

17 ELIXIR Nodes are the national implementation of a harmonised FAIR Data Management programme for the life sciences

18 Findability How do you find a needle in a federated haystack?

19 Bioschemas schema.org markup for life sciences minimum properties needed for finding data Carole Goble, Alisdair Gray, ELIXIR-UK Rafael Jimenez ELIXIR Hub

20 Bioschemas.org A community initiative built on top of Schemas.org to improve Findability and Accessibility in Life Sciences Rapid markup Exposed to harvesting Find Standardised metadata Metadata publish and harvest without APIs or special feeds Major data resources Bioschemas Bioschemas Smaller datasets Feed bio registries and aggregators Registries Data Aggregators Search engines

21 Bioschemas progress Use case Gap analysis Spec Test Adoption Applications Data repositories Datasets Beacons Samples Protein annotations Biological Entity Event Training material Tools

22 Early adopters Google research blog: Facilitating the discovery of public datasets omicsdi

23 Research schemas as Emerging federation architecture in EOSC Life Earth... Data Dataset index Dataset index Dataset index PID PID PID Scientific File Scientific File Scientific File Common Access Common Access Common Access EOSC Catalogue Services Compute Storage Transfer

24 ELIXIR Compute Platform Targeting a seamless workflow: a researcher may use their electronic identity to securely create a scientific software analysis environment, and use the environment to access large biological data resources stored on a cloud.

25 ELIXIR Authentication and Authorization Infrastructure AAI Reliable electronic identification of users (ELIXIR ID) is needed to access the key services and capacities of ELIXIR. You can link existing user accounts to create your ELIXIR ID today at ELIXIR AAI allows Users to continue using their federated academic, corporate or social media identity by linking it to a personal ELIXIR ID. The ELIXIR service providers connected to ELIXIR AAI benefit from a centralised user identity and access management services. Protocols SAML2, OpenIDConnect. o o o o 359 Home Organisation IdPs enabled for login (via edugain) 987 ELIXIR users 155 groups created in ELIXIR AAI 61 registered Resource Providers

26 ELIXIR Cloud & Compute ELIXIR Cloud capacities surveyed here DK, DE, EBI, FI, FR, SUI confirmed capacity > compute cores > TB of storage > compute users

27 ELIXIR Cloud WG: towards interoperable clouds

28 Data storage and transfer, coupled to security Insert link to ELIXIR Webinar

29 ELIXIR Industry Strategy

30 ELIXIR Innovation and SME Forum Previous Events Upcoming Events Cambridge UK January 2018: Enabling Discoverability in Bio-Data Innovation Node-hosted events that present to Munich Germany (Dates TBA): companies Biotechnology the free tools and services made available through ELIXIR 8 Themes events since companies Human Data: have FI, attended ES, CH 50% Rare of forum Disease: attendees, FR on average, are from Marine: the industry NO sector 95% Plant attendee Sciences: satisfaction NL rate Lots Multi-domain: of networking BE, opportunities DK

31 DATA RESOURCE SHOWCASE TRAINING FLASH-TALK SPEAKERS: Wim Haentjens (European Commission, DG Research & Innovation Agri-food unit) Peer Bork (European Molecular Biology Laboratory) Silvia Miret Catalan (Director Nutrition & Health Discover at Unilever)

32 ELIXIR Innovation and SME Forums attendees Quantitative Indicator Total Private Academics Copenhagen 2014 Wageningen 2015 Basel 2015 Oslo 2016 Helsinki 2017 Barcelona 2017 Brussels 2017 Paris 2017

33 Outcome from innovation events: Qualitative Indicator Node - collaboration Node - collaboration Service - exchange

34 ELIXIR in numbers 21 Members and 1 Observer ~ 180 institutes involved 600+ staff 16 Core Data Resources 23 Implementation Studies ongoing or soon to start 17 papers in ELIXIR F1000 channel 264 live events in TeSS 350 companies attended Innovation and SME programme