Elixir: Overview, Progress and Futures

Size: px
Start display at page:

Download "Elixir: Overview, Progress and Futures"

Transcription

1 Elixir: Overview, Progress and Futures 20th Meeting of the EC-US Biotechnology Task Force Barcelona 3 June 2010 Andrew Lyall, ELIXIR Project Manager ELIXIR: a sustainable infrastructure for biological information in Europe.

2 What is Elixir? An EU Framework 7 Preparatory Phase Project Coordinated by Prof Janet Thornton, Director EMBL-EBI To construct a plan for the operation of a sustainable infrastructure for biological information in Europe 4.5 million grant awarded May 2007, three year term 32 member consortium engaging many of Europe s main bioinformatics funding agencies and research institutes Deliverables are memoranda of understanding to fund the implementation phase which could cost 500 million Interested parties should register as stake-holders via the ELIXIR Website: ELIXIR: a sustainable infrastructure for biological information in Europe. 2

3 European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory International organisation created by treaty (cf CERN, ESA) EMBL-EBI has 400 Staff, 30 Million Budget, several million users 15 year history of service provision and scientific excellence Sited at the Wellcome Trust Genome Campus Hinxton, Cambridge, UK after European competition 2008 funding sources ELIXIR: a sustainable infrastructure for biological information in Europe. 3

4 EMBL-EBI Mission To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress To contribute to the advancement of biology through basic investigator-driven research in bioinformatics To provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators To help disseminate cutting-edge technologies to industry ELIXIR: a sustainable infrastructure for biological information in Europe. 4

5 EMBL-EBI: Most important data collections Genomes & Genes 1. Ensembl: Joint project with Sanger Institute - high-quality annotation of vertebrate genomes 2. Ensembl Genomes: Environment for genome data from other taxons Genomes: Catalogue of human variation from major World populations 4. EGA*: European Genotype Archive* genotype, phenotype and sequences from individual subjects and controls 5. ENA: European Nucleotide Archive all DNA & RNA, nextgen reads and traces Transcription 6. ArrayExpress: Archive of transcriptomics and other functional genomics data 7. Expression Atlas: Differentially expressed genes in tissues, cells, disease states & treatments Protein 8. UniProt: Archive of protein sequences and functional annotation 9. InterPro: Integrated resource for protein families, motifs and domains 10. PRIDE: Public data repository for proteomics data 11. PDBe: Protein and other macromolecular structure and function Small molecules 12. ChEBI: Chemical entities of biological interest 13. ChEMBL: Bioactive compounds, drugs and drug-like molecules, properties and activities Processes 14. IntAct: Public repository for molecular interaction data 15. Reactome: Biochemical pathways and reactions in human biology 16. Biomodels: Mathematical models of cellular processes Ontologies 17. GO: Gene Ontology, consistent descriptions of gene products Scientific literature 18. CiteXplor: Bibliographic query system * Requires authentication ELIXIR: a sustainable infrastructure for biological information in Europe. 5

6 Contents. 1. Why is ELIXIR necessary? 2. How is ELIXIR organised? 3. Where did ELIXIR come from? 4. What has happened so far? 5. What next? ELIXIR: a sustainable infrastructure for biological information in Europe. 6

7 1. Why is Elixir Necessary? ELIXIR: a sustainable infrastructure for biological information in Europe. 7

8 Why are we here? Europe is facing unprecedented (Grand) Challenges Healthcare for an aging population A sustainable food supply An internationally competitive life-sciences industrial sector Protection of the environment. A sustainable energy supply ELIXIR: a sustainable infrastructure for biological information in Europe. 8

9 Modern biology requires integration. Genome Protein Cell Embryo Fruitfly Mouse Development, Ageing, Disease ELIXIR: a sustainable infrastructure for biological information in Europe. 9

10 Comprehensive, universal, integrated Life sciences Medicine Agriculture Pharmaceuticals Biotechnology Environment Bio-fuels Cosmaceuticals Neutraceuticals Consumer products Personal genomes Etc Genomes Ensembl, Ensembl Genomes, EGA EGA Gene expression ArrayExpress Protein families, motifs and and domains InterPro Protein interactions IntAct Literature and and ontologies CitExplore, GO GO Nucleotide sequence EMBL-Bank Proteomes UniProt, PRIDE Protein structure PDBe Chemical entities ChEBI, ChEMBL Pathways Reactome Systems BioModels ELIXIR: a sustainable infrastructure for biological information in Europe. 10

11 Very large user community One Million unique users in 2009 ELIXIR: a sustainable infrastructure for biological information in Europe. 11

12 Database growth (2007/2006 %) 211% 100% 122% E-PDB (Structures) 122% 136% 120% ELIXIR: a sustainable infrastructure for biological information in Europe. 12

13 Data growth exceeds growth in IT capability CPU Power doubles <> 24 months (Moores Law) Disk capacity doubles <> 18 months No. of Racks Network bandwidth doubles <> 20 months Racks in EBI machine room double <> 12 months ELIXIR: a sustainable infrastructure for biological information in Europe. 13

14 Growth of disk storage at EMBL-EBI Disk space at EMBL-EBI Terabytes Ten petabytes at May ELIXIR: a sustainable infrastructure for biological information in Europe. 14

15 Good value for money 3500 Total cost of data generation Annual cost of information resources Millions 0 Human Genome Other Organisms Structures Expression data NCBI Japanese Bioinf. EBI ELIXIR: a sustainable infrastructure for biological information in Europe. 15

16 Disruptive technologies. A technology becomes disruptive when the rate at which it improves exceeds the rate at which users can adapt to the new performance. The Innovator's Dilemma. Clayton M. Christensen. Harvard Press ELIXIR: a sustainable infrastructure for biological information in Europe. 16

17 Disruptive technologies in biology Next-generation DNA sequencing Data will be 1,000 <> 1,000,000 times cheaper to produce Data production rates will be 1,000 <> 1,000,000 more by the end of the ESFRI period. Imaging (including video) at all scales from molecules to whole organisms Protein sequencing by Mass Spectrometry may also be disruptive There will probably be others Macromolecular structure determination by Electron Microscopy etc ELIXIR: a sustainable infrastructure for biological information in Europe. 17

18 Exponential growth in data Cannot equate to exponential growth in funding, so 1. Link budgets for data generation and data processing Only produce as much data as you can deal with 2. Take steps to control staff growth Automation of annotation and curation Implement distributed annotation (DAS) Use web services and distributed resources Support for metadata deposition 3. Take steps to control IT resource requirements. Develop policies for which data are to be kept (& which not) Develop data compression techniques ELIXIR: a sustainable infrastructure for biological information in Europe. 18

19 Elixir rationale Optimal Data Management Coordinated data resources with improved access & economy of scale Integration and interoperability of diverse heterogeneous data Forge links to data in other related domains A single European voice to influence global decisions and maintain open access Enhance European competitiveness in bioscience industries Address need for Increased Funding & its Coordination ELIXIR: a sustainable infrastructure for biological information in Europe. 19

20 2. How is ELIXIR Organised? ELIXIR: a sustainable infrastructure for biological information in Europe. 20

21 Members of the ELIXIR consortium There are 32 partners from 13 member states and associated countries 16 of the partners are funding agencies or Government Bodies 16 of the partners are scientific organisations or institutes There are expressions of interest from many others ELIXIR: a sustainable infrastructure for biological information in Europe. 21

22 ELIXIR preparatory phase 1. Mobilisation and planning, November Committee and recommendations phase (lasting 18 months), Jan 2008 Hold stakeholder meetings Establish working committees to write reports Present the reports at an open stakeholders meeting for wider discussion Define ELIXIR 3. Documentation and negotiation phase (lasting 18 months). July 2009 Consolidate reports into a proposal to be sent to all the member states and funding agencies with a draft Memorandum of Understanding by month 26. Define how or which parts will be funded by whom Reach agreement after 38 months, so that construction can start ELIXIR: a sustainable infrastructure for biological information in Europe. 22

23 ELIXIR Work Packages. Elixir is organised into 14 work packages which have committees of (mainly) European experts associated with them. It is organising two surveys, one of users and one of data-providers, and five technical-feasibility studies. The Elixir Steering Committee is associated with WP1 and has oversight of the whole project. WP3 has four committees; for Bioinformatics Communities, for Data Providers, for Industry and for Interactions with the rest of the World (International). There will be regular Stakeholder meetings intended to encourage the widest possible participation. 1. Project management 2. Data providers 3. User communities 4. Organisation and Legal 5. Funding 6. Physical infrastructure 7. Data interoperability 8. Literature 9. Healthcare 10. Agriculture, Chemistry & Environment 11. Training 12. Tools integration 13. Feasibility studies 14. Reporting and negotiation ELIXIR: a sustainable infrastructure for biological information in Europe. 23

24 Is Elixir technically feasible? Elixir does not depend for its success on any technology that has not been developed yet. However, it will be providing solutions to very demanding data management problems presented by things such as the 1000-genome project, the great increase in imaging of biological systems and the impending scale-up of structural and systems biology. We are thus conducting five technical feasibility studies that support the more challenging aspects of Elixir. More information on these studies is available from the Elixir Web Site. 1. Strategic Review of Cell Phenotype Image Data Resources. 2. Pilot of the use of European Supercomputing facilities for distributed processing of Bioinformatics data. 3. Assessment of European Resources for Systems Biology. 4. Search across heterogeneous distributed data resources (EB-eye). 5. Safe and ethical use of personal genetic information (European Genotype Archive). ELIXIR: a sustainable infrastructure for biological information in Europe. 24

25 3. Where did ELIXIR come from? ELIXIR: a sustainable infrastructure for biological information in Europe. 25

26 2000: The Lisbon Strategy of the EU An action and development plan to make the EU the most dynamic and competitive knowledgebased economy in the world capable of sustainable economic growth more and better jobs greater social cohesion respect for the environment by 2010 will be achieved through the formulation of various policy initiatives to be taken by all EU member states ELIXIR: a sustainable infrastructure for biological information in Europe. 26

27 The European Research Area (ERA). The EU created a unified area all across Europe (2000), the purpose of which it to Enable researchers to move and interact seamlessly Benefit from world-class infrastructures Work with excellent networks of research institutions Share, teach, value and use knowledge effectively for social, business and policy purposes; Optimise, open and co-ordinate national and regional research programmes to address major challenges Develop strong links with partners around the world Construction of the ERA is the purpose of FP6 & FP7 ELIXIR: a sustainable infrastructure for biological information in Europe. 27

28 ESFRI The European Strategy Forum on Research Infrastructures Created by the Commission in February 2002 Adopted by the Competitiveness Council in April 2002 Representatives of EU Member States, Associated States, and one representative of the European Commission. Chairman: Prof Carlo Rizzuto (Sincrotrone Trieste S.c.p.A.- ELETTRA, IT) To support a coherent approach to policy-making on research infrastructures in Europe To act as an incubator for international negotiations about concrete initiatives ELIXIR: a sustainable infrastructure for biological information in Europe. 28

29 European Roadmap for Research Infrastructures. 35 mature projects for new large scale Research Infrastructures Based on an international peer review process Covers all scientific areas, regardless of possible location Likely to be realized in the next 10 to 20 years Supported by a relevant European partnership or intergovernmental research organisation. Impact on science and technology development at international level Support new ways of doing science in Europe Contribute to the enhancement of the European Research Area ELIXIR: a sustainable infrastructure for biological information in Europe. 29

30 Roadmap projects summary. 6 Social Science & Humanities 8 Environmental Sciences 3 Energy 6 Biomedical and Life Sciences 7 Material Sciences 5 Astronomy, Astro-, Nuclear and Particle Physics 1 Computer and Data Treatment (transverse) ELIXIR: a sustainable infrastructure for biological information in Europe. 30

31 Cost of 35 Mature ESFRI RI Projects Computing 300M Social Science Environment 1,300 Physics 3,600 Biomedical 1,600 Energy 2,200 Materials 4,500 Total Capital Cost = 13,696 Million ELIXIR: a sustainable infrastructure for biological information in Europe. 31

32 But The commission needed 14 Billion to fund the RIs Which the member states refused to provide it So, the commission created the preparatory phase projects The purpose of which are to create the consortia to fund the construction of the RIs ELIXIR: a sustainable infrastructure for biological information in Europe. 32

33 Where did Elixir come from? Strasbourg Conference on RI European Council need new arrangements to support policies related to research infrastructures Lisbon Lisbon Strategy Strategy European European Research Research Area Area Innovation Innovation Strategy Strategy European Parliament Council of the EU European Commission Timeline 2000 Strasbourg Conference on RI 2000 Lisbon strategy 2000 European Research Area 2002 Competitiveness Council 2002 ESFRI 2006 Innovation Strategy 2006 Roadmap for RIs 2007 Framework Elixir Preparatory Phase 2011 Elixir Implementation Competitiveness Council European Strategic Forum on RI (ESFRI) Roadmap Roadmap for for Research Research Infrastructures Infrastructures Framework Framework Seven Seven Elixir ELIXIR: a sustainable infrastructure for biological information in Europe. 33

34 Infrastructure for tools & services integration Enabling Technology Content European wide consultation Stakeholders meetings Users survey Data providers survey Member state visits Consultation with Industry Workpackages & Feasibility Studies Preparatory Phase ELIXIR: a sustainable infrastructure Construction Phase Member States ELIXIR: a sustainable infrastructure for biological information in Europe. 34

35 4. What has happened so far? ELIXIR: a sustainable infrastructure for biological information in Europe. 35

36 Consultation phase Two year European-wide consultation Consultation with non-european partners (US, Japan, China etc) Three International Stakeholder Meetings Many work package meetings Consultation with Industry Numerous visits to member states Member state meetings ELIXIR: a sustainable infrastructure for biological information in Europe. 36

37 Visits during consultation phase. ELIXIR: a sustainable infrastructure for biological information in Europe. 37

38 Sites of ELIXIR survey data providers ELIXIR: a sustainable infrastructure for biological information in Europe. 38

39 What might Elixir be? A reliable distributed infrastructure to provide equality of access to biological information across all of Europe Sustainable funding for the core European biological data collections (genomes, sequences, structures etc) Sustainable funding for the global biological data collaborations (UniProt, ww-pdb, INSDC etc) Processes for developing new core data collections supporting interoperability of bioinformatics tools developing bioinformatics standards and ontologies Enhanced use of biological information in Academic Research, the Pharmaceutical Industry, Biotechnology, Agriculture and for the Protection of the Environment ELIXIR: a sustainable infrastructure for biological information in Europe. 39

40 A Reliable Distributed Infrastructure Elixir will be constructed by enhancing and linking existing infrastructures in the member states. It will integrate member state infrastructures into a single infrastructure or a Grid. Each member-state will Identify and catalogue its requirements Identify funding agencies that are prepared to fund bioinformatics Identify projects and organisations that could become part of Elixir Include funding for Elixir in its National Research Infrastructure Plan Where appropriate, identify structural funds that can be used for Elixir Once it is constituted, join the Elixir Organisation ELIXIR: a sustainable infrastructure for biological information in Europe. 40

41 Summary of ELIXIR Components. Biomolecular and related data collections Computational resources Standards and ontology development Training infrastructure Tools and services integration infrastructure ELIXIR: a sustainable infrastructure for biological information in Europe. 41

42 Core data collections at EMBL-EBI 1. European-PDB the European partner in the wwpdb Macromolecular Structures Database. 2. UniProt the world s definitive collection of protein sequence data. 3. EMBL-Bank the European instance of the global archive of nucleotide sequence data. 4. Ensembl a world leader in the provision of annotated eukaryotic genomes. 5. ArrayExpress a major public repository for microarray data. 6. InterPro a database of protein families, domains and functional sites which aggregates such information from a large number of collaborators. ELIXIR: a sustainable infrastructure for biological information in Europe. 42

43 Attributes of core data collections Universally relevant to biology and medicine Journals insist on data deposition as a condition for publication Very, very large user communities Aim to be complete collections with Global significance Exchange with other data centres ensures completeness Science is stable enough to allow standardisation of data structures Host institute needs to be involved in standards development Support requires substantial institutional commitment ELIXIR: a sustainable infrastructure for biological information in Europe. 43

44 Global context of data collections Data are freely exchanged daily European Boinformatics Institute Data are freely deposited National Centre for Biotechnology Information DNA Databank of Japan Data are made freely available to all ELIXIR: a sustainable infrastructure for biological information in Europe. 44

45 Sweden is first country to commit Sweden to be the first country to pledge long-term funding for ELIXIR Posted on Tue Jun 23, :32 am The Swedish funding agencies (the Swedish Energy Agency, Fas, Formas, the Swedish Research Council and VINNOVA) have suggested that the Government allocate a total of 19 million SEK (1.7 million Euro) over three years to ELIXIR & the Swedish Bioinformatics Infrastructure for Life Sciences (BILS), which would make Sweden the first country to secure long-term funding for ELIXIR. A final decision along these lines is expected during the Autumn Contact: Prof. Bengt Persson, Linköping University & Karolinska Institutet Web site: ELIXIR: a sustainable infrastructure for biological information in Europe. 45

46 UK commits to ELIXIR UK leads European research programme with 10M investment in bioscience data handling capacity Posted on Tue Aug 25, :35 am The UK has made its first substantial commitment to ELIXIR with a 10M investment by the Biotechnology and Biological Sciences Research Council (BBSRC). BBSRC has awarded funding to the European Molecular Biology Laboratory s European Bioinformatics Institute to permit a dramatic increase in the institute s data storage and handling capacity. The funding is the first step in developing the existing data resources and IT infrastructure of EMBL-EBI towards its planned role as the central hub of the emerging European Life-Science Infrastructure for Biological Information. Contact: Matt Goode, BBSRC External Relations Web site: ELIXIR: a sustainable infrastructure for biological information in Europe. 46

47 Denmark commits to ELIXIR Denmark invests 5 million in pan-european infrastructure for biology Posted on Jan 10, :00 am Denmark has made its first substantial commitment to the European Life-Science Infrastructure for Biological Information (ELIXIR), a major emerging pan-european science project, with a 3.5M investment by the Danish Agency for Science, Technology and Innovation. Substantial co-financing from the University of Copenhagen, Aarhus University, the University of Southern Denmark, and the Technical University of Denmark will increase the total amount invested in ELIXIR to 5 million. Contact: Katrina Pavelin, EMBL-EBI Scientific Outreach Officer - Hinxton, UK Web site: ELIXIR: a sustainable infrastructure for biological information in Europe. 47

48 Finland commits to ELIXIR Finland invests 1.85 million in pan-european infrastructure for biomedical research Posted on Feb 3, :00 am Finland has made its first specific commitment to the development of European biomedical research infrastructures (BMS ESFRIs) by supporting a joint pilot infrastructure project in bioinformatics (ELIXIR), biobanking (BBMRI) and translational research (EATRIS). The initial commitment of 1.85 M is to support preparation and pilot studies in The purpose of the funding is to ensure Finland's commitment to the building of these European infrastructures. The level of funding in future years was left open as this depends on the outcome and structures that will be developed in the pilot phase. Contact: Katrina Pavelin, EMBL-EBI Scientific Outreach Officer - Hinxton, UK Web site: ELIXIR: a sustainable infrastructure for biological information in Europe. 48

49 Data Centre Issues Insufficient capacity in current data centre Data are so large that tape is no longer feasible for Backup and Disaster Recovery. Topology must provide resilience against local & regional disaster through geographical separation. Resilience against national disaster will be provided through international collaboration Tier 3 is sufficient (confirmed by consultation with Industrial and Academic Users) In fact we chose Tier 3+ - keeping redundant equipment live ELIXIR: a sustainable infrastructure for biological information in Europe. 49

50 Telecity Group Solution Headquarters in London, Market Capitalisation 750M, Operates 23 Tier 3+ European data centres in prime city centre positions. 1. Oliver s Yard, a PCI DSS accredited data centre near Old Street, Central London 2. Powergate, their newest and most advanced data centre in Acton, West London 1 2 ELIXIR: a sustainable infrastructure for biological information in Europe. 50

51 Tier 3+ service provision Geographically separated Both systems active Hot failover Less than 2h/y downtime No planned down time Protection from local & regional disaster DC1 DC2 Global load balancer Internet ELIXIR: a sustainable infrastructure for biological information in Europe. 51

52 Data centre topology provides resilience Powergate Oliver s Yard Harbour Exchange ELIXIR: a sustainable infrastructure for biological information in Europe. 52

53 5. What happens next? ELIXIR: a sustainable infrastructure for biological information in Europe. 53

54 ELIXIR Legal Personality Initially ELIXIR is most likely to be an EMBL Special Project although the decision has not been taken yet. In due course this will probably be transferred to an ERIC This approach will allow a quick start for the construction phase. Early adoption of ERIC was considered high risk. Decision to change will be taken by ELIXIR Management. We have taken legal advise on this. Bearing in mind the critical importance of ELIXIR for Europe, this is considered the safest way to proceed. ELIXIR: a sustainable infrastructure for biological information in Europe. 54

55 ELIXIR Management structure ELIXIR International Consortium Agreement ELIXIR Member States Others EMBL ELIXIR Council Scientific Advisory/Grant Committee Structure ELIXIR Executive/Secretariat (at EMBL-EBI) Heads of ELIXIR Nodes Committee ELIXIR Node 1 (EMBL-EBI) ELIXIR Node 2 ELIXIR Node 3 ELIXIR Node 4 ELIXIR: a sustainable infrastructure for biological information in Europe. 55

56 ELIXIR Scientific & Technical Structure ELIXIR: a sustainable infrastructure for biological information in Europe. 56

57 Construction Phase Request for suggestions for nodes Application form and guidelines on web site One page advertisement in Nature Funders meeting planned for fall Applied for one year extension to Preparatory Phase ELIXIR: a sustainable infrastructure for biological information in Europe. 57

58 Tasks for the construction phase Construction of ELIXIR Provision of infrastructure to other ESFRI BMS RI Mediating interactions between ESFRI BMS RI & e-infrastructures Providing services and tools for the Agriculture, Animal and Plant Biology Communities Many others too ELIXIR: a sustainable infrastructure for biological information in Europe. 58

59 BMS Support of the European Grand Challenges ELIXIR will provide Infrastructure for the other ESFRI BMS RI. ELIXIR: a sustainable infrastructure for biological information in Europe. 59