Planning a national bioinformatics infrastructure investment

Size: px
Start display at page:

Download "Planning a national bioinformatics infrastructure investment"

Transcription

1 Planning a national bioinformatics infrastructure investment 1

2 Infrastructure? 2

3 Research infrastructure? 3

4 Research infrastructure? 4

5 What is Bioinformatics Infrastructure?

6 What is Bioinformatics Infrastructure? Research infrastructure we need to remain competitive in the global, rapidly changing research environment of biosciences

7 Biosciences: understanding the research community 30,000 health/biosciences researchers 18,000 health/biosciences RHD students 48,000 health/biosciences PG course work students (163, ,000 =) 200,000 health/biosciences UG students 1,000 to 1,500 bioinformatician/computational biologists

8 Biosciences: understanding the research community Estimated # Australian biology researchers in 2018: 30,000 20,000 ( 15,000) 7,000 ( 12,000) 2,000 ( 3,000) Estimated #: 1,000 (In 5 years 1,500) biology-focussed bioscience researchers data-intensive bioscience researchers bioinf-intensive bioscience researchers bioinformaticians occasional users of bioinformatics web services omics data analysis is a critical contributor to the research outcomes research is fully dependent on advanced use of bioinformatics research into/application of techniques & tool development Eg BLAST, Ensembl Eg. RNAseq analysis to identify upregulated genes in broader research program Eg. Genomic cancer research, population genomics/agricultural genomics programs Eg. research generating new tool or statistical method; core facilities applying complex analyses 8

9 Biosciences: understanding the research community Estimated # Australian biology researchers in 2018: 30,000 20,000 ( 15,000) 7,000 ( 12,000) 2,000 ( 3,000) Estimated #: 1,000 (In 5 years 1,500) biology-focussed bioscience researchers data-intensive bioscience researchers bioinf-intensive bioscience researchers bioinformaticians occasional users of bioinformatics web services omics data analysis is a critical contributor to the research outcomes research is fully dependent on advanced use of bioinformatics research into/application of techniques & tool development Eg BLAST, Ensembl Eg. RNAseq analysis to identify upregulated genes in broader research program Eg. Genomic cancer research, population genomics/agricultural genomics programs Eg. research generating new tool or statistical method; core facilities applying complex analyses Important Transitions 9

10 The rapidly evolving international context

11 Europe: National infrastructures + ELIXIR + EBI MISSION: MISSION: Provide, expand and improve a repertoire of specialized bioinformatics tools Provide access to computing and storage capacities Provide regular training events Maintain and develop specific high-quality data resources Providing the national and international life science community with a state-of-the-art bioinformatics infrastructure, including resources, expertise and services Federating world-class researchers and delivering training in bioinformatics

12 Europe: National infrastructures + ELIXIR + EBI The ELIXIR Platforms comprise: Data Sustaining Europe s life-science data infrastructure Tools Services and connectors to drive access and exploitation Interoperability Supporting the discovery, integration and analysis of biological data Compute Storage, compute and authentication/access services Training Professional skills for managing and exploiting data Four Use Cases service domain-specific research communities: Human data Developing long-term strategies for managing and accessing sensitive human data Rare diseases Supporting the development of new therapies for rare diseases Marine metagenomics Developing a sustainable metagenomics infrastructure to nurture research and innovation in marine science Plant science Developing an infrastructure to facilitate genotype-phenotype analyses for crop and tree species

13 Europe: National infrastructures + ELIXIR + EBI

14 US National Institutes of Health: Data Commons Four main components: A computing environment, such as the cloud or HPC (High Performance Computing) resources, which supports access, utilization and storage of digital objects. Publicly available datasets that adhere to a Commons digital object compliance model. Software services and tools to facilitate access to and use on data, both the data in the Commons or elsewhere. A digital object compliance model that describes the properties of digital objects that enable them to be findable, accessible, interoperable and reproducible (FAIR).

15 US National Science Foundation: CyVerse Vision: Transforming science through data-driven discovery Mission: Design, develop, deploy, and expand a national cyberinfrastructure for life science research, and train scientists in its use Platforms, tools, datasets Storage and compute Training and support 15

16 World: NBCI +EBI

17 World: NBCI +EBI

18 Planning an Australian Bioinformatics Infrastructure investment

19 National research infrastructure roadmap 19

20 National Reference Group Prof Tony Bacic (Director, La Trobe Institute of Agriculture and Food) Prof Jacquie Batley (Plant Genetics & Breeding, UWA) Prof Dave Burt (Director Genomics, UQ) Prof Peter Cameron (Acad Director, Alfred Emerg & Trauma Centre) Prof Joanne Daly (CSIRO Honorary Fellow) Prof Frank Gannon (Director, QIMR Berghofer) Prof Rob Henry (Director, QAAFI, UQ) Prof Ary Hoffmann (Biosciences, Melbourne U) Prof Dean Jerry (Dep Director, JCU Ctr Tropical Fisheries & Aquaculture) Prof Ryan Lister (Head, Epigntcs & Genomics, Harry Perkins Inst, UWA) Prof John Mattick (Director, Garvan Institute/Director, Genome England) Prof Kathryn North (Director, MCRI) Prof Nicki Packer (Macquarie U & Inst for Gycomics, Griffith U) Prof Tony Papenfuss (President ABACBS, Comp Biol WEHI/Petermac) Dr. Maurizio Rossetto (NSW Royal Bot Gardens) Prof Eric Stone (Director, ANU-CSIRO Ctr Genmcs, Metablmcs & Bioinf) Dr Jen Taylor (Group leader Bioinformatics, CSIRO) Prof Steve Wesselingh (Director, SAHMRI) Prof James Whisstock (Monash, EMBL-Australia) Prof Marc Wilkins (Director, Ramaciotti Centre, UNSW)

21 International Scientific Advisory Group Tony Papenfuss Head, Computational Biology, WEHI, VIC Mark Walker Director, Aust Infectious Disease Res Centre, UQ, QLD Jaap Heringa Head: ELIXIR-NL Jason Williams Lead: Education, Outreach and Training Delphine Fleury Aus Centre for Plant Functional Genomics, SA Sean Grimmond Director, Centre for UoM Cancer Research, VIC Rebecca Johnson Director, Australian Museum Research Institute, NSW Rochelle Tractenberg Paul Flicek Lead: Vertebrate Genomics & ENSEMBL Founder: Collaborative for Research on Outcomes and Metrics Vivien Bonazzi Program Leader for NIH Data Commons

22 Timeline Phase Period Activity 1 Jun - Aug Concept development and project 2017 planning 2 Sep - Dec Elaboration of requirements, 2017 options and consensus building 3 Jan - Sep Engagement with expected NCRIS 2018 planning of its investments 4 Oct Steps to implement results or? 2019 engage further as needed Community consultations: Brisbane 8/Aug Perth 10-11/Oct Canberra 30/Oct Sydney 3/Nov Melbourne 8/Nov ABACBS 14/Nov Melbourne 17/Nov Adelaide 20/Nov

23 Three Capabilities Capability I: A Biologist to Bioinformatics Bridge A national omics analysis service providing: A means to use standardised bioinformatics techniques through high level interfaces; Integrated with a regionally accessible support and training network; and Providing direct access to underlying infrastructure for new technique developers Capability II: Data Integration Facilities Providing: A facility for data intensive computing on bioscience data and tools; Coupled with a critical mass of data science expertise versed in omics; and Assigned by merit to large research teams for extended periods Options: 1. A single facility supporting integration of all data types (eg EBI). 2. Multiple facilities, each providing support for a domain of specialization in data types (eg denbi). Capability III: An Australian Biomolecular Data Consortium Options (which are not mutually exclusive): 1. Operate the service on a transaction based model - bring your data - take your data. 2. Include a long term data retention and publication function. 3. Include a user workflow retention and publication function. A joining together of the leadership in bioscience to address long term systemic challenges: Policy development around rapidly emerging data asset issues; The changing requirements on undergraduate and postgraduate training; and Engagement with large scale -omic resources onshore and offshore. 23

24 Understanding the research community Estimated # Australian biology researchers in 2018: 30,000 20,000 ( 15,000) 7,000 ( 12,000) 2,000 ( 3,000) Estimated #: 1,000 (In 5 years 1,500) biology-focussed bioscience researchers data-intensive bioscience researchers bioinf-intensive bioscience researchers bioinformaticians occasional users of bioinformatics web services omics data analysis is a critical contributor to the research outcomes research is fully dependent on advanced use of bioinformatics research into/application of techniques & tool development Eg BLAST, Ensembl Eg. RNAseq analysis to identify upregulated genes in broader research program Eg. Genomic cancer research, population genomics/agricultural genomics programs Eg. research generating new tool or statistical method; core facilities applying complex analyses Important Transitions 24

25 Beyond tools, data & compute: workforce training Workforce Development is a high priority ready to go opportunity Analysis of global efforts shows: Significant resource is being committed Substantial training material exists As expected, lots of commonality 25