Science as a Service Accelerating Scientific Discovery using Cloud

Similar documents
Networking materials data

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

Secure information access is critical & more complex than ever

Introduction. Highlights. Prepare Library Sequence Analyze Data

Genomic Data Is Going Google. Ask Bigger Biological Questions

Product presentation. Fujitsu HPC Gateway SC 16. November Copyright 2016 FUJITSU

SYMPOSIUM March 22-23, 2018

Oracle Platform as a Service and Infrastructure as a Service Public Cloud Service Descriptions-Metered & Non-Metered.

Research Data Management

<Insert Picture Here> Cloud Computing

Software Development. Current and Future Trends. Jagdeep Singh Bhambra, PhD 27 th June 2012

I D C M A R K E T S P O T L I G H T. S i l o s a n d Promote Business Ag i l i t y

Aprimo Digital Asset Management

Copyright 2014, Oracle and/or its affiliates. All rights reserved. 2

Steve Bryant-Brown Technology Mayank Nayar Program Manager, Azure Site Recovery. Will Rowley Cloud

ServiceNow Platform Technical Overview. White Paper

AWS Marketplace Metering Service - Onboarding and Seller Guide

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Course Presentation. Ignacio Medina Presentation

The IBM Reference Architecture for Healthcare and Life Sciences

HPE ITSM Automation and Containers Accelerating Deployment and Time to Value February 23, 2017

Investor Day Client-driven market approach. Doug McCuaig, EVP, Global Client Transformation Services. CGI Group Inc.

Equifax InterConnect. A Product Review. By James Taylor CONTENTS

Oracle Cloud Blueprint and Roadmap Service. 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Distributed Systems Current Trends in Distributed Systems

Providing the right level of analytics self-service as a technology provider

DAM Requirements Checklist

The IoT Solutions Space: Edge-Computing IoT architecture, the FAR EDGE Project John Professor Athens Information

Digitalisieren Sie Ihr Unternehmen mit dem Internet der Dinge Michael Epprecht Microsoft GBB IoT

Intelligent Business Management in the Cloud.

Building a Secure, Approved AMI Factory Process Using Amazon EC2 Systems Manager (SSM), AWS Marketplace, and AWS Service Catalog

Infor CloudSuite. Copyright 2014 Infor. All rights reserved.

World Leading Storage Cloud at ETH Zürich

HP Cloud Maps for rapid provisioning of infrastructure and applications

Ion S5 and Ion S5 XL Systems

Azure PaaS and SaaS Microsoft s two approaches to building IoT solutions

Content. IT as a Service and Tieto s offerings. New way of buying and delivering IT services

Oracle Enterprise Data Quality Product Roadmap and Statement of Direction. October 2016

Innovate with Oracle Public Cloud Platform & Infrastructure Services

BIG DATA CLOUD TOOLBOX

IBM Cloud. Cloud Platforms are enabling the Necessity of Ecosystems!

The EU Open Data policy Daniele Rizzi European Commission DG Communications Networks, Content and Technology

Analyzing Data with Power BI

Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH. BIOL 7210 A Computational Genomics 2/18/2015

TECHNOLOGY PLATFORM STRATEGY

Building a Data Lake on AWS

EUDAT How manage Data into the Collaborative Data Infrastructure: a general overview of EUDAT services

White paper Accelerating the Digital Transformation With Atos alien4cloud and Cloudify

WHITE PAPER. Five Key Benefits of Workflow Enabling Your Organization

Oracle SCM Cloud. Integration and Extensibility Strategy. Jon Chorley. CSO and Group Vice President Oracle SCM Product Strategy.

DevOps, Architecture, and Security in a Cloud

Next Generation Bioinformatics on the Cloud

The Key to Successful Business Transformation

MIGRATING MEDIA WORKFLOWS TO THE CLOUD. Scott Malkie, Systems Engineer

SAP Cloud Platform Mobile Services

Microsoft moves IT infrastructure management to the cloud with Azure

Leveraging the benefits of the cloud with transparency and control

VULNERABILITY MANAGEMENT BUYER S GUIDE

How is technology changing the water utility industry? SC Rural Water Conference Sept , 2015

The Sentieon Genomic Tools Improved Best Practices Pipelines for Analysis of Germline and Tumor-Normal Samples

Aprimo Marketing Productivity

WANT TO BUILD A SUCCESSFUL MIGRATION STRATEGY?

RPA - Robotic Process Automation

Realize Your Product Promise

PRODUCT UPDATES APJ PARTNER SUMMIT - BALI. February Software AG. All rights reserved. For internal use only

FINACLE SERVICES: API MANAGEMENT USING CA API GATEWAY

WHITEPAPER. Art of Code-Free Blockchain Platform

Transform Application Performance Testing for a More Agile Enterprise

Digital Transformation Checklist

Bridging Silos: A Demonstration of Federation Management Using the KeyVOMS Prototype

ACCELERATING DIGITIZATION THROUGH NEXT-GENERATION INTEGRATION

UCSF Integration Update

POWER PORTFOLIO PRODUCT BROCHURE

SCRIBE WHITE PAPER HOW SCRIBE ONLINE WORKS

MHPS-TOMONI: Cloud Based Plant Data Monitoring and Analysis Platform

INFOSYS REALTIME STREAMS

SYSPRO Product Roadmap Q Version 03

Accelerate SharePoint Success: How to Best Plan, Manage & Control Migration Projects

Berkeley Data Analytics Stack (BDAS) Overview

SCRIBE WHITE PAPER HOW SCRIBE ONLINE WORKS

MANAGING SERVICES WITH RED HAT CLOUDFORMS AND ANSIBLE

Multi-Cloud Infrastructure as a Service (IaaS) as a Public Cloud Adoption Pattern

Laboratory Management for Clinical and Research NGS Labs

Cognos 8 Business Intelligence. Evi Pohan

ORACLE MANAGEMENT CLOUD CUSTOMER REFERENCE LOOKBOOK

TEST-DRIVEN EVALUATION OF GALAXY SCALABILITY ON THE CLOUD

Introducing Amazon Kinesis Managed Service for Real-time Big Data Processing

The future is web-scale

Product Innovation Using Private & Public Cloud

Dell EMC Consulting Ingo Strutz

PSD2: An Open Banking Catalyst

NASA Big Data Working Group Meeting Summary

The Leading Low-code Application Platform For Modern Work Management

MAPAL implements tool data management and IoT with SAP HANA Cloud Platform

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

ENTERPRISE OPERATIONS SERVICES

DevOps. Bringing agility all the way up to Production

THE FUTURE FOR INDUSTRIAL SERVICES: THE DIGITAL TWIN

by Accelrys From other Electronic Lab Notebooks (ELNs)

Cloud Computing. University of Economics and Law. Duc.NHM Faculty of Information Systems

Transcription:

Science as a Service Accelerating Scientific Discovery using Cloud Ravi Madduri madduri@anl.gov Internet2 Global Summit 2016, Chicago

Outline Scientific Discovery Process The Case for Cloud Science as a Service Globus Research Data Management and Analysis Perspectives from NIH Success stories

Our vision for a 21st century discovery infrastructure Provide more capability for more people at lower cost by delivering Science as a Service www.globus.org

Scientific Discovery Process 4 Collect data Analyze data Pose question Design experiment Identify patterns Publish results Test hypothesis Hypothesize explanation

Eliminating data friction is essential to modern science Civilization advances by extending the number of important operations which we can perform without thinking about them (Whitehead, 1912) Obstacles to data access, movement, discovery, sharing, and analysis slow research, distort research directions, and waste time (DOE reports)

Imagine if a researcher, when tackling a problem, could easily: Assemble, integrate, and interpret all relevant data within a knowledge network Be informed of anomalies, patterns, gaps Formulate & apply computational models Outsource tasks if local expertise lacking Launch automated processes to test hypotheses, expand knowledge network Pay for all this by taking on other tasks

We will cover Accelerating Scientific Discovery Process by providing Science as a Service Research Data Management Analyzing Research Data Interactive Analysis Large-scale Analysis Publishing Results so others can Discover Validate Reproduce/Use

Cloud has transformed platforms and how software is delivered Software as a service: SaaS (web & mobile apps) Platform as a service: PaaS Infrastructure as a service: IaaS PaaS enables more rapid, cheap, and scalable delivery of powerful apps as SaaS 8

Our Science Stack Globus Galaxies Galaxy Interactive execution, ipython, R Creation, Execution, Sharing and Discovering Workflows Globus Data management Identity Management AWS HTCondor, Chef, EC2, EBS, S3, SNS Spot, Route 53, Cloud Formation SaaS PaaS IaaS

Managing big data with Globus Compute Facility Light Source Globus transfers files reliably, securely 4 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 2 3 1 PI initiates transfer request; or requested automatically by script, science gateway SaaS à Only a web browser required Access using your campus credentials Globus monitors and informs throughout PI selects files to share, selects user or group, and sets access permissions Researcher logs in to Globus and accesses shared files; no local account required; download via Globus 7 Curator reviews and approves; data set published on campus or other system 6 5 Researcher assembles data set; describes it using metadata (Dublin core and domainspecific) Personal Computer 6 Publication Repository 8 Peers, collaborators search and discover datasets; transfer and share using Globus

Globus Platform-as-a-Service Globus APIs Sharing Service Transfer Service Globus Connect Identity, Group, Profile Management Services Globus Toolkit

Globus Adoption and Usage 166,449 active Globus endpoints 27,961 users registered Biggest transfer: 500.42TB Longest running transfer: 182 days. Fastest transfer: 58.5Gbps (average) 55TB moved per day, on average, since the service was launched in November 2010 Average throughput: 637.7Mbps (since service launch)

Analyzing Big Data using Globus Galaxies Sequencing Centers Seq Center Public Data Globus provides for High-performance Fault-tolerant Research Lab Secure file transfer between all data-endpoints Storage Globus Galaxies Local Cluster/ Cloud Galaxy Data Libraries Galaxy-based workflow management Fastq Picard Alignment GATK Variant Calling Globus integrated within Galaxy Ref Genome Web-based UI Drag-Drop workflow creations Easily modify workflows with new tools Analytical tools are automatically run on the scalable compute resources when possible Data management Globus Galaxies on Amazon EC2 Data analysis

Examples of Science as a Service Globus Genomics Large-scale NGS analysis PDACS - Portal for data analysis services for cosmological simulations CVRG Galaxy Large-scale ECG Data Analysis Globus Proteomics ematter Material Science Simulations FACE-IT - Framework to Advance Climate, Economic, and Impact Investigations with Information Technology (usefaceit.org)

Examples of what researchers have done using Globus Genomics

Examples in Genomics A profile of inherited predisposition to breast cancer among Nigerian women Y. Zheng, T. Walsh, F. Yoshimatsu, M. Lee, S. Gulsuner, S. Casadei, A. Rodriguez, T. Ogundiran, C. Babalola, O. Ojengbede, D. Sighoko, R. Madduri, M.-C. King, O. Olopade A case study for high throughput analysis of NGS data for translational research using Globus Genomics D. Sulakhe, A. Rodriguez, K. Bhuvaneshwar, Y. Gusev, R. Madduri, L. Lacinski, U. Dave, I. Foster, S. Madhavan

Globus Genomics at a glance 30 institutions, groups 2 PBs raw sequences analyzed 1000s genomes processed 5 days longest running workflow 10s million core hours labs >1000 analysis tools >50 workflows 99% uptime over the past two years >20 Publications 1 PB largest single transfer to do <1 day turnaround time 100s different species

Diversity of Collaborations Cox Lab Volchenboum Lab Olopade Lab

Costs are remarkably low Pricing includes Estimated compute Storage (one month) Globus Genomics platform usage Support

Some of the cloud-activities in NIH that we are involved in

NIH Commons Pilots minids unique identifiers and minimal metadata for digital objects data objects containers BagIT Registries/Indexes APIs Common Workflow Language for reproducible workflows BDDS Data Publication Services

Our work is supported by: U.S. DEPARTMENT OF ENERGY 25

Thank you! @madduri