Taking Advantage of Cloud Elasticity and Flexibility

Similar documents
Hadoop in the Cloud. Ryan Lippert, Cloudera Product Cloudera, Inc. All rights reserved.

Sr. Sergio Rodríguez de Guzmán CTO PUE

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved.

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Oracle Big Data Cloud Service

TECHNICAL WHITE PAPER. Rubrik and Microsoft Azure Technology Overview and How It Works

How In-Memory Computing can Maximize the Performance of Modern Payments

Hortonworks Connected Data Platforms

Safe Harbor Statement

Insights to HDInsight


Big Data Cloud. Simple, Secure, Integrated and Performant Big Data Platform for the Cloud

Transforming Analytics with Cloudera Data Science WorkBench

Microsoft FastTrack For Azure Service Level Description

Oracle Autonomous Data Warehouse Cloud

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Common Customer Use Cases in FSI

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Big data is hard. Top 3 Challenges To Adopting Big Data

Business is being transformed by three trends

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Microsoft Azure Essentials

Cloudera, Inc. All rights reserved.

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

GUIDE The Enterprise Buyer s Guide to Public Cloud Computing

Course 20535A: Architecting Microsoft Azure Solutions

CORPORATE OVERVIEW. June 2018

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

HyperCloud. IT s Cloud Dilemma

Apache Hadoop in the Datacenter and Cloud

Cloud Based Analytics for SAP

Rapid Start with Big Data Appliance X6-2 Technical & Operational Overview

Make Business Intelligence Work on Big Data

Cloudera Hadoop & Industrie 4.0 wohin mit dem Datenstrom?

How to Build Your Data Ecosystem with Tableau on AWS

A Examcollection.Premium.Exam.35q

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Big Data Hadoop Administrator.

Hadoop and Analytics at CERN IT CERN IT-DB

Machine Learning, Artificial Intelligence and the Future of Big Data Analytics. Amy O Connor Big Data Evangelist, Cloudera

OSIsoft Super Regional Transform Your World

What s new on Azure? Jan Willem Groenenberg

Application Performance Management for Microsoft Azure and HDInsight

MapR: Solution for Customer Production Success

Architecture Overview for Data Analytics Deployments

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

DataAdapt Active Insight

Oracle Autonomous Data Warehouse Cloud

AMD and Cloudera : Big Data Analytics for On-Premise, Cloud and Hybrid Deployments

Preparing for Multi-Cloud Management Success

WELCOME TO. Cloud Data Services: The Art of the Possible

20775 Performing Data Engineering on Microsoft HD Insight

Two offerings which interoperate really well

Evolving Your Infrastructure to Cloud

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Building a Single Source of Truth across the Enterprise An Integrated Solution

Insights-Driven Operations with SAP HANA and Cloudera Enterprise

20775: Performing Data Engineering on Microsoft HD Insight

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Microsoft reinvents sales processing and financial reporting with Azure

Digital transformation is the next industrial revolution

Oracle Autonomous Data Warehouse Cloud

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake


Cognitive Data Warehouse and Analytics

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Architecting Microsoft Azure Solutions

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

Architecting Microsoft Azure Solutions

Application & Data Modernization enabling your Digital Transformation. Dennis Lauwers European Technical Leader Hybrid Cloud

20775A: Performing Data Engineering on Microsoft HD Insight

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Hybrid Data Management

Enabling Self-Service Analytics Across The UDA With Teradata AppCenter

20775A: Performing Data Engineering on Microsoft HD Insight

Architecting an Open Data Lake for the Enterprise

Hortonworks Data Platform

Building Your Big Data Team

Optimal Infrastructure for Big Data

Commvault XaaS Solutions for Service Providers

VDI. Citrix Cloud Services Adrian Fish

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

Microsoft Azure Architect Design (AZ301)

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Welcome to. enterprise-class big data and financial a. Putting big data and advanced analytics to work in financial services.

PLATFORM CAPABILITIES OF THE DIGITAL BUSINESS PLATFORM

MapR Pentaho Business Solutions

Uncovering the Hidden Truth In Log Data with vcenter Insight

Oracle 全数据平台解决方案 : 打破技术壁垒, 释放数据能量. Sally Piao 甲骨文公司全球研发副总裁

Architecting Microsoft Azure Solutions

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Transcription:

Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1

Public cloud adoption is surging 2

Cloudera customers are leading the way 3

Hadoop was born for the cloud Speed Convenience Scale Self-Service TCO 4

But, cloud comes with its own set of challenges Performance Bill Shock Application Portability Security Data Governance Data Sovereignty Hybrid Cloud Lock-in 5

A stepwise approach Lift and shift the platform Optimize each application individually Reconstruct an Enterprise Data Hub 6

Lift and shift the platform 7

Openness is even more important in the cloud Open Environment Run the same platform in different clouds or on bare metal, so customers can move as needed without migration or retraining Open Ecosystem 450+ certified ISV s assures backward compatibility across releases, so customers can leverage their pre-existing investments Open Source Avoid vendor lock-in, and leverage components supported by the committers who drive the community roadmap 8

In on-prem environments, many applications typically share a single, multi-tenant cluster HDFS 9

The cloud creates more & smaller clusters, specialized for each application S3 Azure Data Lake Google Storage* 10

Where to store the data? Object Storage generally best choice Performance often good enough Generally cheaper per TB than DAS Scales independently from compute Not a drop-in replacement for HDFS Different data consistency models Different directory structure support Not all Object Stores created equal Different access control models Different maturity levels Not yet universally supported by CDH Mostly finished for S3 Just getting started for ADLS Not yet started for GCS 11

Object Storage support is rapidly reaching maturity Separation from HDFS S3A connector ADLS connector Filling the gaps Performance Consistency Renames Cloudera Functional Equivalence Security Governance Backup & Recovery Support as of C5.11 S3 ADLS Map Reduce Y Y Hive Y Y Hive on Spark Y - Spark Y Y HBase - Impala Y - Hue Y - Cross- Cluster Sharing Permissions Catalogue Lineage 12

How to provision and manage cloud infrastructure cost effectively? Provisioning requirements Spin clusters up & down quickly Grow & shrink clusters dynamically Select right instance types for each service Leverage demand based pricing whenever possible Management requirements Fully automated and parallelized installation and configuration Manage all aspects of cluster security automatically Retain diagnostic and log information after cluster is gone Support transient and long-lived clusters 13

Cloudera Director automates cluster lifecycle management Easy Single pane of glass for all cloud infrastructure Create templates to run applications in a preoptimized manner Flexible Multi-cloud: AWS, Azure, GCP Hourly pricing with auto billing & metering Spot instance/block support Enterprise-grade Integration across Cloudera Enterprise Management of CDH deployments at scale Deeply integrated with Cloudera Manager 14

Cloudera Manager automates cluster operations Object Store Easy administration Spot instance resiliency Automated security credential handling Transient cluster operations Optimized cluster provisioning Automatic collection of diagnostics and logs Long-lived cluster operations Downtime-less upgrade, patch, restart, and reconfiguration Monitoring, alerting, health checking, reporting, etc. 15

Optimize each application independently 16

Really, four discrete applications on one unified platform Data Engineering Data Science Analytic Database Operational Database Modern data processing (ETL) at scale Exploratory data science and machine learning for the enterprise Explore, analyze, and understand all your data Data-driven applications to deliver real-time insights Multi-Storage, Multi-Environment 17

Needs of each application can vary greatly Data Science & Engineering Access Patterns Batch Can be transient or persistent Performance Needs Relatively insensitve to latency and data locality Security Security often not required for many use cases Analytic Database Acess Patterns Batch or interactive Can be transient or persisent Performance Needs Relatively insensitve to latency and data locality Security Fine-grained security often required Operational Database Access Patterns Real-time Typically persistent Performance Needs Typically quite sensitive to latency and data locality Security Fine-grained security often required 18

Data Science & Engineering in the cloud Three architectural patterns to optimize price, convenience, performance Default Choice Transient Batch (most flexible) Spin up clusters as needed On-demand/spot instances Usage-based pricing Sized for workload Cluster per tenant/user Persistent Batch (most control) Persistent cluster(s) for frequent ETL Reserved instances Node-based pricing Grow/shrink Cluster per tenant group Persistent Batch on HDFS (fastest) Top performance for frequent ETL Reserved instances Node-based pricing Grow/shrink Shared across tenant groups Batch Cluster Batch Cluster Batch Cluster Batch Persistent Cluster Batch Batch Persistent Cluster Batch HDFS Batch Object Storage 19

Analytic DB in the cloud Refer to Data Science & Engineering guidelines Reduce Operating Costs Presents new set of choices New Insights, New Revenue ETL BI/Analytics Only pay for what you need, when you need it Explore and analyze all data, wherever it lives Transient clusters Object storage centric Cloud-native deployment Long-running clusters Object storage or local storage Lift-and-shift deployment 20

BI/Analytics in the cloud Three architectural patterns to optimize price, convenience, performance Default Choice Transient BI (infrequent usage) Spin up clusters when needed On-demand instances Usage-based pricing Grow/shrink Cluster per tenant or user Transient Cluster Transient Cluster Persistent BI (regular usage) Persistent clusters for BI any time Reserved instances Node-based pricing Grow/shrink Cluster per tenant group Persistent Cluster Persistent Cluster Persistent BI with Local Storage (fastest) Max speed for more regular workloads Reserved instances Node-based pricing Less frequent grow/shrink Shared cluster for shared local data Persistent Cluster HDFS and/or Kudu Object Storage 21

Operational DB in the cloud Not as well suited for cloud, but targeted benefits are possible Cost Goals Convenience Goals Low-cost backup and disaster recovery Development and testing environments easy to deploy and decommission Elastic growth for tightly provisioned workloads makes expansion easy, and enables a lower-cost steady state Fast and easy provisioning of additional clusters helps projects move quickly 22

Reconstruct an Enterprise Data Hub 23

Many problems are a combination of SQL & predictive, batch & online Traditional Architecture Data Sources Operational Data Stores Enterprise Data Warehouse Applications Archive Storage #1 BI System Portfolio Contracts Portfolio Risks Market, Counterparty, Ratings Ingest Storage #2 Modeling Ingest HPC GRID ELT Serve Payments Collections Charges Financial Ledger P&L Process Load Reporting Unstructured Ingest ETL Enterprise Data Warehouse 24

Reimagining the Enterprise Data Hub in the cloud Common Operations Developer Workbench Partner Ecosystem SQL Workbench Common Governance Common Security Common: Operations, Governance, Security, Schema, Catalog Object Store Object Store 25

Thank you Thank You Fred Koopmans 26