Apache Hadoop in the Datacenter and Cloud

Similar documents
Insights to HDInsight

Hortonworks Connected Data Platforms

Hortonworks Data Platform

20775A: Performing Data Engineering on Microsoft HD Insight

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

20775: Performing Data Engineering on Microsoft HD Insight

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

20775A: Performing Data Engineering on Microsoft HD Insight

Digital transformation is the next industrial revolution

20775 Performing Data Engineering on Microsoft HD Insight

SOLUTION SHEET End to End Data Flow Management and Streaming Analytics Platform

Microsoft Azure Essentials

Big data is hard. Top 3 Challenges To Adopting Big Data

Digitalisieren Sie Ihr Unternehmen mit dem Internet der Dinge Michael Epprecht Microsoft GBB IoT

Sr. Sergio Rodríguez de Guzmán CTO PUE

Business is being transformed by three trends

1 Hortonworks Inc All Rights Reserved


Analytics Platform System

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

Hortonworks Data Platform for Enterprise Data Lakes delivers robust, big data analytics that accelerate decision making and innovation

Big Data & Advanced Analytics - "managed Services on Azure

MapR: Solution for Customer Production Success

Microsoft Big Data. Solution Brief

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Oracle Autonomous Data Warehouse Cloud

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Cloud Based Analytics for SAP

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Depending on who you ask, IoT is either:

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

Confidential

Two offerings which interoperate really well

Modernizing Your Data Warehouse with Azure

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Architecting an Open Data Lake for the Enterprise

Cask Data Application Platform (CDAP)

Application Performance Management for Microsoft Azure and HDInsight

Big Data Introduction

Make Business Intelligence Work on Big Data

Optimal Infrastructure for Big Data

Pentaho 8.0 Overview. Pedro Alves

BIG DATA AND HADOOP DEVELOPER

5th Annual. Cloudera, Inc. All rights reserved.

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

Spark and Hadoop Perfect Together

Building a Data Lake on AWS EBOOK: BUILDING A DATA LAKE ON AWS 1

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Cask Data Application Platform (CDAP) Extensions

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Analytics in Action transforming the way we use and consume information

How In-Memory Computing can Maximize the Performance of Modern Payments

Databricks Cloud. A Primer

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Azure: Microsoft Cloud. Microsoft Cloud End-to-end solutions

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Industrial IoT Solution Architecture Design From Connectivity to Data

WELCOME TO. Cloud Data Services: The Art of the Possible

Hadoop Course Content

Investor Presentation. Fourth Quarter 2015

Big Data Cloud. Simple, Secure, Integrated and Performant Big Data Platform for the Cloud

Common Customer Use Cases in FSI

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

Modern Data Architecture with Apache Hadoop

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

C3 Products + Services Overview

Oracle Autonomous Data Warehouse Cloud

Angat Pinoy. Angat Negosyo. Angat Pilipinas.

Embracing the Hybrid Cloud using Power BI in CSP. Name Role Group

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Managing explosion of data. Cloudera, Inc. All rights reserved.

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

Investor Presentation. Second Quarter 2016

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Taking Advantage of Cloud Elasticity and Flexibility

Oracle Autonomous Data Warehouse Cloud

ADVANCED ANALYTICS & IOT ARCHITECTURES

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight

New Big Data Solutions and Opportunities for DB Workloads

Stateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018

HDInsight - Hadoop for the Commoner Matt Stenzel Data Platform Technical Specialist

IBM Analytics Unleash the power of data with Apache Spark

Oracle Big Data Cloud Service

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Analytics for All Data

EBOOK: Cloudwick Powering the Digital Enterprise

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

MIGRATING AND MANAGING MICROSOFT WORKLOADS ON AWS WITH DATAPIPE DATAPIPE.COM

Building a Data Lake on AWS

Transcription:

Apache Hadoop in the Datacenter and Cloud

The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational Database Data in Motion Data at Rest Powered by Open Source System centric Mainframe Client / Server Web and SaaS 2 Hortonworks Inc. 2011 2016. All Rights Reserved Modern Applications Connected Data Architecture User centric Transformational Use Cases Predictive Retail Factory Automation Connected Cars Predictive Analytics Artificial Intelligence

Hadoop in the Data Center Create and Manage Central Data Lakes Support all Types of Data Provide Flexible Processing and Access Methods Reduce Architecture Costs by 80% or More Drive Transformational New Use Cases 3 Hortonworks Inc. 2011 2016. All Rights Reserved

Hadoop in the Cloud Fast On Ramp for New Users Elastic Compute and Storage Capabilities Zero configuration access engine capabilities (HD Insight) Eliminate Hardware purchases Facilitate Certain Modern Data Applications through Cloud Connectivity 4 Hortonworks Inc. 2011 2016. All Rights Reserved

Transformational Applications Require Connected Data Edge Analytics Machine Learning CLOUD Edge Data Data in Motion Data at Rest Stream Analytics DATA CENTER Data in Motion Data at Rest Edge Data Deep Historical Analysis Hortonworks Inc. 2011 2016. All Rights Reserved

Our Focus: Enable Modern Applications on Connected Data Platforms Continuous Insights Enterprise Ready Any Delivery Model Open Innovation Deliver insights from ALL data, origin to rest Management Security Governance Data Center Cloud Hybrid Architecture Community Ecosystem Hortonworks Inc. 2011 2016. All Rights Reserved

A Look at Hadoop in the Data Center 7 Hortonworks Inc. 2011 2016. All Rights Reserved

Actionable Intelligence from Connected Data Platforms Modern Data Applications Capturing perishable insights from data in motion Ensuring rich, historical insights on data at rest Necessary for modern data applications DATA IN MOTION ACTIONABLE INTELLIGENCE DATA AT REST Hortonworks DataFlow Hortonworks Data Platform 8 Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks Data Platform for Data at Rest Powered by Open Enterprise Hadoop Open Central Interoperable Ready 9 Hortonworks Inc. 2011 2016. All Rights Reserved

Hortonworks Data Platform 2.5 Highlights Dynamic Security: Apache Atlas + Ranger Integration Enterprise Spark at Scale: Apache Zeppelin Notebook for Spark Real Time Applications: Storm and HBase/Phoenix Streamlined Operations: Apache Ambari Interactive Query in Seconds: Hive with LLAP (Technical Preview ) 10 Hortonworks Inc. 2011 2016. All Rights Reserved

Apache Atlas + Ranger More Powerful Together 11 Hortonworks Inc. 2011 2016. All Rights Reserved

Introducing Tag Based Security Apache Atlas and Ranger Integration Basic Tag policy Access and entitlements can be based on attributes. As an example: Personally Identifiable Information (PII) is a tag that can be leveraged to protect sensitive personal data. Geo based policy Access policy based on location. As an example: A user might be able to access data in North America, but may be restricted from access in EMEA due to privacy compliance. Time based policy Access policy based on time windows. An an example: A user might be able to access data only between 8AM 5PM (common in SOX regulations.) Prohibitions Restrictions on combining two data sets which might be in compliance originally, but not when combined together. As an example, SSNs and Names) Key Benefits: New scalable metadata based security paradigm Dynamic, real time policy Automatic updates to changes in metadata Centralized and simple to manage policy 12 Hortonworks Inc. 2011 2016. All Rights Reserved

Apache Atlas Powers Cross Component Data Lineage As a part of HDP 2.5, users can track lineage across the following components using Atlas: Apache Sqoop Import from and export to relational databases, and additional package that leverages Sqoop Hive Dataset lineage with entity versioning (including schema changes) Apache Kafka/ Storm IoTevent level processing, such as syslogs or sensor data Falcon Data lifecycle at Feed and Process entity level for replication, and repeating workflows. Tracks period icy, throttling, eviction. ATLAS 69 FALCON 1570 Key Benefits: Enterprises need open solutions, not single app vendor More native connectors than any other vendor Hardened metadata infrastructure 13 Hortonworks Inc. 2011 2016. All Rights Reserved

Expanded Native Connector: Dataset Lineage Teradata Connector Apache Kafka RDBMS Sqoop Custom Activity Reporter Metadata Repository 14 Hortonworks Inc. 2011 2016. All Rights Reserved

Apache Atlas Enables Business Catalog for Ease of Use Organize data assets along business terms Authoritative: Hierarchical business Taxonomy Creation Agile modeling: Model Conceptual, Logical, Physical assets Definition and assignment of tags like PII (Personally Identifiable Information) Comprehensive features for compliance Multiple user profiles including Data Steward and Business Analysts Object auditing to track Who did it Metadata Versioning to track what did they do Faster Insight: Data Quality tab for profiling and sampling User Comments Key Benefits: Easy way to create business Taxonomy Useful for multiple user types including Data Steward and Business Analysts Comprehensive features for compliance 15 Hortonworks Inc. 2011 2016. All Rights Reserved

Business Catalog Model and explore metadata via the new Business Catalog in Apache Atlas Data Steward 16 Hortonworks Inc. 2011 2016. All Rights Reserved

Streamlining Operations, Three Phase Plan Focused Strategic Investments into our core products to give customers more unique tooling to quickly understand the cluster s health, how business users are using it, and where to focus efforts when issues arise. Capabilities Phase 1: Advanced Performance & Health Metrics Dashboards with Ambari 2.2.2 Phase 2: Consolidated Cluster Activity Reporting NEW! with SmartSense 1.3.0 Phase 3: Centralized & Contextual Log Search Tech Preview with Ambari 2.4.0 Core Technologies Apache Ambari Ambari Metrics System Apache Solr Hortonworks SmartSense Grafana Ambari Metrics System Grafana Solr AMBARI Log Search Dedicated UIs SmartSense 17 Hortonworks Inc. 2011 2016. All Rights Reserved

Streamlined Operations Phase 1: Advanced Metrics Visualization & Dashboarding Grafana Goal: Quickly understand cluster health metrics and key performance indicators Ambari Metrics System AMBARI Capabilities Centralized Dashboarding focusing on component Health & Performance Ad Hoc Graph Creation Pre Built Dashboards HDFS YARN HBase Core Technologies Ambari Metrics System Grafana 18 Hortonworks Inc. 2011 2016. All Rights Reserved

19 Hortonworks Inc. 2011 2016. All Rights Reserved Ambari now includes pre built dashboards for visualizing cluster health

Streamlined Operations Phase 2: Consolidated Cluster Activity Reporting AMBARI Ambari Ambari Metrics Metrics System System SmartSense Apache Zeppelin Goal: Quickly visualize and report on how business users and tenants are using the cluster, top 10 queues, users, most time consuming jobs Capabilities Top K Activity Reporting Chargeback Services Covered YARN MapReduce Hive/Tez Spark HDFS 20 Hortonworks Inc. 2011 2016. All Rights Reserved Core Technologies Hortonworks SmartSense Apache Zeppelin

Activity Explorer: Cluster Utilization Reporting 21 Hortonworks Inc. 2011 2016. All Rights Reserved

Preview: Streamlined Operations Investments Phase 3: Centralized & Contextual Log Search AMBARI Goal: When issues arise, be able to quickly find issues across all HDP components Solr Log Search Capabilities Rapid Search of all HDP component logs Search across time ranges, log levels, and for keywords Core Technologies: Apache Ambari Apache Solr Apache Ambari Log Search 22 Hortonworks Inc. 2011 2016. All Rights Reserved

23 Hortonworks Inc. 2011 2016. All Rights Reserved Tune the log collection system with Guided Smart Configurations

24 Hortonworks Inc. 2011 2016. All Rights Reserved View a comprehensive inventory of operational logs for each host

Hive 2 with LLAP Enable Interactive Query In Seconds Developer Productivity: Interactive query in seconds Ease of Use and Adoption : 100% compatible with Hive SQL Enterprise Readiness: Linear scaling at Terabytes volume of data Streamlined Operations: LLAP integration with Ambari with automated dashboards 25 Hortonworks Inc. 2011 2016. All Rights Reserved

Hive 2 with LLAP: Preliminary Numbers 80 Hive2.0 and LLAP: TPC DS at 10 TB Scale, 18 Nodes 70 60 Min query time: Query 55: 2.38s 50 40 Hive2.0 Tez LLAP 30 20 10 0 q3 q7 q12 q13 q19 q21 q26 q27 q42 q43 q45 q52 q55 q60 q73 q84 q89 q91 q98 26 Hortonworks Inc. 2011 2016. All Rights Reserved

A Look at Hadoop in the Cloud 27 Hortonworks Inc. 2011 2016. All Rights Reserved

Traditional Hadoop Clusters 28 Hortonworks Inc. 2011 2016. All Rights Reserved 28

Why Cloud? IT & Business Agility No Upfront HW Costs Ephemeral & Long Running Unlimited Elastic Scale Hortonworks Inc. 2011 2016. All Rights Reserved

How Do We Approach The Cloud Market? HYBRID SEGMENT Today s enterprise customers CLOUD ONRAMP New users via digital engagement or existing customers exploring cloud options Seamless Connected Data Architecture across Cloud and Data Center. Always on enterprise use cases are common. Elasticity, Automation, Pay as you Go, One Click Start. Ephemeral use cases are common starting point. AzureHDInsight, HDP, and HDF are our Premier offerings. Customer journey to future state architecture, cloud operation & consumption model. AzureHDInsightis our Premier offering. Focused offerings for AWS that enable us to engage and position our Premier offerings. Cloud first approach to product design, development, testing & delivery 30 Hortonworks Inc. 2011 2016. All Rights Reserved

Outlook: Cloud and the Big Data Market Public cloud adoption (AWS, Azure, Google) will continue to accelerate Many customers will go Cloud First to simplify/speed adoption Customers deploying in public cloud expect a pay as you go (PAYG) pricing model Hourly pricing is default; reserved optimizes annual spend; spot optimizes hourly spend Interested in running workloads in the cloud and in addition to on premiseclusters. Familiar with Native Cloud tooling. Heightens importance of product packaging and user experience tuned to Cloud 31 Hortonworks Inc. 2011 2016. All Rights Reserved

Cloud IaaS and Hadoop as a Service Running Hadoop on Cloud IaaS Using Hadoop as a Cloud Service Public Cloud Service Providers 32 Hortonworks Inc. 2011 2016. All Rights Reserved

Microsoft Azure HDInsights Powered by Hortonworks Data Platform Seamless Access to the Public Cloud for Spark, Hive, and HBaseand other mission critical workloads Unmatched Economics combining HDInsight selasticity in the cloud with HDP s cost efficiencies at scale Enterprise Readiness with robust security, governance and operations in the cloud, powered by Hortonworks Data Platform 33 Hortonworks Inc. 2011 2016. All Rights Reserved

Connected Data Architecture with Azure HDInsight CLOUD Azure HDInsight Cloud Data Processing HDInsightCluster Types Ideal Use Cases Data Prep, Query, and Analysis (Hadoop, Hive, Pig) Iterative In Memory Analysis (Spark) HDF Data Flow Management Advanced Statistics, Modeling, Machine Learning (R Server on Spark) NoSQLData Storage (HBase) DATA CENTER HDP Enterprise Data Lake Real time Event Processing (Storm) Hortonworks Inc. 2011 2016. All Rights Reserved

Runs in more datacenters than anyone else Central US Iowa North Central US Illinois West Europe Netherlands China North * Beijing West US California South Central US Texas East US Virginia East US 2 Virginia North Europe Ireland India Central Pune China South * Shanghai Japan East Tokyo, Saitama Japan West Osaka East Asia Hong Kong SE Asia Singapore Australia East New South Wales Brazil South Sao Paulo State Australia South East Victoria Azure doubling compute and storage every 6 months 35 Hortonworks Inc. 2011 2016. All Rights Reserved

Microsoft Azure HDInsight and Apache Projects in the Cloud YARN DATA OPERATING SYSTEM Batch STORAGE GOVERNANCE OPERATIONS SECURITY STORAGE Machine Learning Standard Hadoop Projects for Hive, YARN, HDFS, MapReduce, Pig, Tez, Sqoop, oozie, Zookeeper, Mahout, Phoenix CompehensiveList of Emerging Projects Spark, Storm Hbase, and R Interactive Streaming Ability to Add Projects Add various projects to the the cloud Search 36 Hortonworks Inc. 2011 2016. All Rights Reserved

Forrester Wave : Big Data HadoopCloud Solutions, Q2 2016 Elasticity, Automation, And Pay As You Go Compel Enterprise Adoption Of Hadoop In The Cloud 37 Hortonworks Inc. 2011 2016. All Rights Reserved

Connected Data Architecture with HDC for AWS CLOUD HDF Data Flow Management HDC for AWS Cloud Data Processing Ideal Use Cases Data Science and Exploration (Spark, Zeppelin) ETL and Data Preparation (Hive, Spark) DATA CENTER Hortonworks Inc. 2011 2016. All Rights Reserved HDP Enterprise Data Lake TECH PREVIEW Analytics and Reporting (Hive2 w/llap, Zeppelin)

Hortonworks Data Cloud for AWS Cluster Types 39 Hortonworks Inc. 2011 2016. All Rights Reserved TECH PREVIEW

Prescriptive On Demand Ephemeral Workloads ** Planned list of available Cluster Types 40 Hortonworks Inc. 2011 2016. All Rights Reserved TECH PREVIEW

Why Hortonworks Cloud Solutions? Choice of Cloud Rich Set of Capabilities and Security Zero configuration access engine capabilities (HD Insight) S3 Integrations on AWS (Tech Preview) Award Winning Hadoop Expertise 41 Hortonworks Inc. 2011 2016. All Rights Reserved

Connected Data Platforms Integrate Cloud and Data Center Deployments Edge Analytics Machine Learning CLOUD Edge Data Data in Motion Data at Rest Stream Analytics DATA CENTER Data in Motion Data at Rest Edge Data Deep Historical Analysis Hortonworks Inc. 2011 2016. All Rights Reserved

Thank You 43 Hortonworks Inc. 2011 2016. All Rights Reserved