Apache Hadoop in the Datacenter and Cloud
The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational Database Data in Motion Data at Rest Powered by Open Source System centric Mainframe Client / Server Web and SaaS 2 Hortonworks Inc. 2011 2016. All Rights Reserved Modern Applications Connected Data Architecture User centric Transformational Use Cases Predictive Retail Factory Automation Connected Cars Predictive Analytics Artificial Intelligence
Hadoop in the Data Center Create and Manage Central Data Lakes Support all Types of Data Provide Flexible Processing and Access Methods Reduce Architecture Costs by 80% or More Drive Transformational New Use Cases 3 Hortonworks Inc. 2011 2016. All Rights Reserved
Hadoop in the Cloud Fast On Ramp for New Users Elastic Compute and Storage Capabilities Zero configuration access engine capabilities (HD Insight) Eliminate Hardware purchases Facilitate Certain Modern Data Applications through Cloud Connectivity 4 Hortonworks Inc. 2011 2016. All Rights Reserved
Transformational Applications Require Connected Data Edge Analytics Machine Learning CLOUD Edge Data Data in Motion Data at Rest Stream Analytics DATA CENTER Data in Motion Data at Rest Edge Data Deep Historical Analysis Hortonworks Inc. 2011 2016. All Rights Reserved
Our Focus: Enable Modern Applications on Connected Data Platforms Continuous Insights Enterprise Ready Any Delivery Model Open Innovation Deliver insights from ALL data, origin to rest Management Security Governance Data Center Cloud Hybrid Architecture Community Ecosystem Hortonworks Inc. 2011 2016. All Rights Reserved
A Look at Hadoop in the Data Center 7 Hortonworks Inc. 2011 2016. All Rights Reserved
Actionable Intelligence from Connected Data Platforms Modern Data Applications Capturing perishable insights from data in motion Ensuring rich, historical insights on data at rest Necessary for modern data applications DATA IN MOTION ACTIONABLE INTELLIGENCE DATA AT REST Hortonworks DataFlow Hortonworks Data Platform 8 Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks Data Platform for Data at Rest Powered by Open Enterprise Hadoop Open Central Interoperable Ready 9 Hortonworks Inc. 2011 2016. All Rights Reserved
Hortonworks Data Platform 2.5 Highlights Dynamic Security: Apache Atlas + Ranger Integration Enterprise Spark at Scale: Apache Zeppelin Notebook for Spark Real Time Applications: Storm and HBase/Phoenix Streamlined Operations: Apache Ambari Interactive Query in Seconds: Hive with LLAP (Technical Preview ) 10 Hortonworks Inc. 2011 2016. All Rights Reserved
Apache Atlas + Ranger More Powerful Together 11 Hortonworks Inc. 2011 2016. All Rights Reserved
Introducing Tag Based Security Apache Atlas and Ranger Integration Basic Tag policy Access and entitlements can be based on attributes. As an example: Personally Identifiable Information (PII) is a tag that can be leveraged to protect sensitive personal data. Geo based policy Access policy based on location. As an example: A user might be able to access data in North America, but may be restricted from access in EMEA due to privacy compliance. Time based policy Access policy based on time windows. An an example: A user might be able to access data only between 8AM 5PM (common in SOX regulations.) Prohibitions Restrictions on combining two data sets which might be in compliance originally, but not when combined together. As an example, SSNs and Names) Key Benefits: New scalable metadata based security paradigm Dynamic, real time policy Automatic updates to changes in metadata Centralized and simple to manage policy 12 Hortonworks Inc. 2011 2016. All Rights Reserved
Apache Atlas Powers Cross Component Data Lineage As a part of HDP 2.5, users can track lineage across the following components using Atlas: Apache Sqoop Import from and export to relational databases, and additional package that leverages Sqoop Hive Dataset lineage with entity versioning (including schema changes) Apache Kafka/ Storm IoTevent level processing, such as syslogs or sensor data Falcon Data lifecycle at Feed and Process entity level for replication, and repeating workflows. Tracks period icy, throttling, eviction. ATLAS 69 FALCON 1570 Key Benefits: Enterprises need open solutions, not single app vendor More native connectors than any other vendor Hardened metadata infrastructure 13 Hortonworks Inc. 2011 2016. All Rights Reserved
Expanded Native Connector: Dataset Lineage Teradata Connector Apache Kafka RDBMS Sqoop Custom Activity Reporter Metadata Repository 14 Hortonworks Inc. 2011 2016. All Rights Reserved
Apache Atlas Enables Business Catalog for Ease of Use Organize data assets along business terms Authoritative: Hierarchical business Taxonomy Creation Agile modeling: Model Conceptual, Logical, Physical assets Definition and assignment of tags like PII (Personally Identifiable Information) Comprehensive features for compliance Multiple user profiles including Data Steward and Business Analysts Object auditing to track Who did it Metadata Versioning to track what did they do Faster Insight: Data Quality tab for profiling and sampling User Comments Key Benefits: Easy way to create business Taxonomy Useful for multiple user types including Data Steward and Business Analysts Comprehensive features for compliance 15 Hortonworks Inc. 2011 2016. All Rights Reserved
Business Catalog Model and explore metadata via the new Business Catalog in Apache Atlas Data Steward 16 Hortonworks Inc. 2011 2016. All Rights Reserved
Streamlining Operations, Three Phase Plan Focused Strategic Investments into our core products to give customers more unique tooling to quickly understand the cluster s health, how business users are using it, and where to focus efforts when issues arise. Capabilities Phase 1: Advanced Performance & Health Metrics Dashboards with Ambari 2.2.2 Phase 2: Consolidated Cluster Activity Reporting NEW! with SmartSense 1.3.0 Phase 3: Centralized & Contextual Log Search Tech Preview with Ambari 2.4.0 Core Technologies Apache Ambari Ambari Metrics System Apache Solr Hortonworks SmartSense Grafana Ambari Metrics System Grafana Solr AMBARI Log Search Dedicated UIs SmartSense 17 Hortonworks Inc. 2011 2016. All Rights Reserved
Streamlined Operations Phase 1: Advanced Metrics Visualization & Dashboarding Grafana Goal: Quickly understand cluster health metrics and key performance indicators Ambari Metrics System AMBARI Capabilities Centralized Dashboarding focusing on component Health & Performance Ad Hoc Graph Creation Pre Built Dashboards HDFS YARN HBase Core Technologies Ambari Metrics System Grafana 18 Hortonworks Inc. 2011 2016. All Rights Reserved
19 Hortonworks Inc. 2011 2016. All Rights Reserved Ambari now includes pre built dashboards for visualizing cluster health
Streamlined Operations Phase 2: Consolidated Cluster Activity Reporting AMBARI Ambari Ambari Metrics Metrics System System SmartSense Apache Zeppelin Goal: Quickly visualize and report on how business users and tenants are using the cluster, top 10 queues, users, most time consuming jobs Capabilities Top K Activity Reporting Chargeback Services Covered YARN MapReduce Hive/Tez Spark HDFS 20 Hortonworks Inc. 2011 2016. All Rights Reserved Core Technologies Hortonworks SmartSense Apache Zeppelin
Activity Explorer: Cluster Utilization Reporting 21 Hortonworks Inc. 2011 2016. All Rights Reserved
Preview: Streamlined Operations Investments Phase 3: Centralized & Contextual Log Search AMBARI Goal: When issues arise, be able to quickly find issues across all HDP components Solr Log Search Capabilities Rapid Search of all HDP component logs Search across time ranges, log levels, and for keywords Core Technologies: Apache Ambari Apache Solr Apache Ambari Log Search 22 Hortonworks Inc. 2011 2016. All Rights Reserved
23 Hortonworks Inc. 2011 2016. All Rights Reserved Tune the log collection system with Guided Smart Configurations
24 Hortonworks Inc. 2011 2016. All Rights Reserved View a comprehensive inventory of operational logs for each host
Hive 2 with LLAP Enable Interactive Query In Seconds Developer Productivity: Interactive query in seconds Ease of Use and Adoption : 100% compatible with Hive SQL Enterprise Readiness: Linear scaling at Terabytes volume of data Streamlined Operations: LLAP integration with Ambari with automated dashboards 25 Hortonworks Inc. 2011 2016. All Rights Reserved
Hive 2 with LLAP: Preliminary Numbers 80 Hive2.0 and LLAP: TPC DS at 10 TB Scale, 18 Nodes 70 60 Min query time: Query 55: 2.38s 50 40 Hive2.0 Tez LLAP 30 20 10 0 q3 q7 q12 q13 q19 q21 q26 q27 q42 q43 q45 q52 q55 q60 q73 q84 q89 q91 q98 26 Hortonworks Inc. 2011 2016. All Rights Reserved
A Look at Hadoop in the Cloud 27 Hortonworks Inc. 2011 2016. All Rights Reserved
Traditional Hadoop Clusters 28 Hortonworks Inc. 2011 2016. All Rights Reserved 28
Why Cloud? IT & Business Agility No Upfront HW Costs Ephemeral & Long Running Unlimited Elastic Scale Hortonworks Inc. 2011 2016. All Rights Reserved
How Do We Approach The Cloud Market? HYBRID SEGMENT Today s enterprise customers CLOUD ONRAMP New users via digital engagement or existing customers exploring cloud options Seamless Connected Data Architecture across Cloud and Data Center. Always on enterprise use cases are common. Elasticity, Automation, Pay as you Go, One Click Start. Ephemeral use cases are common starting point. AzureHDInsight, HDP, and HDF are our Premier offerings. Customer journey to future state architecture, cloud operation & consumption model. AzureHDInsightis our Premier offering. Focused offerings for AWS that enable us to engage and position our Premier offerings. Cloud first approach to product design, development, testing & delivery 30 Hortonworks Inc. 2011 2016. All Rights Reserved
Outlook: Cloud and the Big Data Market Public cloud adoption (AWS, Azure, Google) will continue to accelerate Many customers will go Cloud First to simplify/speed adoption Customers deploying in public cloud expect a pay as you go (PAYG) pricing model Hourly pricing is default; reserved optimizes annual spend; spot optimizes hourly spend Interested in running workloads in the cloud and in addition to on premiseclusters. Familiar with Native Cloud tooling. Heightens importance of product packaging and user experience tuned to Cloud 31 Hortonworks Inc. 2011 2016. All Rights Reserved
Cloud IaaS and Hadoop as a Service Running Hadoop on Cloud IaaS Using Hadoop as a Cloud Service Public Cloud Service Providers 32 Hortonworks Inc. 2011 2016. All Rights Reserved
Microsoft Azure HDInsights Powered by Hortonworks Data Platform Seamless Access to the Public Cloud for Spark, Hive, and HBaseand other mission critical workloads Unmatched Economics combining HDInsight selasticity in the cloud with HDP s cost efficiencies at scale Enterprise Readiness with robust security, governance and operations in the cloud, powered by Hortonworks Data Platform 33 Hortonworks Inc. 2011 2016. All Rights Reserved
Connected Data Architecture with Azure HDInsight CLOUD Azure HDInsight Cloud Data Processing HDInsightCluster Types Ideal Use Cases Data Prep, Query, and Analysis (Hadoop, Hive, Pig) Iterative In Memory Analysis (Spark) HDF Data Flow Management Advanced Statistics, Modeling, Machine Learning (R Server on Spark) NoSQLData Storage (HBase) DATA CENTER HDP Enterprise Data Lake Real time Event Processing (Storm) Hortonworks Inc. 2011 2016. All Rights Reserved
Runs in more datacenters than anyone else Central US Iowa North Central US Illinois West Europe Netherlands China North * Beijing West US California South Central US Texas East US Virginia East US 2 Virginia North Europe Ireland India Central Pune China South * Shanghai Japan East Tokyo, Saitama Japan West Osaka East Asia Hong Kong SE Asia Singapore Australia East New South Wales Brazil South Sao Paulo State Australia South East Victoria Azure doubling compute and storage every 6 months 35 Hortonworks Inc. 2011 2016. All Rights Reserved
Microsoft Azure HDInsight and Apache Projects in the Cloud YARN DATA OPERATING SYSTEM Batch STORAGE GOVERNANCE OPERATIONS SECURITY STORAGE Machine Learning Standard Hadoop Projects for Hive, YARN, HDFS, MapReduce, Pig, Tez, Sqoop, oozie, Zookeeper, Mahout, Phoenix CompehensiveList of Emerging Projects Spark, Storm Hbase, and R Interactive Streaming Ability to Add Projects Add various projects to the the cloud Search 36 Hortonworks Inc. 2011 2016. All Rights Reserved
Forrester Wave : Big Data HadoopCloud Solutions, Q2 2016 Elasticity, Automation, And Pay As You Go Compel Enterprise Adoption Of Hadoop In The Cloud 37 Hortonworks Inc. 2011 2016. All Rights Reserved
Connected Data Architecture with HDC for AWS CLOUD HDF Data Flow Management HDC for AWS Cloud Data Processing Ideal Use Cases Data Science and Exploration (Spark, Zeppelin) ETL and Data Preparation (Hive, Spark) DATA CENTER Hortonworks Inc. 2011 2016. All Rights Reserved HDP Enterprise Data Lake TECH PREVIEW Analytics and Reporting (Hive2 w/llap, Zeppelin)
Hortonworks Data Cloud for AWS Cluster Types 39 Hortonworks Inc. 2011 2016. All Rights Reserved TECH PREVIEW
Prescriptive On Demand Ephemeral Workloads ** Planned list of available Cluster Types 40 Hortonworks Inc. 2011 2016. All Rights Reserved TECH PREVIEW
Why Hortonworks Cloud Solutions? Choice of Cloud Rich Set of Capabilities and Security Zero configuration access engine capabilities (HD Insight) S3 Integrations on AWS (Tech Preview) Award Winning Hadoop Expertise 41 Hortonworks Inc. 2011 2016. All Rights Reserved
Connected Data Platforms Integrate Cloud and Data Center Deployments Edge Analytics Machine Learning CLOUD Edge Data Data in Motion Data at Rest Stream Analytics DATA CENTER Data in Motion Data at Rest Edge Data Deep Historical Analysis Hortonworks Inc. 2011 2016. All Rights Reserved
Thank You 43 Hortonworks Inc. 2011 2016. All Rights Reserved