Operational Hadoop and the Lambda Architecture for Streaming Data

Similar documents
Common Customer Use Cases in FSI

5th Annual. Cloudera, Inc. All rights reserved.

MapR: Solution for Customer Production Success


E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Bringing the Power of SAS to Hadoop Title

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

20775: Performing Data Engineering on Microsoft HD Insight

Intro to Big Data and Hadoop

20775 Performing Data Engineering on Microsoft HD Insight

Big data is hard. Top 3 Challenges To Adopting Big Data

Big Data Introduction

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight

GPU ACCELERATED BIG DATA ARCHITECTURE

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Introduction to Stream Processing

Ray M Sugiarto MAPR Champion Indonesia

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Real World Use Cases: Hadoop & NoSQL in Production. Big Data Everywhere London 4 June 2015

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Business is being transformed by three trends

Make Business Intelligence Work on Big Data

Spotlight Sessions. Nik Rouda. Director of Product Marketing Cloudera, Inc. All rights reserved. 1

Microsoft Azure Essentials

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Cloud Based Analytics for SAP

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Cloudera Hadoop & Industrie 4.0 wohin mit dem Datenstrom?

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved

Hadoop and Analytics at CERN IT CERN IT-DB

ADVANCED ANALYTICS & IOT ARCHITECTURES

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

Realising Value from Data

Transforming Analytics with Cloudera Data Science WorkBench

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

Cask Data Application Platform (CDAP)

Real-Time Streaming: IMS to Apache Kafka and Hadoop

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Confidential

MapR Pentaho Business Solutions

The Intersection of Big Data and DB2

Advancing Information Management and Analysis with Entity Resolution. Whitepaper ADVANCING INFORMATION MANAGEMENT AND ANALYSIS WITH ENTITY RESOLUTION

Cognizant BigFrame Fast, Secure Legacy Migration

BIG DATA AND HADOOP DEVELOPER

Insights-Driven Operations with SAP HANA and Cloudera Enterprise

Jason Virtue Business Intelligence Technical Professional

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Real Applications of Big Data. Challenges and Opportunities

Blueprints for Big Data Success. Succeeding with four common scenarios

Adobe and Hadoop Integration

Spark, Hadoop, and Friends

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Big Data The Big Story

Optimal Infrastructure for Big Data

Modernizing Your Data Warehouse with Azure

Modernizing Data Integration

Introducing Amazon Kinesis Managed Service for Real-time Big Data Processing

FLINK IN ZALANDO S WORLD OF MICROSERVICES JAVIER LOPEZ MIHAIL VIERU

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Real-time IoT Big Data-in-Motion Analytics Case Study: Managing Millions of Devices at Country-Scale

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

StackIQ Enterprise Data Reference Architecture

Blueprints for Big Data Success

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

INDUSTRY BRIEF THE ENTERPRISE DATA HUB IN FINANCIAL SERVICES: THREE CUSTOMER CASE STUDIES

Data. Does it Matter?

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

In-Memory Analytics: Get Faster, Better Insights from Big Data

Cask Data Application Platform (CDAP) Extensions

Analytics in Action transforming the way we use and consume information

Hortonworks Powering the Future of Data

How In-Memory Computing can Maximize the Performance of Modern Payments

APAC Big Data & Cloud Summit 2013

HP SummerSchool TechTalks Kenneth Donau Presale Technical Consulting, HP SW

Making EDW More Flexible with Hadoop

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

Analytic Workloads on Oracle and ParAccel

Adobe and Hadoop Integration

EDW MODERNIZATION & CONSUMPTION

Hadoop Course Content

Table of Contents. Are You Ready for Digital Transformation? page 04. Take Advantage of This Big Data Opportunity with Cisco and Hortonworks page 06

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

SOLUTION SHEET Hortonworks DataFlow (HDF ) End-to-end data flow management and streaming analytics platform

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Building data-driven applications with SAP Data Hub and Amazon Web Services

Analytics Platform System

Managing explosion of data. Cloudera, Inc. All rights reserved.

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

Modern Analytics Architecture

Apache Hadoop in the Datacenter and Cloud

Transcription:

Operational Hadoop and the Lambda Architecture for Streaming Data 2015 MapR Technologies 2015 MapR Technologies 1

Topics From Batch to Operational Workloads on Hadoop Streaming Data Environments The Lambda Architecture the What and Why Example Deployments 2015 MapR Technologies 2

Review: Apache Hadoop Does This Big Compute Task 2015 MapR Technologies 3

Scaling on Hadoop Low Data volume, velocity Scale out with commodity hardware High Low Data variety Use the right tool for unstructured, multi-structured, semi-structured, non-relational data High 2015 MapR Technologies 4

Scaling on Traditional Technologies Low Data volume, velocity Scale up to bigger, faster machines High Low Data variety Extensive data modeling and ETL High 2015 MapR Technologies 5

Batch versus Operational Batch is a built-in delay. Operational is an overloaded term. You might have to wait here for the next window Real-Time? Business operations? Interactive? Industrial operations? In this context: more this than this 2015 MapR Technologies 6

Examples of Operational Workloads Immediate reads/writes/updates to live data Immediate access to data Real-time analysis (operational reporting) Let s separate batch from operational in common Hadoop use cases 2015 MapR Technologies 7

Common Uses for Hadoop Enterprise Data Hub (or Data Lake ) Anomaly Detection Internet of Things Customer 360 View Fraud Detection Recommendation Engine 2015 MapR Technologies 8

Enterprise Data Hub Architecture Enrich data in Hadoop Analyze BI REPORTS AND APPLICATIONS Load more data sources PARSE, PROFILE, ETL CLEANSE, MATCH RELATIONAL, SAAS, MAINFRAME LOAD Graphical User Interfaces DATA MARTS DATA WAREHOUSE DOCUMENTS, EMAILS BLOGS, TWEETS, LINK DATA REPLICATE, CDC Batch Processing MR, YARN, Hive, Pig, etc. HBase, other data stores Interactive Querying Drill, Impala, Presto, etc. LOAD LOG FILES, CLICKSTREAMS STREAMING Distributed File System Offload / Enrich / Reload 2015 MapR Technologies 9

Example Data Flow within Hadoop Hadoop Canonical, immutable data Transformed data Snapshot data Analyst data 2015 MapR Technologies 10

Lambda Architecture ALL DATA BATCH RECOMPUTE PRECOMPUTE VIEWS BATCH LAYER SERVING LAYER NEW DATA STREAM BATCH VIEWS REAL-TIME VIEWS MERGE MERGED VIEW REAL-TIME DATA PROCESS STREAM REAL-TIME INCREMENT INCREMENT VIEWS SPEED LAYER 2015 MapR Technologies 11

Why the Lambda Architecture? Explained in How to Beat the CAP Theorem blog by Nathan Marz What is the CAP Theorem? You have to compromise with distributed systems Consistency data in is same as data out Availability the system can always respond with an answer Partition Tolerance the system still runs if nodes lose a connection You get 2 out of 3 (actually, either C or A) The Lambda Architecture can be about more than beating CAP 2015 MapR Technologies 12

Key Principles of the Lambda Architecture Store an immutable, constantly growing data set Only add new data points Recompute queries from raw data to avoid complexity Uses incremental algorithms to reduce query latency 2015 MapR Technologies 13

Lambda Architecture Batch Layer ALL DATA BATCH RECOMPUTE PRECOMPUTE VIEWS BATCH LAYER NEW DATA STREAM BATCH VIEWS 2015 MapR Technologies 14

Lambda Architecture Speed Layer NEW DATA STREAM REAL-TIME VIEWS REAL-TIME DATA PROCESS STREAM REAL-TIME INCREMENT INCREMENT VIEWS SPEED LAYER 2015 MapR Technologies 15

Lambda Architecture Serving Layer BATCH VIEWS REAL-TIME VIEWS REAL-TIME DATA MERGE SERVING LAYER MERGED VIEW 2015 MapR Technologies 16

Lambda Architecture ALL DATA (HDFS) BATCH RECOMPUTE PRECOMPUTE VIEWS (Hive) HADOOP BATCH LAYER SERVING LAYER NEW DATA STREAM KAFKA BATCH VIEWS REAL-TIME VIEWS REAL-TIME DATA MERGE MERGED VIEW (HBASE) PROCESS STREAM REAL-TIME INCREMENT INCREMENT VIEWS STORM SPEED LAYER 2015 MapR Technologies 17

Example Use Case Predictive Maintenance Data and use case based on one of our customers Time series sensor data from oil well pumps Transactions and predictive analytics to predict imminent pump failure Data characteristics Location, time stamp, electrical current, displacement, flow, sediment PPM, etc. High volume, high velocity Higher resolution (measurements at a higher frequency) enables more accurate analytics 2015 MapR Technologies 18

Example Use Case Predictive Maintenance (cont.) Sample queries and analysis Dashboard of state of oil field Failed pump measurement pattern Imminently failing measurement pattern Drill-down navigation 2015 MapR Technologies 19

What about Historical Analytics on Streaming Data Need static view of data Data can t be moving as you re building your model vs. Need repeatable view of data What if model is wrong? How can you fix it if data has changed? Need comparisons for better models Alternate models should be run on same data Validate legitimacy for future data states Compare to previous state Check out the different snapshot implementations on Hadoop 2015 MapR Technologies 20

Managed Security Service Provider Customer data, network security event data Anomaly detection on large volumes of security event data, analytics on customer data to enable incremental sales 2015 MapR Technologies 21

Telecommunications Company Customer profile data, customer behavior data Analytics on customer behavior for better recommendations 2015 MapR Technologies 22

Q & A Engage with us! @mapr maprtech mapr-technologies MapR dalekim@mapr.com maprtech 2015 MapR Technologies 23