MapR Streams A global pub-sub event streaming system for big data and IoT

Similar documents
Real World Use Cases: Hadoop & NoSQL in Production. Big Data Everywhere London 4 June 2015

Microsoft Azure Essentials

Cloud Based Analytics for SAP

Apache Kafka. A distributed publish-subscribe messaging system. Neha Narkhede, 11/11/11

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Operational Hadoop and the Lambda Architecture for Streaming Data

Transforming IIoT Data into Opportunity with Data Torrent using Apache Apex

Hortonworks Powering the Future of Data

Hadoop in the Cloud. Ryan Lippert, Cloudera Product Cloudera, Inc. All rights reserved.

EVERYTHING YOU NEED TO KNOW ABOUT THE WSO2 ANALYTICS PLATFORM SENIOR SOFTWARE ENGINEER

Hybrid Data Management

This presentation is for informational purposes only and may not be incorporated into a contract or agreement.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail

FLINK IN ZALANDO S WORLD OF MICROSERVICES JAVIER LOPEZ MIHAIL VIERU

Bringing Big Data to Life: Overcoming The Challenges of Legacy Data in Hadoop

Architected Blended Big Data With Pentaho. A Solution Brief

Architecture Overview for Data Analytics Deployments

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

INFOSYS REALTIME STREAMS

ORACLE COMMUNICATIONS UNIFIED COMMUNICATIONS SUITE

The Future of IoT. Don DeLoach Co-Chair, Midwest IoT Council

Edge Analytics for IoT Device Intelligence

More information for FREE VS ENTERPRISE LICENCE :

Data: Foundation Of Digital Transformation

Introducing Infor Xi/Ming.le for M3

Oracle Big Data Cloud Service

Data Center Operating System (DCOS) IBM Platform Solutions

Event Driven Architecture for Real-Time Analytics

Bringing the Power of SAS to Hadoop Title

SAP Leonardo (Internet of Things) Fixed Assets

SAP Big Data. Markus Tempel SAP Big Data and Cloud Analytics Services

Secure information access is critical & more complex than ever

ETL challenges on IOT projects. Pedro Martins Head of Implementation

THE MAGIC OF DATA INTEGRATION IN THE ENTERPRISE WITH TIPS AND TRICKS

Jason Virtue Business Intelligence Technical Professional

Turn Data into Business Value

Extending Enterprise to the Edge

The IBM Reference Architecture for Healthcare and Life Sciences

Financial Data Enterprise Platform

2014 SAP SE or an SAP affiliate company. All rights reserved. 1

The IoT Solutions Space: Edge-Computing IoT architecture, the FAR EDGE Project John Professor Athens Information

Streaming Analytics, Data Lakes and PI Integrators

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Transforming Big Data to Business Benefits

Ticketing: How ACME s Cloud-Based Enterprise Platform Benefits Your Business

Simplifying Your Modern Data Architecture Footprint

Real-Time Streaming: IMS to Apache Kafka and Hadoop

Innovate with Oracle Public Cloud Platform & Infrastructure Services

An Enterprise-Grade Architecture for Salesforce Native Applications

20332B: Advanced Solutions of Microsoft SharePoint Server 2013

Sandeep Alur Architect Advisor Microsoft India Aditee Rele Architect Advisor Microsoft India

Boston Azure Cloud User Group. a journey of a thousand miles begins with a single step

Using Mesos Schedulers with Amazon EC2 Container Service

Cloud Customer Architecture for Hybrid Integration

EBOOK: Cloudwick Powering the Digital Enterprise

How to Build Your Data Ecosystem with Tableau on AWS

Intel IT Cloud 2012 and Beyond. Cathy Spence, Enterprise Architect Intel Information Technology April 2012

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Deploying Microservices and Containers with Azure Container Service and DC/OS

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

SOLUTION BRIEF CA MANAGEMENT CLOUD FOR MOBILITY. Overview of CA Management Cloud for Mobility

INTRODUCING BIRST INFOR S GO-FORWARD BUSINESS INTELLIGENCE SOLUTION

Berkeley Data Analytics Stack (BDAS) Overview

Konica Minolta Business Innovation Center

Processing over a trillion events a day CASE STUDIES IN SCALING STREAM PROCESSING AT LINKEDIN

Erik Swanson Group Program Manager BI COE Microsoft Corporation Microsoft Corporation. All rights reserved

Machina Research White Paper for ABO DATA. Data aware platforms deliver a differentiated service in M2M, IoT and Big Data

IBM WebSphere Information Integrator Content Edition Version 8.2

IoT Enabled Technologies Delivering Actionable Insights for the Telecom Industry

TIBCO Live Datamart providing an operational command and control center in a virtual train application.

Beyond Ceilometer Metering and Billing

"Charting the Course... MOC A: Architecting Microsoft Azure Solutions. Course Summary

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

An Enterprise Architect s Guide to API Integration for ESB and SOA

IBM Tivoli Workload Automation View, Control and Automate Composite Workloads

Transition to SOA. Oracle SOA Suite. Martin Jäkle Solution Architect TSBU Fusion Middleware Oracle Deutschland

OWB EE 11.2: From ETL to DI 16 November 2009 Antonio Romero, Senior Product Manager, Data Integration

RICHARD BEESON. OSIsoft

Middleware Modernization: lay the foundation to your digital success

Security intelligence for service providers

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

CloudSuite Corporate ebook

The ABCs of. CA Workload Automation

MiCloud Engage Contact Center

Microsoft FastTrack For Azure Service Level Description

Put Mobile First: With the IBM Mobile App Development Lifecycle. Ian Robinson Program Director IBM MobileFirst Platform & Analytics 10/07/2013

Technology Challenges for the Global Real-Time Enterprise.

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

A Crucial Challenge for the Internet of Everything Era

AUTOMATING HEALTHCARE CLAIM PROCESSING

SCRIBE WHITE PAPER HOW SCRIBE ONLINE WORKS

Azure IoT Suite. Secure device connectivity and management. Data ingestion and command + control. Rich dashboards and visualizations

SCRIBE WHITE PAPER HOW SCRIBE ONLINE WORKS

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Building a Data Lake with Spark and Cassandra Brendon Smith & Mayur Ladwa

POLOPOLY V9 TECHNICAL OVERVIEW. System Architecture Templates and Presentation Modules

Key Benefits. Overview. Field Service empowers companies to improve customer satisfaction, first time fix rates, and resource productivity.

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Transcription:

MapR Streams A global pub-sub event streaming system for big data and IoT Ben Sadeghi Data Scientist, APAC IDA Forum on IoT Jan 18, 2016 2015 MapR Technologies 2015 MapR Technologies

MapR Streams: Vision To enable continuous, globally scalable streaming of event data, allowing developers to create real-time applications that their business can depend on. Converged Continuous Global

Data Processing MapR Converged Data Platform Open Source Engines & Tools Commercial Engines & Applications Utility-Grade Platform Services Enterprise Storage Search & Others Database Cloud & Managed Services Event Streaming MapR-FS MapR-DB MapR Streams Custom Apps Unified Management and Monitoring Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy

Big Data is Generated One Event at a Time time : 6:01.103, event : RETWEET, location : lat : 40.712784, lon : -74.005941 time: 5:04.120, severity : CRITICAL, msg : Service down card_num : 1234, merchant : Apple, amount : 50

Batch Processing Has Many Use Cases Customer 360 Sentiment analysis Clickstream analysis Predictive maintenance Fraud detection Coupon offers Risk models

Real-time Processing is Complementary Trending now News feed Ops dashboards Failure alerts Breach detection Real-time fraud detection Real-time offers Push notifications

Streams Simplify Data Movement Streams Reliable publish/subscribe transport between sources and destinations. Filtering & Aggregation Alerting Processing

Legacy Systems: Message Queues IBM MQ, TIBCO, RabbitMQ Order Processing Front End Orders Order Processing Usage/Requirements Tight, transactional conversations between systems 1:1 or Few:Few Low data rates Mission-critical delivery Approach Queue-oriented design Each message replicated to N output queues Messages popped when read Scale-up, master/slave Doesn t Do High message rates (>100K/s) Slow consumers Queue replay/rewind

Evolving big data Event Streams: Distributed Logs Kafka, Hydra, DistributedLog Stream Processing DB DB_Changes Search/ EDW Usage/Requirements High throughput data transferred from decoupled systems Many -> 1 1 -> Many Different speeds Approach Log-oriented design Write messages to log files Consumers pull messages at their own pace Scale-out Doesn t Do Global applications Message persistence Integrated analytics (data movement required)

MapR: Rethinking a Platform for Event Streams Ad Impressions App Logs Sensor Data Big data scale and performance Global applications and data collection Multi-tenant and multi-application Secure Analytics-ready (no movement) Converged: no cluster sprawl Stream Processing Analytics

MapR Streams Converged, Continuous, Global 2015 2014 MapR MapR Technologies Technologies

MapR Streams: Global Publish-Subscribe Event Streaming System for Big Data Producers publish billions of messages/sec to a topic in a stream. Guaranteed, immediate delivery to all consumers. Tie together geo-dispersed clusters. Worldwide. Producers Stream To Topic c Consumers Standard real-time API (Kafka). Integrates with Spark Streaming, Storm, Apex, and Flink Batch analytics Remote sites and consumers Direct data access (OJAI API) from analytics frameworks.

MapR Streams - Converged, Continuous, Global Converged Converged platform with file storage and database OJAI API - Direct access from analytics tools Unified security framework with files and database tables Multi-tenant - topic isolation, quotas, data placement control Continuous Integrated with Spark Streaming, Flink, Apex, others Message persistence for up to infinite time span Guaranteed delivery (at least once) Consistent, synchronous replication & no single point of failure Global Native, global data and metadata replication with arbitrary topology Millions of streams, 100K topics/stream Billions of events per second Millions of producers & consumers

Global Provides Arbitrary topology of thousands of clusters Automatic loop prevention Producers DNS-based discovery Globally synchronized message offsets and consumer cursors Enables Global applications & data collection Producer & consumer failover Analysis/filtering/aggregation at the edge Consumers Occasional connections

Top Differentiators Single cluster for files, tables, and streams. Global, IoT-scale fabrics with failover. Data persistence and direct data access to batch processing frameworks. Converged Global MapR Streams Secure & Multi-Tenant Authentication, authorization, encryption. Unified policy with all other platform services. Tenant-owned streams, logical grouping of topics and messages.

Life Without a Converged Platform Sources Open Source Analytics (Hadoop, Spark) Apps Streaming Cluster (TIBCO, IBM, Kafka) Batch Loads Operational Cluster (HBase, Cassandra) Streaming Enterprise Storage (system of record) Real-time

Data Life With a Converged Platform Sources/Apps Bulk Processing Stream Processing Utility-Grade Platform Services Enterprise Storage Database Event Streaming MapR-FS MapR-DB MapR Streams Global Namespace High Availability Data Protection Self-healing Unified Security Real-time Multi-tenancy Only full-stack big data platform.

Part of a Converged Reference Architecture Source Capture Store Process Serve NFS MapR-FS Spark Flume ElasticSearch MapR-DB Drill Kibana Dashboard Streams Spark Streaming Ops Dashboard 2015 MapR Technologies 18

IoT Data Transport & Processing Business Results New revenue streams from collecting and processing data from things. Low response times by placing collection and processing near users. Local Collection, Filtering, Aggregation Why Streams IoT is event-based, and needs an event streaming architecture. Why MapR Converged platform gives single cluster, single security model for data in motion and at rest. Reliable global replication for distributed collection, analysis, and DR. Global Dashboards, Alerts, Processing 2015 MapR Technologies 19

Q & A Engage with us! @bensadeghi, @mapr maprtech mapr-technologies MapR bsadeghi@mapr.com maprtech