REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.

REDEFINE BIG DATA Zvi Brunner CTO 2

2020: A NEW DIGITAL WORLD 30B DEVICES 7B PEOPLE Millions OF NEW BUSINESSES Source: Gartner Group, 2014

DIGITIZATION IS ALREADY BEGINNING PRECISION FARMING DRESS THAT DISPLAYS HOW WE FEEL THERMOSTAT THAT KNOWS YOU RE AWAY GLASSES THAT DIRECT US WHERE TO GO CONTACT LENS THAT CONTROLS BLOOD SUGAR FITNESS BAND THAT MEASURES ACTIVITY LEVEL DRONES THAT DELIVER OUR GROCERIES

THE DATA DIVIDE BIG DATA CHASM 70% of data generated by customers 80% of data stored 3% prepared for analysis 0.5% being analyzed <0.5% being operationalized 6

FIVE TRENDS ENABLING BIG DATA Data Growth Limitless Compute Dev Ops Cheap Storage Real-time Technologies

Big Data Is Not Just Alot of Data Summary, limited data Backward looking Pre-planned Reports & meetings Expansive, full data sets Predicting the future Iterative, agile Applications & products

STEPS TO HARNESS BIG DATA Build New Applications, Products, & Business Models Leverage New Analytics To Predict The Future Gather as Much Data As Possible

BRINGING IT ALL TOGETHER IS HARD MAP ANALYTICS USE CASE, ANALYTICS PLATFORMS, & STORAGE Customer Sentiment Analysis Product Performance & Reliability Supply Chain Optimization Product Recommendation Engine Competitive War Games??????

HOW CAN WE SIMPLIFY THIS?

EMC BUSINESS DATA LAKE THE INDUSTRY S FIRST FULLY-ENGINEERED ENTERPRISE-GRADE DATA LAKE SOLUTION

WHAT IS A DATA LAKE? JUST COLLECTING AND ANALYZING DATA IS NOT A DATA LAKE A Data Lake is a complex ecosystem of tools Security isn t secondary, it s core functionality Data is discoverable and its usage is traceable It should provide access to all business users It should follow business rules and policies

EVOLVE TO THE DATA LAKE CHOICE, EXTENSIBLE ECOSYSTEM Data Lake Foundations? EMC ANALYTICS STARTER PLATFORM

RELIABLE INFRASTRUCTURE-SCALABILITY Must be able to handle more traffic demand at any time Easily process big data workloads

EMC BUSINESS DATA LAKE PLATFORM DATA SERVICES DATA & ANALYTICS CATALOG MANAGEMENT DATA MANAGER ANALYTICS TOOLBOX DATA GOVERNOR (THIRD PARTY APPLICATIONS) ADVANCED ANALYTICS PIVOTAL BIG DATA SUITE GREENPLUM DATABASE HAWQ APPS AT SCALE REDIS GEMFIRE INGEST DATA PROCESSING SPRING XD SPARK PIVOTAL HD RABBITMQ BDS ON PIVOTAL POLICY MGMT HADOOP INDEX & SEARCH OPEN DATA PLATFORM SECURITY & ACCESS CONTROL VIRTUALIZATION PIVOTAL CLOUD FOUNDRY EMC II STORAGE DATA LAKE FOUNDATION: ISILON ECS VCE VBLOCK XTREMIO

FOCUS ON BUSINESS OUTCOMES NOT ON LOW LEVEL TECHNOLOGY DECISIONS INGEST Capture data from a wide range of sources, traditional and new STORE ANALYZE Use advanced algorithms to discover new, predictive patterns SURFACE ACT Build data-driven applications to meet business needs Store everything in one environment for cross data analysis Share insights with business domain experts

WHAT IS BIG DATA ANALYTICS? Value of Analytics ($) HOW CAN WE MAKE IT HAPPEN? WHAT WILL HAPPEN? Prescriptive Analytics WHY DID IT HAPPENED? Predictive Analytics WHAT HAPPENED? Diagnostic Analytics Descriptive Analytics Complexity

ADDRESS GAPS IN A TYPICAL DATA LAKE KEY CAPABILITIES FAST & EASY DEPLOYMENT in as little as one week versus months SEMANTIC CONSISTENCY and governed metadata SECURITY Access control and governance AUTOMATIC instantiation of data, analytics and applications SELF-SERVICE for all of the various users across the organization

PIVOTAL BIG DATA SUITE

WORLD S FIRST OPEN SOURCED BIG DATA PORTFOLIO BUILDING ON SUCCESS OF CLOUD FOUNDRY FOUNDATION Open sourcing all Pivotal Big Data Suite components including: Pivotal GemFire Apache Geode Pivotal Greenplum Database Apache HAWQ Pivotal HDB BUILT FOR ENTERPRISES

OPEN Common core for Hadoop ecosystem Rapidly accelerated certifications, ecosystem development and enterprise-grade quality OpenDataPlatform.org

DATA-DRIVEN ENTERPRISE JOURNEY STORE ANALYZE DEVELOP INNOVATE Structured Predictive Analytics Advanced Analytic Pipelines Agile Dev Expertise Unstructured Machine Learning Realtime Analytical Applications DevOps High Volume High Velocity Advance Data Science Realtime Analytics Global Scale Data-Driven Applications Enterprise, Consumer, and Mobile Microservice Continuous Delivery Closed Loop Applications BIG DATA PREDICTIVE ANALYTICS CLOUD NATIVE PLATFORM AGILE DEVELOPMENT

DATA-DRIVEN ENTERRPRISE JOURNEY WITH PIVOTAL BIG DATA SUITE STORE Structured Unstructured High Volume High Velocity Data Engineering ANALYZE Predictive Analytics Machine Learning Advance Data Science Realtime Analytics Data Science DEVELOP Advanced Analytic Pipelines Realtime Analytical Applications Global Scale Data-Driven Applications Enterprise, Consumer, IoT, and Mobile INNOVATE Agile Dev Expertise DevOps Pivotal Labs Microservices Continuous Delivery Closed Loop Applications Spring XD Spring XD Spring XD Spring Cloud Spark Pivotal HD & Open Data Platform Pivotal Greenplum Database Pivotal HDB Pivotal GemFire Rabbit MQ Pivotal BDS on PCF Pivotal Cloud Foundry BIG DATA PREDICTIVE ANALYTICS CLOUD NATIVE PLATFORM AGILE DEVELOPMENT

TEMP Absorbance Velocity INTERNET OF THINGS IN MANUFACTURING A pipeline of sensors and opportunities for optimizing output Input materials Mix Incubate Filter Centrifuge Final Product 30 25 20 15 10 5 Automated raw materials mixing High-Content Screens TIME Elution volume Sensors 0 0 50 100 150 200 Time

ADVANCED ANALYTICS REQUIREMENTS BENEFITS 010101010101010100 101010101010101100 1010101010101010 Massive stream processing Internet of Things use cases Rapid time to insights SQL- compliant batch and interactive queries Leverage existing skills and tools Rapid time to insights 0101010101010 1010010101010 1010101100101 010 Machine learning and advanced analytics Solve business problems Predictive insights: proactive execution

TO PRO-ACTIVE, SELF-IMPROVING, MACHINE LEARNING SYSTEMS Data Stream Pipeline In-Memory Real- Time Data Data Lake HDFS Expert System / Machine Learning Multiple Data Sources Real-Time Processing Store Everything Continuous Learning Continuous Improvement Continuous Adapting

DATA STREAM NEEDS AN AGILE, SCALABLE AND FAST SOLUTION SpringXD Ingest Transform Sink GemFire Data Lake HAWQ GPDB

SPRING XD State of the Art Data Pipeline Automation INGEST / SINK PROCESS ANALYZE No coding required Dozens of built-in connectors Seamless integration with Kafka, Sqoop Create new connectors easily using Spring Call Spark, Reactor or RxJava Built-in configurable filtering, splitting and transformation Out-of-box configurable jobs for batch processing Import and invoke PMML jobs easily Call Python, R, Madlib and other tools Built-in configurable counters and gauges

DELIVER A SEAMLESS EXPERIENCE FOR EVERYONE: BIG DATA IS A TEAM SPORT BUSINESS USERS BUSINESS ANALYSTS APPLICATION DEVELOPERS DATA PROGRAMMERS INFRASTRUCTURE DEVELOPERS DATA SCIENTISTS

PIVOTAL HDB Hadoop Native SQL Exceptional Hadoop Native SQL Performance No compatibility risks to SQL developers or SQL BI tools and applications Support query roll-ups, dynamic partitions and joins Massive MPP scalability to petabytes On premise or on the cloud Scale your cluster out, not up World class parallel loading and unloading Fast performance for complex and advanced data analytics Integrated with MADLib for advanced machine learning Powerful Cost-based Query Optimizer

WHERE ARE YOU? Let EMC Help You Discover YOUR KILLER USE-CASE with a Big Data Vision Workshop Work with EMC and Your Team to IMPROVE YOUR BUSINESS with a 8-12 Week Proof-of-Value Project Order and START TODAY by Deploying the EMC Business Data Lake

IN SUMMARY BIG DATA WILL REDEFINE EVERY BUSINESS

Big Data

Big Data זה