Nouvelle Génération de l infrastructure Data Warehouse et d Analyses November 2011 André Münger andre.muenger@emc.com +41 79 708 85 99 1
Agenda BIG Data Challenges Greenplum Overview Use Cases Summary Q&A 2
Big Data Challenges 3
BIG DATA 4
Data Sources Are Expanding Growth ratio of structured data to unstructured data will be approximately 1 : 8 Source : 2011 IDC Digital Universe Study 5
Perspective(s) Business IT Governance Resources Data Scientist.. 6
Databases Need to Adapt to Big Data Traditional RDBMS is not optimized for Big Data Analytics 50% of TDWI survey respondents will replace their DW platform in the next 3 years because: Cannot do advanced analysis Poor query response Can t support advanced analytics 40% 45% Cannot handle big data volumes Inadequate data load speed Can t scale up to large date volumes Cost of scaling up is too expensive 33% 37% 39% Poorly suited to real-time or on-demand workloads 29% Source: TDWI Next Gen Database Study, 2010 7
The Big Data Challenge Increased Volume of Data Increased No. of Formats / Sources Increased Business Demand Decreased time window Decreased budgets Decreased resources 8
It took us roughly 100 years from 9
. to space tourism 10
20 + Years of Evolution Data Warehouse Data Models BI Tools Consulting 11
20 + Years of Evolution Data Mining OLAP / BI Applications, Verticals Data Warehouse ROLAP MOLAP HOLAP Data Models BI Tools Consulting 12
20 + Years of Evolution OLAP / BI Data Mining Applications, Verticals Big Data Transition of traditional relational databases to MPP Massive Parallel Processing Data Warehouse ROLAP MOLAP HOLAP Turn unstructured data into actionable information. Data Models BI Tools Consulting 13
Greenplum Overview 14
Big Data UAP: Unified Analytics Platform 3 rd Party/Partner BI and Analytics Tools Greenplum Chorus Enterprise Collaboration Platform for Data Greenplum Data Computing Appliances Purpose-built for Big Data Analytics Greenplum Database Enterprise & Community Editions World s Most Scalable MPP Database Platform Greenplum HD Hadoop Enterprise & Community Editions Enterprise Analytics Platform for Unstructured Data 15
Greenplum DB MPP Shared-Nothing Architecture Greenplum s MPP database has extreme performance on commodity Infrastructure Optimized for BI and analytics Provides automatic parallelization Just load and query like any database Tables are automatically distributed across nodes No need for manual partitioning or tuning Extremely scalable and I/O optimized All nodes can scan and process in parallel Linear scalability by adding nodes Interconnect Loading 16
Massively Parallel Processing And Linear Performance Scalability Greenplum 4.0: Database Architecture SQL MapReduce Master Servers Query planning & dispatch...... Network Interconnect Segment Servers Query processing & data storage...... External Sources Loading, streaming, etc. 17
Greenplum HD Enterprise Edition Enterprise-Ready Hadoop Platform for Unstructured Data Reliable High Availability Mirroring Easier to Use NFS mountable System Management Faster 2 5x Faster than Apache Hadoop 18
Greenplum: Not Just About Technology Data Science teams will become the driving force for success with big data analytics University data science program collaboration with Stanford and UC Berkeley Greenplum s Data Science practice with leading PhDs and analytic tools expertise Community investment including the Greenplum Analytic Workbench, Community edition software, and Data Science Summits 19
Powerful Customer and Partner Ecosystem 20
Use Cases 21
Easynet / Retail Real-Time Scoring at POC Cross-selling Up-selling Customer Reward Program 22
Likelihood Of Conversion USE CASE Optimize Marketing Campaigns With Big Data Big Data Analytics Enables Better Customer Interactions HIGH Legacy System Greenplum In-Database Analytics Greenplum Big Data Analytics Clicks become users targeted to predicted outcomes Social Media, Blog and Press, & Competitor Website Behavior, Leveraged to Refine Predictions LOW 23
Customer Profit USE CASE Increase Revenue With Big Data Analytics Big Data Analytics Enables Increased Per Customer Profit For Retail Banking Firm HIGH Legacy System Greenplum Database BI Reporting Greenplum In-Database Analytics Greenplum Big Data Analytics LOW Agent Best Guess Branch Level Reporting Enabling Profit-based Recommendations Market Basket Analysis & Customer Lifetime Value Computations Enabling User-based Recommendations Data Enriched with Unstructured Activity Logs To Identify At Risk Customers TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED 24
Underwriting Risk USE CASE Reduce Risk With Big Data Analytics Big Data Analytics Enables Accurate Decisions For National Mortgage Underwriter HIGH Greenplum Database BI Reporting Greenplum In-Database Analytics Greenplum Big Data Analytics Legacy System Unstructured Data Sources Enrich The Data Delivering In Minutes What Was Days K-Means Clustering & Decision Tree Scoring Improves Accuracy LOW Monthly Risk Model Updates Daily Risk Model Updates TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED 25
Quality of Care USE CASE Innovate With Big Data Analytics Big Data Analytics Accelerate Health Care 2.0 for Evidence-based Care Provider HIGH Legacy System Greenplum Database BI Reporting Greenplum DB In-Database Analytics Greenplum Big Data Analytics Delivering 10 Years Of Data In Seconds Associative Rule Mining and User Clustering Improves Pathways External Data Sources Enable Personalized Medicine LOW Treatment Pathways on Summary Data Treatment Pathways on All the Data TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED 26
Summary 27
Possible Client Issues Performance Cost / TCO Value (Load / or Enduser) 28
If you face any of these Performance External POC on defined challenges. Value Analytic Labs with World Leading Data Scientists SAS / Greenplum Cost Cost assessment and Value Proposition Combined Internal POC including infrastructure and support 29
Summary In 5 years time MPP architectures will simply be the standard for Analytics and Data Warehouse infrastructures. In 5 years time the ability to Compete on Analytics will be the most important differentiator between companies on structured AND unstructured data. 30
Q & A 31