Big Data Analytics met Hadoop

Size: px
Start display at page:

Download "Big Data Analytics met Hadoop"

Transcription

1 Big Data Analytics met Hadoop Jos van Dongen Arno Klijnman

2

3 What is Distributed storage and processing of (big) data on large clusters of commodity hardware HDFS Map/Reduce

4 HDFS - Distributed storage for big files

5 Map/Reduce- Distributed processing for big data

6 The Hadoop Jungle

7 SAS & Hadoop Capabilities WITH Hadoop ON Hadoop IN Hadoop HDFS SAS Data Quality Accelerator SAS Scoring Accelerator SAS Code Accelerator

8 SAS & Hadoop Integration User Interface SAS Data Management SAS Enterprise Miner SAS Studio SAS Visual Analytics SAS Visual Statistics SAS In-memory Statistics for Hadoop SAS User Metadata Data Access Base SAS & SAS/ACCESS to Hadoop SAS Metadata In-Memory Data Data Access Access Next-Gen SAS User Data Processing Pig Hive Hive Map Reduce SAS Embedded Process SAS LASR Analytic Server File System HDFS

9 DEPLOY & MONITOR Two Paradigms Hadoop as a Data Platform Hadoop as a core component of next generation analytical platform MANAGE DATA TEXT DATA EXPLORE DEVELOP MODELS

10 DEPLOY & MONITOR Paradigm two Hadoop as a core component of next generation analytical platform SAS/ACCESS SAS Data Management SAS Federation Server SAS Event Stream Processing SAS Data Loader for Hadoop SAS Data Quality Accelerator for Hadoop SAS Code Accelerator for Hadoop MANAGE DATA DATA EXPLORE SAS Data Loader for Hadoop SAS Visual Analytics SAS In-memory Statistics for Hadoop TEXT SAS Scoring Accelerator for Hadoop SAS Decision Manager SAS Visual Analytics DEVELOP MODELS SAS High Performance Analytics Products SAS Visual Statistics SAS In-memory Statistics for Hadoop

11 SAS runs the Entire Analytical Lifecycle in/on/with Hadoop SAS Visual Analytics IDENTIFY / FORMULATE PROBLEM BASE SAS SAS / Access SAS Data Loader for Hadoop SAS DI Studio SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop DEPLOY MODEL EVALUATE / MONITOR RESULTS DATA PREPARATION DATA EXPLORATION SAS Visual Analytics SAS Visual Statistics SAS High Performance Analytics Offerings SAS In-Memory Statistics for Hadoop VALIDATE MODEL TRANSFORM & SELECT Done using either the Data Preparation, Data Exploration or Build Model Tools Done using the Build Model Tools and other checks BUILD MODEL SAS High Performance Analytics Offerings SAS In-Memory Statistics for Hadoop SAS Visual Statistics

12 USER ROLES & THE ANALYTICS LIFECYCLE BUSINESS MANAGER Domain Expert Makes Decisions Evaluates Processes and ROI EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION BUSINESS ANALYST Data Exploration Data Visualization Report Creation DEPLOY MODEL DATA EXPLORATION IT SYSTEMS / MANAGEMENT Model Validation Model Deployment Data Preparation VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT ANALYST DATA SCIENTIST Exploratory Analysis Descriptive Segmentation Predictive Modeling

13 USER ROLES & THE ANALYTICS LIFECYCLE BUSINESS MANAGER Domain Expert Makes Decisions Evaluates Processes and ROI EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION BUSINESS ANALYST Data Exploration Data Visualization Report Creation DEPLOY MODEL DATA EXPLORATION IT SYSTEMS / MANAGEMENT Model Validation Model Deployment Data Preparation VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT ANALYST DATA SCIENTIST Exploratory Analysis Descriptive Segmentation Predictive Modeling

14

15 USER ROLES & THE ANALYTICS LIFECYCLE BUSINESS MANAGER Domain Expert Makes Decisions Evaluates Processes and ROI EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION BUSINESS ANALYST Data Exploration Data Visualization Report Creation DEPLOY MODEL DATA EXPLORATION IT SYSTEMS / MANAGEMENT Model Validation Model Deployment Data Preparation VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT ANALYST DATA SCIENTIST Exploratory Analysis Descriptive Segmentation Predictive Modeling

16 SAS Data Loader for Hadoop A new SAS Web-based Business user interface Point & Click User Menus Little or no Hadoop experience needed Self-Service UI HTML 5 Interface Enables Self-Service approach to managing data in Hadoop environment

17 SAS Data Loader for Hadoop Transform Data in Hadoop Filtering Rules Column Selections Aggregation No coding, scripting or specialized skills required

18 SAS Data Loader for Hadoop Query Hadoop data Select Source Tables Apply Query Criteria See subset of data in Table Viewer Simple Drag & Drop approach to Query Data inside Hadoop

19 SAS Data Loader for Hadoop Profile Hadoop Data Select Source Table View Reports in Column Display View Reports in Table Display Run standard metrics on data inside Hadoop and generate reports

20

21

22 View Data

23 SAS Data Loader for Hadoop Copy Data to distributed sas lasr server Select Source Table Copy Data To distributed SAS LASR Servers Visualize Data SAS Visual Analytics Explore Hadoop data quickly and easily for faster insights Optional

24

25 USER ROLES & THE ANALYTICS LIFECYCLE BUSINESS MANAGER Domain Expert Makes Decisions Evaluates Processes and ROI EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION BUSINESS ANALYST Data Exploration Data Visualization Report Creation DEPLOY MODEL DATA EXPLORATION IT SYSTEMS / MANAGEMENT Model Validation Model Deployment Data Preparation VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT ANALYST DATA SCIENTIST Exploratory Analysis Descriptive Segmentation Predictive Modeling

26

27 USER ROLES & THE ANALYTICS LIFECYCLE BUSINESS MANAGER Domain Expert Makes Decisions Evaluates Processes and ROI EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM DATA PREPARATION BUSINESS ANALYST Data Exploration Data Visualization Report Creation DEPLOY MODEL DATA EXPLORATION IT SYSTEMS / MANAGEMENT Model Validation Model Deployment Data Preparation VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT ANALYST DATA SCIENTIST Exploratory Analysis Descriptive Segmentation Predictive Modeling

28 SAS Scoring Accelerator for Hadoop Export Score Code (EM,SAS/STAT,VS) Scoring File(s) Hadoop Publish Macro SAS Model Manager

29 SAS Scoring Accelerator for Hadoop

30

31 Demo flow SAS DI Data Loader VA Explorer Access to Hadoop Transform Write back to Hadoop Write to LASR Show table Profile Build Query Write result to LASR Discover relations Understand the data VS IMSTAT Scoring Accelerator Discover a model Determine significance Cluster variables Recommendation Datastep to enrich original dataset with recommendation results Write to LASR Deploy model Run model Back to Data Loader SAS Data Management SAS Interactive Analytics On Hadoop SAS Analytics

32 SAS & Hadoop: 3 Things to Remember WITH Hadoop ON Hadoop IN Hadoop HDFS

33 Demo Environment Infrastructure Internet Elastic IP Address Setup: CentOS operating system - Local users on all Amazon servers - Internal network for all Amazon Servers - Open firewall for all ports between workstation & server - No integration Mail server - No SSL AWS-Cloud C opyr i g ht 2014, SAS Ins titut e Inc. All rights res er ve d.

34

35 9 oktober 2014 Huizen

36