SAS & HADOOP ANALYTICS ON BIG DATA
WHY HADOOP? OPEN SOURCE MASSIVE SCALE FAST PROCESSING COMMODITY COMPUTING DATA REDUNDANCY DISTRIBUTED
WHY HADOOP? Hadoop will soon become a replacement complement to: Business Intelligence; Data Warehousing; Data Integration; Analytics. HADOOP IN PRODUCTION: YES 10% NEVER #1 reason to go for Hadoop: Analytics (71%) < 12 MONTHS Challenges to Hadoop adoption: Hadoop has no analytic functions built in Cost: hefty payroll due to intensive hand coding 3+ YEARS < 36 MONTHS < 24 MONTHS SOURCE: 10 Myths About Hadoop - TDWI Best Practices Report
WHY SAS? ANALYTICS IN-MEMORY HIGH-PERFORMANCE DATA MANAGEMENT BUSINESS INTELLIGENCE DATA VISUALIZATION
WHY SAS? ANALYTICAL DECISION MAKING Competitive Advantage Optimize What is the best that can happen? Differentiators Predict What will happen next? Predict Prescribe Optimize What if these trends continue? Why is this happening? Statistical Analysis Forecast Alerts What actions are needed? Raw data Clean data Standard reports Ad hoc reports Query drill down Degree of Intelligence Where exactly is the problem? How many, how often, where? What happened?
AN ERA OF ABUNDANCE BIG DATA 2005 2007 2009 2011 2013 BIG DATA
AN ERA OF ABUNDANCE HADOOP 2005 2007 2009 2011 2013 BIG DATA HADOOP
AN ERA OF ABUNDANCE ANALYTICS 2005 2007 2009 2011 2013 BIG DATA HADOOP ANALYTICS
AN ERA OF ABUNDANCE WHERE WE ARE NOW 2005 2007 2009 2011 2013 BIG DATA Lots of data HADOOP Processing Power ANALYTICS Intelligence
SAS & HADOOP THE BUSINESS REASONING What organizations are looking for: Accuracy: bring superior analytics to Hadoop for more precise insights. Scalability: provide comprehensive support from data-to-decision to maximize the value of Hadoop across the enterprise. Governance: integrate and manage data in order to promote broad reuse and to comply with IT policies and procedures. Economics: drive bottom-line benefits by boosting the value of analytics infrastructure while reducing TCO.
SAS & HADOOP WHY THE MARRIAGE? High-performance Advanced Analytics; Business Intelligence and Data Visualization; At Massive Scale, on Distributed, Commodity Hardware
SAS & HADOOP HOW? SAS & Hadoop intersect in many ways: SAS can treat Hadoop just as any other data source, pulling data FROM Hadoop, when it is most convenient; SAS can work WITH Hadoop, lifting data in a purpose-built advanced analytics in-memory environment; SAS can work directly IN Hadoop, leveraging the distributed processing capabilities of Hadoop.
SAS & HADOOP SAS FROM HADOOP SAS accesses and extracts data from Hadoop to a SAS server for processing, and writes results back. Bridge to traditional SAS environments Hadoop treated as just another data source Performance limited to single pipe bandwidth Ideal when not all data is to be found in Hadoop, or when established process cannot run in Hadoop DATA MOVEMENT
SAS & HADOOP SAS WITH HADOOP SAS accesses and processes Hadoop data on SAS Servers while keeping the data and computations massively parallel. Provides capabilities Hadoop cannot do well Supports advanced analytics via shared computing Allows the scaling of data storage and analytics separately Ideal when analytical rigor, sophistication and governance are required DATA LIFT INTO MEMORY
SAS & HADOOP SAS IN HADOOP SAS processes data directly in the Hadoop cluster. SAS Embedded Process enables scalable SAS compute in Hadoop SAS compute is orchestrated via Hadoop technology Data manipulation, data quality, and scoring support Ideal when all data is landing in Hadoop, and Hadoop is the proper place for processing SAS LOGIC
SAS & HADOOP SAS IN HADOOP SAS processes data directly in the Hadoop cluster. SAS Embedded Process provides scalable SAS compute in Hadoop SAS compute is orchestrated via Hadoop technology Data manipulation, data quality, and scoring support Ideal when all data is landing in Hadoop, and Hadoop is the proper place for processing
SAS & HADOOP Prepare data IN Hadoop for analytics THE PRAGMATIC APPROACH Move data FROM Hadoop into a SAS environment Deploy and manage model score code IN Hadoop Lift data IN to memory for analytics at scale Use the right approach for what needs to be done! Explore data at scale, inmemory WITH data visualization Model data at scale inmemory WITH advanced modeling tools
SAS & HADOOP KEY POINTS SAS is the only vendor to work from + with + in Hadoop throughout the analytics lifecycle. All three approaches can be combined and coordinated, complementing each other for each situation. Each approach can evolve, mature and/or morph into the other. Metadata management across the whole analytics life cycle, crossing all Hadoop interactions, is key to success. SAS can help realize the value of Hadoop; bring production-analytics to the platform.
ROGERS MEDIA Data visualization and high performance analytics Processing data on 12 million customers 40 million records per month in Hortonworks More than 600 relevant web characteristics Several of us from Rogers in the room looked at each other, and said That is really wicked; that s cool. Chris Dingle Senior Director of Audience Solutions Rogers Communications
MACY S 20% reduction in churn $500,000 annual savings Customer lifetime value analysis More accurate response prediction Optimized promotions... they can look at data and spend more time analyzing it and become internal consultants who provide more of the insight behind the data. Kerem Tomak Vice President of Analytics
SAS/ACCESS TO HADOOP Uses Existing SAS Interfaces Standard Libname syntax PROC HADOOP Datastep and Proc SQL translated to Hive Filename support Execute Pig Scripts and MapReduce Push-down of certain procedures Custom SerDe support SPDE formats
SAS/ACCESS TO IMPALA Massively Parallel Processing (MPP) query engine SQL queries against the Hadoop file system (HDFS) Optimized for interactive queries Similar to Hive in function, but different in implementation Extraordinary performance
SAS/ACCESS TO HAWQ Direct, transparent access to the Pivotal HAWQ SQL engine SQL pass-through Enable you to interact with HBase Bulk loading, faster than inserting
SAS VISUAL ANALYTICS - EXPLORER Data exploration at massive scale Intuitive visual analytics
SAS VISUAL STATISTICS Descriptive and Predictive Modeling Model comparison Dynamic groupby processing
SAS IN-MEMORY STATISTICS FOR HADOOP In-Memory Statistics for Hadoop: Interactive Programming interface for SAS model development
SAS VISUAL ANALYTICS REPORT DESIGNER Visual Analytic Designer and Viewer: Reporting and analysis for broad audiences
SAS VISUAL ANALYTICS VIEWER FOR MOBILE Mobile BI for reporting
SAS HIGH-PERFORMANCE DATA MINING Highperformance procedure nodes in SAS Enterprise Miner
SAS DATA LOADER FOR HADOOP SAS Code Accelerator (DS2) Embedded Process and Hive Parallel data loading No data movement Data Profiling Data Quality Accelerator
SAS DATA MANAGEMENT CAN USE ALL THREE EP EP EP
SAS 9.4 SUPPORTED HADOOP DISTRIBUTION Cloudera CDH ( 支持 Kerberos) Hortonworks HDP ( 支持 Kerberos) MapR ( 未验证 Kerberos) Pivotal HD ( 支持 Kerberos) IBM BigInsights ( 未验证 Kerberos)