In-Memory Analytics: Get Faster, Better Insights from Big Data

Similar documents
Bringing the Power of SAS to Hadoop Title

NICE Customer Engagement Analytics - Architecture Whitepaper

Predictive Analytics Reimagined for the Digital Enterprise

Investor Presentation. Fourth Quarter 2015

Managing Data to Maximize Smart Grid Benefits

Investor Presentation. Second Quarter 2016

DATASHEET. Tarams Business Intelligence. Services Data sheet

Analytics in Action transforming the way we use and consume information

IBM Balanced Warehouse Buyer s Guide. Unlock the potential of data with the right data warehouse solution

Enabling Self-Service Analytics Across The UDA With Teradata AppCenter

The Industry Leader in Data Warehousing, Big Data Analytics, and Marketing Solutions

How Data Science is Changing the Way Companies Do Business Colin White

Translate Integration Imperative into a solution Framework. A Solution Framework. August 1 st, Mumbai By Dharanibalan Gurunathan

Simplifying Hadoop. Sponsored by. July >> Computing View Point

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Insight is 20/20: The Importance of Analytics

5th Annual. Cloudera, Inc. All rights reserved.

Analytic Workloads on Oracle and ParAccel

COPYRIGHTED MATERIAL. Contents. Part One Requirements, Realities, and Architecture 1. Acknowledgments Introduction

Implementing a Data Warehouse with Microsoft SQL Server

LEVERAGE THE WEALTH OF DATA INTELLIGENCE BUSINESS INTELLIGENCE ANALYTICS CDW FINANCIAL SERVICES

Cognitive Data Warehouse and Analytics

IBM Analytics Unleash the power of data with Apache Spark

Oracle Big Data Discovery The Visual Face of Big Data

The Importance of good data management and Power BI

Cognizant BigFrame Fast, Secure Legacy Migration

From Data Deluge to Intelligent Data

UNLEASH THE POWER OF YOUR DATA

PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD

ITG STATUS REPORT. Bottom-line Advantages of IBM InfoSphere Warehouse. Overview. May 2011

Blueprints for Big Data Success

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Actionable enterprise architecture management

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Architecting an Open Data Lake for the Enterprise

Better information, better results siemens.com/xhq

CREATING A FOUNDATION FOR BUSINESS VALUE

Embark on Your Data Management Journey with Confidence

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

INTRODUCTION TO R FOR DATA SCIENCE WITH R FOR DATA SCIENCE DATA SCIENCE ESSENTIALS INTRODUCTION TO PYTHON FOR DATA SCIENCE. Azure Machine Learning

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Kepion Solution vs. The Rest. A Comparison White Paper

LEVERAGING DATA ANALYTICS TO GAIN COMPETITIVE ADVANTAGE IN YOUR INDUSTRY

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

The Benefits of Modern BI: Strategy Companion's Analyzer with Recombinant BI Functionality

Blueprints for Big Data Success. Succeeding with four common scenarios

DataAdapt Active Insight

Modernize Transactional Applications with a Scalable, High-Performance Database

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Strategies for the digital leader. Keys to delivering excellence in digital manufacturing today

Big Data Analytics met Hadoop

White Paper. Return on Information: The New ROI. Getting value from data

AXON PREDICT ANALYTICS FOR VXWORKS

Our Emerging Offerings Differentiators In-focus

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Ensuring a Sustainable Architecture for Data Analytics

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Modern Payment Fraud Prevention at Big Data Scale

Extreme Convergence: Fusing IT and Business in a Leaner, Global, Virtualized World

A Forrester Consulting Thought Leadership Paper Commissioned By HPE. August 2016

Capability Checklist for an Enterprise Customer Data Platform

SAP Cloud Platform Big Data Services EXTERNAL. SAP Cloud Platform Big Data Services From Data to Insight

Ways to Transform. Big Data Analytics into Big Value

TDWI MONOGRAPH SERIES Seven Keys to High-Performance Data Management for Advanced Analytics

Text Analytics for Executives Title

UNLEASH THE POWER OF YOUR DATA

Microsoft Azure Essentials

MATLAB for Data Analytics The MathWorks, Inc. 1

USING BIG DATA AND ANALYTICS TO UNLOCK INSIGHTS

SAS ANALYTICS IN ACTION APPROACHABLE ANALYTICS AND DECISIONS AT SCALE TUBA ISLAM, SAS GLOBAL TECHNOLOGY PRACTICE, ANALYTICS

Operational Hadoop and the Lambda Architecture for Streaming Data

Mass-Scale, Automated Machine Learning and Model Deployment Using SAS Factory Miner and SAS Decision Manager

Operating in a Big Data World. Thinking about ROI

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

The power of the moment. Great CX starts by putting your business before your customer.

Accelerate Your Digital Transformation

WELCOME TO. Cloud Data Services: The Art of the Possible

Active Analytics Overview

White Paper. Demand Signal Analytics: The Next Big Innovation in Demand Forecasting

Using Analytical Marketing Optimization to Achieve Exceptional Results WHITE PAPER

Progressive Organization PERSPECTIVE

Converting Big Data into Business Value with Analytics Colin White

White Paper, March Building the Data-Centric Enterprise

Six Critical Capabilities for a Big Data Analytics Platform

Luxoft and the Internet of Things

COURSE OUTLINE: Implementing a Data Warehouse with SQL Server Implementing a Data Warehouse with SQL Server 2014

Data Integration for the Real-Time Enterprise

DATA, DATA, EVERYWHERE: HOW CAN IT BE MONETIZED?

Big Data The Big Story

The Evolution of Analytics

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

zdata Solutions BI / Advanced Analytic Platform and Pilot Programs

Datameer for Data Preparation: Empowering Your Business Analysts

Table of Contents. Headquarters Cary, NC USA US Fax International

Transcription:

Discussion Summary In-Memory Analytics: Get Faster, Better Insights from Big Data January 2015 Interview Featuring: Tapan Patel, SAS Institute, Inc.

Introduction A successful analytics program should translate quickly into monetizing the data where the data (and learnings from this data) helps the organization increase revenue, manage risks and pursue new product or service innovation. To accomplish this, what s needed from a technology perspective? The key is to remove barriers and latencies associated with analytics lifecycle steps and remove the processing constraints caused by complex big data requirements. Today, the adoption in-memory analytics is growing in hopes that it can deliver speed, deeper insights and allow companies to do more with the data they have to solve a variety of business problems. As sophisticated data discovery and analytical approaches (descriptive analytics, predictive analytics, machine learning, text analytics, etc.) become commonplace, the efficiencies of co-locating both the data and analytical workloads are essential to handle the processing needs. To get a view of the fast moving in-memory analytics technology, IIA spoke with Tapan Patel of the SAS Institute. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 2

Q: Let s start with a simple definition of in-memory analytics and some of the benefits from adopting inmemory analytics. In-memory analytics is a computing style in which all the data used by an application is stored within the main memory of the computing environment. Rather than accessing the data on a disk, data remains suspended in the memory of a powerful set of computers. And, multiple users can share this data across multiple applications in a rapid, secure, and concurrent manner. In-memory analytics also takes advantage of multi-threading and distributed computing, where you can distribute the data (and complex workloads that process the data) across multiple machines in a cluster or within a single server environment. In-memory analytics is not only associated with queries and data exploration, but it is also used with more complex processes like predictive analytics, machine learning and text analytics. For example, box plots, correlations, decision trees, neural networks, etc. are all associated with inmemory analytics processing. There are four key factors driving the adoption of in-memory analytics today: 1. A demand for greater speed in getting analytical insights from multiple data sources. Inmemory processing can support analytical workloads with sufficient scaling and speed as compared to conventional architecture. 2. A demand for more granular and deeper analytical insights. How can you take advantage of the insights to uncover meaningful new opportunities, detect unknown risks and drive fast growth? And, how can we make business processes more intelligent? In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 3

3. Reduction in the main memory hardware cost. Memory prices continue to fall year over year, and this has made in-memory processing more achievable for analytical purposes on commodity hardware. 4. The digital era is forcing organizations to reevaluate their interactions with external constituents and be proactive. They need the ability to discover, analyze and respond to different and fast-moving events. Q: For the layman, how different is in-memory processing from the traditional approach for analytics taken by an organization? The first difference is where the data is stored. Traditionally, the data is stored on a disk. In the case of in-memory analytics, the persistent storage of the data is still on the disk, but the data is read into memory. Now, with commodity hardware that s more powerful than before, you can take advantage of in-memory processing power instead of constantly shuffling with data residing on the disk. That leads to the second difference speed. Compared to traditional batch processing, where a lot of back and forth happens between the disk and job/step boundaries (i.e. data shuffling), keeping data in memory allows multiple users to conduct interactive processing without going back to disk. This allows end users rapidly get answers without worrying about infrastructure constraints for analytical experiments. Data scientists are not restricted to a sample; they can apply as many analytical techniques and iterations as needed to find the best model. Of course, in-memory computing technology needs to be evaluated by IT and analytics teams to identify opportunities where faster performance, granular insights and greater scalability can In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 4

yield better results. Q: How does in-memory computing complement the presence of a data warehouse? A data warehouse is an essential component of any analytics environment, especially since it contains a set of data that is relevant, cleansed and refined for several use cases that require structured data. As new types of data come onboard (e.g., sensor, text, etc.) and performance expectations change, IT organizations can set up a Hadoop-based sandbox environment and utilize in-memory processing to quickly explore unknown data relationships and experiment with candidate analytical models. If the data is not qualified yet, it s better to utilize an inmemory analytics sandbox environment (coupled with Hadoop for persistent storage) rather than a data warehouse. If needed, you can combine data from the data warehouse and the sandbox environment for certain types of data and analytics use cases. Proper assessment of new data sources, data preparation needs, data architecture and data governance policies is critical to help you determine how the sandbox environment can complement existing an data warehouse. The need for data preparation does not go away, and data preparation can happen outside of the data warehouse. Depending on the use case, organizations can augment data from the sandbox environment and the data warehouse. The new class of in-memory analytics powered applications meet your IT demands around expediency, responsiveness and deal with emerging business problems. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 5

Q: Specifically what are the key steps for customers to embark on an in-memory analytics path? A key step is to identify areas where in-memory analytics can delivery significant business value whether that s revenue growth, product innovation, or process efficiency. From a technology standpoint organizations need to think about how they can modernize on two fronts: analytics and infrastructure. On the analytics front, it s important to transfer from a traditional analytics mindset to a high-performance analytics mindset. This will allow you to quickly add new variables and iterate models more frequently. If you re using the latest machine learning and text analytics techniques, you can take a look at problems once deemed too complex to solve, etc. On the infrastructure front, it s important to examine how in-memory computing architecture can handle data scalability, user scalability and complex workloads. Ultimately organizations are interested in removing latencies in the analytics lifecycle whether it is related to data preparation, model development or deployment. From a data infrastructure perspective, you can evaluate how Hadoop and in-memory analytics will play a bigger role in meeting your analytics needs, especially around new or complex use cases. By providing a lowcost storage option and an in-memory, distributed computing environment, you can change the cost model for analytics processing environments. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 6

Q: What are some key challenges or speed bumps related to adopting in-memory analytics solutions? No matter how much you speed up your data preparation and analytics life cycle steps, you have to make sure that your downstream business processes and decision makers can capitalize on the generated rapid insights. It is especially challenging in asset-intensive industries like manufacturing, transportation, telecommunication, and utilities making collaboration between IT and the business even critical. Organizations will not be able to realize value from generating rapid insights if all of the supporting business processes are not taking advantage of it. It s critical to move in incremental fashion, where you focus the highest-value business processes first and learn from the experience. Another potential challenge is underestimating the skillsets required to build and maintain these advanced analytics applications (using latest machine learning techniques) along with a Hadoop-based data infrastructure. A lot of focus has been on the role of the data scientist, but IT skills required to manage and configure a big data infrastructure is equally important to meet service level agreements. Finally, it s important to know how in-memory computing fits into (or complements) your existing analytics infrastructure. For example, should IT consider a separate in-memory environment alongside the distributed data store (e.g., Hadoop, Teradata)? Or, should they utilize in-memory capacity in a shared environment (e.g., inside a Hadoop cluster) for discovery and analytics workloads? It s also important to know if you should combine data from the data In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 7

warehouse in the sandbox (in-memory based) with new types of data for specific use cases (e.g., product recommendations). Q: Talk about some of the considerations IT has to take into account as they evaluate in-memory processing architecture for analytics. Including IT early in the evaluation and planning process is important to determine how inmemory analytics fits into the larger picture of creating a flexible and scalable analytics platform. In-memory analytics allows for more self-service for end users because there will be less dependence on IT to create, maintain and administer aggregates and indexes. In-memory analytics also helps meet diverse and unplanned workloads (e.g. discover relationships or build models involving observations at granular level). However, IT has to be careful that it s not creating yet another silo. Instead, in-memory analytics should be part of your comprehensive information architecture, not a separate strategy. Using in-memory analytics as your centralized processing platform for data discovery and analytics workloads also helps IT reduce data redundancy by eliminating data silos. As the footprint for data and modeling grow, the scale of in-memory analytics deployment will likely grow to meet the new demand. Hardware sizing, memory allocation and performance tuning are critical topics for IT to meet service level agreements. We constantly get these types of questions from customers, and our solutions, coupled with the capabilities of partners like Intel, In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 8

Teradata, and HP, are critical to solve these issues. Q: Does data integration effort change under an inmemory analytics environment? Typically, we have seen that 60% to 70% of your effort in any analytics exercise is around data integration, including preparing data before building models and deploying model score codes into operational systems. As you integrate new, more diverse data types and volumes (e.g., event streams, sensor data, log data, free-form text, social media data, etc.) to support inmemory analytics enabled use cases, data integration and data discovery will be even more critical for building analytical models downstream. A range of data preparation techniques (e.g., profiling, cleansing, transforming, imputing, filtering, etc.) integrated with analytical workflows is essential to quickly yield value from complex data. To cope with the data deluge and to enhance end-user productivity, the adoption of self-service, interactive data integration tools will increase. Also gaining importance will be capabilities to quickly assist in evaluating the usefulness of data and generate reusable data transformations for integration into analytic workflows. Q: Is this a SAS-specific message, or do others in the marketplace share the same thoughts on in-memory analytics? We have seen other vendors associating in-memory processing architecture from a traditional In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 9

BI, query or data discovery perspective. What SAS provides is a way to use in-memory analytics as a processing method for more advanced concepts like predictive analytics, machine learning, prescriptive analytics and text analytics. We have built our in-memory engine from the groundup with data preparation and analytical workloads in mind; it is not an in-memory database where the focus is on selecting rows of data and performing basic queries, aggregations, etc. Another key differentiation for SAS is the ability to exploit in-memory processing across key components of the analytics life cycle. This includes data discovery, model development and model deployment in an interactive manner. For example, data exploration is fundamental to identifying strong relationships and to find out why certain events happen. But, we take this a step further and help exploit these relationships to build, refine and deploy predictive models. Then, in-memory analytics provides a distributed platform that provides interactivity, fast response times and multi-user concurrency. Once the required data is loaded in memory, users can make multiple passes through the data for analytical computations and build numerous models by group or segment (e.g., location, store, owner, device, age, income) on the fly. About the Interviewee Tapan Patel is Global Product Marketing Manager at SAS. With more than 15 years in the enterprise software market, Patel leads marketing efforts for Predictive Analytics, Data Mining and Hadoop market segments. He also leads marketing efforts for infrastructure topics like In- Memory Analytics and In-Database Analytics. He works closely with customers, partners, industry analysts, press and media, and thought leaders to ensure that SAS continues to deliver high-value solutions in the marketplace. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 10

Additional Information To learn more about this topic, please visit In-Memory Analytics on sas.com In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 11 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 107526_S135698.0115