In-Memory Analytics: Get Faster, Better Insights from Big Data

Discussion Summary In-Memory Analytics: Get Faster, Better Insights from Big Data January 2015 Interview Featuring: Tapan Patel, SAS Institute, Inc.

Introduction A successful analytics program should translate quickly into monetizing the data where the data (and learnings from this data) helps the organization increase revenue, manage risks and pursue new product or service innovation. To accomplish this, what s needed from a technology perspective? The key is to remove barriers and latencies associated with analytics lifecycle steps and remove the processing constraints caused by complex big data requirements. Today, the adoption in-memory analytics is growing in hopes that it can deliver speed, deeper insights and allow companies to do more with the data they have to solve a variety of business problems. As sophisticated data discovery and analytical approaches (descriptive analytics, predictive analytics, machine learning, text analytics, etc.) become commonplace, the efficiencies of co-locating both the data and analytical workloads are essential to handle the processing needs. To get a view of the fast moving in-memory analytics technology, IIA spoke with Tapan Patel of the SAS Institute. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 2

Q: Let s start with a simple definition of in-memory analytics and some of the benefits from adopting inmemory analytics. In-memory analytics is a computing style in which all the data used by an application is stored within the main memory of the computing environment. Rather than accessing the data on a disk, data remains suspended in the memory of a powerful set of computers. And, multiple users can share this data across multiple applications in a rapid, secure, and concurrent manner. In-memory analytics also takes advantage of multi-threading and distributed computing, where you can distribute the data (and complex workloads that process the data) across multiple machines in a cluster or within a single server environment. In-memory analytics is not only associated with queries and data exploration, but it is also used with more complex processes like predictive analytics, machine learning and text analytics. For example, box plots, correlations, decision trees, neural networks, etc. are all associated with inmemory analytics processing. There are four key factors driving the adoption of in-memory analytics today: 1. A demand for greater speed in getting analytical insights from multiple data sources. Inmemory processing can support analytical workloads with sufficient scaling and speed as compared to conventional architecture. 2. A demand for more granular and deeper analytical insights. How can you take advantage of the insights to uncover meaningful new opportunities, detect unknown risks and drive fast growth? And, how can we make business processes more intelligent? In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 3

3. Reduction in the main memory hardware cost. Memory prices continue to fall year over year, and this has made in-memory processing more achievable for analytical purposes on commodity hardware. 4. The digital era is forcing organizations to reevaluate their interactions with external constituents and be proactive. They need the ability to discover, analyze and respond to different and fast-moving events. Q: For the layman, how different is in-memory processing from the traditional approach for analytics taken by an organization? The first difference is where the data is stored. Traditionally, the data is stored on a disk. In the case of in-memory analytics, the persistent storage of the data is still on the disk, but the data is read into memory. Now, with commodity hardware that s more powerful than before, you can take advantage of in-memory processing power instead of constantly shuffling with data residing on the disk. That leads to the second difference speed. Compared to traditional batch processing, where a lot of back and forth happens between the disk and job/step boundaries (i.e. data shuffling), keeping data in memory allows multiple users to conduct interactive processing without going back to disk. This allows end users rapidly get answers without worrying about infrastructure constraints for analytical experiments. Data scientists are not restricted to a sample; they can apply as many analytical techniques and iterations as needed to find the best model. Of course, in-memory computing technology needs to be evaluated by IT and analytics teams to identify opportunities where faster performance, granular insights and greater scalability can In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 4

yield better results. Q: How does in-memory computing complement the presence of a data warehouse? A data warehouse is an essential component of any analytics environment, especially since it contains a set of data that is relevant, cleansed and refined for several use cases that require structured data. As new types of data come onboard (e.g., sensor, text, etc.) and performance expectations change, IT organizations can set up a Hadoop-based sandbox environment and utilize in-memory processing to quickly explore unknown data relationships and experiment with candidate analytical models. If the data is not qualified yet, it s better to utilize an inmemory analytics sandbox environment (coupled with Hadoop for persistent storage) rather than a data warehouse. If needed, you can combine data from the data warehouse and the sandbox environment for certain types of data and analytics use cases. Proper assessment of new data sources, data preparation needs, data architecture and data governance policies is critical to help you determine how the sandbox environment can complement existing an data warehouse. The need for data preparation does not go away, and data preparation can happen outside of the data warehouse. Depending on the use case, organizations can augment data from the sandbox environment and the data warehouse. The new class of in-memory analytics powered applications meet your IT demands around expediency, responsiveness and deal with emerging business problems. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 5

Q: Specifically what are the key steps for customers to embark on an in-memory analytics path? A key step is to identify areas where in-memory analytics can delivery significant business value whether that s revenue growth, product innovation, or process efficiency. From a technology standpoint organizations need to think about how they can modernize on two fronts: analytics and infrastructure. On the analytics front, it s important to transfer from a traditional analytics mindset to a high-performance analytics mindset. This will allow you to quickly add new variables and iterate models more frequently. If you re using the latest machine learning and text analytics techniques, you can take a look at problems once deemed too complex to solve, etc. On the infrastructure front, it s important to examine how in-memory computing architecture can handle data scalability, user scalability and complex workloads. Ultimately organizations are interested in removing latencies in the analytics lifecycle whether it is related to data preparation, model development or deployment. From a data infrastructure perspective, you can evaluate how Hadoop and in-memory analytics will play a bigger role in meeting your analytics needs, especially around new or complex use cases. By providing a lowcost storage option and an in-memory, distributed computing environment, you can change the cost model for analytics processing environments. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 6

Q: What are some key challenges or speed bumps related to adopting in-memory analytics solutions? No matter how much you speed up your data preparation and analytics life cycle steps, you have to make sure that your downstream business processes and decision makers can capitalize on the generated rapid insights. It is especially challenging in asset-intensive industries like manufacturing, transportation, telecommunication, and utilities making collaboration between IT and the business even critical. Organizations will not be able to realize value from generating rapid insights if all of the supporting business processes are not taking advantage of it. It s critical to move in incremental fashion, where you focus the highest-value business processes first and learn from the experience. Another potential challenge is underestimating the skillsets required to build and maintain these advanced analytics applications (using latest machine learning techniques) along with a Hadoop-based data infrastructure. A lot of focus has been on the role of the data scientist, but IT skills required to manage and configure a big data infrastructure is equally important to meet service level agreements. Finally, it s important to know how in-memory computing fits into (or complements) your existing analytics infrastructure. For example, should IT consider a separate in-memory environment alongside the distributed data store (e.g., Hadoop, Teradata)? Or, should they utilize in-memory capacity in a shared environment (e.g., inside a Hadoop cluster) for discovery and analytics workloads? It s also important to know if you should combine data from the data In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 7

warehouse in the sandbox (in-memory based) with new types of data for specific use cases (e.g., product recommendations). Q: Talk about some of the considerations IT has to take into account as they evaluate in-memory processing architecture for analytics. Including IT early in the evaluation and planning process is important to determine how inmemory analytics fits into the larger picture of creating a flexible and scalable analytics platform. In-memory analytics allows for more self-service for end users because there will be less dependence on IT to create, maintain and administer aggregates and indexes. In-memory analytics also helps meet diverse and unplanned workloads (e.g. discover relationships or build models involving observations at granular level). However, IT has to be careful that it s not creating yet another silo. Instead, in-memory analytics should be part of your comprehensive information architecture, not a separate strategy. Using in-memory analytics as your centralized processing platform for data discovery and analytics workloads also helps IT reduce data redundancy by eliminating data silos. As the footprint for data and modeling grow, the scale of in-memory analytics deployment will likely grow to meet the new demand. Hardware sizing, memory allocation and performance tuning are critical topics for IT to meet service level agreements. We constantly get these types of questions from customers, and our solutions, coupled with the capabilities of partners like Intel, In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 8

Teradata, and HP, are critical to solve these issues. Q: Does data integration effort change under an inmemory analytics environment? Typically, we have seen that 60% to 70% of your effort in any analytics exercise is around data integration, including preparing data before building models and deploying model score codes into operational systems. As you integrate new, more diverse data types and volumes (e.g., event streams, sensor data, log data, free-form text, social media data, etc.) to support inmemory analytics enabled use cases, data integration and data discovery will be even more critical for building analytical models downstream. A range of data preparation techniques (e.g., profiling, cleansing, transforming, imputing, filtering, etc.) integrated with analytical workflows is essential to quickly yield value from complex data. To cope with the data deluge and to enhance end-user productivity, the adoption of self-service, interactive data integration tools will increase. Also gaining importance will be capabilities to quickly assist in evaluating the usefulness of data and generate reusable data transformations for integration into analytic workflows. Q: Is this a SAS-specific message, or do others in the marketplace share the same thoughts on in-memory analytics? We have seen other vendors associating in-memory processing architecture from a traditional In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 9

BI, query or data discovery perspective. What SAS provides is a way to use in-memory analytics as a processing method for more advanced concepts like predictive analytics, machine learning, prescriptive analytics and text analytics. We have built our in-memory engine from the groundup with data preparation and analytical workloads in mind; it is not an in-memory database where the focus is on selecting rows of data and performing basic queries, aggregations, etc. Another key differentiation for SAS is the ability to exploit in-memory processing across key components of the analytics life cycle. This includes data discovery, model development and model deployment in an interactive manner. For example, data exploration is fundamental to identifying strong relationships and to find out why certain events happen. But, we take this a step further and help exploit these relationships to build, refine and deploy predictive models. Then, in-memory analytics provides a distributed platform that provides interactivity, fast response times and multi-user concurrency. Once the required data is loaded in memory, users can make multiple passes through the data for analytical computations and build numerous models by group or segment (e.g., location, store, owner, device, age, income) on the fly. About the Interviewee Tapan Patel is Global Product Marketing Manager at SAS. With more than 15 years in the enterprise software market, Patel leads marketing efforts for Predictive Analytics, Data Mining and Hadoop market segments. He also leads marketing efforts for infrastructure topics like In- Memory Analytics and In-Database Analytics. He works closely with customers, partners, industry analysts, press and media, and thought leaders to ensure that SAS continues to deliver high-value solutions in the marketplace. In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 10

Additional Information To learn more about this topic, please visit In-Memory Analytics on sas.com In-Memory Analytics: Get Faster, Better Insights from Big Data, January 2015 p. 11 SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 107526_S135698.0115