Six Critical Capabilities for a Big Data Analytics Platform

Size: px
Start display at page:

Download "Six Critical Capabilities for a Big Data Analytics Platform"

Transcription

1 White Paper Analytics & Big Data Six Critical Capabilities for a Big Data Analytics Platform

2 Table of Contents page Executive Summary...1 Key Requirements for a Big Data Analytics Platform...1 Vertica: A Singularly Effective Big Data Analytics Platform...6 Take a Closer Look at Vertica...7

3 How can you transition to an analytics architecture that meets both current and future requirements, that somehow fits within your flat-lined budget, and that minimizes the impact on legacy processes and staff? Executive Summary In the Digital Age, Speed Is of the Essence. Can Your Data Warehouse Keep Pace? Legacy analytics solutions are choking on data. Recent research at Vertica shows that 66 percent of respondents felt their current solutions were unable to analyze large volumes of data; 65 percent said queries ran very slowly or did not complete at all; and 43 percent said their current system had reached its performance limits.* At the same time, the demand for faster and better analytics for extracting that golden nugget of insight from a mountain of data continues to grow. You re well aware of the implications. You look at the skyrocketing cost and complexity of managing your current data warehouse solution, you see that it s running out of gas, and you know that disruption is coming. On the other hand, modernizing your Big Data analytics platform presents a huge opportunity to deliver new value from monetizing data, to increasing customer loyalty and retention, to optimizing traffic and even meeting compliance requirements. The right Big Data analytics platform, implemented the right way, opens the door to enhancing your competitive position and achieving superior business outcomes. The question is, how can you transition to an analytics architecture that meets both current and future requirements, that somehow fits within your flat-lined budget, and that minimizes the impact on legacy processes and staff? This paper summarizes the top six capabilities a Big Data analytics platform must deliver, and it takes a perspective that you may find to be both unexpected and insightful: the key considerations for a Big Data analytics platform do not revolve only around the concepts of big, data, and analytics. In the digital era, the guiding principle is speed. A Big Data analytics platform must expedite IT s ability to provide insights that drive process improvements that accelerate business results. Simply put, the future belongs to the fast, and Big Data analytics must help the business move faster and smarter. Key Requirements for a Big Data Analytics Platform Clearly, the Big Data analytics platform you select must deliver on a very broad range of requirements. Here are the top six criteria to consider. * TechValidate Survey, December 2015 #1: It Has To Be Extremely Fast Having alluded to the urgent need for speed in the digital era, let s consider what that translates to in terms of a Big Data analytics platform. In the simplest terms, users don t want to wait for results when they run a query. They expect instant gratification, instant results, with no impact on other workloads. And that means the Big Data analytics platform must supercharge the performance of existing applications, permit you to develop challenging new analytics, and provide a logical, predictable, and affordable scale-out strategy. 1

4 White Paper Six Critical Capabilities for a Big Data Analytics Platform From a technical perspective, meeting these expectations requires a combination of a columnar database architecture (as opposed to traditional row-based, non-parallel-processing databases) and the use of massively parallel processing technology, or MPP. The reason: A columnar design helps minimize I/O contention, which is the leading cause of latency in analytic processing. Columnar design also offers extremely high compressibility, typically exceeding that of a row-based database by a factor of four or five. And MPP data warehouses typically scale linearly, meaning if you double the footprint of a two-node MPP warehouse, you effectively double its performance. The combination of columnar design and MPP not only allows for a massive scale-up in terms of performance (often on the order of 50 times or more), but also opens the door to lower and more transparent pricing mechanisms, such as a per-terabyte model rather than the traditional per-processor, per-node, per-user pricing schemes. The net result: an exponential increase in performance coupled with the potential to radically reduce the total cost of Big Data analytics processing. From a technical perspective, meeting these expectations requires a combination of a columnar database architecture (as opposed to traditional row-based, non-parallelprocessing databases) and the use of massively parallel processing technology, or MPP. Vertica means insanely high query performance! Figure 1. Vertica query performance vs. legacy systems 2

5 In order to support data warehousing workloads on Hadoop, it is necessary to develop new skills, obtain new software, and in many cases hire new people. #2: It Has to Accommodate Huge Volumes of Data Of course, blistering speed alone is of limited use if the Big Data analytics platform can t scale to store or manage massive volumes. Today, the scale may be gigabytes or terabytes. Tomorrow, you may be thinking about petabytes. Massively parallel processing is the ideal technology for scaling analytical processing because it leverages both the storage and compute capability of a cluster of computers. It scales not only in performance but also in its ability to handle the huge streams of data coming in. Moreover, the use of MPP by a Big Data platform designed to handle structured data accelerates processing even more, because structured data is optimized for analytics and reduces the amount of searching that s required to answer a query. The structured database knows better where the data exists in the sea of data and can access it with precision. Generally speaking, unstructured databases have difficulty scaling to the levels attainable by structured databases using columnar design. However, it is possible for Big Data analytics platforms to incorporate features that also improve scalability and performance in unstructured databases. #3: It Must Embrace Legacy Tools If your Big Data analytics relies on extract, transform, load (ETL) tools such as Attunity, Informatica, Syncsort, Talend, or Pentaho, or SQL-based visualization tools such as Logi Analytics, Looker, MicroStrategy, Qlik, Tableau, and Talena, make sure your platform is certified to work with all of them not just those from your primary vendor. Also, make sure all tools and add-on technologies you re using comply with the latest (SQL 2011) version of the ANSI SQL standard. #4: It Should Both Harness and Add Value to Hadoop Hadoop, the open source software platform managed by the Apache Software Foundation, has become a major force in Big Data analytics. Many database professionals have evaluated Hadoop as a potential solution to the analytical limitations of their legacy data warehouse systems. Unfortunately, they typically find Hadoop s performance for ad hoc query and SQL analytics versus a columnar, MPP-based Big Data analytics platform to be seriously deficient. Moreover, in order to support data warehousing workloads on Hadoop, it is necessary to develop new skills, obtain new software, and in many cases hire new people. On the other hand, Hadoop offers a few distinct advantages in data analytics processing. Because it is a data lake, it provides the cost savings of a place to store data. It provides warm and cold storage, a low-cost way to keep data that may be used, but isn t hot, used in daily analytics. It offers data discovery capabilities helping you understand whether data has business value. Through ETL tools it can aggregate or munge the data as it comes into an organization. And as noted, it s possible to cost-effectively land, store, and process structured, semi-structured, and multi-structured data in Hadoop. This simply isn t the case with a relational database. 3

6 White Paper Six Critical Capabilities for a Big Data Analytics Platform What s needed is the best of both worlds: a way to harness the advantages of Hadoop without incurring the performance penalties and potential disruptions of Hadoop. Therefore, look for a Big Data analytics platform that can leverage Hadoop as a cost-effective platform for persistence and light data management and that can accelerate both traditional data warehouse workloads and advanced analytics. #5: It Must Support Data Scientists The data scientist is acquiring greater influence and importance within enterprise IT, and the Big Data analytics platform should support data scientists in two key areas. First, the new generation of data scientists is using tools like Java, Python, and R to perform predictive analytics. The underlying analytical database should support and accelerate the creation of innovative predictive analytics and allow for user-defined extensions that extend these common data science languages. The underlying analytical database should support and accelerate the creation of innovative predictive analytics and allow for userdefined extensions that extend these common data science languages. Second, the platform should help connect the work of data scientists to business goals. Today, the role of data scientist often evolves from the role of statistician, a relatively academic pursuit that is traditionally not well versed in big-picture business objectives. In some cases, the result is that the conclusions drawn by data scientists may be incomplete, inaccurate, or unrelated to business outcomes. At the same time, business people are often content to leave the statisticians in a back room and call them out only when they need a magic bullet. A Big Data analytics platform that is fast, efficient, easy to use, and widely deployed can help close that gap between business and technical professionals. 4

7 The ability to perform predictive analytics is becoming increasingly important to many organizations. Make sure the Big Data analytics platform not only enables you to prepare and load data in seconds but also build predictive models with advanced in-database machine learning algorithms and easily deploy those models for scoring at scale on large datasets. #6: It Should Provide Advanced Analytics Capabilities Depending upon your particular use case, it may be important to look at the depth of built-in SQL analytical functions offered by the Big Data analytics engine. You have to look under the hood to see exactly what SQL analytics are offered under these volumes, never mind performing analytics on that data. For example, if you want to perform analytics on data from devices (as in the Internet of Things), analytical functions such as Time Series Analysis and Gap Analytics are necessary functions. Without them, you might spend your time munging data or writing custom code. In addition, the ability to perform predictive analytics is becoming increasingly important to many organizations. Make sure the Big Data analytics platform not only enables you to prepare and load data in seconds but also build predictive models with advanced in-database machine learning algorithms and easily deploy those models for scoring at scale on large datasets. These and other capabilities will give you the ability to accelerate large-scale machine learning, statistical analysis, and graph processing, while also enabling data scientists to use their existing statistical packages and preferred languages. 5

8 White Paper Six Critical Capabilities for a Big Data Analytics Platform Vertica: A Singularly Effective Big Data Analytics Platform Vertica is a true data analytics platform as opposed to a point solution. Vertica analytics platform is singularly capable of meeting the six key criteria summarized in this paper. It delivers the speed, scalability, simplicity, and openness to address the vast majority of analytical requirements of today s enterprise. It is architected to provide extremely high performance (queries run 50 times faster or more), petabyte scale (stores 10 to 30 times more data per server), and the ability to use business intelligence (BI)/ETL tools, including Hadoop, at a much lower cost than traditional data warehouse solutions. Equally important, Vertica is a true data analytics platform as opposed to a point solution. It includes a broad range of features such as a management console to monitor the performance of Vertica clusters, providing a graphical view of your Vertica database cluster, nodes, network status, and detailed monitoring charts and graphs. In terms of backup, you can use a full backup for disaster recovery to restore a damaged or incomplete database; you can also restore individual objects from a full backup. These and other features are included as part of the platform, but must be cobbled together when using a less mature solution. Vertica s in-database machine learning supports the entire predictive analytics process with massively parallel processing and a familiar SQL interface, allowing data scientists and analysts to embrace the power of Big Data and accelerate business outcomes with no limits and no compromises. 6

9 See for yourself how Vertica delivers blazing fast analytics on-premises, in the clouds or on Hadoop. Vertica also complements and adds value to Hadoop. It gives you a cost-effective and scalable analytics engine for both traditional and advanced analytics, as well as an extensible data management platform that can help you get the most out of what you re doing or planning with Hadoop and other innovations. This includes Vertica for SQL on Hadoop as well as External Tables, which allow users to directly query data in Parquet or ORC formats stored in HDFS or S3. Take a Closer Look at Vertica See for yourself how Vertica delivers blazing fast analytics on-premises, in the clouds or on Hadoop. Manage and analyze up to 1 TB of data across three nodes for an unlimited time, free of charge or take a test drive of Vertica using sample applications with pre-loaded data, running on AWS and Azure clouds. Try it free at: /try Learn More At 7

10 Additional contact information and office locations: AA V DS 03/ Micro Focuss. All rights reserved. Micro Focus and the Micro Focus logo, among others, are trademarks or registered trademarks of Micro Focus or its subsidiaries or affiliated companies in the United Kingdom, United States and other countries. All other marks are the property of their respective owners.