Oracle's Big Data analytics portfolio gains critical mass

Size: px
Start display at page:

Download "Oracle's Big Data analytics portfolio gains critical mass"

Transcription

1 Oracle's Big Data analytics portfolio gains critical mass Publication Date: 04 Dec 2015 Product code: IT Tony Baer

2 Ovum view Summary Over the past year, the third-party ecosystem for big data analytics has grown more densely populated with solutions designed specifically for Hadoop and mixed data platforms, with the common thread being enhancement through machine-learning capabilities. Oracle has been very much part of that wave, with emerging offerings in exploratory analytics, fast data streaming analytics, and data wrangling. While it has many rivals among start-ups and BI/analytics tool vendors, in breadth, its big data analytics and data integration portfolio is most comparable to IBM's but the comparisons are not necessarily apples to apples. One more thing to note: Oracle's Big Data Appliance finally took off last year after a three-year gestation. Besides reflecting the readiness of the enterprise market for Hadoop, it validates our hunch that the next wave of Hadoop adoption to the broader enterprise market will be via cloud or appliances, because "do-it-yourself" is beyond the skills and capabilities of most organizations. Oracle Big Data analytics and integration stack is filling out Oracle has segmented its offering through four use cases, most of which use the Big Data Appliance in conjunction with Oracle databases. They include: Data factory Where organizations use Hadoop as an inexpensive compute platform for offloading workloads such as ETL or batch analytics. Data lab This is the exploratory analytics use case, where data scientists run experiments and business users take advantage of self-service analytics tools to discover patterns and derive insights. Data lake Where Hadoop becomes the primary ingest point and repository for analytics data. Fast data Where big data is analyzed in real time and at scale. Oracle's definition pinpoints fast data as data in motion. For the data factory and data lake Recognizing that there are different paths for getting data into and out of Hadoop, Oracle includes multiple offerings in its catalog. Big Data SQL is Oracle's high-performance SQL-on-Hadoop solution, which uses the Oracle database and treats data in Hadoop as virtual tables (it supersedes Oracle's SQL Connector to Hadoop, which is used with other Hadoop distributions); in the Big Data Appliance, Oracle also supports SQL access via Cloudera's open source Impala project or Apache Hive. For loading data to or from Hadoop, Oracle offers a real-time data replication solution (Oracle GoldenGate); for batch export, Oracle Loader for Hadoop (one of Oracle's big data connectors) bulk-feeds data to the Oracle database. And other connectors can bulk-copy objects such as partitions from Oracle to Hadoop, and allow HiveQL and SparkSQL to directly access data inside Oracle. Oracle, like major rivals such as IBM and Informatica, has been extending its core data integration tools to execute natively on Hadoop. While that originally meant execution in MapReduce (and other Hadoop Apache projects), today the trend is toward supporting higher-performance alternatives. Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 2

3 Oracle has extended GoldenGate and Oracle Data Integrator to support Spark in-memory execution (others such as IBM are hedging when it comes to bulk loads, given questions over Spark's scalability, which we believe will ultimately be solved). But the takeaway is that the core capabilities are stable; at this point, the enhancements are incremental. For instance, in the latest dot release, there is more granular control for running Oracle Data Integrator in Oozie. For preparing data, Oracle is promoting a cloud service, Big Data Preparation, which we profiled at length in our earlier report, Oracle's Cloud-based Big Data Analytics and Preparation Solution. There is a key missing piece: Oracle should subsequently release an on-premise version because some organizations may have data or analytics that are deemed too sensitive for running outside the four walls (as Paxata, the pioneer in data preparation, has proven). For the data lab: Exploratory analytics As we noted in our previous research, Oracle has adopted a "cloud-first" strategy for business end-user tools for visualization and data preparation, but a hybrid approach for self-service data discovery. The core offering, Oracle Big Data Discovery, has its roots in Oracle's Endeca search-based analytics tool, but only as a starting point; Oracle Big Data Discovery is not simply Endeca oversized for big data. It employs natural language processing and an underlying graph engine that helps end users explore the multifaceted relationships of data to parse out the signals and build narratives on those relationships. Its closest competitor, IBM Watson Analytics, provides useful comparison more from the perspective of how young this market is: The solutions tend to be unique, so comparing them is like apples and oranges. For the data scientist, Oracle offers an implementation of R that expands this single-node, client analytic tool to multi-threaded processing on Hadoop. Oracle is hardly alone in offering such capabilities; IBM, Teradata, and Microsoft are also offering their own scale-out implementations. Oracle has recently added spatial and graph analytics, a move that has similarly been matched by the same usual suspects. Additionally, Oracle offers XQuery for Hadoop, which extends its existing XML query facility for the Hadoop platform. Many of these tools are bundled with Oracle's Big Data Connectors. What's missing is an offering for Python developers, but we expect that Oracle will address that gap. For fast data: Oracle Event Processing gets a new face Since the BEA days, Oracle has had a foothold in what used to be known as "complex event processing": Oracle Event Processing. As we've noted in our fast data research, CEP was a solution that was too far ahead of its time: the technology wasn't ready to provide an economical solution, and aside from a few niches in capital markets, national security, and telco, there was little perceived demand. And so it's not surprising that it took Oracle a while to position its event-processing offering. As we noted in our research, several factors have dragged the data-in-motion flavor of fast data to the front burner: the emergence of scale-out commodity infrastructure, open source software, and compelling use cases with IoT and mobile data. And not surprisingly, CEP has been recast by most vendors as streaming analytics. Oracle has responded by adding an end-user front end to Oracle Event Processing with Oracle Stream Explorer the long-needed visual tool for creating real-time, event-driven applications. This is a useful first step that keeps Oracle ahead of open source where the technology is still raw and lacks usability. Oracle needs to follow this up by developing templates for different industry scenarios that could spark adoption, and integrating new flexibility so end users Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 3

4 can choose the streaming engine of choice; the momentum behind the Spark computing engine makes this an imperative. With its new IoT cloud service, Oracle has done just that; it should add such capability for its on-premise event processing as well. Oracle Big Data Appliance business hits the inflection point Oracle's Big Data Appliance business has become an overnight success after a three-year gestation as the company worked through packaging to lower the entry cost for new Hadoop adopters. Over the past year, growth for Big Data Appliance has finally taken off. For Oracle, it was a case of history repeating itself; just as its data warehouse platforms initially drew traction in Europe, the same has happened with Big Data Appliance. In large part this has been attributable to people; with staff turnover lower on both the vendor (Oracle) and customer sides in Europe compared to other world geographies, the relationships tend to be longer-standing. The appliance bundles Cloudera's Hadoop distribution for analytics with the Oracle NoSQL database for operational use cases. Growth of the business and use cases provide good indicators of what the next wave of Hadoop adoption will look like. Use cases have been the familiar ones that we have seen with enterprises (e.g., customer engagement, log analysis for operational efficiency), with one key exception: real-time processing of IoT (sensor device) data has emerged. That is attributable not only to Oracle NoSQL Database's performance characteristics, but also to Hadoop's growing support of real-time processing through the introduction of YARN and the Spark compute engine. And it points to one other trend we expect to see more of in 2016: The next wave of Hadoop adoption in the enterprise will in large part occur through simpler paths, such as appliance or cloud. Appendix Further reading Oracle's Cloud-Based Big Data Analytics and Preparation Solution, IT (July 2015) Fast Data : Understanding Streaming Analytics, IT (October 2015) Fast Data : The Rebirth of Streaming Analytics, IT (October 2015) Author Tony Baer, Principal Analyst, Information Management tony.baer@ovum.com Ovum Consulting We hope that this analysis will help you make informed and imaginative business decisions. If you have further requirements, Ovum s consulting team may be able to help you. For more information about Ovum s consulting capabilities, please contact us directly at consulting@ovum.com. Copyright notice and disclaimer The contents of this product are protected by international copyright laws, database rights and other intellectual property rights. The owner of these rights is Informa Telecoms and Media Limited, our Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 4

5 affiliates or other third party licensors. All product and company names and logos contained within or appearing on this product are the trademarks, service marks or trading names of their respective owners, including Informa Telecoms and Media Limited. This product may not be copied, reproduced, distributed or transmitted in any form or by any means without the prior permission of Informa Telecoms and Media Limited. Whilst reasonable efforts have been made to ensure that the information and content of this product was correct as at the date of first publication, neither Informa Telecoms and Media Limited nor any person engaged or employed by Informa Telecoms and Media Limited accepts any liability for any errors, omissions or other inaccuracies. Readers should independently verify any facts and figures as no liability can be accepted in this regard readers assume full responsibility and risk accordingly for their use of such information and content. Any views and/or opinions expressed in this product by individual authors or contributors are their personal views and/or opinions and do not necessarily reflect the views and/or opinions of Informa Telecoms and Media Limited. Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 5

6 CONTACT US INTERNATIONAL OFFICES Beijing Dubai Hong Kong Hyderabad Johannesburg London Melbourne New York San Francisco Sao Paulo Tokyo Ovum. All rights reserved. Unauthorized reproduction prohibited. Page 6