Breakout Vendors: Big Data Integration

Size: px
Start display at page:

Download "Breakout Vendors: Big Data Integration"

Transcription

1 FOR ENTERPRISE ARCHITECTURE PROFESSIONALS by Brian Hopkins Why Read This Report How are you going to get more value from your data lake? Most big data integration vendors focus on making classic processes faster with tools for moving data into a lake and working with it there. Three innovative vendors Looker Data Sciences, SnapLogic, and Snowflake Computing offer alternatives. Read this report to understand how they are pushing the boundaries of big data integration with new ideas about how to get value faster. Key Takeaways Looker Data Sciences Accelerates Analytics On Data Lakes This vendor challenges the need for separate technologies in order to manage, govern, and store data in your data lake. It provides a distributed platform that does it all by combining human collaboration and machine intelligence. SnapLogic Takes On Hybrid Integration This vendor is pushing beyond its cloud roots to take on big data and hybrid integration requirements. It separates integration logic and execution, and is now positioning for more realtime and open source capabilities. Snowflake Computing Provides Data Warehousing On Demand Our last vendor challenges the need to create and load physical data warehouses. It provides a public cloud service that can ingest data, manage the process of integrating it, and produce virtual data warehouses and data marts on demand. FORRESTER.COM

2 by Brian Hopkins with Srividya Sridharan, Gene Leganza, Elizabeth Cullen, and Tyler Thurston Table Of Contents Big Data Technology Is Changing How Integration Happens Looker Combines Integration, Governance, And BI For Data Lakes SnapLogic Separates Integration Logic And Execution Snowflake Creates Data Warehouses On Demand What It Means Big Data Innovators Strain Enterprise Technology Vendors Notes & Resources Forrester interviewed three vendors for this research: Looker Data Sciences, SnapLogic, and Snowflake Computing, as well as attended industry events, including Strata-Hadoop World 2016 and Phorum Related Research Documents Big Data Fabric Drives Innovation And Growth Insight Platforms Accelerate Digital Transformation The Platform Explosion: Harness It Or Lose Agility TechRadar : Big Data, Q Forrester Research, Inc., 60 Acorn Park Drive, Cambridge, MA USA Fax: forrester.com 2016 Forrester Research, Inc. Opinions reflect judgment at the time and are subject to change. Forrester, Technographics, Forrester Wave, RoleView, TechRadar, and Total Economic Impact are trademarks of Forrester Research, Inc. All other trademarks are the property of their respective companies. Unauthorized copying or distributing is a violation of copyright law.

3 Big Data Technology Is Changing How Integration Happens Big data technology can bring new data sources and faster results into integration processes. For example, Monsanto uses Apache Kafka and Pivotal Cloud Foundry to do near-real-time extract, transform, load (ETL). This is, however, classic data integration made faster. In Forrester s TechRadar : Big Data, Q1 2016, big data integration was placed in the Growth phase, which means that there is a thriving market of established and new vendors. 1 To help enterprise architects get ahead of the latest innovations, Forrester has identified three groundbreaking vendors that are not sticking to the classic big data integration rules. Looker Data Sciences, SnapLogic, and SnowflakeComputing are changing when and where data integration happens to further compress time-to-value (see Figure 1). In the vendor profiles below, we assess each company s: Offering. What are the capabilities of the products and the technology? Scenarios. What are the user requirements, environments, and use cases? Maturity. What is the company s go-to-market approach, channel strategy, and viability? Challenges. What are the barriers to success? Road map. What s next for the business and the products? FIGURE 1 Big Data Innovators Extend Classic Big Data Integration Big data focuses on data lakes: The new paradigm for big data integration is E-L-T. Extract Load Transform Consumer But three innovators help you do integration where it makes the most sense: Pain: Making data in lakes analytics-ready is still complicated. Solution: Create an application layer above the data lake for integration, governance, and business intelligence. Pain: Integration often needs to happen in many places, but most vendors focus on data lakes. Solution: Separate integration definition from execution. Pain: Integrating before data warehouse limits your options. Solution: Use cloud to combine integration and on-demand warehouse creation. 2

4 Looker Combines Integration, Governance, And BI For Data Lakes Firms are building data lakes to capture raw data to store it cheaply and quickly use it for analysis. 2 However, making the data ready for analytics can still be a complex process. Much of the integration work involves discovering relationships between elements and creating metrics from the data that everyone agrees to. Today, firms are largely accomplishing this with separate integration and governance tools like Informatica and business intelligence (BI) tools like SAP BusinessObjects. Even though these vendors support Hadoop as a data store, data flow implementation and change can still be complex. Looker Data Sciences seeks to be cheaper and faster. Offering: Looker s lets users accelerate BI on data lakes. The Looker data platform helps teams to model data in a lake, then transforms it at query execution (see Figure 2). It also offers BI tools, and a run time for embedding Looker analytics into applications. The platform uses machine learning to inspect and discover data relationships, then allows data workers to develop purposebuilt models and metrics using LookML. a modeling language that provides structured query language (SQL) abstraction, reusability, and team development, as well as integrates more modern constructs like Git for version control Scenarios: Looker is an insight platform. 3 Different business groups and governance teams collaboratively develop data models in the Looker data platform to suit their needs. Since the data lake stores data in its raw form, different groups create various models to suit their needs. Business users employ Looker s embedded BI tools to perform basic reporting and analysis on core business data integrated with high-velocity transitory data from web, mobile, or devices. Finally, Looker Data Sciences supports insight execution with Looker big data apps and Powered by Looker embedded analytics. Maturity: Looker is a mid-stage startup looking to disrupt enterprise BI. Looker Data Sciences was founded in 2012 by early employees of Netscape and Greenplum, and now has over 500 paying customers. It just closed a Series C round of venture funding in January It is winning with firms that have completed big data lakes or big databases like Amazon Redshift, Google BigQuery, HPE Vertica, or Pivotal Greenplum. It is now looking to accelerate business value without adding cost and complexity. Challenges: Looker needs to balance scaling its business and adding features. Like all of the innovators featured in this report, Looker Data Sciences also has to spend a lot of time educating potential buyers, many of whom have been previously skeptical of a single tool that claims to combine integration, governance, and BI at big data scale. Looker told us that its biggest challenge now, however, is keeping up with the demand. We also anticipate that as more enterprise demand builds, customers will push Looker to incorporate more advanced features, which will strain its ability to keep costs down. 3

5 Road map: Looker is improving API accessibility and enterprise-grade features. Looker Data Sciences next release will feature enabling API access to every facet of its platform and improving features for its most demanding enterprises customers. For example, it plans to improve the lineage and security of both content and the data model by providing an enhanced staging to production workflow. After that, Looker is targeting advanced analytics, better job scheduling, and write back. FIGURE 2 Looker Deploys On Your Data To Accelerate BI Source: Looker Data Sciences SnapLogic Separates Integration Logic And Execution Firms are discovering that they need more than batch transformation operations against static data as they move to real time. For example, many firms are developing Spark code to run microbatch data integration and aggregation for high-velocity use cases. 4 SnapLogic seeks to coordinate both application and data integration flows across cloud and on-premises, batch, and real-time. Offering: SnapLogic offers cloud design and hybrid execution. SnapLogic is an integration tool that combines application and data integration. Its product is packaged in two pieces first, an integration design and management tool delivered via the cloud as-a-service no installation required. Second, it offers different Snaplexes that let users execute integration logic in multiple places, such as the cloud, in the data center, or Hadoop (see Figure 3). 5 It also sells Snaps, which are connection components that allow customers to access data from other technologies. 4

6 Scenarios: SnapLogic is pushing beyond cloud integration. SnapLogic has historically gone to market as a cloud integration platform that includes both applications and data. It has moved away from cloud-only, hence recent Groundplex and Hadooplex components for executing integration work in different places and in hybrid scenarios. In addition to powering cloud analytics, and hybrid cloud application integration, it is increasingly used by customers to manage data lakes and even replace legacy ETL processes. Maturity: SnapLogic is a mature vendor that has adapted to big data and the cloud. Unlike other vendors in this report, SnapLogic is not a new kid on the block. Founded in 2006, it began taking investment in 2010 and secured a Series E in 2015 in an effort to capitalize on cloud, big data, and internet of things (IoT). It has more than 400 paying customers and 200 employees. It also has a mature go-to-market strategy that includes enterprise direct sales, services partners, and original equipment manufacturers (OEMs). Its target buyer is primarily in tech management, but it is seeing more interest from the business side, especially for IoT solutions, where tech management is often not involved. Challenges: SnapLogic must deal with organizational inertia and open source. Data and app teams are often separate, so customers treat SnapLogic as either a data or an application integration tool, not both. As a result, customers miss the full potential. Talent and skills are another issue, as customers know they can hire a traditional ETL developer more easily than finding someone trained in SnapLogic s tool. The pace of open source innovation is a challenge as well. Customers often bring in competitive open source integration tools first and expect SnapLogic to connect. Road map: SnapLogic is embracing open source streaming and data lakes. Its May 2016 release will add support for Apache Kafka, and its summer release will bring continued streaming and self-service usability updates. Going forward, it plans to focus more on data lake ingestion, streaming workflow, metadata, security, and governance. At a business level, its plans include expanding internationally and developing deeper relationships with channel partners and independent software vendors (ISVs). 5

7 FIGURE 3 SnapLogic Defined In The Cloud But Executed In Many Places Source: SnapLogic Snowflake Creates Data Warehouses On Demand Many big data integration tools apply the power of distributed computing to data processing tasks, letting them accomplish tasks more cheaply and more quickly. Snowflake Computing simplifies the complexity of integrating and processing diverse data, without being a traditional data integration tool. Offering: Snowflake offers data warehouses on demand. Its Elastic Data Warehouse lets customers create virtual data warehouse constructs in Amazon Web Service (AWS) on demand. Customers source raw or nearly raw data into Snowflake Computing s cloud object store, and the software handles the metadata, linking, aggregating, and caching. All of this is accomplished in a shared, elastic environment that allows firms to cheaply source in data and create SQL-accessible analytic databases on demand (see Figure 4). 6

8 Scenarios: Snowflake helps firms simplify integration and eliminate data marts. Snowflake Computing s secret sauce is the database engine it has built in AWS and the automation tools that create virtualized views of data, thus eliminating the many traditional integration steps. Customers use the technology to eliminate physical data marts when modernization needs prove expensive. Other common uses include integrating machine data, such as from connected devices with business data for analytics, and running analytics applications, such as those that typically power online dashboards. Maturity: Snowflake is mid-stage startup with its foot on the gas. Founded in 2012, Snowflake Computing has taken over $70 million in funding. It has over 100 paying customers and 115 employees. It did not disclose its profitability outlook to Forrester, but with so much late-stage funding, many paying customers, and over 100 employees, Forrester assumes investors will seek an exit in the next two to three years through acquisition or initial public offering (IPO). Challenges: Snowflake s biggest challenge is organizational inertia. Snowflake Computing s biggest challenge is breaking potential customers out of the mindset that data warehousing has to be complex, expensive, and on-premises. Moving sensitive data into a public cloud solution where a lot of the processing is automagic is also a risk for many buyers. Lastly, Snowflake needs to do more work so it can fit into existing BI and data integration flows more easily. Road map: Snowflake wants to fully automate data warehousing. Snowflake s solution goes a long way in overcoming many obstacles with current data warehousing practices, but it stops short of being a fully automated platform for data warehousing with analytics applications built on them. Today, Snowflake sells direct as a software-as-a-service (SaaS) offering, but is pushing to have more ISVs using its software to power their applications using a Snowflake-inside moniker. FIGURE 4 Snowflake s Elastic Data Warehouse Eliminates The Need For Complex Integration Source: Snowflake Computing 7

9 What It Means Big Data Innovators Strain Enterprise Technology Vendors The best-of-breed technology selection strategy for big data integration tools and hardware is under stress. There are too many big challenges and hybrid requirements. Do you need real-time streaming or batch? Hadoop on-premises or in the cloud? It s more than likely you need all these things, and more. However, you increasingly cannot afford to configure and manage separate, best-of-breed, and often expensive data management, data integration, data governance, and analytics technologies. You need solutions that meet scale and hybrid needs more easily. This idea was a key theme for all of our innovators and a big reason why they told us customers are buying them and not complex enterprise tools. Enterprise software vendors are responding, however; which is why, for example, IBM, Oracle, and SAS are embracing a cloud-first strategy with their latest product releases. For example: Looker challenges the need for separate data, BI, and execution tools. By combining all these capabilities into one platform, Looker Data Sciences is targeting buyers who currently spend a lot of money on data integration tools like IBM InfoSphere DataStage and enterprise BI tools like SAP BusinessObjects. SnapLogic challenges the need for separate application and data integration tools. SnapLogic sees how application and data integration are coming together as firms push more into the cloud. It is targeting buyers who want to replace legacy or disparate API management, enterprise service buses, and data integration tools. Snowflake challenges the need for enterprise-grade hardware. Enterprise data integration and warehousing has always required big iron for performance. Snowflake Computing says, No more... do it in the cloud. While uses are limited today, it is rapidly progressing and expects to be able to meet the demands of even large enterprises in the next year. 8

10 Engage With An Analyst Gain greater confidence in your decisions by working with Forrester thought leaders to apply our research to your specific business and technology initiatives. Analyst Inquiry Ask a question related to our research; a Forrester analyst will help you put it into practice and take the next step. Schedule a 30-minute phone session with the analyst or opt for a response via . Learn more about inquiry, including tips for getting the most out of your discussion. Analyst Advisory Put research into practice with in-depth analysis of your specific business and technology challenges. Engagements include custom advisory calls, strategy days, workshops, speeches, and webinars. Learn about interactive advisory sessions and how we can support your initiatives. Endnotes 1 Forrester defines big data integration as tools that enable firms to store, process, access, report, and manipulate data using big data repositories such as Hadoop, distributed data stores, distributed databases, and distributed table stores. Traditional vendors in the space focus on accelerating or bringing new data source into classic integration processes. To learn more about the maturity and business value of integration tools, as well as other big data technologies, see the TechRadar : Big Data, Q Forrester report. 2 A data lake is a file system, typically running on a distributed infrastructure like Hadoop, where firms source and store large volumes of raw or nearly raw data cheaply for analysis later. Note that while most firms choose Hadoop, cloud object stores like Amazon Simple Storage Service (S3) or other distributed files stores like IBM Global Parallel File System (GPFS) or MapR-FS. 3 Forrester defines insights platform as a platform that unifies the technologies to manage and analyze data, test and integrate the derived insights into business action, and capture feedback for continuous improvements. To learn more about this emerging group of platforms, see the Insight Platforms Accelerate Digital Transformation Forrester report. 4 Forrester has defined this as the lambda architecture. To learn how Hadoop and streaming analytics platform interact, see the Internet Of Things Applications Hunger For Hadoop And Real-Time Analytics In The Cloud Forrester report. 5 While Hadoop may be installed in the data center, running an integration job in Hadoop is different because the integration execution logic, when pushed to Hadoop, runs as a YARN applications. Therefore, it can utilize the cluster resources to handle big data jobs more effectively. 9

11 We work with business and technology leaders to develop customer-obsessed strategies that drive growth. Products and Services Core research and tools Data and analytics Peer collaboration Analyst engagement Consulting Events Forrester s research and insights are tailored to your role and critical business initiatives. Roles We Serve Marketing & Strategy Professionals CMO B2B Marketing B2C Marketing Customer Experience Customer Insights ebusiness & Channel Strategy Technology Management Professionals CIO Application Development & Delivery Enterprise Architecture Infrastructure & Operations Security & Risk Sourcing & Vendor Management Technology Industry Professionals Analyst Relations Client support For information on hard-copy or electronic reprints, please contact Client Support at , , or clientsupport@forrester.com. We offer quantity discounts and special pricing for academic and nonprofit institutions. Forrester Research (Nasdaq: FORR) is one of the most influential research and advisory firms in the world. We work with business and technology leaders to develop customer-obsessed strategies that drive growth. Through proprietary research, data, custom consulting, exclusive executive peer groups, and events, the Forrester experience is about a singular and powerful purpose: to challenge the thinking of our clients to help them lead change in their organizations. For more information, visit forrester.com