The Next-Generation EDW Is The Big Data Warehouse

Size: px
Start display at page:

Download "The Next-Generation EDW Is The Big Data Warehouse"

Transcription

1 The Next-Generation EDW Is The Big Data Warehouse by Noel Yuhanna Why Read This Report EDW is not dead; it s evolving! Enterprise data warehouses have come a long way in delivering value by predicting trends, minimizing churn, and identifying new business opportunities. However, in the era of big data, traditional EDW is failing to meet new business requirements, such as support for real-time and ad hoc customer analytics, new sources of data, and self-service capabilities. Enterprise architects should read this report to learn how the new big data warehouse addresses these gaps by delivering timely and actionable insights to gain competitive edge and enable innovation and growth. Key Takeaways Without Modernizing Your Current EDW Platform, You Will Likely Fail Business users are demanding faster, more realtime, and integrated customer analytics from multiple sources, so they can make better decisions and increase their company s competitiveness. Current EDW platforms have gaps and limitations that fail to meet these new requirements. Forrester s Big Data Warehouse Strategy Extends The Existing EDW Framework Based on interviews of customers and vendors, Forrester has laid out an architecture to guide enterprise architects in creating a big data warehouse framework tailored to their firm s requirements to support both existing and new actionable business insights. You Need A Big Data Warehouse Strategy To Succeed Big data warehouse is a modern data warehouse architecture that leverages traditional and new data repositories, in-memory, cloud, and other technologies. forrester.com

2 by Noel Yuhanna with Gene Leganza and Shreyas Warrier Table Of Contents Notes & Resources EDW Has Been The Analytics Platform King For Decades But New Business Requirements Are Changing EDW Requirements EDW Technology Gaps Are Making Enterprises Look Elsewhere The Big Data Warehouse Extends The EDW Platform Big Data Fabric Connects The Superset Of Your Data Sources Including Your BDWs The BDW Provides A Comprehensive View And Integrated Analytics The Major EDW Vendors Provide BDW Components BDW Use Cases Go Beyond Traditional Analytics Forrester interviewed various customers in the financial, oil and gas, retail, and healthcare sectors. Related Research Documents Big Data Fabric Drives Innovation And Growth The Forrester Wave : Enterprise Data Warehouse, Q TechRadar : Big Data, Q Recommendations 12 Extend Your Current EDW Platforms Toward A BDW Strategy 13 Supplemental Material Forrester Research, Inc., 60 Acorn Park Drive, Cambridge, MA USA Fax: forrester.com 2016 Forrester Research, Inc. Opinions reflect judgment at the time and are subject to change. Forrester, Technographics, Forrester Wave, RoleView, TechRadar, and Total Economic Impact are trademarks of Forrester Research, Inc. All other trademarks are the property of their respective companies. Unauthorized copying or distributing is a violation of copyright law.

3 EDW Has Been The Analytics Platform King For Decades The enterprise data warehouse is an architecture, not a technology. The traditional EDW platform has served and continues to serve a broad range of business users, including enterprise architecture (EA) pros, feeding both analytical and operational systems. EDWs: Organize and aggregate historical analytical data from functional domains. EDWs house information from data subject areas such as customer, manufacturing, finance, and human resources that align with key processes, applications, and roles. Most of the traditional EDW platform has been built using relational database management system (DBMS) and columnar database platforms using extract-transform-load (ETL), change data capture (CDC), and replication technology (see Figure 1). Offer a strong decision support framework. EDWs provide in-database analytics, predictive models, and embedded business algorithms to drive business decisions. Are central to a firm s data ecosystem. The EDW is a proven ecosystem that supports integration with data models and security frameworks, automation, and a broad range of business intelligence (BI) and visualization tools. 1 Provide the foundation for BI. EDWs support timely reports, ad hoc queries, and dashboards and supply other analytics applications with trusted and integrated data. Many use the EDW to deliver operational intelligence in the form of query responses, reports, dashboards, charts, and other analytic views in support of various decision scenarios. 2

4 FIGURE 1 The Traditional Enterprise Data Warehouse Platform Source Storage/persistence Compute/processing OLTP CRM ERP Social SaaS Relational Columnar Modeling Data quality Security governance transformation Integration Business intelligence Operational reporting Analytics Predictive analytics ETL/CDC/replication On-premises Hybrid Cloud But New Business Requirements Are Changing EDW Requirements Today, business users are demanding real-time analytics that s integrated from legacy, social, and cloud sources, while business execs want self-service and autonomous access to fit-for-purpose customer data insights. In our 2016 global survey, 59% of respondents stated that leveraging big data and analytics was a critical or high priority (see Figure 2). But increasing data volume and dealing with multimodel customer data are slowing down timely analytics and putting constraints on traditional warehouse platforms, causing firms to revisit their EDW architectures. Businesses are reporting that current EDW platforms: Can t share current data quickly enough for timely business decisions. With increasing big data comes a major challenge for any enterprise: knowing what to look for and where, and then making sense of it. In our survey, 30% of businesses reported growth of data volume and variety affecting their BI strategy (see Figure 3). Firms are realizing that traditional data warehouses fall short when it comes to real-time analytics. 2 With data explosion and increasing demand for real-time analytics by the business, we are finding it challenging to support our LOB users. While we already use Hadoop, our traditional data warehouses still are important for analytics, but we are now looking at modernizing that architecture. (Enterprise architect, oil and gas, North America) Don t support ad hoc and dynamic analytics for new customer trends. EDWs were built for a limited set of uses, providing answers to known questions. But 27% of enterprises report that fastchanging analytics and reporting requirements are one of the biggest challenges when orchestrating 3

5 their BI strategy, while 30% cite the growth and variety of their data. Processes using traditional EDWs don t scale well when you introduce ambiguity or add new and dynamic questions. EDWs need to ingest, process, and curate data continuously and support dynamic insights. We are now looking to [build] a modern data warehouse that can provide insights to all kinds of tough questions critical for our business to succeed. Including identifying business risks and opportunity. (Business analyst, financial services, Europe) Don t provide a self-service platform for strategic and operational decision-making. When executives need to determine why something is happening or what the best course of action is, they can t wait for a data processing cycle to make data available. Analysts need to be able to aggregate and prepare data sets without technology management s involvement. Twenty-seven percent of companies reported lack of end user self-service capabilities as one of the biggest challenges in executing their BI strategy. Self-service customer analytics has become critical for organizations to succeed. Self-service for all data is our long-term strategic direction, and we know it ll take us some time to get there, but we have to start somewhere. We have started to integrate our current EDW appliances to Hadoop and in-memory to create [a] unified and integrated analytical platform. (Enterprise architect, financial services, North America) FIGURE 2 Big Data And Analytics Have Become A Priority Which of the following initiatives are likely to be your organization s top business priorities? (Better leverage big data and analytics in business decision-making) Don t know 1% Critical priority 19% High priority 40% Moderate priority 28% Low priority 9% Not on our agenda 0% Base: 3,343 data and analytics decision-makers Source: Forrester s Global Business Technographics Data and Analytics Survey,

6 FIGURE 3 Data Growth And Variety Are Affecting Business Intelligence And Analytics Strategy What are the biggest challenges your firm faces when orchestrating its business intelligence strategy? Data security and privacy 35% Growth of data volume/variety Fast-changing analytic and reporting requirements Lack of alignment between IT and business Lack of adequate user training Poor data quality Inadequate or missing relevant internal skills Legal and regulatory compliance Lack of data standards Lack of end user self-service capabilities Lack of access to data and insights Inadequate change management programs (communications, incentives, etc.) Widespread utilization of insights for decision-making and planning Lack of business C-level executive support 30% 27% 25% 25% 25% 22% 22% 21% 20% 19% 17% 17% 16% Don t know/does not apply 4% Base: 3,343 data and analytics decision-makers Source: Forrester s Global Business Technographics Data and Analytics Survey, 2016 EDW Technology Gaps Are Making Enterprises Look Elsewhere While traditional data warehouses often took years to build, deploy, and reap benefits from, today s organizations want more simplified, agile, integrated, cost-effective, and automated solutions. Firms are revisiting their EDW strategies, as they spend too much time loading, unloading, transforming, securing, integrating, and curating customer data. Enterprises face: 5

7 A data volume explosion that s affecting customer analytics. Traditional structured data continues to grow rapidly, slowing down legacy data warehouse systems and affecting analytics and timely insights. Regulatory requirements now mandate storing compliant data for several years, and business growth is generating more data at a faster pace than ever before. We are experiencing tremendous data explosion for traditional data sets that s impacting our data warehouses. While we are still looking at improving the performance of existing data warehouses for the short term, we are now starting to look at alternatives, both supplementary and replacement as longer-term strategy. (Enterprise architect, oil and gas, North America) Data variety that s making it harder to support using traditional warehouses. Business users can t easily spot patterns and trends in content such as documents, , images, audio, and social media. In addition, storing, processing, and accessing unstructured data in data warehouses pushes the limits of traditional technologies and architectures, which were not designed to handle such data types. 3 Data speed that s making it harder to keep up. New sources of data are coming in a lot faster, such as sensor and machine data, log and clickstream data, cloud and software-as-a-service (SaaS) data, and other streaming data. Storing, transforming, and processing such data requires new technologies and systems to support new customer analytics, real-time analytics, and operational intelligence reporting. 4 For us, real-time data sharing is critical internally among business users but also with various partners that we engage with. Currently, not all of our data is available to everyone, but we are looking at ways of expanding to support a more self-service real-time big data platform. (Data scientist, biotechnology company, North America) The Big Data Warehouse Extends The EDW Platform Firms are already using a variety of technologies in their big data strategy to support new, nextgeneration analytics (see Figure 4). The big data warehouse (BDW) is a modern data warehouse architecture that leverages traditional data warehouse architectures as well as modern big data technologies (see Figure 5). Forrester defines the big data warehouse as: A specialized, cohesive set of data repositories and platforms used to support a broad variety of analytics running on-premises, in the cloud, or in a hybrid environment. BDW leverages both traditional and new technologies such as Hadoop, columnar and row-based data warehouses, ETL and streaming, and elastic in-memory and storage frameworks. 6

8 FIGURE 4 Cloud, Streaming, And Distributed In-Memory Are Already Part Of Firms Big Data Strategy Which of the following are included in your plans for big data? Public cloud big data services 40% Large-scale predictive modelling, data mining, or other advanced analytics Streaming analytics/computing Distributed in-memory databases, grids, analytics tools Unstructured data mining/analytics Packaged analytics technologies that brand themselves as big data Marketing or digital data management platforms and service providers that brand their offerings as... Creating or building out a data lake Data anonymization or de-identification Hadoop (including Hbase or Accumulo) Semantic technologies (ontology building, search, autocuration, graph, etc.) A massively parallel processing (MPP) data warehouse NoSQL other than Hadoop 36% 33% 30% 28% 27% 26% 26% 23% 23% 22% 18% 16% Don t know 8% Base: 2,094 data and analytics decision-makers Source: Forrester s Global Business Technographics Data and Analytics Survey,

9 FIGURE 5 Big Data Warehouse Architecture Sources Storage/compute processing Management Interaction Use cases OLTP CRM ERP Social SaaS Devices Sensors Relational Columnar Apache Hadoop Integration Data quality Security Transformation Governance Machine learning In-memory/ Apache Spark Self-service Ad hoc interactions Modeling Business intelligence Operational reporting Analytics Predictive analytics Real-time analytics ETL/CDC/replication Streaming On-premises Hybrid Cloud Big Data Fabric Connects The Superset Of Your Data Sources Including Your BDWs The big data warehouse is part of a larger big data fabric architecture, which embodies data from multiple potentially distributed data sources, including BDWs and data lakes. The big data fabric architecture enables integration, data quality, security, governance, data curation, data preparation, and data management to support an end-to-end, real-time big data platform (see Figure 6). 5 The two architectures: Can exist separately but work best as complements. Multiple traditional EDWs, BDWs, and data lakes have become the new norm to support the variety of analytical workloads. While both BDWs and big data fabric architectures can exist independent of each other, typically firms leverage both to deliver a blend of real-time and batch across various distributed enterprise data sets to support broader use cases. For example, some financial services organizations use the BDW to support mostly financial data analytics leveraging columnar data warehouses, Hadoop, and ETL technologies. The BDW also acts as a source within the big data fabric architecture that delivers real-time customer analytics across BDW, Twitter, Salesforce, and clickstream data. 8

10 Vary significantly in the amount of data transformation required. We often see big data fabric used for real-time analytical use cases that integrate data across many disparate sources, including BDW, with the BDW used mostly for batch and near-real-time analytics for data stored in a data warehouse and Hadoop clusters that require aggregation, transformation, and further processing before becoming available to BI users or analytical processes. Exploration occurs within the fabric, with transformations captured within the BDW. FIGURE 6 Big Data Fabric Architecture Integrated With Big Data Warehouse Big data fabric Hadoop Spark BDW Processing and persistence New York Hadoop Spark EDW Singapore On-premises sources Cloud sources Data ingestion (streaming/replication/batch) The BDW Provides A Comprehensive View And Integrated Analytics A key component of the BDW architecture is the ability to leverage various specialized data repositories such as traditional relational data warehouses, columnar data warehouses, and Hadoop. Unlike traditional data warehouses, the BDW minimizes complexity and hides heterogeneity by embodying a trusted model, supports all kinds of data types including unstructured data, and adapts to changing business requirements more rapidly through a self-service platform. The BDW centralizes administration of distributed data repositories, in-memory compute resources, metadata, storage, access, and processing functions. It leverages new technologies such as: Hadoop to support diverse data sets and distributed computing. By leveraging Hadoop, the BDW enables organizations to deal with a wider variety of data structures than traditional EDWs. Hadoop can also deal with extremely large data sets that are inappropriate for traditional EDW 9

11 platforms. Enterprise architects can choose to store data in relational, columnar, wide columns, or Hadoop based on business needs. For example, a retailer leverages legacy structured data stored in a traditional data warehouse, and Hadoop for clickstream data, and integrates them to deliver a 360-degree view of the customer for recommendations and churn analysis. In-memory to enable faster customer analytical capabilities. A key component of the BDW is the ability to use in-memory to deliver performance and faster access to business data. We are heading toward having large memory platforms that will store petabytes in DRAM and Flash/SSD in the coming years. For example, several retailers are using BDW to leverage customer-related data to determine product discounting strategy, optimize product distribution across stores, and enable personalized customer experiences. Streaming engines to support new data channels for ingestion and processing. Market data, clickstream, mobile devices, and sensors are new sources for analytical information that are not in your existing data warehouse. Streaming technology boosts integrating, transforming, and curating data on diverse data streams in real time. 6 Integrating streaming technology with data platforms such as Hadoop and Spark as well as traditional data warehouses has become critical. For example, we see oil and gas industry firms leveraging streaming technology for insights into new business opportunities, such as predicting staffing and resource requirements for various drilling sites and performing machine failure analysis. The Major EDW Vendors Provide BDW Components From an implementation viewpoint, most enterprises are currently building BDW platforms themselves by integrating their traditional data warehouses with Apache Spark, Hadoop, Storm, and in-memory technologies. Forrester sees many enterprises already using an extract-hadoop-load (EHL) approach to: 1. Extract data from various source systems such as traditional databases and flat files. 2. Load data into Hadoop to perform aggregation and transformation using Apache Hadoop ecosystem tools. 3. Finally load the result into the EDW platform. 7 BDW Use Cases Go Beyond Traditional Analytics Adoption of BDW architectures will accelerate as enterprises run into existing EDW challenges. But building a BDW platform internally will require more time and effort, which will likely put pressure on the overall business technology (BT) agenda. The good news is that solutions are starting to emerge from vendors such as IBM, Microsoft, Oracle, SAP, Snowflake, and Teradata that provide some or all of the components to build and deploy a BDW strategy. 8 Enterprises are already using BDWs to support social analytics, risk analysis, campaign analysis, fraud assessment, and pricing trends. The top BDW use cases include: 10

12 Integrated analytics. A key challenge in the traditional EDW approach was that if data didn t exist in the warehouse, you couldn t do any analytics full stop. With BDW architecture, you can perform integrated analytics across data warehouse and Hadoop clusters. Hadoop can store and process large sets of semistructured and unstructured data, log files, and streaming data with ease. For example, health research often requires looking at complex patient data and determining how effective a treatment is likely to be based on factors like age, sex, and health status. The BDW enables gathering and storing millions of data points in Hadoop and performing complex navigation and modeling using traditional data warehouse and in-memory technology. Internet-of-things (IoT) analytics. Traditional data warehouses don t deal with IoT data. However, the BDW offers the ability to store, process, and access large volumes of IoT data from sensors and devices in Hadoop repositories efficiently through automation and machine learning technologies. Manufacturers deal with highly sophisticated machinery to support their plants, whether they re building a car, airplane, or tire or bottling wine or soda. Every minute of machine downtime can cost a manufacturer dearly. IoT analytics on BDW platforms enables manufacturers to predict machine failures based on sensor data, minimizing or eliminating production slowdown. Right-time business analytics. Traditional EDW architectures were based on mostly batch processing, with ETL doing the heavy lifting of data from traditional systems to operational systems to data warehouses. As a result, by the time data arrived in data warehouses, it was already 12 to 48 hours old. BDWs enable right-time analytics by leveraging streaming and replication with direct access to data sources, whether on-premises or cloud, bypassing traditional ETL approaches. The financial services industry has been an early adopter of BDW to support right-time analytics for portfolio management, fraud detection, and asset management. Adaptive, self-service analytics. Most EDWs use predefined data sources to deliver predictive analytics, trends, and insights. The BDW enables organizations to dynamically leverage new data sources quickly to deliver new insights. It enables self-service capabilities for business users to ask complex and new questions so they can make more accurate decisions. The BDW adapts to the new sources and can help correlate data using machine learning and adaptive intelligence. For example, a major European bank recently built a BDW framework that business units now use to support self-service for making better decisions on investments and risks. The platform represents a major shift from the static reports the bank used previously. 11

13 Recommendations Extend Your Current EDW Platforms Toward A BDW Strategy Don t throw away your existing EDW platform! The investments you have already made in EDWs will form the foundation of the next-generation BDW strategy. However, attaining this demands that you rearchitect your existing EDW platform and invest in new technologies to deliver on a new vision of right-time analytics, self-service, and intelligent and contextualized customer analytics. Forrester recommends that enterprise architects extend existing EDW platforms toward a BDW strategy by leveraging: Hadoop for low-cost storage and processing of big data. Let Hadoop be the first stop for your big data that has no other home in your data warehouse. Hadoop offers the ability to store very large volumes of data (including unstructured data) more efficiently than traditional warehouses and at a fraction of cost. In addition, Hadoop helps you offload data from traditional warehouses and leverage a distributed computing framework to perform transformation, aggregation, and curation quickly. In-memory technology to support right-time analytics. Without in-memory technology, customer analytics, personalization, and right-time analytics will run slowly. This could cause you to miss key trends like customer churn or miss the opportunity to offer new products and services or identify weak markets. You can also use data from the BDW as part of the bigger big data fabric framework that leverages distributed in-memory computing to deliver a broader enterprise information fabric. Hybrid platforms to support on-demand and scalable BDWs. Storing all of your data onpremises need no longer be the default. Cloud platforms like those from Amazon Web Services, Google, IBM, Microsoft (Azure), Oracle, and Rackspace offer pay-as-you-go facilities to store, process, and access any amount of data. 9 Hybrid is the new norm look at utilizing both onpremises and cloud data warehouse platforms as part of your BDW architecture, with a common administration facility. Vendor solutions that help achieve faster time-to-value. Data warehouse, Hadoop, and other big data solutions from vendors such as Cloudera, Hortonworks, IBM, MapR Technologies, Microsoft, Oracle, SAP, and Teradata can reduce time-to-value by automating and simplifying various BDW functions and implementation steps. Look at vendors that support broader solutions and can support your business data. Ask your vendor how it plans to provide the BDW vision. Review the various components that the vendor has integrated and ask how it plans to fill any gaps. 12

14 Engage With An Analyst Gain greater confidence in your decisions by working with Forrester thought leaders to apply our research to your specific business and technology initiatives. Analyst Inquiry To help you put research into practice, connect with an analyst to discuss your questions in a 30-minute phone session or opt for a response via . Learn more. Analyst Advisory Translate research into action by working with an analyst on a specific engagement in the form of custom strategy sessions, workshops, or speeches. Learn more. Webinar Join our online sessions on the latest research affecting your business. Each call includes analyst Q&A and slides and is available on-demand. Learn more. Forrester s research apps for iphone and ipad Stay ahead of your competition no matter where you are. Supplemental Material Forrester s Global Business Technographics Data And Analytics Survey, 2016 was fielded in March This online survey included 3,343 respondents in Australia, Brazil, Canada, China, France, Germany, India, New Zealand, the UK, and the US from companies with 100 or more employees. Forrester s Business Technographics ensures that the final survey population contains only those with significant involvement in the planning, funding, and purchasing of business and technology products and services. Research Now fielded this survey on behalf of Forrester. Survey respondent incentives include points redeemable for gift certificates. Please note that the brand questions included in this survey should not be used to measure market share. The purpose of Forrester s Business Technographics brand questions is to show usage of a brand by a specific target audience at one point in time. 13

15 Endnotes 1 Today, organizations still rely on EDW platforms to deliver actionable, timely, and trustworthy intelligence. EDW technology organizes and aggregates analytical data from various functional domains and serves as a critical repository for organizations operations. See the The Forrester Wave : Enterprise Data Warehouse, Q Forrester report. 2 It takes a long time to measure a business process. Enterprise data hubs need to accommodate more data and an infinite set of queries. See the Create A Road Map For A Real-Time, Agile, Self-Service Data Platform Forrester report. 3 Data consumers from casual data analysts to data scientists to your customers are looking across a broad variety of data today to find answers to their questions. See the Compose Digital Data To Create A Symphony Of Insight Forrester report. 4 Data bottlenecks create business bottlenecks. The days of provisioning data to simply meet the requirements of systems of record are over. Business stakeholders at the executive and line-of-business levels need data faster to keep up with customers, competitors, and partners. See the Create A Road Map For A Real-Time, Agile, Self-Service Data Platform Forrester report. 5 Forrester defines big data fabric as bringing together disparate big data sources automatically, intelligently, and securely, and processing them in a big data platform technology, such as Hadoop and Apache Spark, to deliver a unified, trusted, and comprehensive view of customer and business data. See the Big Data Fabric Drives Innovation And Growth Forrester report. 6 Streaming technology helps integrating, transforming, and curating data on diverse data streams in real time. See the The Forrester Wave : Big Data Streaming Analytics, Q Forrester report. 7 Forrester sees many enterprises already using an extract-hadoop-load approach to extract data from various source systems, such as IoT devices and cloud and traditional platforms, then load it into Hadoop, perform aggregation and transformation, and finally load it into the EDW to support business analytics. See the The Forrester Wave : Enterprise Data Warehouse, Q Forrester report. 8 Most big data integration vendors focus on making classic processes faster with tools for moving data into a lake and working with it there. Three innovative vendors Looker Data Sciences, SnapLogic, and Snowflake Computing offer alternative approaches. See the Breakout Vendors: Big Data Integration Forrester report. 9 According to Forrester customer feedback, such cloud-based storage is typically over 20% less expensive than onpremises deployment. 14

16 We work with business and technology leaders to develop customer-obsessed strategies that drive growth. Products and Services Core research and tools Data and analytics Peer collaboration Analyst engagement Consulting Events Forrester s research and insights are tailored to your role and critical business initiatives. Roles We Serve Marketing & Strategy Professionals CMO B2B Marketing B2C Marketing Customer Experience Customer Insights ebusiness & Channel Strategy Technology Management Professionals CIO Application Development & Delivery Enterprise Architecture Infrastructure & Operations Security & Risk Sourcing & Vendor Management Technology Industry Professionals Analyst Relations Client support For information on hard-copy or electronic reprints, please contact Client Support at , , or clientsupport@forrester.com. We offer quantity discounts and special pricing for academic and nonprofit institutions. Forrester Research (Nasdaq: FORR) is one of the most influential research and advisory firms in the world. We work with business and technology leaders to develop customer-obsessed strategies that drive growth. Through proprietary research, data, custom consulting, exclusive executive peer groups, and events, the Forrester experience is about a singular and powerful purpose: to challenge the thinking of our clients to help them lead change in their organizations. For more information, visit forrester.com