Lake or Cesspool? The Challenges of Big Data Infrastructure Big data environments are making it quick and easy for companies to store any and all forms of audience and customer. Marketers are increasingly seeking to activate this data to power targeted marketing and personalization. The concept of a data lake has emerged as a form of big data repository to support a traditional data warehouse. A data lake stores large volumes of data in structured, semistructured, or unstructured form without the need to define a data schema up-front. This not only allows for an easy and nimble way to capture data, but also provides agile and granular access for analytics. A well-conceived data lake should empower data scientists and marketing analysts to mine insights and identify new attributes for targeted marketing or predictive modeling. This enables organizations to focus on extracting and processing only those data elements that will drive the highest business value. As a result the right data gets incorporated into the more structured data aggregates and attributes, which, in turn, power marketing execution. However, the ability to collect data from a multitude of sources at an expedient rate can create an opening for data lakes, or storage systems, to become a quagmire of information that is not easy to understand or to act upon.
Companies that are considering competitive market advantage and planning or actively engaged with big data infrastructure should ask themselves: How good is your big data in terms of quality? In big data environments, the ingestion of data is often unqualified. This leaves much of the due diligence to users, who may spend 90 percent of their time on data cleansing and other preparation, instead of data analysis. Data lakes that provide nimble access for the data scientist to explore, should not, in turn compromise on data quality for business applications. High-quality data is required for reliable reporting, accurate modeling, campaign execution, and other production marketing capabilities. In data lakes, data may be raw or unstructured, but it should still be as accurate and complete as possible. Businesses need to consider how they are managing their data and what gate-keeping methods need to be applied to keep their data lakes clean and easy to utilize in a production or end-user environment. DATA QUALITY EXAMPLE Overview: Web analytics hit level data is ingested into the data lake through an automated process. Events and custom variable meanings have been changed by the web analytics team without notifying other areas of the business including the central analytics group that uses web analytics data within the data lake for modeling. Problem: Web behavioral attributes used in the modeling are no longer accurate. Solution: Institute process for communicating changes that affect data structure or meaning downstream to all groups that consume data. Solution: Create automated data quality processes which alert data administrators when distribution of data values changes significantly. Do you have the necessary taxonomy and cross-reference data to analyze and aggregate data from the data lake? Querying and analyzing a wide range of data requires the effort to define and classify the data elements within. There is little value in maintaining an environment to house and query big data if much of the supporting descriptive elements needed to analyze, classify, and aggregate that data cannot be accessed within the lake itself. It is important to have a disciplined taxonomy for descriptive data elements that is consistent across the company (e.g., individual and household identifiers, customer segmentation, web page categories, campaign metadata, etc.). This discipline will allow the analyst to more easily and effectively prepare, analyze, and aggregate data within the lake, and then effectively integrate the data across systems. CROSS-REFERENCE EXAMPLE Overview: Mobile app usage event data is available in the data lake. The data can be analyzed and aggregated by customer ID, but customer (CRM) data is not available in the data lake including customer subscription status, customer segment, and customer value. The business wants insight into which customers are the heaviest app users. The business also wants to be able to leverage app usage to target marketing messages. ` Problem: Mobile app usage data cannot be joined to customer attributes within the data lake to analyze usage. In addition, the mobile app data is too large to feasibly import into the data warehouse for analysis. Problem: Customer subscription status and segment data is not available to extract the relevant data needed to target marketing messages. Solution: Integrate customer attributes into the data lake so they can be used in analysis or data selection.
DATA TAXONOMY EXAMPLE Overview: Display view and click data is kept in the data lake. A campaign identifier is available in the display data to identify the particular display marketing activity. The business wants to analyze the performance of display marketing at the campaign group level. Campaign metadata is not maintained within the data lake. Problem: Because the display view and click data are isolated from the campaign metadata, the campaign performance cannot be aggregated to the campaign group level. Solution: Integrate campaign metadata into the data lake. Overview: Web analytics data is maintained in the data lake. Business users want to report on the number of form submissions. There are different ways that form submissions can be aggregated from the event-level data, (e.g., counting total events, counting unique events by visit or visitor, counting events within certain time- or campaign-based attribution rules, etc.). Problem: Since a standard has not been established by the business analytics group on how to define aggregated form submissions, each analyst aggregates the data in a slightly different way, creating inconsistency and a lack of trust from the business. Solution: Establish data aggregation and reporting standards. Define aggregate metrics across the organization. Solution: Create aggregated summary tables or views of the data. How easy is it to integrate your big data with other important (traditional) data environments? Big Data is often tied to specific customer touchpoints and actions (e.g., transactions and web log activity). In order for it to be meaningful and actionable, t his data needs to integrate with other customer and campaign data housed in the data warehouse and other marketing or analytics platforms (e.g., campaign and content management, transactional systems, and modeling platforms). It is important to build a big data environment with serious consideration toward how it will integrate across other datasets. For example, common keys/identifiers must be accessible across datasets, and consistent methods must be applied for identification management. This will drive better associations back to customers and ensure a common enterprise customer definition. ABILITY TO INTEGRATE DATA WITH OTHER DATA ENVIRONMENTS EXAMPLE Overview: Web activity data, associated with a visitor ID, is maintained in the data lake. Sometimes when visitors log into their account a web profile ID is captured in the web activity data. The association between the profile ID and customer ID is not available within the data lake. Problem: Web activity data cannot be tied back to customers, therefore it provides little value to the CRM program. Solution: Bring the profile ID to customer ID cross-reference file into the data lake. Use identity management solution to associate visitor IDs to customer IDs and retroactively match data for anonymous visitors who log into their account at a future date. How does big data fit into your overall data strategy? All of the above points add up to the fact that big data and data lakes do not operate in isolation. How will marketing and analytics teams leverage the data? How will the data drive performance or customer insight? How will that data eventually be activated to power targeted marketing, predictive modeling, or personalization? A comprehensive data strategy must be developed in order to understand the objectives and requirements of the big data infrastructure before it is built and activated. Technology, marketing, and analytics user groups need to be included in the discussions that may require education about what big data is and how it could apply to their work.
HOW BIG DATA FITS INTO DATA STRATEGY EXAMPLE Overview: Analysts discover a strong correlation between certain online web activities and future purchase activity within the data lake. Marketers want to use these online web activities to trigger marketing messages. IT returns with a plan to aggregate and import the web activity data into the CRM data store they estimate several weeks to do the data development necessary to incorporate the data into CRM system. Problem: An effective data strategy is not in place to quickly transform insights from big data into marketing execution. Solution: Establish a process and infrastructure that allows agile data communication from your data lake into your marketing execution platforms and vice versa. Solution: A complete data strategy will encompass the full lifecycle of data from collection to storage to consumption by analysts and marketing platforms. Do we have a governance process in place to maintain the integrity of the data lake over time? Setting up the data lake is only part of the story. Over time, people can be less sensitive to the quality and format of the data they choose to pump into the lake, or business needs and usage can change. You must have a governance process in place to periodically (1-2x per year) review the data quality in the lake to make sure it is clean and consistent and that the data being captured is still valuable to those using it for reporting or analysis. If a data source being fed into the lake is no longer used by the business, it is time to turn off that source. Minimizing extraneous or imperfect data flowing into the data lake will reduce maintenance costs and make it easier for analysts to find and use the right data. DATA GOVERNANCE EXAMPLE Overview: Data columns, tables, or sources within a data warehouse or lake may become deprecated as new data is introduced. Old data sources may persist to maintain processes during transition periods, and then they are never cleaned up, or the effort is never invested to fully transition from legacy data sources to new data sources. Over time, your data warehouse or data lake will become cluttered and difficult to use, often requiring extensive internal knowledge to be able to identify the best data sources to use for a particular task. Problem: Ineffective data governance to maintain integrity of data sources and to fully transition systems from old data sources to new data sources. Solution: Data governance needs to become a priority within the organization. Businesses should realize that investment in data governance is reflected in lower costs on future data projects, lower analyst training costs, and more consistency/less likelihood for error in reporting and data-driven marketing. In the end, a data lake should not be a data dumping ground or simply an access point for data scientists to mine insights from log data. A data lake should be seen as an environment for developing data-powered marketing execution. To effectively support marketing execution, the data must be of high quality, be descriptive and supported by a consistent taxonomy, be able to integrate across other data environments and marketing systems, and be guided by a comprehensive data strategy. Maintaining a clean and effective data lake requires planning and governance; however, the ease and agility of storing data in the lake can lead to laziness and lack of effort in getting the right data or enforcing data quality. This is how data lakes can become polluted over time. If you are not driving value from your big data; if your data scientists and analysts are spending all of their time trying to clean, prepare, or connect data; or if they don t even want to touch the stuff, then your data lake may have transformed into a murky cesspool.
ABOUT THE AUTHORS Robert Schroko Principal, Marketing Solutions Robert is a seasoned consultant/executive with a track record for shaping customer development and engagement strategies for major brand companies (Condé Nast, Saks Fifth Avenue, SONY Music) across digital and traditional media platforms. He has expertise in driving deeper and more profitable customer relationships across all phases of the customer lifecycle by leveraging transaction, behavioral, and big data analytics and applying it to web, tablet, mobile, store, and traditional media. Robert has utilized his combination of analytical and business acumen to implement customer acquisition, development and retention strategies through predictive modeling; implement and optimize loyalty programs; and create in-store customer-centric clienteling solutions. His talents also included developing customer 360 databases, consolidating individual consumer behavior and transaction data collected from desktop, tablet, mobile (all via Omniture), and offline media. Robert hold an MS and a BS degree in industrial and management engineering from Rensselaer Polytechnic Institute. He is a frequent guest speaker at industry and academic events, which have included the University of Pennsylvania Wharton MBA Executive Program, Teradata Partners Conference, and The Conference Board Marketing Conference. Peter Kemp Principal, Customer Strategy Peter has more than twenty years of strategy, marketing, and CRM consulting experience in retail, consumer products and healthcare. At Merkle, he is a leader in developing the strategies, processes, organizations, and change management plans needed for clients to become more customer centric. Peter began his career in brand management at Kraft and Black & Decker. He then worked for Accenture in the consulting firm s marketing strategy practice. After Accenture, he joined DDB Advertising s CRM group where he was the client lead for Lowes Home Improvement, and then served as the SVP Global CRM lead for ExxonMobil, overseeing all CRM and loyalty programs for it s retail operations worldwide. Peter received a BS in marketing from the University of Virginia and an MBA in finance from the Wharton School at the University of Pennsylvania, where he was also a top-ranked instructor in the undergraduate marketing program. Allen Dickson Director, Analytics Allen has over ten years of experience in digital marketing and customer analytics, with deep expertise in digital data and personalization solutions. Since joining Merkle, Allen has worked across many client accounts including Walmart, Dell, Samsung, Abercrombie & Fitch, Under Armour, Lowe s, Office Depot, Bose, and others, on a variety of digital analytics projects, from digital data capture and integration to personalization strategy and execution. Allen also led the development of Merkle s product recommendation engine. Prior to Merkle, Allen worked for Overstock.com where he led it s email marketing, web analytics, website testing and optimization, and personalization efforts. Allen received an MS in mathematics from Brigham Young University (with a concentration in algebraic topology) and also studied math at the University of Utah under an NSF research fellowship.
ABOUT MERKLE Merkle is a leading data-driven, technology-enabled, global performance marketing agency that specializes in the delivery of unique, personalized customer experiences across platforms and devices. For more than 25 years, Fortune 1000 companies and leading nonprofit organizations have partnered with Merkle to maximize the value of their customer portfolios. The agency s heritage in data, technology, and analytics forms the foundation for its unmatched skills in understanding consumer insights that drive people-based marketing strategies. When combined with its strength in performance media, Merkle creates customer experiences that drive improved marketing results and shareholder value. With more than 3,700 employees, Merkle is headquartered in Columbia, Maryland with 16 additional offices in the US and offices in Barcelona, London, Shanghai, and Nanjing. In 2016, the agency joined the Dentsu Aegis Network. For more information, contact Merkle at 1-877-9-Merkle or visit www.merkleinc.com.