Adobe and Hadoop Integration

Size: px
Start display at page:

Download "Adobe and Hadoop Integration"

Transcription

1 Predictive Behavioral Analytics Adobe and Hadoop Integration DECEMBER 2016 SYNTASA Copyright

2 1.0 Introduction For many years large enterprises have relied on the Adobe Marketing Cloud for capturing and analyzing digital activity from mobile and web visitors. The insights gained from clickstream analysis have been well known and this has created a larger industry sector focused on digital analytics. However, the rich details captured from millions of visitors each day continues to warrant a scalable platform for extracting deeper analytics from both clickstream and enterprise data to remain competitive. The adoption of Apache Hadoop as a scalable and open source platform has been slower than most in the Hadoop market have hoped for, but has also been the popular direction for large retailers and other digital enterprises for the integration of digital activity data and other enterprise data. This white paper will discuss how and why simply ingesting the Adobe clickstream data into Hadoop for integrated analytics is just the first step towards a building behavioral analytics platform. 2.0 Analytics and Data Preparation Challenges There has been no shortage on the stated objectives of business units regarding improving customer 360 analysis and the benefits from doing this. Customer acquisition and retention, as well as cross-marketing and product development inputs, are only a few of those benefits. Utilizing available data from retailer digital platforms (known as first party data) as well as data marketed by third parties (relating to the browsing activities and traits of potential customers) is becoming more understood in regards to reacting to customer preferences. However, integration of this digital data with more traditional off-line enterprise data is more challenging than it appears. Let s look at the specific challenges associated with first party data and migrating it to Hadoop. Data collection By tagging the web/mobile sites, Adobe captures the activity of customers and prospective customers in order to collect clickstream data. This activity is then stored in a dataset, which is specifically organized for digital analytics. Data structure Adobe datasets used to capture digital activity are complex structures with over 500 customizable columns. Each clickstream record has information related to activity by visitor and visits, as well as transactions and content. Some fields in the datasets even contain packed data representing multiple events that took place during a page view interaction. Data handling and transformation Many service organizations are ingesting the clickstream data in to Hadoop and perform transformations and enrichments of the activity data, on an ad hoc basis. Establishing various automated processes focused on managing changes and providing production support are, for the most part, absent in the marketplace. These data analysis and validation processes are critical for accurate intelligence to be delivered to the business units and marketers. The industry accepted few-percent discrepancy between different clickstream data sources is not suitable for rigorous enterprise reporting and analytics. At a minimum, maintaining a 99.9% accuracy rate between clickstream data in Adobe Analytics and Hadoop data stores is critical in proper enterprise data management and reporting. SYNTASA Whitepaper: Behavioral Analytics Adobe and Hadoop Integration Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this document.

3 3.0 Clickstream Data Ingestion and Configuration The care and detail applied to data handling once the dataset(s) have been loaded into the Hadoop cluster requires deep subject matter expertise. Understanding the data structure can be easy enough, but transforming the data into useable and ready to consume schemas is quite another undertaking. Digital Marketing expertise, combined with data architecture and Hadoop operating skills is a very difficult combination to try and resolve in staffing. SYNTASA has developed technology to assist in the following areas, leading to an accelerated deployment and production use of the clickstream data combined with enterprise data. ETL While there are generic ETL solutions on the market, they require significant configuration, scripting, and domain expertise to handle rich and complex clickstream data structures generated by Adobe Marketing Cloud., SYNTASA s advanced data ingestion technology is purpose built for digital marketing data and more specifically, the Adobe Marketing Cloud clickstream data. It uses the Adobe APIs and user controlled configuration capabilities that fully integrate Adobe Marketing Cloud with a Hadoop cluster in a matter of hours instead of months. Configuration Once the raw data set(s) have been delivered to the Hadoop cluster via batch or streaming, establishing properly configured schema for analytics is required. The advanced configuration capabilities of the SYNTASA solution provides for a productive UI where new custom schemas and views of the data can be developed and maintained in a production environment. User Interface The UI offers an opportunity for users who may not have the skillset for developing data management scripts and other code assets to manipulate the data and create custom personalized environments and data structures for specific analytics and reporting. 4.0 Enterprise data integration Many enterprises today believe a full customer 360 perspective can be achieved through full integration of the customer experience data with other enterprise data that represents touch points between the customer and the retailer. This should include call center data (IVR), CRM and account data, social media messages and any other data adding to the profile of the customer experience. Additionally, enterprise data specific to operations such as logistics, inventory, customer feedback and other process-oriented measurements should be considered when integrating with clickstream and other customer data to gain a full understanding of operations. SYNTASA software enables users to consolidate their clickstream and enterprise data using a configurable behavioral analytics schema. This way, clickstream and enterprise data can contain similar elements that enable meaningful results from analytics. Structuring the data schemas to share certain data elements and values will better equip analytic staff with the ability to monitor for example digital sales compared to delivery dates. Tracking online sales as compared to in-store sales is also a very important measurement for those retailers who existed prior to online ecommerce. Page 2

4 5.0 Advanced Analytics By using the SYNTASA toolset for creating custom schemas, custom analytics are now more easily built with the models available within the suite or those developed externally. Without preparing the data by creating a comprehensive schema for clickstream and enterprise data, such algorithms are typically used in smaller non-scalable applications by individual analysts. Benefits of using SYNTASA readily available models or user supplied models within the framework of the SYNTASA advanced analytics platform occur in an accelerated manner by establishing the proper schema and integrating enterprise data with clickstream data. 6.0 Advanced Visualizations The ability to connect directly with data stored in Apache Hadoop can dramatically improve the intelligent outputs from SYNTASA advanced analytics. By utilizing facilities such as Impala, Hive and Drill, SYNTASA can accurately display analytic results in self-service dashboards. Some examples of these ready-to-use visualizations are: Marketing campaign performance ecommerce pipeline propensities Digital journey path analysis Multi-channel attribution analysis Behavioral segmentation 7.0 Production and Operational Support Once clickstream data has been initially ingested into the Hadoop cluster, automated processes configured and established upon ingestion are available via the SYNTASA configurable adaptor for Adobe. As any dataset variables change by virtue of changes on the digital platform, these changes must be propagated and synchronized with the data migrated to Hadoop. Alerts and other notifications are available to maintain a 99.99% accuracy rate of data captured to that migrated. Data integrity has long been a hallmark of scalable enterprise data management. Online ecommerce sales data captured from the digital platform(s) must match that of the enterprise accounting system. Use Cases Attribution Understanding the comprehensive and holistic perspective of the origination of visits and visitors to a digital platform can only be accomplished by analyzing the complete activity on the digital platform. Using attribution models and results using first touch, last touch or other methods does not present the entirety of paths and sources of visits to the digital platform(s). By ingesting and configuring the entire clickstream dataset and executing attribution models against the data, SYNTASA can display comprehensive results of visitors origins related to specific outcomes such as purchases. Propensity modeling Once the complete Adobe clickstream dataset(s) is ingested to Apache Hadoop, SYNTASA can use propensity models and machine learning technology to understand the probability and propensity of certain behaviors leading to a conversion event (e.g. purchase). SYNTASA can Page 3

5 also use these models for understanding the propensity of specific campaigns and products and the correlation to such campaigns to purchases. Path Analysis By interrogating the clickstream dataset(s) with advanced marketing analytics, SYNTASA is able to display, in real-time, the paths leading from specific origins (paid search, natural search, direct entry, campaigns) through specific site pages during the visitor experience which lead to specific outcomes. SYNTASA Advanced Path Analysis Re-Targeting Ingesting and configuring the clickstream data allows for advanced analytics of specific behavior that leads to purchases. It is also possible to now understand the behavior that does not lead to a purchase, but is most similar to purchase behavior. An example of this may be an abandoned shopping cart or a selection of certain product options. The events and combined events defining this behavior can now be associated with visitors for re-targeting campaigns. The algorithmic determination of these higher propensity visitors has a higher return than simple rules. Campaigns aligned with the re-targeted visitors show significant lift in conversion events. SYNTASA has built outbound connectors back into Adobe Marketing Cloud to provide a full circle integration, sending result set data back into Solutions like Target, Campaign, Audience Manager, Customer Attributes and the Adobe Marketing Cloud Profile service. Page 4

6 Data Integrity and Adobe/Hadoop Accuracy through Synchronization As the constant flow of clickstream data moves from the captured events dataset of Adobe Marketing Cloud to the enterprise Hadoop cluster, the accuracy of the data is critical. The value of the data migrated and transformed from Adobe to Hadoop is highly dependent on the accuracy through such transformation. SYNTASA has achieved an accuracy rate of 99.99% as data is migrated and enriched on the Hadoop platform for analytics. 8.0 Case Study Lenovo has deployed a Cloudera Hadoop cluster to consolidate its enterprise and marketing data. The Adobe Analytics clickstream logs are complex datasets that require extensive transformation and processing to make them analysis ready and suitable for integration with other enterprise datasets. Lenovo engaged SYNTASA to provide an integrated scalable solution for data ingestion, configuration and integration of Adobe clickstream data with other enterprise data. All of this was done in order to perform advanced analytics for deeper Customer 360 intelligence. Lenovo started with their global ecommerce data ingestion from Adobe datasets collected on their ecommerce pages. They also layered in the collection and migration of activity data from their mobile apps, also being collected by Adobe. The integration of SAP data for integrated reporting and dashboards was the initial intent. With SYNTASA, Lenovo has an on-premise integrated scalable and configurable platform for advanced analytics which did not require a large team of Big Data experts and months or even years to deploy. After ingesting the Adobe clickstream data into our Hadoop cluster, we found it to be very tedious and time consuming Ashish Braganza Director, Global Business Intelligence at Lenovo for our analysts and data scientists to take full advantage of the rich Adobe clickstream data in its native form, said Ashish Braganza, Director, Global Business Intelligence at Lenovo. SYNTASA, through their Adobe Marketing Cloud Adaptor and Behavioral Schema, helped us unlock the full power of Adobe clickstream data in our Hadoop environment to build better clustering models and personalize customer experiences based on website behavior. Page 5