Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures have enabled clients to rapidly stand up technology stacks that traditionally required specialized expertise and long times. Big Data, an umbrella term encompassing ingestion, processing, and analytics around structured and semi-structured data sets, has been revolutionary for the data warehousing and analytics market. These data sets, including data from cloud-based solutions, sensors, and Internet-enabled devices, are often large and diffcult to process using standard relational data warehousing methodologies. Big Data solutions take an alternative way to process these data sets by leveraging both cloud-based and non-relational technologies to derive analytical value. One significant customer problem with Big Data involves the rapidly changing technology stacks and specialized code that is required to work effectively in the space. Companies are reluctant to invest too deeply in one technology for fear of this rapid change. However, many innovative applications leverage Big Data to improve customer satisfaction, reduce operational risk, and increase sell through. Customers want the benefit of Big Data, but often do not know how much of an investment is required to begin. To that end, we will review three specific customer use case patterns in detail within in this white paper. These use cases discuss both cloud architectures and Big Data solutions in detail, and show how to remove complexity, reduce operational risk, and improve customer satisfaction. This technical brief is intended to be a companion to The Big Data Journey webinar. The author would like to extend his thanks to John Haddad of Informatica, who provided some of the architecture slides within this white paper. WWW.CPTECH.COM 781.273.4100 IT SERVICES
USE CASE 1: REMOVE COMPLEXITY A pharmaceutical client uses several cloud-based applications for sales force and operations enablement. These applications allow business analysts to rapidly provision new functionality on the fly. However, IT must also possess the agility to rapidly and continuously provision and integrate cloud-based application data while maintaining existing data warehouse integrity and data lineage. We leverage cloud services solutions like Informatica Cloud Services (ICS) to help address this challenge. At our pharmaceutical client, the sales team manages multiple new products and adds new SFDC columns at the rate of one or two a week. The existing ETL process had to replicate data from SFDC down to the main enterprise data warehouse (EDW). Each new column required a corresponding ETL change and update to jobs, causing significant IT development churn. We leveraged ICS s SFDC replication solution to mirror each SFDC table into a staging environment within the DW. The ICS workflow is managed through a web-based interface, which is available to the same business analyst that adds fields to SFDC. If a new column has been added to SFDC, the analyst logs into ICS and quickly configures, in less than 5 minutes, the new column to be replicated to the DW. In the diagram above, the green databases represent existing SQL Server databases that were not impacted by the switch in replication architecture. We simply removed the existing ETL code feeding the SQL 2008 Replication Stage target, and replaced it with an ICS endpoint Once replicated to a DW staging environment, the SFDC tables are wrapped with views to create a dimensional analytical layer. This layer is immediately available to trained business analysts using BI and visualization tools to perform data analysis. Insights from these analyses are vetted and implemented by the DW team and then turned into operational reporting in the enterprise BI environment on a weekly basis. Leveraging ICS and the staging replication architecture has allowed us to significantly accelerate time to market within the DW for simple SFDC column additions. The DW team is freed from regularly working on lights-on management tasks, and business analysts can immediately perform analysis without having to wait for new ETL development. 2
USE CASE 2: REDUCE OPERATIONAL RISK An oil and gas client was looking to understand performance and maintenance activity around their wells. Specifically, the oil and gas industry uses a large amount of sensors to monitor well activity. These sensors measure pressure, level and flow rates, and are prevalent within the industry. They come with operational monitoring solutions that allow technicians to spot up-to-the-second deviations and apply corrective action. Maintenance, as you can imagine, is critical for both production and safety, and often the earlier a problem is caught, the cheaper it is to fix. Our client was very interested in knowing about maintenance issues as soon as possible, and ideally, applying preventative maintenance to prevent a larger issue. In order to apply more intelligence towards preventative maintenance, the customer wanted to load sensor data to an existing data warehousing solution. However, when existing ETL infrastructure was leveraged to stream sensor data directly into the warehouse, we quickly encountered performance issues around the sheer volume of data that was being sourced. If you think about it, sensors report readings at a real time level. With multiple sensors a well, the volume of data easily eclipsed hundreds of gigabytes a day for the client s production wells. This created a serious problem with both performance and the expense of storing the data. Upon further analysis, we realized that we needed to do two things with the full array of sensor data. We were looking to apply algorithms to spot deviations in time series data, specifically deviations that went above a certain threshold for a period of time. These deviations may change, based on measurements of multiple sensor arrays. In short, we were attempting to apply matrix algebra to the existing series of sensor data. Once the deviations were spotted, we wanted to provide time-bounded series of this data to the BI environment for reporting and simple analysis. The combination of these two requirements allowed us to introduce a Big Data approach into our overall solution pattern in order to perform ELT pre-processing of this data, by applying matrix algebra using to the large volumes of sensor data. This sensor data also resembled JSON data structures in nature, and was more suitable for a Big Data solution, specifically Hadoop. We leveraged Hadoop to filter the data, apply matrix algebra to look for anomalies, and roughly model the filter records for data warehouse ingestion, by transforming the sensor records from JSON to a relational structure. You can leverage Informatica s PowerCenter Big Data Edition ( BDE ) in order to ease the processing of both JSON records (PowerCenter BDE comes with JSON support) and connectivity to Big Data solutions such as Hadoop and other NoSQL databases. In addition, PowerCenter BDE allows you to run workflows in Hadoop without having to program and interact with MapReduce in languages such as Pig and Hive. Although these languages are powerful, they require a specialized skillset that is typically different than relational and ETL skillsets already present within your DW / IT organization. 3
PowerCenter BDE allows you to leverage Big Data solutions within your EDW environment, while leveraging existing skillsets. This solution allowed us to significantly reduce load time and space consumed for the EDW. More importantly, the customer was able to spend much less time, by almost 90%, to find maintenance issues. The majority of this savings was in the time spent ingesting and processing the data, and operational expense and load on the data warehouse. USE CASE 3: IMPROVE CUSTOMER SATISFACTION An online retail company has been selling to customers over the internet for many years, and has accumulated a large data warehouse on customer activity during that time. The retailer is now interested in linking social media elements and real time customer website navigation into their selling strategy, due to significant user adoption in shopping via mobile. This likely comes with no surprise to many readers of this white paper. During the last five years, mobile shopping has become mainstream and dominant in some sectors such as books and electronics. However, the retailer also discovered that mobile customers have a higher rate of shopping cart abandonment compared to traditional laptop browser customers. For various reasons and distractions, mobile customers are leaving more shopping carts; even a small conversion on these abandoned carts would result in a significant revenue rise for our retailer. In order to get more mobile conversions, our retailer wanted to provide a more personalized shopping experience to mobile customers, by dynamically modifying content as the customer interacts with the site. The content would present both products of higher interest, as well as potentially offer aggressive pricing on selected items for certain shopping cart mixes. We leveraged a Big Data / DW solution pattern in two ways: via a NoSQL database to 1) crunch weblogs in real time, and 2) analyze a customer s Twitter stream, in order to provide items of interest and potential discounting. All of this was linked to a historical customer score made available by the traditional data warehouse via a web service. Again, you can leverage Informatica s PowerCenter Big Data Edition ( BDE ) in order to ease the processing of weblogs and connectivity to Big Data solutions. In addition, you can leverage the Social Media Connector to connect directly to a customer s Twitter stream to source that data into the NoSQL database for further analysis. PowerCenter BDE allows you to leverage Big Data solutions within your EDW environment, while leveraging existing skillsets. The retailer chose to deploy this solution in phases, initially for a select tier of customers, as an A/B test. After a month of trial, the select customers demonstrated a material difference in shopping cart losses compared to a baseline customer group -- around 10%. Customers saw things that they were more likely to buy, both in things that are aligned to their likes, and more aggressive pricing in order to get that buy. 4
ABOUT CORPORATE TECHNOLOGIES Corporate Technologies provides high value services to clients. Through the effective application of technologies like Business Intelligence, Data Integration and Management, Enterprise and Cloud Computing, we help clients implement the right IT solutions to empower business innovation and dynamic scalability. From leveraging business intelligence to rethinking the effciency of the data center, we are your strategic partner for everything from data management to information delivery. Today s IT solutions have to be highly integrated to solve the complex business challenges that organizations face. Your business cannot afford to work with multiple consulting organizations specializing in silos of experience. Corporate Technologies engineering team understands how the implementation of any new technology must support both the business and infrastructure requirements. Our ability to successfully integrate Business Intelligence, Data Management and Systems Technologies by merging complex system and application structures is a rarity in the industry. We focus on solving complex business challenges. We create long term relationships with our clients and partners to deliver recommendations and innovative, high quality, high value IT solutions. Please visit the our website at www.cptech.com, contact us by email or call us at 781-273-4100. 5