IBM Software Group Designing your BI Architecture Exploiting your Data Warehouse David Cope EDW Architect Asia Pacific 2007 IBM Corporation
The Analytical Evolution Easy Mining and Alphablox enable insights to be delivered throughout the enterprise. IBM Differentiator Action Business Value Reports Insight Discovering previously unknown and unsuspected Ad Hoc information. Analysis Empowering analysts to test hypotheses for better decision making. Query and OLAP Static, repetitive queries about past results. Decision Empowerment 2
IBM DB2 Warehouse Software Embedded analytics Modeling and design Data mining and visualization Data partitioning Performance optimization Workload control In-line analytics Data movement and transformation Database management IBM DB2 Warehouse Deep compression Administration and control 3
IBM DB2 Warehouse Software Embedded analytics Modeling and design Data mining and visualization Data partitioning Performance optimization Workload control In-line analytics Data movement and transformation Deep compression Administration and control Database management IBM DB2 Warehouse 4
DWE OLAP Model Cube Cube dimension Cube hierarchy Cube Model Dimension Hierarchy Cube Level Facts Join Level Cube Facts Measure Measure Attribute Join Attribute dimension tables fact table dimension tables Relational tables in DB2 5
Model-Based Optimization Administrator Model Catalog Tables Base Tables OLAP Metadata Time & Space constraints Query Types Benefits Smart Aggregate Selection Smart Index Selection SQL Generation DB2 Exploitation Model Information Statistics Data Samples MQT's Performance Advisor 6
OLAP Metadata Interchange meta data bridge OLAP Metadata meta data bridge DB2 Alphablox OLAP Metadata MITI OLAP Metadata OLAP Metadata Hyperion OLAP Metadata OLAP Metadata OLAP Metadata DML DDL RDBMS Metadata DB2 Data Warehouse DATA OLAP Metadata OLAP Metadata OLAP Metadata BUSINESS OBJECTS QMF for Windows Model & ETL tool metadata BI tool metadata QlikTech ArcPlan 7
Alphablox IBM Software Group Platform for Customized Analytic Applications and Inline Analytics Pre-built components (Blox) for analytic functionality Allows you to create customized analytic components that are embedded into existing business processes and web applications 8
Alphablox IBM Software Group For end-users: A web application, portal or dashboard with embedded analytics in an easy-to-use interactive interface For application developers: A J2EE application for analysisoriented interaction A set of analytic-focused extensions to the application server Alphablox with DWE: SQL generated by DWE Design Studio can be pasted into Alphablox pages for warehousebased embedded analytics 9
Alphablox Architecture Web Browser DHTML Based Client similar to AJAX XMLHttpRequest WebLogic WebSphere Tomcat Alphablox UI Model GridBlox ChartBlox PresentBlox Calculations Bookmarks Alerts Comments DataBlox OLAP Essbase / MSAS / SAP BW Alphablox Cubing Engine ROLAP Relational Databases MQ 10
Relational Cubing Engine & OLAP Optimization Application Server Tier Relational Cubing Engine Relational Cube cubelets Cube Definition Dimension Data Retrieval Metadata Import OLAP Metadata Database Server Tier DB2 Cube Views DB2 MQTs Star Schema DB2 Alphablox Server MDX MDX Data Blox DB2 Alphablox Application Present Blox Grid Blox Chart Blox Fact Data Retrieval Customer Tier HTTP Server 11
Versatile Architecture Support Mart BI Applications and Tools DB2 Warehouse supports versatile analytics architectures EDW Analytics directed against External Mart Internal Mart Virtual Mart External Marts Internal Marts Virtual Marts 12
IBM DB2 Warehouse Software Embedded analytics Modeling and design Data mining and visualization Data partitioning Performance optimization Workload control In-line analytics Data movement and transformation Deep compression Administration and control Database management IBM DB2 Warehouse 13
IBM Software Group DWE Easy Mining Mining without a Statistician Realize the benefits of mining by enabling analysts, rather than relying on statisticians, for your data mining needs Reporting Tool DB2 Data Warehouse Edition 14
Two Types of Data Mining Discovery & Predictive Discovery Automatically find trends and patterns Answer unasked questions Relatively undirected analysis Tool reports on findings In a word Easier Useful for non-statisticians Predictive Specific question Probability associated with outcomes Directed analysis Iterative process Train Test Apply Apply model in database at customer touch points 15
DWE Easy Mining Algorithms DWE Enterprise Data Warehouse Data Warehouse Selected Data Extracted Information Select Transform Mine Assimilate Business Analyst DWE Partner Assimilated Information Statistician & Data Mining Workbench Discovery Methods finding useful patterns and relationships Associations Which item affinities ( rules ) are in my data? [Beer => Diapers] single transaction Sequences Which sequential patterns are in my data? [Love] => [Marriage] => [Baby Products] sequential Clustering Which interesting groups are in my data? customer profiles, store profiles Predictive Methods predicting values Classification How to predict categorical values in my data? will the patient be cured, harmed, unaffected by treatment? Regression How to predict numerical values in my data? how likely a customer will respond to the promotion how much will each customer spend this year? Score data directly in DB2, scalable and real time 16
How to Recognize a Data Mining Need What do my customers look like? Which customers should I target in a promotion? Which products should I use for the promotion? How should I lay out my new stores? Which products should I replenish in anticipation of a promotion? Which of my customers are most likely to churn? How can I improve customer loyalty? What is the most likely item that a customer will purchase next? Who is most likely to have another heart attack? What is the likelihood of a part failure? When one part fails, what other part(s) are most likely to fail soon? How can I identify high-potential prospects (lead generation)? How can I detect potential fraud? 17
High Level view of the Data Mining Process Business Problem A minor miracle occurs Validate, Refine Data Warehouse Extract & Transform data Build Model Deploy Insight 18
The Data Mining Process This is an iterative process! MINING Revise Data & Refine Model Discover & Interpret Information DEPLOY Business Problem Data Warehouse Select Data Σ(X j ) Σ( Σ( Σ( Y = f(x,z) Apply Results Select Transform Mine Report ETL Visualize Analyze Understand Score data Embed in application Data Preparation Data Mining 19
Associations IBM Software Group Discovery technique to find associations or affinities among items (or conditions, outcomes, etc.) in a single transaction. Constructs statements ( rules ) that quantify the relationships among items that tend to occur together in transactions Example: In a supermarket, Cola is bought in 20% of all purchases. Cola is bought in 60% of the purchases involving Orange juice. 3.7% of all purchases involve both Cola and Orange juice. The rule [ Orange juice ] [ Cola ] has the following properties: Support = 3.7% Cola and OJ are present together in 3.7% of all baskets. Confidence = 60% Cola is present in 60% of the baskets containing OJ. Lift = 60% / 20% = 3 Cola is 3 times as likely to be in the basket when OJ is also. Scoring Given the item(s) purchased (rule body), what item (rule head) is most likely to be purchased as well? Common uses Promotional or cross-sell offers, Disease management, Part failure 20
Sequences IBM Software Group Discovery technique to find affinities among items (or conditions, outcomes, etc.) across multiple transactions over time. Quantifies relationships ( sequences ) to identify the most likely item in the next transaction C G, B ---- C ---- X B ---- A ---- Y 100% of the customers who get C will get X at a later time 67% of the customers who get B will get X at a later time X Y ---- D ---- C --- B ---- X Scoring Given the item(s) purchased previously (rule body), what item (rule head) is most likely to be purchased in a subsequent transaction within a certain time frame? Common uses Fraud detection, Promotional offers, Disease management, Part failure 21
Clustering IBM Software Group Discovery technique to find clusters having distinct behaviors and characteristics Gain insights to customers, stores, insurance claims, etc. Generate distinct behavioral/demographic profiles Understand the most important attributes of each cluster Create a model to assign individuals to best-fit clusters Apply model to assign new individuals or re-assign existing individuals Design business actions tailored to different characteristic profiles Scoring Apply model to assign each record to its best-fit cluster Apply appropriate business action for each record based on its assigned cluster Common uses Customer segmentation, store profiling, deviation detection 22
Classification Prediction technique to classify individuals by outcome Classify by a categorical class variable (e.g., YES-NO-MAYBE response) Understand the most important factors (predictors) leading to each outcome Modeling Create a model to classify individuals according to expected outcome Design business action based on most important predictors Scoring Apply model to predict the outcome for each individual New prospects (expected behavior) Existing individuals (changes in behavior) Identify target individuals for business action Common uses Customer attrition (churn), Part failure 23
Regression IBM Software Group Set of predictive techniques to predict a dependent variable Predict continuous value or binary numeric value Continuous: e.g., revenue (prediction represents amount of revenue) Binary: e.g., 0=No, 1=Yes (prediction represents probability of Yes) Understand the most important predictors of the dependent variable Transform regression, linear regression, polynomial regression Modeling Create a model to predict the dependent variable Design business action (e.g., predict likelihood of default for a loan application, in real time) Scoring Apply model to generate a prediction for each individual (e.g., probability of part failure) Identify target individuals for business action Common uses Predict revenue/cost/profitability, Predict risk of loan default 24
The Data Mining Process This is an iterative process! MINING Revise Data & Refine Model Discover & Interpret Information DEPLOY Business Problem Data Warehouse Select Data Σ(X j ) Σ( Σ( Σ( Y = f(x,z) Apply Results Select Transform Mine Report ETL Visualize Analyze Understand Score data Embed in application Data Preparation Data Mining 25
Data exploration DWE enables you to explore the data. Check data quality (prior to performing ETL for data preparation) and gain a general understanding of the data Design Studio provides four tools to inspect data: Table sampling Univariate distributions Bivariate distributions Multivariate distributions All these tools are accessible by rightclicking on a table/view/alias/nickname in the database explorer: -> Data for table sampling/editing -> Value Distributions for multivariate/ univariate/bivariate distributions 26
The Data Mining Process This is an iterative process! MINING Revise Data & Refine Model Discover & Interpret Information DEPLOY Business Problem Data Warehouse Select Data Σ(X j ) Σ( Σ( Σ( Y = f(x,z) Apply Results Select Transform Mine Report ETL Visualize Analyze Understand Score data Embed in application Data Preparation Data Mining 27
Leveraging Mining and Alphablox: DWE Miningblox Create web applications that provide access to DWE Data Mining Extends the DB2 Alphablox API with mining specific functionality. With Miningblox, you can perform the following tasks: Selecting input data Processing input data Displaying mining results graphically in a Web browser, for example, the characteristics of a customer segment Administering or managing mining runs Typically a web application using MiningBlox tags might be integrated in a business application or an intranet portal. 28
Why use Miningblox? Provide access to Data Mining for a group of business analysts. Create a Miningblox web application that provides access to mining functionality through the Web browser, no need to install software on the Client s machines Analysts can execute mining runs and view results in a customized web application without extensive knowledge about mining software. With the Miningblox Application wizard in the DWE Design Studio, you can easily create Web applications by selecting sample templates or you can extend Alphablox applications with mining functionality. 29
Deployment through Alphablox application example MBA application console 30
Deployment through Alphablox application example MBA execution 31
Deployment through Alphablox application example MBA completion 32
Deployment through Alphablox application example MBA results report 33
IBM Software Group Case Study: Retail Department Store Analytics with Data Mining and Alphablox David Cope EDW Architect Asia Pacific 2007 IBM Corporation
Retail Department Store Chain Business requirements Perform a data mining POC (really a pilot project) to support the original DWE decision, ensure success, and highlight DWE capabilities for further uptake Define business problem Boost storewide sales (across other departments) based on women s shoes Define analytical approach and ETL procedure Extract all transactions of customers who have purchased women s shoes Transform transactional data into one record per customer, for customer segmentation Perform market basket analysis (MBA) for high-potential customers who have purchased women s shoes Challenges Engagement sponsored by IT with limited access to business users (LOB) 35
Solution Overview Prepare data for mining by: Pulling transactions for women s shoe customers Creating data for customer segmentation Use DB2 Mining to perform: Clustering Identify high-potential customer segments Market Basket Analysis for high-potential segments Identify associated items Identify next-most-likely purchases Deploy mining results in Alphablox Integrate data mining information into the dashboard and as part of the guided analysis Build a dashboard in Alphablox: Provide critical information and metrics in an Alphablox dashboard to merchandising and marketing. Integrate powerful visualization to make it easier to identify problem areas Alphablox Cubing Engine Analytical Dashboard Heat Maps / Other Visualization DB2 Data Warehouse Mining Models & Services Clustering Associations & Sequences Scoring Services Data Mining Visualizer/ Alphablox Data Mining API 36
Business Scenario for Mining Business requirements for POC Focus on customers who have purchased women s shoes in the past 12 months Boost storewide sales (across other departments) based on women s shoes Increase wallet share from high-potential customers Business questions to be answered What do my women s shoes customers look like? Which of these customers should I target in a promotion? Which products should I use for the promotion? Which products should I replenish in anticipation of a promotion? How can I improve customer loyalty? What is the most likely item that a women s shoes customer will purchase next? 37
Step 1: Identify High-Potential Shoe Customers 38
Result: 16 Distinct Clusters Created 39
Cluster 1: Those who Act Like VIP s Frequent Shoppers Big Spenders VIP s Active Shoppers Respond to Discounts High Returns High Potential Customers! 40
Cluster 6: Frequent Good Shoppers Shop Here 30 days/yr Above-Avg Purchases Above-Avg Spending Respond to Discounts Average Returns High Potential Customers! 41
Step 2: Identify Associated Items for Clusters 1 & 6 Extracted transactions for those clusters of customers Performed market basket analysis and interpreted results Associations (items purchased together in one visit) + 42
Identify Purchased Together for Clusters 1 & 6 43
Results: Associations for Clusters 1 & 6 44
Step 3: Identify Next Likely Purchase for Clusters 1 & 6 Extracted transactions for those cluster of customers Performed market basket analysis and interpreted results Sequences (next most likely purchase in a future visit) 45
Identify Next Likely Purchases for Clusters 1 & 6 46
Results: Sequences for Customers in Clusters 1 & 6 47
Results and Future Ideas Deployment of customer segmentation and MBA End-user application with Alphablox Create & refresh mining models Identify high-potential customer segments Refresh assignment of each customer to best-fit cluster Target selected customer segments for promotions Batch scoring to identify best offer(s) for each customer/segment Merchandising now has a view of their customers, not just products Future ideas Score a customer at checkout register in real time MBA scoring (associations, sequences) Focused MBA scoring for known customers, based on best-fit cluster Make an offer to induce customers to visit other departments before leaving the store 48
49