Challenges for Data and Text Mining and how SAS addresses them Sascha Schubert Product Manager Data Mining SAS EMEA
Predictive Analytics Process 1. Prepare Data 2. Develop Model (Analytical Training Set) 3. Deploy Model Transactional Demographic Operational Financial Interactive, Batch or Real Time Unstructured Marketing or Risk Data Warehouse Other Domain Sources 4. Monitor Model Decision Support System
S2 Successful Data Mining through Integration Data Manager Data Preparation Deployment Services Report Administration Data Miner Exploratory Analysis Descriptive Segmentation Predictive Modeling 1. Register Training Set 2. Retrieve Training Set 4. Batch Scoring Plug In 3. Register Results Package Data Aggregation Metadata Server Model Development 5a. Deploy Model 5b. Distribute Reports Data Manager Data Miner Business Analyst Model Deployment Business Analyst Application Developer Manages Campaigns Domain Expert Evaluates Processes & ROI Model Management
Slide 3 MS2 Note to M-E: I modified this graphic to include 5a. Deploy Models. It seems like it should be on there somewhere unless they are just thinking it is part of 5b. Distribute Reports. Let me know if you like the modified version. Marjorie Shelley; 17-Mar-05
Data Sources Specific Needs for Data Mining Data Volumes Long history More columns (observed and derived) Different sources Data Format Transactional data vs. customer history data Data Type Database Weblogs Free Text
Answers to Data Challenges Provide Data Models for business-specific problems SAS Industry Intelligence Solutions Create required analytical data format from many different data formats SAS ETL Solutions Create and store business-specific metadata for enterprise wide use SAS Metadata Server Provide flexible tools for interactive data preparation and selection SAS Enterprise Miner
Integration: SAS Enterprise Miner - SAS ETL Studio Data Preparation ETL Studio Define a Process Job to Create a Table Register Table to Metadata Server Create Data Mining Metadata as Part of Job Register DM Metadata to Metadata Server Available Now
S1 Successful Data Mining through Integration Data Manager Data Preparation Deployment Services Report Administration Data Miner Exploratory Analysis Descriptive Segmentation Predictive Modeling 1. Register Training Set 2. Retrieve Training Set 4. Batch Scoring Plug In 3. Register Results Package Data Aggregation Metadata Server Model Development 5a. Deploy Model 5b. Distribute Reports Data Manager Data Miner Business Analyst Model Deployment Business Analyst Application Developer Manages Campaigns Domain Expert Evaluates Processes & ROI Model Management
Slide 7 MS1 Note to M-E: I modified this graphic to include 5a. Deploy Models. It seems like it should be on there somewhere unless they are just thinking it is part of 5b. Distribute Reports. Let me know if you like the modified version. Marjorie Shelley; 17-Mar-05
Analytical Data Preparation Interactive Tools Transformations Builder Filter outliers interactively Principle Components node with results browser Available in Autumn 2005
Text Mining Challenges Handle Bad Text Quality Text Cleaning Fixing misspellings Detecting all multi-word terms: sliding door, front seat Deal with abbreviations/user-defined terms Adj d doors, call cust., i/m arm broken Visually Discover Concepts Link terms to display concepts Available in Autumn 2005
Integrate Analytical Modeling Algorithms Data Miners always want new algorithms SAS will support new algorithms such as SVM More important to combine existing techniques Hybrid models Ensemble Models (bagging and boosting) Combine different modeling techniques Integrate for predictive analytics Web Path Analysis Time Series Analysis Market Basket Analysis
Combine Different Modeling Techniques
Modeling Algorithms Integrate your own modeling techniques in SAS Enterprise Miner Can integrate ANY SAS model very easily Use the Extension facilities Create new nodes easily based on SAS and XML SAS will provide a sharing platform for user written SAS Enterprise Miner Extension Nodes Available Now
Develop Customized Tools
Performance: Grid Computing a means to apply the resources from a collection of computers in a network and to harness all the compute power into a single project Available for Model Training in EM with EM 5.2 in Autumn 2005
Enterprise Miner on SMP SMP server
Enterprise Miner on a Grid
Model Deployment Most important step in the process Often the most time consuming task with many manual steps involved Options: Batch On-Demand Real-time
Ways to Deploy Data Mining Models in SAS Batch Deploy EM SAS score code directly Integrate SAS EM Score code using Mining Results plugin in ETL Studio Interactive Score within Enterprise Miner Use Stored Processes to Score Model on Demand Use Scoring Task in Enterprise Guide 4 Real-Time Integrate Score Code with operational systems using SAS Integration Technologies C or Java Score Code
Integration: SAS Enterprise Miner - SAS ETL Studio Data Miner Data Manager Data Preparation Deployment Services Report Administration 1. Register Training Set 2. Retrieve Training Set Exploratory Analysis Descriptive Segmentation Predictive Modeling 3. Register Results Package Data Aggregation 4. Mining Results Transform Metadata Server Model Development Model Deployment
Integration: SAS Enterprise Miner - SAS ETL Studio Batch Scoring ETL Studio Use Mining Results Plug-in to register EM models for Scoring Define a Process for Batch Scoring Available Now
Stored Processes for Scoring on Demand
Scoring SAS Enterprise Miner Models interactively in Enterprise Guide 4.1 Currently Early Adopter Production in October 2005 HMEQ Scoring Model Scoring Model Output Data
SAS Model Manager SAS Enterprise Miner Model Development & Model Scoring SAS Model Manager New solution to address the gap between the model development and model scoring environments Addresses: Increased amount of Models & Data Model Selection (Challenger Champion Retired) Different computing environments for training and scoring Multi-channel delivery: batch, interactive, on-demand
Model Lifecycle Management Development Environment Production Environment Model Registration Model Development Environment SAS Enterprise Miner SAS Credit Scoring SAS/STAT Base SAS Score Code Champion Model Selection Model Testing Production Environment Interactive Batch Real Time Model Deployment Model Tracking Model Retirement
SAS Model Management Studio Client Interface Customizable Project Hierarchy Champion and Challenger Models Model Scoring Code and Metadata
Timeline - SAS Model Deployment Studio Summer 2005 MDS 1.1 for Development Partners Winter 2005 MDS 2.1 for Early Adopters Spring 2006 MDS 2.1 Production