New Features in Enterprise Miner Dr. John Brocklebank, SAS Institute Inc. Gerhard Held, SAS Institute EMEA
New Features in Enterprise Miner Agenda! The Big Picture: Importance of analytics in today s Marketplace! Enterprise Miner 4.0: Integration! Enterprise Miner 4.0: New analytics/graphics! Beyond Enterprise Miner 4.0! Summary
Analysts: Analytical Applications are Key! Three key stages of CRM implementation Operational CRM: Sales, mktg, service automation Analytical CRM: 74% of Global 2000 plan to invest more Collaboration CRM: Customer channels Measuring Web Success instead, firms need an intelligent infrastructure to track visitors and their activities Web-based reporting OLAP and query tools Data mining tools uncover hidden opportunities
Analytical Applications are Key -and SAS Institute leads the Pack! SAS Institute by far market leader WW in Statistical / Data Mining Revenues (29.3%, 1999)
Enterprise Miner Release 4.0 - to come with SAS Release 8.e! Clients: Windows 2000, NT, 98, and 95! Servers:! Windows 2000 and NT! Solaris 2.6 and 7 or higher! MVS ESA 5 or prior releases including all releases of OS/390! HP-UX 10.20 and 11.0! Compaq Tru64 Unix 4.0E! Intel ABI all compliant ABI+ systems New
Integration: Enterprise Miner 4.0 is V8e enabled! Long variable and table names up to 32 bytes! Handles data with mixed-case variable names! Documentation is integrated with the rest of the SAS help system in HTML format.
Sampling Tools for Metadata Creation in Warehouse Administrator Add-ins
Integration: Converts EM Score Code to C Functions for Deployment DATA step score code to C functions Beta for EM 4.0
New Analytics/Graphics: New Tree Viewer! Written in Microsoft Foundation Classes! Creates a thin client viewer! Interactive Tree Display, presentation quality, printing! %let emv4tree=1; * add to autoexec.sas;! Select the New view popup-menu item from the Tree node icon to launch the browser;
MFC Based Tree Browser
Results Browser for Associations
Fast new neural network methodology Supports stand alone principal components analysis Experimental for EM 4.0
PROC DMVQ for SOM/Kohonen! Dedicated procedure, provides enhanced speed! Builds dummy variables and incorporates these into score code instead of calculating these outside of the procedure
SAS Code Node: New Macro Variables and Improved Interface
Integrated Installation; Quick Conversion Version 3.0x project: simply open Version 2.0x project - use import Wizzard.
Major R&D Efforts on the Way Some in SAS Release 8.2/EM 4.1! Text Mining - Preprocessing and Variable Reduction! Memory Based Reasoning - Fast models for e-intelligence! Analytic Recommendation Engine using new Model Repository! Java based score code! Forecasting for cross sectional time series! Genomic data mining add-ons for SNP linkage analysis and microarray expression profiling...
What Is Text Mining? It is a process of! converting free-form textual data to an intelligent infrastructure so that we can! extract implicit meaning and! discover heretofore unknown information via data analytical tools.
Applications & Customers! Customer relationship learning! E-mail routing! Newsgroup filtering! Newswire/News report analysis! Document analysis etc.
Remove Noise Words! Stop words e.g. are, hence, maybe, of, the,! Punctuation! Non-discriminating words
Analyze Word Morphology! Irregular words! understand & understood! swim & swam! Stemming! walk & walking! dance & danced! Hyphenated words! e-commerce
Create Frequency Table! Count occurrences of terms in each document Term Frequency count Doc Key e-commerce 1 2 1869 World wide web 2 2 2001 World wide web 1 3 2001 software 1 1 2005 software 1 5 2005
Reduce Dimension via SVD! Project document vectors into a k-dimensional best fit subspace! Choosing k properly should reduce noise in data but preserves all relevant info! Add additional target to projected image
Fuzzy Pattern Matching! When fuzzy pattern matching used for categorization, it is called Memory-Based Reasoning, or Lazy Learning.! Categorization can be done by having each of the neighbors vote on what category value to predict for the scored instance.
When MBR is useful! Target needs to be determined on-the-fly : e- intelligence, other web applications! Different profiles will predict same target value.! Target has many values, perhaps not mutually exclusive.! Want to incrementally change model over time --- forgetting possible.
Analytic Recommendation Engine (ARE)! ARE plugs in analytical results into an API! API implemented as Java classes! Configurable to do score lookup or real-time scoring Currently deployed in the SAS Publishing Web site www.sas.com Accesses Model Repository (MR) where all information is contained
Genome Miner Solution
Multiple Regression with Time Series Errors An Example of Extended Inputs! Year Y X1 X2 Company! 1975 317.60 3078.50 2.80 AA! 1976 391.80 4661.70 52.60 AA! 1977 410.60 5387.10 156.90 AA!.....!.....! 1998 1304.40 6241.70 1777.30 AA! 1999 1486.70 5593.60 2226.30 AA! 2000. 4989.18 2675.10 AA! 2001. 5045.91 3123.89 AA! 1975 26.63 290.60 162.00 BB! 1976 23.39 291.10 174.00 BB!.....!.....
Data Mining Knowledge Solutions Applications/Industries! Available:! Cross-selling in finance! Fraud detection in finance! Rate making in insurance! Churn management in telco (CRM Knowledge Solution)! Intrusion detection (joint usage: Systems, e-intelligence)! Coming this year:! Mining of quality data in manufacturing! Database marketing in retail! Credit scoring in banking! Web mining! Customer attrition in finance
Data Mining and other Initiatives! Data Mining and CRM! Customer acquisition, retention, cross-sell, up-sell, profitability, fraud, closed-loop.! Data Mining and Web-based Computing! On-line scoring of customer over Web, WAP phone, PALM! Data Mining and e! Customer profiling, personalization, tailoring web site to user behaviour, identify potential, increase stickiness of site Data Mining and Pharma! Find active chemical structures, new drug discovery, outcomes research, sales and marketing! Data Mining and Systems! Capacity planning, intrusion detection
Conclusion! Analytical applications are key and SAS software is leading the Pack!! Enterprise Miner Version 4.0:! V 8e enabled/integrated! Windows 2000, MVS, Intel ABI! Deployment using C-scoring! Some new GUIs, algorithms: MFC Tree Browser, DMNeural! Major R&D on the way: text mining, MBR, MR, Java-based score code, forecasting, genomics.! DM-based Knowledge Solutions, co-operation with other Initiatives! and Web Mining is a key part from now on!