DASI: Analytics in Practice and Academic Analytics Preparation

Size: px
Start display at page:

Download "DASI: Analytics in Practice and Academic Analytics Preparation"

Transcription

1 DASI: Analytics in Practice and Academic Analytics Preparation Mia Stephens Copyright 2010 SAS Institute Inc. All rights reserved.

2 Background TQM Coordinator/Six Sigma MBB Founding Partner, Statistical Trainer and Consultant, North Haven Group Senior Consultant and Trainer, George Group (now Accenture) Adjunct Professor Statistics, University of New Hampshire Academic Ambassador, JMP Academic Team Author 2

3 Background 3

4 What is Analytics? an encompassing and multidimensional field that uses mathematics, statistics, predictive modeling and machinelearning techniques to find meaningful patterns and knowledge in recorded data. Descriptive/explanatory statistics. Understand what happened in the past and why. Predictive analytics. Using past data and predictive algorithms to determine what will happen next. Prescriptive analytics. Answering the question of what to do, by providing information on optimal decisions based on the predicted future scenarios. From SAS Analytics: 4

5 Analytics Frameworks SEMMA (SAS) Sample Explore Modify Model Assess CRISP-DM (Cross Industry Standard Process for Data Mining) Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment 5

6 The Business Analytics Process Define the Problem May loop back at any step Business Problem Prepare for Modeling Business Analytics Process Monitor Performance Modeling Deploy Model From Building Better Models with JMP Pro, Grayson, Gardner and Stephens,

7 The Business Analytics Process The focus of most courses? Business Problem Define the Problem May loop back at any step Prepare for Modeling What do we like to teach? Business Analytics Process Monitor Performance Modeling Deploy Model From Building Better Models with JMP Pro, Grayson, Gardner and Stephens,

8 Modeling: Activities and Tools Key Activities: Choose the appropriate modeling method or methods Fit one or more models Evaluate the performance of each model using validation statistics (misclassification, RMSE, Rsquare) Choose the best model or set of models to address the analytics problem (and ultimately the business problem) **Create ensemble models Key Tools: Multiple Regression Logistic Regression Naïve Bayes knn Classification and Regression Trees Bootstrap Forests and Boosted Trees Neural Networks Generalized Linear Models Survival Models Forecasting/Time Series Model Comparison Text Mining 8

9 The Business Analytics Process What is the most timeconsuming step? Business Problem Define the Problem May loop back at any step Prepare for Modeling Business Analytics Process Monitor Performance Modeling Deploy Model From Building Better Models with JMP Pro, Grayson, Gardner and Stephens,

10 Data Preparation: Activities and Tools Key Activities: Determine which data are needed Compile (or collect new) data Explore, examine and understand data Assess data quality Clean and transform data Define features Reduce dimensionality Create training, validation and test sets Key Tools: SQL, data import Data table structuring - join, concatenate, update, stack, summarize, Summary statistics and graphical displays, interactive tools and filtering Multivariate procedures (clustering, PCA, ) Transformations, creating derived variables, recoding, binning Addressing missing data and outliers Creating holdout set(s) 10

11 The Business Analytics Process Most neglected topics both academically and in practice? Business Problem Define the Problem May loop back at any step Prepare for Modeling Business Analytics Process Monitor Performance Modeling Deploy Model From Building Better Models with JMP Pro, Grayson, Gardner and Stephens,

12 Define the Problem Understand the business problem(s) and objectives ROI Frame the analytics problem and objectives Define project goals Develop a project plan Obtain resources, buy-in, approvals 12

13 Deploy Models and Monitor Performance Deliver the model and model results to the business or internal customers Communicate results (graphs and profilers, summaries, explore what if scenarios) Assist in applying model insights and implementing ongoing use of the model Document the project Close out the project Evaluate and quantify improvement Revise model Identify additional opportunities/problems 13

14 Best Practices from Top Programs How to provide context and real-world experience? Focus on application to real business problems Method-specific case studies Capstone projects Work in cross-functional teams Industry partnerships and projects Internships 14

15 Best Practices from Top Programs Analytics is different from statistics Not statistics programs/courses simply rebranded or renamed Heavy in application versus theory, equations and computation Start from the ground up in designing programs/courses 15

16 Best Practices from Top Programs Predictive analytics is different from descriptive or explanatory modeling De-emphasize:» hand-computation» p-values» rigorous adherence to meeting assumptions» measures of goodness-of-fit Emphasize: model accuracy and predictive ability on holdout sample 16

17 Feedback from Industry Most important skill - data understanding and intuition More important than understanding of theories and equations Students need to be able to: think with data know which methods to apply and when to understand and communicate the story in the data 17

18 Feedback from Industry Learning software/programming language is secondary to learning the concepts and methods The end goal is not to teaching statistical software or programming We can teach the software we use, we can t teach data intuition Software should facilitate learning Toolkit use the tools that best support the learning objectives (exploration of concepts, development of models, deployment, communication, ) 18

19 Thank you jmp.com/academic Copyright 2010 SAS Institute Inc. All rights reserved.