Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Similar documents
Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

SPM 8.2. Salford Predictive Modeler

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Deep Dive into High Performance Machine Learning Procedures. Tuba Islam, Analytics CoE, SAS UK

Brian Macdonald Big Data & Analytics Specialist - Oracle

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

Data Science for PRISMA. Christoph Euler September 20, 2017

Building the In-Demand Skills for Analytics and Data Science Course Outline

3 Ways to Improve Your Targeted Marketing with Analytics

CSC-272 Exam #1 February 13, 2015

Credit Card Marketing Classification Trees

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

SAS Machine Learning and other Analytics: Trends and Roadmap. Sascha Schubert Sberbank 8 Sep 2017

Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS

EXPERIENCES ON INCREMENTAL RESPONSE MODELLING

Advanced Analytics through the credit cycle

SAS & HADOOP ANALYTICS ON BIG DATA

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

IBM SPSS & Apache Spark

Data Mining. Chapter 7: Score Functions for Data Mining Algorithms. Fall Ming Li

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

SAS ANALYTICS IN ACTION APPROACHABLE ANALYTICS AND DECISIONS AT SCALE TUBA ISLAM, SAS GLOBAL TECHNOLOGY PRACTICE, ANALYTICS

Analytics: The Widening Divide

IBM SPSS Modeler Personal

E-Commerce Sales Prediction Using Listing Keywords


SAS Business Knowledge Series

In-Memory Analytics: Get Faster, Better Insights from Big Data

Strategic inventory management through analytics

2016 INFORMS International The Analytics Tool Kit: A Case Study with JMP Pro

Churn Prediction for Game Industry Based on Cohort Classification Ensemble

depend on knowing in advance what the fraud perpetrators are atlarnpting. Sampling and Analysis File Preparation Nonnal Behavior Samples

Decision Tree Learning. Richard McAllister. Outline. Overview. Tree Construction. Case Study: Determinants of House Price. February 4, / 31

DATA ANALYTICS WITH R, EXCEL & TABLEAU

Evaluation next steps Lift and Costs

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

Using Decision Tree to predict repeat customers

How to build and deploy machine learning projects

Evaluation of Machine Learning Algorithms for Satellite Operations Support

To Build or Buy BI: That Is the Question! Evaluating Options for Embedding Reports and Dashboards into Applications

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Getting Started with OptQuest

Machine Learning 101

IBM SPSS Statistics Editions

Ranking Potential Customers based on GroupEnsemble method

Is Machine Learning the future of the Business Intelligence?

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Putting Big Data & Analytics to Work!

Bringing the Power of SAS to Hadoop Title

MANAGING NEXT GENERATION ARTIFICIAL INTELLIGENCE IN BANKING

Data Mining Applications with R

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

Data Warehousing. and Data Mining. Gauravkumarsingh Gaharwar

Predictive Conversion Modeling

Harnessing Predictive Analytics to Improve Customer Data Analysis and Reduce Fraud

Big Data. Methodological issues in using Big Data for Official Statistics

SAP Predictive Analytics Suite

The Age of Intelligent Data Systems: An Introduction with Application Examples. Paulo Cortez (ALGORITMI R&D Centre, University of Minho)

Approaching an Analytical Project. Tuba Islam, Analytics CoE, SAS UK

Chapter 5 Evaluating Classification & Predictive Performance

Olin Business School Master of Science in Customer Analytics (MSCA) Curriculum Academic Year. List of Courses by Semester

Describing DSTs Analytics techniques

Dealing with Missing Data: Strategies for Beginners to Data Analysis

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT

Application of Machine Learning to Financial Trading

Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry

Statistical Observations on Mass Appraisal. by Josh Myers Josh Myers Valuation Solutions, LLC.

Step-by-step CRM data analytics for sales teams. How to go beyond metrics and find actionable insights to improve sales performance

Week 1 Unit 4: Defining Project Success Criteria

Business Intelligence, Analytics, and Predictive Modeling Kim Gaddy

HASHTAGS? NUMBER OF WORDS? SPONSORED? BUDGET? BEHIND THE HYPE TIMING? AUDIENCE? AN INFLUENCER STUDY BY

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks

Getting Started with HLM 5. For Windows

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

ECONOMIC MACHINE LEARNING FOR FRAUD DETECTION

Strategic Segmentation: The Art and Science of Analytic Finesse

Netflix Optimization: A Confluence of Metrics, Algorithms, and Experimentation. CIKM 2013, UEO Workshop Caitlin Smallwood

Data maturity model for digital advertising

Achieve Better Insight and Prediction with Data Mining

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization

Predicting user rating on Amazon Video Game Dataset

Tree Depth in a Forest

Insight is 20/20: The Importance of Analytics

Intro Logistic Regression Gradient Descent + SGD

Modeling with SAP Infinite Insight

Keyword Performance Prediction in Paid Search Advertising

Ensuring a Sustainable Architecture for Data Analytics

A logistic regression model for Semantic Web service matchmaking

Video Traffic Classification

The Inventory Optimization Maturity Curve

Quality Control Assessment in Genotyping Console

Business Intelligence and Process Modelling

Preference Elicitation for Group Decisions

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge

How to Get More Value from Your Survey Data

GLMs the Good, the Bad, and the Ugly Ratemaking and Product Management Seminar March Christopher Cooksey, FCAS, MAAA EagleEye Analytics

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

Transcription:

Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo

Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles A Model-agnostic Methodology for Interpretability

Hello! About Me data scientist telecommunications industry mathematics background coffee enthusiast women in STEM www.linkedin.com/in/helen-ngo/ helen.ngo14@gmail.com

What s more accurate than one great model? Sometimes, a lot of not-too-shabby models Ensemble modeling: learning algorithms to combine s by weighting their predictions Lots of models; diverse algorithms; uncorrelated predictions You re more likely to be right most of the time Ensembles only more accurate than individual models if the models disagree with each other

The simple ones Simple average, voting Ensemble node in Enterprise Miner Bagging: sample with replacement Random forests: random subset of features for many trees trained in parallel Boosting: iteratively reweight misclassified observations in training XGBoost, AdaBoost

Slightly more sophisticated Top-T: for N models and t <= N, Take best T models according to your accuracy measure of choice Use validation data to select optimal value for T More is not always better! Unsupervised cluster ensembling Use PCA to assign probabilities from N models to clusters based on original features Use the models in cluster with top-t method Optimal ensemble Not used

Popularized by our favourite data science competition Stacking & blending Use posterior probabilities from trained models as numerical inputs to model original target variable Can have several stacked layers Stacking functions can be any supervised learning method Training data Final prediction

On the shoulders of giants Hill climbing Rank models by some accuracy measure and calculate the incremental ensemble performance by adding one at a time Greedy algorithm to choose next model to add Final ensemble chosen based on overall accuracy metric Models can be added multiple times and weighted differently; Model Space 1 2 3 powerful models can be added many times Iteration 1 1 2 Iteration 2 1 2 Iteration 3 1 2 2 2 Ensemble 3 1 3 2 2 3 3

In Enterprise Miner Source: Wendy Czika and the SAS team https://github.com/sassoftware/dm-flow/tree/master/ensemblemodeling

The reason why ensembles work Bias-variance error decomposition Bias (underfitting) & variance (overfitting) traded off More complexity more overfitting What if we have a bunch of lowbias, high-variance models? Ensemble them all same bias, lower variance Source: http://scott.fortmann-roe.com/docs/biasvariance.html Total error is lower! Less correlation is better (higher reduction in variance)

Thoughts: post-real-life experimentation Success! Overcame noisy marketing data with ensembles Best ensembles created out of weak learners without many transformations Ensembles don t necessarily win over simpler models depends on your use case The best ensemble models outperform the best non-ensemble significantly 3.7x vs. 3.4x lift

Downsides of ensembles Search space for best ensemble is large Time-consuming to train Feature impact on final score is unclear Lack of explanation for customer-facing decisions Hard to sell to a business decision maker Source: https://www.hindawi.com/journals/mpe/2016/3845131/fig1/

Model Contributions: Peeking Under the Hood Toronto Data Mining Forum November 2017 Adrian Muresan

Agenda Adrian Muresan Topic Page Current Role Education BI Team Main Responsibilities Senior Analyst Business Intelligence Started at Bell in January 2015 MSc Mathematics MSc Computing Modeling and Segmentation Lead a Predictive Modeling Team Why Look Into Models 3 How To Look Into Models 4 Looking Into Models 5-11 Performance Stability 5 Variable Impact Analysis 6 Lift Drop Analysis 7 Correlation Analysis 8 Variable Level Historical Trends 9 Page 14 Predictive Modeling: Looking Under the Hood

Why Look Under The Hood Legislation: Under the new EU General Data Protection Regulation (GDPR) set to come into effect in 2018 all EU citizens will have a right to explanation from vendors using their information, even if they are based outside the EU Promote Use: Cool ML products are redundant if they re not adopted, and fear of the unknown outside of the ML/AI community keeps marketers from trusting ad using your results Ethics: We must be careful in the way we allow data to influence our actions to ensure we are making decisions that are ethical. The book: Weapons of Math Destruction: The Dark Side of Big Data O Neil (2016) expounds upon this topic very well Accuracy: Sometimes historical performance may look stable but this does not always imply strong robust results going forward Page 15 Predictive Modeling: Looking Under the Hood

How To Look Into Models What Is It? The metrics discussed here are generated by a SAS EG program This program is set up to take environmental variables in it s first step then run automatically for any given model base for speed and I/O optimization) Needs metadata tables describing the input and output variables of the model For Models with coefficients needs metadata about which variables are assigned which coefficients Pros: Greatly reduces model iteration time Provides quick and much more in depth validation that is available within EM The process does not rely on the model having been built in EM, only assumes that you can score it and that the output results have a certain format Outputs a HTML documentation file which can be shared with others, or used to compare old versions Largely Model Agnostic Cons: Some of the processes take a great deal of processing power As the statistical rigor of your tests increase, the time required to run also increases These could be alleviated by transitioning to a distributed network for scoring Page 16 Prerequisites : Needs a method of scoring a model (preferably in data Predictive Modeling: Looking Under the Hood

Performance Stability Lift Stability by Decile: Shows how stable (or not) our performance is. Ideally each line in the graph to the right would be flat. Cumulative Response Stability: A different way of looking at the same stability problem. Ideally the various response curves to be practically indistinguishable form each other. Target Rate Over Time: Independent of your model results, but may explain deviations in your lift, it may also highlight points in time when there are data issues to consider. Page 17 Predictive Modeling: Looking Under the Hood

Variable Impact Goal: Measure how much the variation of a variable impacts the score of the observation What To Look For: Make sure that no single variable is responsible for too much of the variation in scores. Earmark those variables with higher impact for closer scrutiny making sure that they are not overfit. Methodology: See what the average scores are based on different levels of each variable and see how much variation there is between levels Note that some levels will have vastly different number of observations and this needs to be accounted for when computing this metric Page 18 Predictive Modeling: Looking Under the Hood

Lift Drop Percentage Goal: Measure how much of the lift the model sees is being provided by a single variable What To Look For: Ensure that the entire lift is not coming form a single variable (or even a small cluster) Ensure that the top contributors are not overly correlated or in some ways proxies for each other Earmark those variables with higher contributions to make sure they are not leading indicators for our target Methodology: Rescore the model while fixing the variable you are testing and compare results Note that for statistical significance (particularly with non linear models) do this with multiple periods and multiple values Page 19 Predictive Modeling: Looking Under the Hood

Correlation to Score Goal: Ensure that the average score produced at different levels of our variables correlate to the target rate on that level What To Look For: Both that the correlation between the score and the target rate is high, and that it is stable For models where that have coefficients make sure the coefficients are not more correlated than the scores, this implies overfitting Methodology: Proc corr works well for this Make sure to take into account the number of observations in each level Optionally weight these results by the variable impact or lift drop metrics discussed previously Page 20 Predictive Modeling: Looking Under the Hood

Variable Level Historical Trends Historical Correlation: Very similar to the correlation analysis at the model level, but this can also help us identify which variable is most to blame for drops in correlation/lift Historical Lift Drop: Ensure that the lift drop metric is stable over time and does not spike at certain points. If there is a particular time when this spiked, look deeper into that period and see what was different. Page 21 Predictive Modeling: Looking Under the Hood

Variable Level Historical Trends Goal: Make sure that the scores generated are fitting the target rate, not only on an average level, but also period by period. What To Look For: Strong correlation between the target rate and the average score predicted For models that have coefficients coefficients, make sure the coefficients are also correlated, when the score is correlated but the coefficient is not, this indicates that the variable is correcting for another misattributed weight Check to ensure the distribution of observations by level is consistent over time Methodology: Use the metadata tables that EM produces as well as in database scoring to produce the average score metrics. Page 22 Predictive Modeling: Looking Under the Hood

Use Case Page 23 Predictive Modeling: Looking Under the Hood

Questions?

Appendix Metadata Node: Change the original predictors back to inputs. Attach a Metadata node to the winning model node Click Train under the Variables section Change the Role of the target_ind to be REJECTED Change the Role of the Event Probability to be TARGET This can alternatively be done via a SAS code node Training data Decision Tree: Use a tree node to model the scores. Change the Maximum Branch to be > 2 (the default is a binary tree) This tree should be maximally overfit as its only purpose is to describe the behaviour of the ensemble model The non-linear nature of the decision tree results in the best fit for the interactions as opposed to a Regression or GLM Decision Tree Ensemble