Gordon S. Linoff Founder Data Miners, Inc.

Similar documents
Unleashing the Enormous Power of Call Center KPI s. Call Center Best Practices Series

Call Center Best Practices

Social Networking for Churn Analysis. Predictive Analytics World 2011 SF David Katz Dataspora David Katz Consulting

The Service Desk Balanced Scorecard

Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry

Session 15 Business Intelligence: Data Mining and Data Warehousing

MONITORING HEIFER PROGRAMS

U.S. Farm and Retail Egg Price Relationships to 2005 Changing Share of the Consumer's Egg Dollar

The Call Center Balanced Scorecard

Data Mining in CRM THE CRM STRATEGY

THE PROFIT-DRIVEN MARKETER:

Business Intelligence Intelligent Business

PASSPORT TO PERFORMANCE Your Year-End. Empowering you to do your best work every day

Architecting. the. Customer Centric. Enterprise. WHITE PAPER JANUARY Merging Elements Corporation

5 Star London Hotels - Example Report

Dairy Outlook. January By Jim Dunn Professor of Agricultural Economics, Penn State University. Market Psychology

DEMAND FORECASTING (PART II)

Chapter 3 Sales forecasting

Recurly Subscription Snapshot

Call Center Benchmark India

IAF Advisors Energy Market Outlook Kyle Cooper, (713) , October 31, 2014

Coachella Valley Median Detached Home Price Mar Mar 2017

THE VALUE OF DISCRETE-EVENT SIMULATION IN COMPUTER-AIDED PROCESS OPERATIONS

Village Phone A Tool for Women s Empowerment

PREPAID CUSTOMER SEGMENTATION IN TELECOMMUNICATIONS

Transforming your Historical Metrics to a Futuristic State

PRODUCTION PLANNING SYSTEM CHAPTER 2

SURFACE WATER WITHDRAWALS & LOW FLOW PROTECTION POLICY MICHAEL COLLEGE, P.E. SUSQUEHANNA RIVER BASIN COMMISSION

Dairy Outlook. April By Jim Dunn Professor of Agricultural Economics, Penn State University. Market Psychology

Call Center Benchmark

Who Are My Best Customers?

Predictive Analytics for Retail Businesses. Cenacle Research India Private Limited

The State of Sustainable Agriculture in the EU

How Banks Can Generate More Revenue and Profit by Enabling Customer Centricity

Regional Habitat Indicators Project. Workshop #1. Nov 9, 2016

Demand Response Programs. Donnie Clary CoServ President/CEO

The Business Benefits of Managed IT Services

Energy Market Outlook

Paperless Office An Organizational Vision An igate White Paper

Telecommunications Churn Analysis Using Cox Regression

A Roadmap to Success: Five Guiding Principles to Incentive Compensation Planning. KMK Consulting, Inc.

Aggregate Planning and S&OP

Credit Risk Modeling: Basel versus IFRS 9

Operations Management

Administration Division Public Works Department Anchorage: Performance. Value. Results.

Machine Learning 101

IM S5028. IMS Customer Relationship Management. What Is CRM? Ilona Jagielska 1

BULGARIA ICAP BULGARIA. Business Information Services Presentation. March ICAP BULGARIA Your Business Partner

FDA Experience with the Sentinel Common Data Model: Addressing Data Sufficiency

Fully Automated Real-Time Analytics to Action

Heating the University of Glasgow with river sourced heat pumps. Adam McConkey Andrew Poon-King Zhibin Yu

2017 LOSS PREVENTION / RISK MANAGEMENT SEMINARS APRIL 20, 2017 DALLAS, TEXAS

Internal Medicine Faculty Affairs Staffing Analysis Program & Operations Analysis University of Michigan Health System

Getting Real Business Value out of Oracle BI and Oracle Data Mining

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University

COURSE LISTING. Courses Listed. Training for Applications with Integration in SAP Business One. 27 November 2017 (07:09 GMT) Advanced

ASSIGNMENT SUBMISSION FORM

Impact of the Dark Rule Amendments. Baiju Devani Director, Analytics IIROC

A NEW COINCIDENT INDICATOR FOR THE PORTUGUESE ECONOMY*

The Non Market Value of Water in Oklahoma

Meaningful Metrics Using Data to Inform Fundraising Strategy in February 2017

U.S. Packing Capacity Sufficient for Expanding Cattle Herd

Solar Power Realities

Employer Branding Essentials. 4 Tips Inspired by LinkedIn s Top Attractors Ranking

Memberships Guide. Detailed Overview of The Reports in The Envision Software

LIKES ARE GREAT, LEADS ARE BETTER

Veal Price Forecast. October 2015

% Change. Total. Total Retail Sales Index* Estimate ($M)

The next release is scheduled for Thursday, December 8, 2011 at 10:00 A.M. (KST) In the U.S Wednesday, December 7, 2011 at 8:00 P.

Why global businesses need global audit networks October 2012

Leveraging Smart Meter Data & Expanding Services BY ELLEN FRANCONI, PH.D., BEMP, MEMBER ASHRAE; DAVID JUMP, PH.D., P.E.

3 Ways to Improve Your Targeted Marketing with Analytics

Cards and Payments: Innovation Designed Through Empathy

Approaching an Analytical Project. Tuba Islam, Analytics CoE, SAS UK

HR MYTHBUSTERS. The Reality of Work at Mid-Market Companies Nationwide Namely, Inc.

THE MOBILE OPPORTUNITY CRM FOR ON-THE-GO SALES TEAMS

DISPLACEMENT TRACKING MATRIX V2.0 UPDATE 31 MARCH 2013

Advanced Analytics through the credit cycle

Premium Sample Reports

ENERGY SLIDESHOW. Federal Reserve Bank of Dallas

Outlook for Natural Gas Demand for Winter

Predictive Safety s Fatigue Management System: Benefits Realization Report

Use of Data Mining and Machine. Use of Data Mining and Machine. Learning for Fraud Detection. Learning for Fraud Detection. Welcome!

FACEBOOK USER-GENERATED CONTENT (UGC) BENCHMARK REPORT

Teller & Cash Activity Analysis Tools

Financial Forecasting:

New Specialty Crops for California

PORTFOLIO OPTIMIZATION MODEL FOR ELECTRICITY PURCHASE IN LIBERALIZED ENERGY MARKETS

Forecasting Introduction Version 1.7

Targeting, valuing, segmenting and loyalty techniques

Predictive Conversations

CP 94bis Statement of amounts due with weight brackets

Resource Development and Caribou In Nunavut Finding a Balance

Men at Work in Conrad Liveris +61 (0)

Giuseppe Giordano Founder & CEO

Renewable Energy in The Netherlands September 2015

SUPPLY CHAIN EXCELLENCE IN WIDEX. June 2016

Transcription:

Survival Data Mining Gordon S. Linoff Founder Data Miners, Inc. gordon@data-miners.com

What to Expect from this Talk Background on survival analysis from a data miner s perspective Introduction to key ideas in survival analysis hazards survival competing risks Lots of examples Stratification Quantifying Loyalty Effort Voluntary and Involuntary Churn Forecasting Time to Reactivation and Re-purchase http://www.data-miners.com 2

Who Am I? I am not a statistician Adept with databases and advanced algorithms Founded Data Miners with Michael Berry in 1998 We have written three books on data mining Have become very interested in survival analysis for mining customer data survival data mining http://www.data-miners.com 3

What Does Data Mining Really Do? Provides ways to quantitatively measure what business users know or should know qualitatively Connects data to business practices Used to understand customers Occasionally, produces interesting predictive models http://www.data-miners.com 4

Data Mining Is About Customers Customer Relationship Lifetime Prospect New Customer Established Customer Former Customer Events Initial Purchase Sign-Up 2nd Purchase Usage Failure to Pay Voluntary Churn Customers Evolve Over Time http://www.data-miners.com 5

The State of the Customer Relationship Changes Over Time Starts P1 Starts P1 P2 P1 P1 P2 STOP P1 P2 P1 P3 Starts P1 P3 P1 http://www.data-miners.com 6 STOP

Traditional Approach to Data Mining Uses Predictive Modeling Jan Feb Mar Apr May Jun Jul Aug Sep Model Set 4 3 2 1 +1 http://www.data-miners.com 7 Score Set 4 3 2 1 +1 Data to build the model comes from the past Predictions for some fixed period in the future Present when new data is scored Models built with decision trees, neural networks, logistic regression, regression, and so on

Survival Data Mining Adds the Element of When Things Happen Time-to-event analysis Terminology comes from the medical world which patients survive a treatment, which patients do not Can measure effects of variables (initial covariates or time-dependent covariates) on survival time Natural for understanding customers Can be used to quantify marketing efforts http://www.data-miners.com 8

Example Results (Made Up) 99.9% of Life bulbs will last 2000 hours Mean-time-to-failure for hard disk is 500,000 hours A recent study published in the American Journal of Public Health found that life expectancy among smokers who quit at age 35 exceeded that of continuing smokers by 6.9 to 8.5 years for men and 6.1 to 7.7 years for women. [www.med.upenn.edu/tturc/pdf/benefits.pdf ] Example of initial covariate Stopping smoking before age 50 increases lifespan by one year for every decade before 50 http://www.data-miners.com 9

Original Statistics Life table methods used by actuaries for a long, long time These the are the methods we will be focused on Applied with vigor to medicine in the mid-20th Century Applied with vigor to manufacturing during 20th Century as well Took off with Sir David Cox s Proportion Hazards Models in 1972 that provide effects of initial covariates http://www.data-miners.com 10

Survival for Marketing Has Some Differences We are happy with discrete time (probability vs rate) Traditional survival analysis uses continuous time Marketers have hundreds of thousands or millions of examples Traditional survival analysis might be done on dozens or hundreds of participants We have the benefit of a wealth of data Traditional survival analysis looks at factors incorporated into the study We have to deal with window effects due to business practices and database reality Traditional survival ignores left truncation http://www.data-miners.com 11

To Understand the Calculation, Let s Focus on the End of the Relationship START tenure is the duration of the relationship STOP Challenge defining beginning of customer relationship Challenge defining end of customer relationship Challenge finding either in customer databases http://www.data-miners.com 12

How Long Will A Customer Survive? Percent Survived 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 12 24 36 48 60 72 Tenure (months) 84 High End Regular 96 108 120 Survival always starts at 100% and declines over time If everyone in the model set stopped, then survival goes to 0; otherwise it is always greater than 0 Survival is useful for comparing different groups http://www.data-miners.com 13

Key Idea: Data is Censored Known tenure when customers stop Known Customer Start time http://www.data-miners.com 14 Unknown tenure for active customers (censored)

Use Censored Data to Calculate Hazard Probabilities Hazard, h(t), at time t is the probability that a customer who has survived to time t will not survive to time t+1 h(t) = # customers who stop at exactly time t # customers at risk of stopping at time t Value of hazard depends on units of time days, weeks, months, years Differs from traditional definition because time is discrete hazard probability not hazard rate http://www.data-miners.com 15

Bathtub Hazard (Risk of Dying by Age) 3.0% 2.5% 2.0% 1.5% 1.0% 0.5% 0.0% http://www.data-miners.com 16 Bathtub hazard starts high, goes down, and then increases again Example from US mortality tables shows probability of dying at a given age 0-1 yrs 1-4 yrs 5-9 yrs 10-14 yrs 15-19 yrs 20-24 yrs 25-29 yrs 30-34 yrs 35-39 yrs 40-44 yrs 45-49 yrs 50-54 yrs 55-59 yrs 60-64 yrs Haz ard 65-69 yrs 70-74 yrs Age (Years)

Hazards Are Like An X-Ray Into Customers Hazard (Daily Risk of Churn) 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% Peak here is for nonpayment and end of initial promotion 0 60 120 180 240 300 360 420 480 540 600 660 720 Tenure (Days) Bumpiness is due to within week variation http://www.data-miners.com 17

Hazards Can Show Interesting Features of the Customer Lifecycle Daily Churn Hazard 8% 7% 6% 5% 4% 3% 2% Peak here is for nonpayment after one year Peak here is for nonpayment after two years 1% 0% 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 630 660 690 720 750 780 810 840 870 900 930 960 990 Tenure (Days After Start) http://www.data-miners.com 18

How Long Will A Customer Survive? Survival analysis answers the question by rephrasing it slightly: What proportion of customers survive to time t? Survival at time t, S(t), is the probability that a customer will survive exactly to time t Calculation: S(t) = S(t 1) * (1 h(t 1)) S(0) = 100% Given a set of hazards by time, survival can be easily calculated in SAS or a spreadsheet http://www.data-miners.com 19

Survival is Similar to Retention Retention/Survival 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Retention is jagged. It is calculated on customers who started exactly w weeks ago. 0 10 20 30 40 50 60 70 80 90 100 110 Tenure (Weeks) Survival is smooth. It is calculated using customers who started up to w weeks ago. http://www.data-miners.com 20

Why Survival is Useful Compare hazards for different groups of customers Calculate median time to event How long does it take for half the customers to leave? Calculate truncated mean tenure What is the average customer tenure for the first year after starting? For the first two years? Can be used to quantify effects http://www.data-miners.com 21

Median Lifetime is the Tenure Where Survival = 50% 100% 90% Percent Survived 80% 70% 60% 50% 40% 30% 20% 10% 0% High End Regular 0 12 24 36 48 60 72 Tenure (months) 84 96 108 120 http://www.data-miners.com 22

Average Tenure Is The Area Under The Curves 100% 90% 80% Percent Survived 70% 60% 50% 40% 30% 20% 10% average 10-year tenure regular customers = 44 months (3.7 years) High End Regular average 10-year tenure high end customers = 73 months (6.1 years) 0% 0 12 24 36 48 60 Tenure 72 84 96 108 120 http://www.data-miners.com 23

Survival to Quantify Marketing Efforts A company has a loyalty marketing effort designed to keep customers This effort costs money What is the value of the effort as measured in increased customer lifetimes? SOLUTION: survival analysis How much longer to customers survive after accepting the offer? How to quantify this in dollars and cents? http://www.data-miners.com 24

Loyalty Offers is an Example of a Time- Dependent Covariate Non-Responders have no loyalty date Responders have a loyalty date Open Circle indicates that customer has stopped before (or on) the analysis date time Time = 0 Time = n Time c http://www.data-miners.com 25

We Can Use Area to Quantify Results 100% Percent Retained 95% 90% 85% 80% 75% 70% How much is this difference worth? 65% Group 1 Group 2 0 2 4 6 8 10 12 14 16 18 Months After Intervention Chart shows survival after loyalty offer acceptance compared to similar group not given offer http://www.data-miners.com 26

We Can Use Area to Quantify Results Percent Retained 100% 95% 90% 85% 80% 75% 70% 65% 0 2 Group 1 Group 2 4 6 8 10 93.4% 85.6% 12 14 Months After Intervention 16 18 Increase in survival is given by the area between the curves. For the first year, area of triangle is a good enough estimate Note: there are easy ways to calculate the exact value http://www.data-miners.com 27

Loyalty-Responsive Customers Are Doing Better Than Others Survival for first year after loyalty intervention Group 1: 93.4% Group 2: 85.6% Increase for Group 1: 7.8% Average increase in lifetime for first year is 3.9% (assuming the 7.8% would have stayed, on average, 6 months) Assuming revenue of $400/year, loyalty responsive contribute an additional revenue of $15.60 during the first year This actually compares favorably to cost of loyalty program http://www.data-miners.com 28

Competing Risks: Customers May Leave for Many Reasons Customers may cancel Voluntarily Involuntarily Migration Survival Analysis Can Show Competing Risks overall S(t) is the product of S r (t) for all risks We ll walk through an example Overall survival Competing risks survival Competing risks hazards Stratified competing risks survival http://www.data-miners.com 29

Overall Survival for a Group of Customers 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Sharp drop during this period, otherwise smooth Overall 0% 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 http://www.data-miners.com 30

Competing Risks for Voluntary and Involuntary Churn 100% 90% 80% 70% 60% 50% 40% 30% Steep drop is associated with involuntary churn Involuntary Voluntary Overall 20% 10% 0% Survival for each risk is always greater than overall survival 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 http://www.data-miners.com 31

What the Hazards Look Like 9% 8% 7% As expected, steep drop has a very high hazard Involuntary Voluntary 6% 5% 4% 3% 2% 1% 0% 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 http://www.data-miners.com 32

Most Involuntary Churn is from Non Credit Card Payers 100% 90% 80% 70% 60% 50% Involuntary-CC Voluntary-CC Involuntary-Other Voluntary-Other Steep drop associated only with non-credit card payers 40% 30% 20% 10% 0% 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600 http://www.data-miners.com 33

Time to Reactivation When a customer stops, often they come back this is winback or reactivation In this case, the initial condition is the stop The final condition is the restart Not all customers restart http://www.data-miners.com 34

Survival Can Be Applied to Other Events: Reactivation 100% 10% 90% 9% Survival (Remain Deactivated) 80% 70% 60% 50% 40% 30% 20% 10% 8% 7% 6% 5% 4% 3% 2% 1% Hazard Probability ("Risk" of Reactivating) 0% 0% 0 30 60 90 120 150 180 210 240 270 300 330 360 Days After Deactivation http://www.data-miners.com 35

Time to Next Order In retailing type businesses, customers make multiple purchases Survival analysis can be applied here, too The question is: how long to the next purchase? Initial state is: date of purchase Final state is: date of next purchase It is better to look at 1 survival rather than survival http://www.data-miners.com 36

Time to Next Purchase, Stratified by Number of Previous Purchases 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 05-->06 04-->05 03-->04 02-->03 ALL 01-->02 0% 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 http://www.data-miners.com 37

Customer-Centric Forecasting Using Survival Analysis Forecasting customer numbers is important in many businesses Survival analysis makes it possible to do the forecasts at the finest level of granularity the customer level Survival analysis makes it possible to incorporate customer-specific factors Can be used to estimate restarts as well as stops http://www.data-miners.com 38

Using Survival for Customer Centric Forecasting Number Actual Predicted 130 120 110 100 90 80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 Day of Month http://www.data-miners.com 39

The Forecasting Solution is a Bit Complicated Existing Customer Base New Start Forecast Existing Customer Base Forecast Do Existing Base Forecast (EBF) Do Existing Base Churn Forecast (EBCF) Do New Start Forecast (NSF) Do New Start Churn Forecast (NSCF) New Start Forecast Churn Forecast Churn Actuals Compare http://www.data-miners.com 40

Survival Data Mining Connects Data to Business Needs Provides ways to quantitatively measure what business users know or should know qualitatively Connects data to business practices Techniques such as survival analysis provide new ways of looking at customers Techniques such as customer-centric forecasting integrate data mining with business processes such as forecasting http://www.data-miners.com 41