Presenter Daymond Ling, Professor, Seneca College

Size: px
Start display at page:

Download "Presenter Daymond Ling, Professor, Seneca College"

Transcription

1

2 Presenter Daymond Ling, Professor, Seneca College Daymond Ling is a Professor at Seneca College of Applied Technology and Arts where he has been teaching advanced analytics and machine learning in the School of Marketing since Prior to teaching, he was Senior Director, Advanced Analytics at Canadian Imperial Bank of Commerce where he focused on solving all manners of marketing analytics problems related to customer relationship management since He worked for American Express Canada in Risk Management before CIBC. Daymond received his M.Sc. degree in Operations Research and B.Sc. in Physics Honours from the University of British Columbia. He started using SAS in 1980 and has continued ever since. 1

3 Timing is Everything Detecting Important Behaviour Triggers

4 Predictive Analytics If the future looks like the past, then past patterns can be used to predict the future Past Now Future Historical Window Prediction Window Historical information up to current period used to find pattern that strongly correlates with target definition Target definition of desired outcome within some timeframe But Life is Dynamic. What if the process is non-stationary and has changed? 3

5 Think Outside The Box Compare different people at the same time: Follow the same person in time: 1. Who is likely to buy 2. Who has more money 1. Who changed? 2. What was the change? Change your perspective. Ask different questions to get new insights. 4

6 Change Point Analysis Shift in Mean

7 Change Point Analysis Change Point Analysis is the problem of estimating the point at which some statistical property changes. This presentation will focus on change in the mean, i.e., the average has shifted. Economy: Gross Domestic Product Stock Market Index Corporate Performance Metrics: Number of clients Portfolio balance Hundreds Hundreds Customer Details: EFT payroll of a chequing account Your monthly credit card spend Tens of Millions 6

8 Did the Mean shift? Naïve rule: mean(period 2) different from mean(period 1) by 20% These graphs meet 20+% change, but they are False Positives: Naïve rule also generate False Negatives. Statistical tests of mean difference, e.g., two sample t-test, involve ratio of mean difference to standard deviation, it is not based on the mean difference only. The issue with the naïve rule is that it does not take data variabilityinto account. The decision rule must be modified to take variability into consideration. 7

9 Detect Mean Shift via CUSUM Range CUSUM: cumulative sum of deviations of centered series If deviations are random, they tend to cancel out resulting in small CUSUM range Shifted sections have deviations of the same sign, CUSUM will move away from zero Decision rule: Large CUSUM range is indicative of shifted mean 8

10 How Large is Large? Leverage variability of empirical data to determine Large : Calculate Empirical CUSUM Calculate CUSUM distribution by randomly shuffling the data many times P-value of Empirical CUSUM is significant is proportion of Distribution >= Empirical By using actual data variability to perform significance test, False Positive and False Negative can be minimized Natural data variation determine whether empirical pattern is unusual 9

11 When Did It Happen? Two estimators for location of change: 1. Change occurred at Max Absolute CUSUM Simple to compute A little less precise 2. Change occurred at point of Minimum Variance More complex calculation More precise Recursively split a time series into many sections 10

12 Reaction Speed to New Change V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 Prob The first series has no change. Probability of mean shift = 0.15, insignificant. When shifted by one and appending a single large value, probability increases to 0.39, still insignificant. CPA is robust to single spike. Shifting the series by two and append two large values, probability increases to CPA is signaling heightened likelihood of mean shift, odds are 3:1. Shifting by three and appending three large values raises the probability to With four consecutive value, probability is For short time series, a section of three or more shifted value has P >=

13 Decision Logic Statistical Significance Business Significance Events of Interest No False Positive No False Negative Magnitude of change interesting to business True events for investigation and intervention 12

14 Pay Increase

15 Large Scale Computation Numerically intensive computation: End-to-end process built in SAS: 1. Hundreds of millions of time series 2. Random shuffle significance test 3. Minimization of Residual Sum of Squares Handles varying length time series 28 code modules including data prep 2,000 lines of code In-memory processing for efficiency (in-memory array data structures) 14

16 EFT Payroll Increase Process 155 Million payroll events Each month detect 1,200 1,500 clients Can detect large pay spike, e.g., lump sum payments Step Records CPU Time Elapsed 1. Extract four years of EFT payroll 155 Million records 18 minutes 25 minutes 2. Aggregation (customers with multiple acct) 145 Million records 2 minutes 2 minutes 3. Eliminate low pay and closed accounts 85 Million records 5 minutes 5 minutes 4. Pay frequency determination 62 Million records 2 minutes 2 minutes 5. Remove irregular off cycle pay 60 Million records 5 minutes 5 minutes 6. Kalman Filter smoothing of pay spikes 60 Million records 4 minutes 4 minutes 7. Change Point Detection 60 Million records 7 hours 7 hours 8. Event Selection 17K Accounts 1 minute 1 minute 15

17 Payroll increase examples (clean) 16

18 Payroll increase examples (noisy) 17

19 More money in pocket means We identify in one year 21K customers that grow funds by $300 Million and increase card spend by $50 Million 1. Payroll increase more often for younger people, and they open more accounts 2. Younger customers invest more, spend more, borrow more (new car, bigger house) 3. Older people just save their additional income, they don t spend more or borrow more Age After Pay Increase Per Customer Funds Increase Total in $Million Customer Account Increase Asset Lending Card Spend Funds Card Spend , % $5,900 $10,800 $3,200 $130 $ , % $5,500 $5,400 $2,400 $62 $ , % $9,600 $3,500 $3,100 $66 $ , % $16,100 $600 -$1,900 $42 -$5 All 21, % $7,900 $6,400 $2,300 $300 $50 18

20 Credit Card Spend

21 Credit Card Spend Decrease Scenario: Portfolio of 4 million+ credit cards Annual spend volume low to plan by approximately $700 million Causes: Reduced acquisition? Increased attrition? Slow down in economy? Increased competition in Reward cards? 20

22 Change Point Analysis Process Process 178 million records Each month detect 600 clients with significant PV decrease Step Records CPU Time Elapsed Time 1. Extract three years of account PV 178 Million 6 minutes 1 hour 2. Aggregate to customer 126 Million 15 minutes 15 minutes 3. Eliminate low spend and closed accounts 48 Million 6 minutes 6 minutes 4. Change Point Detection 48 Million 4 hours 4 hours 5. Event selection 8K customer 1 minute 1 minute 21

23 Losing $1 Million+ per customer in a year Each customer lowered spend by $1 Million+ per year 22

24 Losing $500K+ per year Each customer lowered spend by $500K+ per year 23

25 Spend Loss Identified Annual spend of 4million+ cards was low to plan by ~ $700 Million. CPA identified 8K clients with annual spend loss of ~ $545 Million. Group Customer % Customer Average Annual Spend Loss Annual Spend Loss Index % $1 million+ $109 million % $266K $109 million K 13% $110K $109 million K 26% $52K $109 million K 55% $25K $109 million 0.4 Total 8K 100% $68K $545 million

26 Interest Rate Sensitive Customers

27 Savings Account Bonus Rate Promotion Scenario: Frequent promotion of Bonus Interest Rate on new balance Portfolio balance fluctuates with promotion on/off Questions: Who is interest rate sensitive? How many are they? How much do they swing the portfolio balance? 26

28 Analysis Process 1. Change Point Analysis to identify accounts with balance changes (multi-period) 2. Check correlation of balance change pattern with campaign periods $5 Million fluctuation $500K fluctuation Grey = high interest promotion White = regular low interest 27

29 Outcome Portfolio consists of 400K+ accounts with ~ $20 Billion in balance. We found 9K customers that are responsible for ~ $4.6 Billion of Hot Money. Hot Money # of Customer Total ($Million) $ Million+ 815 $2,167 $500k - $1Million 1,313 $904 $250K - $500K 2,381 $838 $100K - $250K 4,361 $699 Total 8,870 $4,607 Instead of an interest rate issue, the conversation changed to how to look after wealthy customers that are looking for return from safe instruments. 28

30 Back to Predictive Analytics

31 Improve Predictive Analytics 1. Better Target Definition Balance/Spend increase/decrease model target definition use the naïve rule, thus the target set is not clean resulting in poor model Use CPA to define very clean targets won t mis-identify, won t miss any Align individual customer s time window on the exact time of change. This cleanly delineates and improves the historical and prediction window. 2. Better Input Features Events are cleaner and clearer signals compared to the raw time series Use past event triggers as inputs to predictive models to improve predictability Build customer event database 30

32 Timing is Everything 31

33 1489 Timing is Everything Thank You 32

34 Please Provide Feedback! 1. Go to the Agenda icon in the conference app. 2. Find this session title 1489 and select it. 3. On the sessions page, scroll down to Surveys and select the name of the survey. 4. Complete the survey and click Finish. 33

35