THE POWER OF MACHINE LEARNING IN SEGMENTING CRM DATABASES. londonr: 27 TH MARCH 2018

Size: px
Start display at page:

Download "THE POWER OF MACHINE LEARNING IN SEGMENTING CRM DATABASES. londonr: 27 TH MARCH 2018"

Transcription

1 THE POWER OF MACHINE LEARNING IN SEGMENTING CRM DATABASES londonr: 27 TH MARCH 2018

2 WHO AM I? Jeremy Horne, Head of Analytics, MC&C Media My specialist subjects are Reporting automation Data visualisation Customer analytics Machine learning

3 WHO ARE MC&C?

4 We are The Growth Agency We are an independent media agency with a relentless focus on delivering continuous, profitable growth.

5 We re on our own growth trajectory 60m billings 2x income in 3 years 36 clients 19 global partners 20% growth YoY 360 capabilities under one roof 2 IPA Effectiveness Awards 77 Hand picked talents

6 capabilities under one roof Media Creativity Business Modelling Data & Tech

7 This is what my friends and family think I do

8 Our data strategy experience extends from property and publishing to healthcare and utilities

9 TODAY S CHALLENGE: HOW DO WE IDENTIFY THE CUSTOMERS ON A CRM DATABASE WHO ARE MOST LIKELY TO MAKE A PURCHASE THIS MONTH? 9

10 START BY UNDERSTANDING THE DATABASE Most of our customer analytics start by creating a Recency, Frequency, Value (RFV) cube RFV analysis focuses on the customer, rather than business objectives, by defining rules and boundaries against three core parameters: Action / purchase frequency In an RFV cube: Least recent Time since last action / purchase Most recent High 73,686 27,486 90, , ,398 58,212 37,966 97,595 56, ,715 78,869 92, ,078 74,024 76,277 88,864 79,388 99, ,520 82,708 28,245 77,036 63,585 69,726 98,671 Low 60,273 66,029 57,285 66,964 69,347 when a customer last purchased (Recency) how regularly they make such a purchase (Frequency) how much they spend (Value) The top right-hand corner (darkest green) indicates the customers that are most valuable (highly frequent, most recent), The bottom left hand (darkest red) corner contains the lowest value customers (least frequent, least recent).

11 MOST DATABASES ARE DOMINATED BY LOWER VALUE SEGMENTS RFV analysis by percentage of customers Frequency of donation Most recent donation More than 24 months ago months ago 7-12 months ago 4-6 months ago Within past three months Four or more times per year 0.00% 0.01% 0.10% 0.19% 0.60% 0.89% Three times per year 0.01% 0.19% 0.88% 1.30% 2.19% 4.57% Twice per year 0.13% 1.37% 2.84% 2.60% 3.02% 9.96% Once per year 2.80% 8.01% 6.50% 4.48% 4.15% 25.94% Less than once per year 20.21% 17.97% 8.96% 6.46% 5.04% 58.63% TOTAL The table represents a low recency, low frequency database (nearly 60% donate once per year, only 15% twice or more). The challenge is to shift customers in these lowest value blocks (shown in green) towards the higher value blocks (top right hand boxes of the table).

12 CONVENTIONAL TECHNIQUES DO NOT ALWAYS DISCRIMINATE BETWEEN SEGMENTS RFV analysis by percentage of customers Frequency of donation Most recent donation More than 24 months ago months ago 7-12 months ago 4-6 months ago Within past three months Four or more times per year 0.00% 0.01% 0.10% 0.19% 0.60% 0.89% Three times per year 0.01% 0.19% 0.88% 1.30% 2.19% 4.57% Twice per year 0.13% 1.37% 2.84% 2.60% 3.02% 9.96% Once per year 2.80% 8.01% 6.50% 4.48% 4.15% 25.94% Less than once per year 20.21% 17.97% 8.96% 6.46% 5.04% 58.63% TOTAL We compared the three highest and lowest RFV segments, but could not find any differences in any of their geodemographic characteristics (e.g. through MOSAIC or ACORN), as shown by the example below Breakdown of household types in high and low value segments % breakdown (high value segments) % breakdown (low value segments) Older Families & Mature Couples 32.9% 32.6% Elders In Retirement 26.9% 27.1% Families With School Age Children 24.8% 23.3% Young Couples With Children 12.8% 13.6% Pre-Family Couples & Singles 1.8% 2.1% Unknown 0.7% 1.3%

13 Machine Learning Methods that give computers the ability to learn and improve without being explicitly trained Regression Classification Novelty detection Independent variables with continuous output, Example: what price will customer X pay for product y? Outcomes with a discrete set of responses Example: which of our products is customer x most likely to buy next? Predict a deviation from a pre-defined norm Example: will customer X actually make a purchase next month

14 A SIMPLE EXAMPLE OF NOVELTY DETECTION The chart shows a selection of candidates from the lowest value cohort of a CRM database, based on two of their attributes Attribute Attribute 1 The norm for a low value cohort is that a customer does not make a purchase But there is something special about those in blue something that only a machine can detect These algorithms can be run against conventional selection methods to boost campaign effectiveness

15 WHAT DATA WOULD A TYPICAL MODEL CONTAIN? The data is the second most crucial part to a good machine learning model - incomplete datasets will often produce bad models, specifically: Put in all of the variables you think are important - the algorithm will ignore those that aren t statistically significant anyway Start with as much historical data as possible (at least three years if available) and use the machine to test for the optimal amount of history Gaps in the data are fine and best left as gaps - the machine can handle them and will perform better than if you create proxy values In terms of variables, the following are crucial to producing valuable, predictive segmentations: Age Gender Location Membership type Number of previous purchases Date of last purchase Amount spent on previous purchases Products purchased Length of time on database Geodemographic coding It is the geodemographic profile that is most important in these problems postcode just isn t enough to identify meaningful patterns; geodemographics are vital to further differentiate between cohorts

16 PREPARING THE DATASET We receive data in a variety of formats and with all our modelling taking place in R, we are able to take advantage of a wealth of data reading and wrangling packages, including: data.table dplyr lettercase reshape2 stringr XLConnect

17 HOW DO WE APPLY THE MODEL IN R? We use the kernlab package for kernel based modelling Split the data into training and testing sets Define and train a model based on a set of variables and parameters Use the model to make predictions Continually refine the model

18 BOOSTING THE MODEL Most models often get swamped by the volume of normal outcomes, i.e. a purchase not being made and this makes it difficult for the algorithm to predict where the deviations might occur We use a technique known as boosting to reduce this proportion of normal outcomes and assist predictive quality

19 THIS MEANS WE NEED TO RUN THROUGH MANY ITERATIONS OF THE MODEL

20 WHAT DOES OUR OUTPUT LOOK LIKE? Using the predict function within the kernlab package and by including the type="probabilities option, we get a two column output: a probability of purchase a probability of a purchase not being made Each row represents our testing set (in the same order), so we can retrospectively add the customer ID s back into the data We can then use the probabilities to segment the customers into four blocks: Very high (probability of 0.6 or higher) High (probability between 0.5 and 0.6) Medium (probability between 0.4 and 0.5) Low (probability below 0.4)

21 HOW ACCURATE ARE OUR RESULTS? 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% Example results: Machine learning accuracy rates % correct predictions Actual purchase rate Very high High Medium Low In this case, around 5% of the customers tested actually made a purchase within the given period Machine learning was deployed to identify new opportunities within the database Four segments were developed where; the very high potential segment went on to deliver a 40% purchase rate the high segment a 20% purchase rate the medium segment a 15% purchase rate All of which were significantly in excess of the purchase rate for the entire group The low value segment delivered a 3% purchase rate; significantly lower than that of the group

22 COMMERCIAL MEANING Our approach has presented some significant commercial advantages for our clients: Client A Client B Database size 230, ,000 Low value block size 90,000 76,000 Average customer spend per purchase Potential revenue from 10% of customers 5m - 6m 220K

23 AND WE RE NOT JUST LIMITED TO MACHINE LEARNING IN R As an agency, most of our analytics and data science projects involve R at some point in their lifecycle. Other recent use cases of R: Marketing Mix Modelling Digital attribution Reporting automation through API connections Spot matching Google analytics deep dive Facebook spend tracking via daily web app

24 QUESTIONS? 24