Business Data Analytics

Size: px
Start display at page:

Download "Business Data Analytics"

Transcription

1 MTAT Business Data Analytics Lecture 4

2 Marketing and Sales Customer Lifecycle Management: Regression Problems

3 Customer lifecycle

4 Customer lifecycle

5 Moving companies grow not because they force people to move more often, but by attracting new customers

6 Customer lifecycle Casey Conroy lifecycle of the customer relationship

7 Relationships based on commitment event-based subscription-based

8 Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Third Edition

9 Customer lifecycle

10 Customer lifecycle

11

12

13 Customer lifetime value estimate of acquiring and keeping any given customer. describes the amount of revenue or profit a customer generates over his or her entire lifetime we attempt to minimize cost per acquisition (CAC)

14 It is often referred to two forms of lifetime value analysis: Historical lifetime value simply sums up revenue or profit per customer. Predictive lifetime value projects what new customers will spend over their entire lifetime.

15 Current RFM Future RFM Predictive models time

16 Predicting purchase journeys? time

17 Algorithms predict purchase frequency, average order value, and propensity to churn to create an estimate of the value of the customer to the business. Predictive LTV is extremely useful for evaluating acquisition channel performance, using modeling to target high value customers, and identifying and cultivating VIP customers early in their brand journey. custora.com

18 Unsupervised learning Supervised learning

19 Supervised vs. Unsupervised Learning The goal of the supervised approach is to learn function that maps input x to output y, given a labeled set of pairs The goal of the unsupervised approach is to learn interesting patterns given only an input

20 Regression vs. Classification

21 Acquisition vs. Retention RFM_3 churn churn RFM_1 RFM_6 regular regular regular churn

22 Sleeping habits 4 hours of sleep 8 hours of sleep exam performance

23 Linear regression

24 Linear regression

25

26 Simple linear regression y x Task: given a list of observations find a line that approximates the correspondence in the data

27 Simple linear regression output (dependent variable, response) input (independent variable, feature, explanatory variable, etc)

28 Simple linear regression intercept (bias) mean of y when x=0 noise (error term, residual) shows what we are not able to predict with x coefficient (slope, or weight w) shows how increases output if input increases by one unit

29 Simple linear regression

30 Simple linear regression We search for a function such that minimizes mean squared error (MSE) : = which means to find derivatives wrt and and solve the system of equations:

31 Simple linear regression: example Built-in R dataset: a collection of observations of the Old Faithful geyser in the USA Yellowstone National Park > data(faithful) > head(faithful) eruptions waiting the length of the waiting period until the next one (in mins) the duration of the geyser eruptions (in mins) > dim(faithful) [1] > model <- lm(data=faithful, eruptions ~ waiting) What do we want to model here? i.e. What is input and output?

32 Simple linear regression: example > summary(model) Call: lm(formula = eruptions ~ waiting, data = faithful) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** waiting <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 270 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1162 on 1 and 270 DF, p-value: < 2.2e-16 The fitted model is: eruptions = x waiting

33 Simple linear regression: example in R The fitted model is: eruptions = x waiting What is the eruption time if waiting was 70?

34 Simple linear regression: example in R The fitted model is: eruptions = x waiting What is the eruption time if waiting was 70? > * [1] > coef(model)[[1]] + coef(model)[[2]]*70 Calculating predictions for new set: > test_set = data.frame(waiting=c(70,80,100)) > predict(model, newdata=test_set)

35 Machine learning secret sauce Train Data Test

36

37 Simple linear regression: example in R train_idx <- sample(nrow(faithful), 172) train <- faithful[train_idx,] test <- faithful[-train_idx,] model <- lm(data=train, eruptions ~ waiting) test$predictions <- predict(newdata=test, model) MSE <- (1/nrow(test))*sum((test$eruptions - test$predictions)^2) > MSE [1] ggplot(train, aes(x=waiting, y=eruptions)) + geom_point() + geom_smooth(method='lm') + theme_bw() ggplot(test, aes(x=eruptions, y=predictions)) + geom_point(color='red') + theme_bw() + geom_abline(intercept = 0, slope = 1)

38 Multiple linear regression all the same, but instead of one feature, x is a k-dimensional vector the model is the linear combination of all features: via the matrix representation:

39 Assumptions - the relationship between x and y is linear y - y distributed normally at each value of x x - no heteroscedasticity (variance is systematically changing) - independence and normality of errors - lack of multicollinearity (non-correlated features)

40 Multivariate linear regression model_1 <- lm(data=train[,-1], AmountPerCust_2 ~ AmountPerCust_1 + TransPerCustomer_1 + AmountPerTr_1 + gender + age + discount_proposed + clicks_in_eshop)

41 Multivariate linear regression Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** AmountPerCust_ TransPerCustomer_ AmountPerTr_ gender * age *** discount_proposed < 2e-16 *** clicks_in_eshop < 2e-16 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 322 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 7 and 322 DF, p-value: < 2.2e-16 Interpret

42 NOTE Linear Regression: Multiple Regression: Multivariate Linear Regression: multiple correlated dependent variables are predicted, rather than a single scalar variable.

43 Demo time!