Modelling changes in buyer purchasing behaviour

Size: px
Start display at page:

Download "Modelling changes in buyer purchasing behaviour"

Transcription

1 Submission for the Award of Doctor of Philosophy Modelling changes in buyer purchasing behaviour Giang Trinh University of South Australia Ehrenberg-Bass Institute for Marketing Science Supervisors: Dr. Cam Rungie Professor Malcolm Wright Dr. Carl Driesener Associate Professor John Dawes January 2013

2 Abstract The negative binomial distribution (NBD) has been widely used in marketing for modelling purchase frequency counts, particularly in packaged goods contexts. A key managerial use of this model is Conditional Trend Analysis (CTA) - a method of benchmarking future sales based on past performance utilising the NBD conditional expectation. CTA allows brand managers to identify whether the increased sales in a second period are accounted for by previous non-buyers, light buyers or heavy buyers of a brand. By comparing the actual sales during the growth period with the expected sales predicted by the conditional expectation of the NBD model under the stationary condition (i.e. the no change condition), the manager is able to identify the sources of growth. Although it is a useful tool, the conditional prediction of the NBD suffers from a bias - it under predicts what the period-one non-buyer class will do in period two; and over predicts the sales contribution of existing buyers. This bias can be a serious problem as it overstates the importance of attracting new buyers and understates the importance of retaining existing buyers. Further, the NBD s assumption of a gamma-distributed mean purchase rate lacks theoretical support - it is not possible to explain why a gamma distribution should hold. This thesis therefore proposes an alternative model that uses a lognormal distribution in place of the gamma distribution, hence creating a Poisson lognormal distribution (PLN) for benchmarking future purchases. The PLN has a stronger theoretical grounding than the NBD model as it has a natural interpretation that relies on the central limit theorem. Empirical analysis of brands in multiple categories shows that the PLN model gives better predictions than the NBD model. ii

3 The second important contribution of this thesis is the development of a new approach to diagnose the source of brand sales increase or decline. Thus far, the application of conditional trend analysis only identifies which groups of buyers (non; light; or heavy) cause sales changes. This does not imply that every buyer in the group changes their purchasing behaviour. Conditional trend analysis does not identify the mechanism of change: how many buyers make a small change (e.g. buying one or two purchases more or less in the second period compared to the first period), how many buyers make a large change, and how many buyers make no changes. Knowing the mechanism of change will help to identify the source of brand growth or decline, say whether a sales increase from this year compared to last year is contributed by a large group of buyers who increased their purchase propensity a little bit or by a small group of buyers who actually changed a lot. This is an important topic with obvious managerial applications. For example, if sales increases mainly come from a small group of buyers, then segmentation and targeting play a potentially important role in brand growth. On the other hand, if sales increases come from a large group of buyers, then increasing reach becomes crucial, and so consequently mass marketing is a better strategy. This thesis proposes a method to identify the mechanism of change. The method is based on the distribution of changes in buyer purchasing behaviour in two sequential time periods. The distribution of changes describes the stochastic nature of changes in purchase frequency: some buyers purchase the brand one, two, three x units more, some buyers purchase less, and some others purchase the same amount from one time period to another. Thus, the method identifies those consumers who change their buying behaviour from those who do not change; and those who make small changes from those who make large changes. Importantly, the method creates theoretically expected benchmarks of the extent of changes in purchasing behaviour. Comparison between the observed data and the predicted distribution of changes allows brand managers to determine if a sales increase or decline is iii

4 due to a small shift in purchase propensity of a large group of buyers or a big change in purchase propensity of a small group of buyers. This thesis derives the distribution of changes for the negative binomial distribution and the Poisson lognormal distribution. Practical examples of the method are also presented. iv

5 Acknowledgements I owe my gratitude to many people who have made this thesis possible. Professionally, I would like to thank my supervisors, Cam Rungie, Malcolm Wright, Carl Driesener and John Dawes for their guidance and support during my candidature. It has been a deep learning experience for me and I have benefited from their knowledge and research experience. I consider myself to be very fortunate to have four wonderful supervisors who not only create a research environment that allows me to discover my potential, but also push me beyond my limits. I would also like to thank Gerald Goodhardt, Byron Sharp and Simone Mueller for their invaluable comments and suggestions on early drafts of this thesis. Personally, I would like to thank my wife, Maria, and my parents, for their love, understanding and encouragement. I would also like to thank my colleagues at the Ehrenberg-Bass Institute for their friendship and help throughout this research. I would like to acknowledge the University of South Australia for awarding me with their Australia Postgraduate Awards. v

6 Declaration I declare that this thesis presents work carried out by myself and does not incorporate without acknowledgement any material previously submitted for a degree or diploma in any university; and to the best of my knowledge it does not contain any materials previously published or written by another person except where due reference is made in the text. Giang Trinh vi

7 Table of Contents ABSTRACT...II ACKNOWLEDGEMENTS...V TABLE OF CONTENTS...VII LIST OF TABLES... IX LIST OF FIGURES...X CHAPTER 1 INTRODUCTION TOPIC OF RESEARCH Conditional predictions How changes are distributed amongst buyers CONTRIBUTIONS OF THE THESIS Predicting future purchases with the Poisson lognormal model Distribution of changes in buyer purchasing behaviour Managerial implications THESIS ORGANISATION...8 CHAPTER 2 LITERATURE REVIEW NEGATIVE BINOMIAL MODEL CONDITIONAL TREND ANALYSIS COMPETING MODELS POISSON LOGNORMAL MODEL...22 CHAPTER 3 MATHEMATICAL EXPRESSIONS, PARAMETER ESTIMATIONS AND DATA MATHEMATICAL EXPRESSIONS Negative binomial distribution Poisson lognormal distribution PARAMETER ESTIMATIONS DATA...32 CHAPTER 4 METHOD OF PREDICTION AND MEASURING THE ACCURACY OF THE PLN VERSUS THE NBD METHOD OF PREDICTION MEASURING THE ACCURACY OF PREDICTIONS...35 CHAPTER 5 RESULTS MODEL FIT CONDITIONAL PREDICTION PURCHASES OF THE ZERO CLASS CATEGORY BUYING...41 CHAPTER 6 MODELLING HOW BRANDS GROW OR DECLINE BACKGROUND GROWING BRANDS DECLINING BRANDS CONCLUSION...51 CHAPTER 7 DISTRIBUTION OF CHANGES IN BUYER PURCHASING BEHAVIOUR INTRODUCTION DISTRIBUTION OF CHANGES IN BUYER PURCHASING BEHAVIOUR Distribution of changes with purchase rate distributed gamma Distribution of changes with purchase rate distributed lognormal...59 vii

8 7.3 EMPIRICAL ANALYSIS EXAMPLES OF GROWING BRANDS CONCLUSION...66 CHAPTER 8 CONCLUSION AND FUTURE RESEARCH THESIS SUMMARY MANAGERIAL IMPLICATIONS FUTURE RESEARCH Combining the PLN with the Dirichlet model Bivariate PLN model Applying the PLN in customer base analysis and media planning...71 viii

9 List of Tables TABLE 1. THE FIT OF THE NBD TO ACTUAL DATA...11 TABLE 2. CONDITIONS UNDER WHICH THE NBD HAS BEEN FOUND TO HOLD...11 TABLE 3. AN EXAMPLE OF CTA ANALYSIS A SEASONAL BRAND IN TWO PERIODS...13 TABLE 4. SUMMARY OF STUDIES ON CONDITIONAL EXPECTATION...15 TABLE 5. PLN AND NBD FIT TO TWENTY BRANDS IN FOUR PRODUCT CATEGORIES...37 TABLE 6. PLN AND NBD PREDICTIONS OF TWENTY BRANDS IN FOUR PRODUCT CATEGORIES...39 TABLE 7. RATIOS OF DIFFERENCE BETWEEN THEORETICAL AND ACTUAL ZERO CLASS S PURCHASES IN PERIOD TWO...41 TABLE 8. PLN AND NBD RESULTS TO 4 PRODUCT CATEGORIES...42 TABLE 9. PREDICTED AND ACTUAL DISTRIBUTIONS OF CHANGES - TIMOTEI (SHAMPOO BRAND)...64 TABLE 10. PREDICTED AND ACTUAL DISTRIBUTIONS OF CHANGES - BAILEYS (SPIRITS BRAND)...65 ix

10 List of Figures FIGURE 1. TYPICAL BIAS IN 52 WEEK ANALYSIS - DOVE (TOILET SOAP)...14 FIGURE 2. CONDITIONAL PREDICTIONS OF THE PLN AND NBD MODELS - DOVE (TOILET SOAP)...40 FIGURE 3. CONTRIBUTIONS TO SALES INCREASE, USING NBD AND PLN BENCHMARKS - TIMOTEI, SHAMPOO...48 FIGURE 4. CONTRIBUTIONS TO SALES INCREASE, USING NBD AND PLN BENCHMARKS - BAILEYS, SPIRIT...49 FIGURE 5. CONTRIBUTIONS TO SALES DECLINE, USING NBD AND PLN BENCHMARKS -DOVE, SHAMPOO...50 FIGURE 6. CONTRIBUTIONS TO SALES DECLINE, USING NBD AND PLN BENCHMARKS - SUNSILK, SHAMPOO...51 FIGURE 7. DISTRIBUTION OF CHANGES: K=1, A= FIGURE 8. DISTRIBUTION OF CHANGES: K=1.8, A= FIGURE 9. DISTRIBUTION OF CHANGES: K=0.25, A= FIGURE 10. DISTRIBUTION OF CHANGES: µ = 1, σ = FIGURE 11. DISTRIBUTION OF CHANGES: µ =1.4, σ = FIGURE 12. DISTRIBUTION OF CHANGES: µ = 1.7, σ = FIGURE 13. DISTRIBUTIONS OF CHANGES COLGATE...63 FIGURE 14. DISTRIBUTIONS OF CHANGES AQUAFRESH...63 FIGURE 15. DISTRIBUTIONS OF CHANGES SENSODYNE...63 x

11 Chapter 1 Introduction 1.1 Topic of research Conditional predictions Goodhardt and Ehrenberg (1967) introduced to marketing science a method to benchmark future sales based on past performance, which they termed conditional trend analysis (CTA). This method is based on the conditional expectation of the negative binomial distribution (NBD), which is applied to the modelling of consumer behaviour in two sequential time periods under stationary conditions. The conditional expectation is the expected mean of purchases in period two made by the buyers who bought x purchase in period one, E[X 2 X 1 = x]. CTA is regarded as a very important construct, with significant managerial implications (Morrison and Schmittlein, 1988). The conditional expectation is crucial for brand managers who wish to identify whether a change in the overall brand sales level are accounted for by previous non-buyers, light buyers or heavy buyers (Schmittlein et al., 1985). Suppose a brand manager has been reported an increase in sales this year compared to last year. The manager would perhaps ask herself the following question: Where do our increased sales come from? Is it because there are more new buyers or because the existing buyers have purchased more? The reason she needs to know the answer is that it will help her plan future marketing activity and importantly, assess past marketing activity. For example, if she finds that the source of sales growth comes mainly from new buyers, she would carry out marketing activity with a focus on customer acquisition. On the other hand, 1

12 if the brand sales growth comes mainly from the existing buyers, she would concentrate on loyalty marketing activity to retain the current customers. Conditional trend analysis can be used to answer these questions. By comparing the actual sales during the growth period with the expected sales predicted by the conditional expectation of the NBD model under the stationary condition (i.e. the no change condition), the manager is able to identify the sources of growth. Although it is a very useful tool, the NBD conditional expectation is often biased. A typical bias is that the NBD conditional expectation under-predicts what the period one nonbuyer class will do in period two. Similarly, it over predicts the sales contributions of existing buyers (Lenk et al., 1993; Morrison and Schmittlein, 1988; Morrison and Schmittlein, 1981). Consequently, this bias affects sales fluctuation analysis in the following ways: For growing brands, the observed contribution of the non-buyers is greater, whereas that of the existing buyers is smaller than expected. For declining brands, the observed contribution of the non-buyers is smaller, whereas that of the existing buyers is greater than expected. This bias can be a serious problem as it overstates the importance of attracting new buyers and understates the importance of retaining existing buyers (Lenk et al., 1993). In explaining the poor predictions of the NBD model, the Poisson assumption of the model has been most questioned (e.g. Chatfield and Goodhardt, 1973; Schmittlein and Morrison, 1983; Schmittlein et al., 1987). However, empirical tests against the assumption that individuals inter-purchase times follow an exponential distribution show that the Poisson assumption is robust (Chatfield and Goodhardt, 1973; Dunn et al., 1983). 2

13 Consequently, Chatfield and Goodhardt (1973, p. 834) concluded In those cases where the NBD does not give a good fit (essentially for large variances), it is therefore likely to be mainly due to a failure in the gamma assumption. Ehrenberg (1988, p. 63) also noted that the gamma distribution is not a particularly precise assumption, it is not possible to adduce any strong reasons why a gamma distribution should hold. In light of this criticism, this thesis proposes an alternative to the gamma distribution for the mean purchase rate, namely the lognormal distribution. This approach gives the Poisson lognormal distribution (PLN) for modelling purchase frequency counts and predicting future purchases based on past performance. This PLN model has not been fitted to purchase frequency counts previously. However, it has been shown that the model gives a better fit for count data generally, when compared to the negative binomial model (e.g. Connolly et al., 2009; Miranda-Moreno et al., 2005; Sohn, 1994; Tsionas, 2010; Winkelmann, 2008). Not only does the PLN model appeal in fitting to empirical data, theoretically, the lognormal distribution has a natural interpretation (Cassie, 1962; Winkelmann, 2008). It is quite natural to assume that the individual consumer s average purchase rate is determined through the interaction of multiple unobserved factors, which can be both positive or negative such as advertising, promotion and words of mouth. If there are many independent unobserved factors that affect the average purchase rate of a given consumer, the multiplicative process may converge them to a lognormal distribution relying on the central limit theorem (Aitchison and Brown, 1969; Johnson et al., 1994; Winkelmann, 2008). In light of the empirical results and the theoretical advantages of the PLN model, it has been suggested that the previous neglect of the Poisson lognormal model in the literature should be reconsidered in future applied work (Winkelmann, 2008, p.134). 3

14 Consequently, this thesis will validate the PLN model and compare this model to the well-known NBD model for analysing consumer purchasing count data and predicting future purchases How changes are distributed amongst buyers Marketers spend substantial sums of money to change consumer behaviour towards their brands, for example, by increasing consumers propensity to buy the brand. Much research has been done to explain how different marketing stimuli (e.g. sales promotion, advertising, distribution breadth, and product innovation) lead to such changes, (Ataman et al. 2010; Ataman et al., 2008; Dekimpe and Hanssens, 1995; 1999; Dekimpe et al., 1999; Pauwels et al., 2004; Srinivasan et al., 2010; Sriram et al., 2007; Steenkamp et al., 2005). However, there has been little research on the form that such changes take. That is, how are increases in sales distributed amongst brand buyers? Theoretically, as previous research has not identified the form that sales changes take, it has not comprehensively explained the relationship between marketing stimuli and changes in purchasing behaviour. Several authors have proposed models for benchmarking changes in consumer purchasing behaviour. The two best-known models are the negative binomial distribution (NBD) (Ehrenberg, 1959; Goodhardt and Ehrenberg, 1967; Morrison and Schmittlein, 1988; Lenk et al., 1993) and the Pareto/NBD (Schmittlein et al. 1987; Schmittlein and Peterson, 1994; Fader et al., 2005; Fader et al., 2007; Abe, 2009; Jerath et al., 2011a; Jerath et al., 2011b). Both models have been used with considerable success in benchmarking changes in buyer behaviour. The NBD model is used for brands in the fast moving consumer goods (FMCG) context to identify whether an overall sales change is accounted for by the previous non buyers, light buyers or heavy buyers of a brand, whereas, the Pareto/NBD model is used at the organizational context to identify inactive customers (those who change from active to 4

15 inactive purchasing). However, a critical question of analysing changes in consumer behaviour has not been addressed: when there is a sales trend, what is the distribution of changes are the changes due to nudging or radical conversion? This is an important question with obvious managerial implications. Suppose a brand manager would like to increase brand sales. Essentially, the two options are to direct marketing efforts towards making a small change in purchase propensity among a large group of buyers, or alternatively towards making a big change in the purchase propensity of a small group of buyers. The question of which option to take, and how sales increases will then be manifested in the distribution of purchases will inform the correct choice of marketing strategy. For example, if a sales increase comes from a small group of buyers, then segmentation and targeting could potentially play an important role in brand growth. On the other hand, if a sales increase comes from a large group of buyers, then increasing reach is crucial and consequently, mass marketing is a better approach. However, typically brand managers cannot answer this question due to the stochastic nature of actual purchasing behaviour. For example in a two year period, when comparing year two to year one, some buyers would have increased their brand purchasing by one unit and some other buyers would have increased by five units, and some no longer buy, even if the overall brand sales are unchanged. In order to find out which group is causing a given sales increase or decline, a benchmark that shows stationary behaviour is needed. Unfortunately, there has not been any method available to benchmark changes in buyer purchasing behaviour at this disaggregate level. This thesis therefore proposes a method that answers this question. The method is based on the distribution of changes that is generated from a compound Poisson distribution for the analysis of consumer behaviour in two sequential time periods, where the compound Poisson distribution for the second period is assumed to have the same parameters as that of 5

16 the distribution for the first period (stationary condition). The distribution of changes provides a benchmark to evaluate sales increase or decline in the second period compared to the first period. The deviation between the actual changes in buyer behaviour and the benchmark of the distribution of changes will indicate where sales increase or decline comes from. 1.2 Contributions of the thesis Predicting future purchases with the Poisson lognormal model This thesis proposes an alternative to the NBD model to benchmark future purchases in the fast moving consumer goods context. The PLN model has a stronger theoretical grounding than the NBD model based on the central limit theorem. Empirical analysis across multiple brands and categories show that the PLN model fits the observed purchase frequency data as well as the NBD model at the brand level and better at the category level. In terms of future purchase prediction, the PLN model outperforms the NBD model at both brand and category levels Distribution of changes in buyer purchasing behaviour This thesis also develops a new method to benchmark changes in consumer purchasing behaviour. It derives the distribution of changes for both the NBD and the PLN model and demonstrates that both models predict changes very well. 6

17 The distribution of changes with mean purchase rate distributed gamma is: f (z) = x= 0 if z 0 x= z if z<0 Γ(2x + z + k) x!(x + z)!γ(k) ( 1+ 2a) a (2x +z+k ) (2x +z) The distribution of changes with mean purchase rate distributed lognormal is: f (z) = x= 0 if z 0 x= z if z<0 0 2x +z 1 2λ+ λ 1 x!(x + z)! σ 2π e ( log λ µ ) 2 2σ 2 dλ Each model has its own advantages and disadvantages. The NBD model has an advantage of a closed form, but lacks theoretical support, whereas the PLN model is well theoretically grounded, but has no closed form. This thesis also discovers that in stationary conditions, changes in buyer purchasing behaviour in period by period analysis is symmetrically distributed with mean 0. This means many buyers make no change or a small change and a few buyers make a big change. The thesis also finds that the distribution of changes of bigger brands has longer tails (i.e. more buyers making a bigger change) than that of smaller brands. This suggests that big brand customers vary more than small brand customers Managerial implications In terms of managerial implications, this thesis provides a better method for brand managers who wish to quantify the relative contribution of new, light and heavy buyers. For example, if a brand manager has put more budget into in-store promotion this year and the brand sales increase, she can identify whether this strategy attract more new buyers, light buyers or heavy buyers of the brand by using the PLN conditional expectation benchmarks. 7

18 This thesis also provides a new tool for brand managers to check if a sales increase is mainly the result of a large group of buyers who slightly increase their purchase propensity or by a small group of buyers who increase their purchase propensity a lot, by using the distribution of changes benchmark. The thesis demonstrates practical examples of how to use the new method to diagnose the sources of brand growth. 1.3 Thesis organisation Chapter 1 - Introduction Outline the topic of research. Summarise the contributions of the thesis. Chapter 2 Literature review Review the literature on predicting future purchases using stochastic models. Review the theory of the PLN model. Review the application of the PLN model. Chapter 3 Estimation method and data Detail the estimation method used in the thesis. Specify the data used in the thesis. Chapter 4 Method of prediction Detail the prediction method used in the thesis. Chapter 5 Results Detail findings of fitting the NBD and PLN models. 8

19 Detail findings of using the NBD and PLN models to predict future purchases. Chapter 6 Modelling how brands grow and decline Provide examples of analysis of brand growth or decline. Discuss the similarities and differences between the two models in analysis brand. growth and decline. Chapter 7 Distribution of changes in buyers purchasing behaviour Derive the distribution of changes for the NBD and PLN models. Fit the two models to empirical data. Provide examples of how to use the distribution of change to diagnose the sources of brand growth. Chapter 8 Conclusion and future research Summarise the thesis s contributions. Suggest future research directions. 9

20 Chapter 2 Literature review 2.1 Negative Binomial Model The modelling of purchase frequencies and prediction of future purchases have been of interest to marketing scholars for more than half a century. The best-known model of purchase frequency is the negative binomial distribution (NBD) model. The NBD model was initially used by Greenwood and Yule (1920) to model accidents and was first applied to marketing science by Ehrenberg (1959). Ehrenberg made two assumptions: Purchases of a given consumer in successive time periods follow a Poisson distribution. This implies that the variance of purchases within individual consumers is as if random over time (i.e. Poisson process). The mean rates of purchasing of different consumers in the long run differ and their distribution is a gamma distribution. In other words, the variance of mean purchases across different consumers is measured by a gamma distribution. Following these assumptions, the frequency of consumers making 0, 1, 2, 3, x purchases in a given time period can be modelled by the negative binomial distribution. Table 1 shows the earliest published example of the fit of the model to purchasing data. As we can see from the table, the NBD model fits the actual data very well. The theoretical values closely match the actual data. 10

21 Table 1. The fit of the NBD to actual data Number of purchases Actual Theoretical Source: Ehrenberg (1959) Since the original article by Ehrenberg (1959), the NBD model has been shown to work well in numerous situations including different brands in different categories, different time periods, and different countries (see Table 2). Recently, the NBD has been applied to different types of behaviour such as gambling behaviour (Mizerski et al., 2004; Lam and Mizerski, 2009) and consumption behaviour of mobile phone service (Lee et al., 2011). Table 2. Conditions under which the NBD has been found to hold For a variety of product-fields: Breakfast Cereals, Butter, Canned Vegetables, Cat and Dog Foods, Cocoa, Coffee, Confectionery, Convenience Foods, Cooking Fats, Detergents, Disinfectants, Flour, Food Drinks, Household Soaps, Household Cleaners, Instant Potatoes, Jams and Marmalade, Margarine, Motor Oil, Petrol, Polishes, Processed Cheese, Refrigerated Dough, Sausages, Shampoos, Soft Drinks, Soup, Take-home Beer, Toilet Paper, Toilet Soap The leading brands in each product-field Large, medium and small pack-sizes and the brand as a whole Great Britain, Continental Europe, U.S.A Various Demographic Subgroups Analysis Periods ranging from 1 week to 12 months Source: Ehrenberg (1988) 11

22 Although it has been shown that the NBD model fits the observed data quite well in a wide range of conditions, prior literature has noted that the NBD does not give a good fit for the tail of the distribution, especially for purchase frequencies of heavily brought products (Ehrenberg, 1959; Chatfield et al., 1966). It also gives a poor fit if there are outliers such as excessively heavy buyers (Chatfield et al., 1966; Ehrenberg, 1988). 2.2 Conditional Trend Analysis One of the most useful properties of the negative binomial distribution is its conditional expectation. In the case of modelling purchase frequency, this is the expected mean of purchases in period two made by the buyers who bought x purchases in period one, E[X 2 X 1 = x]. Based on conditional expectation, Goodhardt and Ehrenberg (1967) introduced a method to benchmark future sales change based on past performance, which they called conditional trend analysis (CTA). It is regarded as a very important method with significant managerial implications (Morrison and Schmittlein, 1988). CTA is crucial for brand managers who wish to identify whether a change in overall brand sales level is accounted for by previous non-buyers, light buyers or heavy buyers (Schmittlein et al. 1985). The following example of this method is derived from Goodhardt and Ehrenberg (1967). Suppose that the shoppers of a brand are divided into five different purchase frequency groups in the first period (shoppers who bought 0, 1, 2, 3, and 4+ purchases). This subdivision enables us to examine buying behaviour of different groups of shoppers (e.g. bought 0 = non-buyers; bought 1, 2 = light buyers; bought 3, 4+ = heavy buyers), which is relevant to marketing in practice. In this example the brand is seasonal, where period one is a normal period and period two is the seasonality of increased brand sales 12

23 Table 3. An example of CTA analysis A seasonal brand in two periods Number of purchases Total Total buyers in period Total purchases in period 1 (off-peak) Total purchases in period 2 (peak) NBD predicted total purchases in period Deviations Source: Goodhardt and Ehrenberg (1967) In the first period, these groups of buyers contribute 0, 53, 48, 42, and 200 purchases, respectively, with a total of 343 purchases (row 2 of Table 3). Using conditional expectation, the NBD predicts that under stationary conditions, these groups of buyers would have contributed 53, 48, 41, 35, and 166 purchases in the second period, respectively, with total purchases in period two being the same as that in period one (343 purchases). The NBD conditional expectation therefore provides a benchmark to evaluate any sales change. By comparing the actual purchases to the NBD predicted purchases in period two, we are able to determine which group of buyers is causing the increased sales. In the example there were 140 more purchases in period two than period one (483 compared to 343 purchases). By looking at the deviations of the actual purchases in period two from the benchmarks of the NBD predictions, the sales increase is caused mainly by the previous non-buyers of the brand (132 purchases more than the NBD expected sales), whereas the heavy buyers (bought 3 and 4+) actually bought slightly less than expected and hence, they were not the source of the increased sales. Conditional trend analysis is an important method with obvious managerial applications. This method makes it possible to determine which marketing activities have undue effects on non; light; and heavy buyers. Suppose a brand manager has recently run a promotion and there is an increase in sales. Is this an effective campaign? Does the sales 13

24 increase come from previous non buyers of the brand or from previous heavy buyers who purchase more for stocking up? It is obvious that the first scenario is good news, whereas the second one might be bad news as the sales increase has merely taken future sales (Morrison and Schmittlein, 1988). Due to its ability to determine the source of sales gains or losses, conditional trend analysis is regarded as one of the most managerially useful constructs in the stochastic modelling of brand choice (Schmittlein et al. 1985). Although it is a very useful tool, the NBD conditional expectation is often biased. A typical bias is that the NBD conditional expectation under-predicts what the period one nonbuyer class will do in period two. Similarly, it over predicts the sales contributions of existing buyers (Lenk et al. 1993; Morrison and Schmittlein 1988; Morrison and Schmittlein 1981). Applying the NBD model to the panel data provided by Kantar (see Chapter 3, section 3.3 for a full description of the data), we also find the same bias. The bias is shown in the following graph. Figure 1. Typical bias in 52 week analysis - Dove (Toilet soap) ways. Consequently, this bias affects long-term sales fluctuation analysis in the following 14

25 For growing brands, the observed contribution of the non-buyers is greater, whereas that of the existing buyers is smaller than expected For declining brands, the observed contribution of the non-buyers is smaller, whereas that of the existing buyers is greater than expected From a marketing perspective, this bias can be a serious problem as it overstates the importance of attracting new buyers and understates the importance or retaining existing buyers (Lenk et al., 1993) 2.3 Competing models Since Goodhardt and Ehrenberg (1967) introduced the CTA, many marketing scholars have recognized its important consequences and embraced further study. Table 4 summarises the studies on conditional expectation. Table 4. Summary of studies on conditional expectation Studies Rationales Contribution Morrison (1969) Ehrenberg (1970) The simple conditional expectation- NBD model is biased if the proportion of non-buyers is large. This is because there are hard core non-buyers (buyers who never purchase the product), who are not appropriate for the gamma distribution Hard core non-buyers are less likely to be a factor that cause the model bias Generalised the simple NBD model to include hard-core non-buyers in the model. Shown two examples demonstrating that using the estimate procedure proposed by Morrison (1969) would lead to misleading explanation of the discrepancies between actual data and model predictions (e.g. the model estimation of proportion of hard core non-buyers is 30%, while the actual proportion is 1%) 15

26 Paull (1978) The simple conditional expectation - NBD model has no explicit provision for accommodating variations in size of purchase on each purchase occasion. Generalised the simple model by integrating it with a purchase quantity variable. Schmittlein and Morrison (1983) Schmittlein et al. (1985) Morrison and Schmittlein (1988) Schmittlein et al. (1987) Lenk et al. (1993) Fader et al. (2005) The condensed NBD is more appropriate for consumer purchasing behaviour than the simple NBD as it assumes interpurchase times had Erlang 2 distributions (Chatfield and Goodhardt, 1973). It allows the interpurchase times to be more regular than the exponential distribution implied by the Poisson purchasing behaviour. The beta binomial (BB)/NBD is shown to be reasonable for modelling purchasing for a particular brand Is it worth the effort to generalise the NBD model for customer purchases? The Poisson distribution only accounts for active customers. Death or drop out customers are not Poisson. They follow the Pareto distribution (Johnson and Kotz, 1970). Seasonal effects and marketing activity such as sales promotion can contribute to non-stationarity that biases the conditional expectation- NBD model for short term predictions. It is difficult to implement the Pareto/NBD model because of computational challenges associated with parameter estimation. Derived the conditional expectation of the condensed NBD and found that the NBD predictions are slightly superior than the condensed NBD predictions. Derived conditional expectation of the BB/NBD model and found that the discrepancy between NBD and BB/NBD models is not likely to be managerially significant. Reviewed the literature on NBD and conditional expectation-nbd and concluded that it is worth the effort to generalise the conditional expectation - NBD as it is a very important construct. But there are some unresolved issues in applying the conditional expectation - NBD model such as units bought vs. purchase incidence; fitting the NBD to a stable baseline period. Developed a method for identifying those customers who are still active. Derived the conditional expectation of the Pareto/NBD model. Extended the conditional expectation- NBD model to include non-stationary effects. Developed a new model, the betageometric/nbd (BG/NBD), which is easier to implement in Excel. Found that the BG/NBD and the Pareto/NBD models give very similar results. Yet, the Pareto/NBD give slightly better conditional expectations 16

27 Batislam et al. (2007) Abe (2009) A customer can drop out immediately after the first purchase Marketing has seen some shift from an aggregate to a disaggregate focus. Testing the independence assumption of purchase rate and dropout rate in the Pareto/NBD model than the BG/NBD Modified the BG/NBD model to the MBG/NBD model which allow customers drop out at time zero Found that the MBG/NBD model and the Pareto/NBD model give almost identical conditional expectations, and both models slightly over predict the conditional expectations of heavypurchase customers. Extended the Pareto/NBD model using hierarchical Bayesian (HB) framework to focus on customised marketing. Confirmed that the independence assumption holds (the correlations of purchase rate and dropout rate are close to zero) Found that the HB model gives better conditional expectations than the Pareto/NBD model for the retail data set, whereas the Pareto/NBD model predicts better for the CDNOW dataset The majority of the development in this subject has concerned modifications of the NBD model. Morrison (1969) extended the NBD model to include hard core non-buyers (buyers who never buy the product). Morrison s rationale was that hard core non-buyers might cause model bias as these buyers are not appropriate for the gamma distribution. In the generalized model, Morrison assumes that 1. A proportion of consumers are hard core non-buyers. 2. The remaining proportion of consumers are active and purchase according to the Poisson process with the mean purchase rate following the gamma distribution. Using data of consumption changes of foil wraps, the author estimated that hard core non-buyers are fairly constant in different time period. They account for 47% of the total shoppers in a six month period and 45% in a one year period. 17

28 In reply, Ehrenberg (1970) demonstrated that using the estimation procedure proposed by Morrison (1969) would lead to a misleading explanation of the discrepancies between actual data and model predictions (e.g. the model estimation of proportion of hard core non-buyers is 30%, while the actual proportion is 1%). Therefore, Ehrenberg (1970) concluded that hard core non-buyers are less likely to be a factor to cause model bias. Chatfield and Goodhardt (1973) also questioned the Poisson assumption of the model. They argued that if the Poisson assumption is true for any time period, then interpurchase times should follow the exponential distribution. Thus the mode of inter-purchase times should be zero. Yet, in practice, it is not likely that a buyer purchases again immediately, let alone this being representative of the typical buyer. Rather, there is a dead period (e.g. a week or more) between one purchase and another. The authors then proposed an alternative, the Erlang distribution, to model inter-purchase times. This distribution was suggested by Herniter (1971) for modelling inter-purchase times; however, he only considered a special case in which the mean purchase rate is exponentially distributed and this rarely happens in practice (Chatfield and Goodhardt, 1973). Based on the Erlang 2 assumption, Chatfield and Goodhardt (1973) derived the distribution of an individual consumer s purchases in a given period, which they called the condensed Poisson distribution. The integration of the Erlang 2 with the gamma distribution across the whole population gives the condensed NBD model. In response to Chatfield and Goodhardt (1973), Schmittlein and Morrison (1983) derived the conditional expectation of the condensed NBD model to predict future purchases. They applied the model to empirical data and found that the conditional expectation of the standard NBD is slightly superior to the conditional expectation of the condensed NBD. 18

29 Schmittlein et al. (1985) proposed another approach to predict future brand purchases, conditional on the category purchases. They derived the conditional expectation of the beta binomial (BB)/NBD model. The beta binomial distribution has been shown to be reasonable for modelling purchasing for a particular brand when the product category purchases follow the NBD. It is a special case of the Dirichlet/NBD model proposed by Goodhartd et al. (1984). The BB/NBD has 5 assumptions. 1. Each consumer purchases the product category in a Poisson manner with a latent mean purchase rate. 2. The latent rate is distributed gamma across the population of consumers. 3. Given a purchase, each consumer has a probability of buying the brand. 4. The probability of buying the brand is distributed beta across the population of consumers. 5. The category mean purchase rate is independent of the brand probability The advantage of the BB/NBD model is that it allows the conditional expectation to be non-linear. Yet the results suggest that the difference between the NBD and the BB/NBD models is not likely to be significant to managers (Schmittlein et al., 1985). Another well-known generalised model of the original NBD model is the Pareto- NBD model. Schmittlein et al. (1987) argued that the Poisson distribution only accounts for active customers. Death or drop out customers are not Poisson. They follow the Pareto distribution (Johnson and Kotz, 1970). The authors make five assumptions about the model. 1. While active, each customer makes purchases according to a Poisson process with rate gamma. 2. The purchasing rate for different customers is distributed according to a gamma distribution across the population of customers. 19

30 3. Each customer remains alive for a lifetime which has an exponentially distributed duration with a death rate. 4. The customers death rates are distributed according to a different gamma distribution across customers. As a consequence of (3) and (4), drop out customers follow the Pareto distribution. 5. The purchasing rate gamma and the death rate are distributed independently of each other. The Pareto/NBD is highly regarded for customer base analysis in the marketing literature. Recently, many researchers have extended this model in different areas (e.g. Abe, 2009; Batislam et al., 2007; Bemmaor and Glady, 2011; Fader et al., 2005; Fader et al., 2007; Jerath et al., 2011a; Jerath et al., 2011b; Ma and Buschken, 2010; Reinartz and Kumar, 2003; Schmittlein and Peterson, 1994). For example, Fader et al. (2005) developed a new model, the beta-geometric/nbd (BG/NBD), which is easier to implement than the Pareto/NBD model. Batislam et al. (2007) modified the BG/NBD model to the MBG/NBD model which allowed customers drop out at time zero (immediately after the first purchase). Abe (2009) extended the Pareto/NBD model using a hierarchical Bayesian (HB) framework to focus on customised marketing. Bemmaor and Glady (2011) proposed to replace the Pareto distribution with a gamma mixing of Gompertz distributions (G/G), which allows for the probability density function to be skewed to the right or to the left; and its mode can be at zero or shift away from zero. A non-zero mode might occur when the organisation offers strong differentiation and has a strong reputation, such as high-end hotels and up-scale catalog retailers (Bemmaor and Glady, 2011). The Pareto/NBD model is proposed for organisations that have information on initial purchases and former customers who are no longer active. Some examples are catalogue 20

31 mailing lists, church directories, dentist and beauty salons files, department store charge card records, and triers of a new grocery product (Schmittlein et al., 1987). At the brand level, the Pareto/NBD has some potential for monitoring sales of a newly introduced brand, but is not recommended for established brands in markets such as FMCGs (Morrison and Schmittlein, 1988). There are several reasons for this. First, the Pareto/NBD assumes the market is nonstationary (allowing dropout consumers) - or a very special case of nonstationarity - as opposed to the mature brand situation (Morrison and Schmittlein, 1988). The dropout assumption seems reasonable for new brands, since consumers might try a new brand, but some of those who try the brand will not buy it again in the future. Second, for a mature brand, it is difficult to identify when a consumer made the initial purchase of the brand. For example, it is difficult for Colgate to identify when a given consumer made the initial purchase. It is also difficult to identify if a given consumer is permanently inactive unless the consumer is literally dead. A consumer might have not bought Colgate for months or years but there is still a probability that the consumer will buy Colgate again in the future. In summary, in explaining the poor predictions of the NBD model, the Poisson assumption of the model has been most questioned by previous studies (e.g. Chatfield and Goodhardt, 1973; Schmittlein and Morrison, 1983; Schmittlein et al., 1987). However, some modified models that base on relaxing the Poisson distribution have not shown a better fit than the NBD model for modelling brands in the FMCG context as discussed above. In regard to the typical bias, if we assume individual purchases are more regular than the Poisson process and use an Erlang 2 distribution (Chatfield and Goodhardt 1973; Schmittlein and Morrison 1983), this only makes the bias worse, as existing buyers purchases in period two are even more over predicted. A second possibility is that we assume, instead, that there are permanently inactive consumers who follow the Pareto distribution (Schmittlein et al. 1987), or that there are hard-core non buyers (Morrison 1969). Unfortunately, this also 21

32 results in the period one non-buyer class s purchases in period two being even more under predicted (Morrison and Schmittlein 1988). In addition, empirical tests against the assumption that individuals inter-purchase times follow an exponential distribution show that the Poisson assumption is robust (Chatfield and Goodhardt 1973; Dunn et al. 1983). Consequently, Chatfield and Goodhardt (1973, p. 834) concluded In those cases where the NBD does not give a good fit (essentially for large variances), it is therefore likely to be mainly due to a failure in the gamma assumption. Ehrenberg (1988, p. 63) also noted that the gamma distribution is not a particularly precise assumption, it is not possible to adduce any strong reasons why a gamma distribution should hold. Similarly, Brockett et al. (1996, p.96) suggested when the NBD model fails, the most lucrative avenue for deriving a model that does fit involves relaxing some assumption other than the Poisson individual purchase assumption. In light of this criticism, this thesis proposes to replace the gamma distribution in the NBD model with the lognormal distribution. In this modified model, it is assumed that individual consumers make purchases according to the Poisson process with mean purchase rate follows the lognormal distribution. There are other distributions that can also replace the gamma distribution, such as the Pareto, inverse Gaussian, Weibull and Lomax distribution. However, this thesis focuses on the lognormal distribution as it has an attractive theoretical interpretation as well as some appeal in fitting empirical data. The next section in this chapter reviews the theory and applications of the lognormal distribution in the literature and proposes a theoretical interpretation to justify the lognormal distribution of purchase rates. 2.4 Poisson Lognormal Model The lognormal distribution has been used in many disciplines such as geology, economics, telecommunication, biochemistry, demography, health, and risk analysis (e.g. Aitchison and Brown 1969; Cassie 1962; Crow and Shimizu 1988; El-Basyouny and Sayed 2009; Fahidy 22

33 2005; Johnson et al. 1994;). However, surprisingly few applications have been demonstrated in marketing. The earliest application of the lognormal distribution in marketing science seems to be reported in Lawrence (1980), where it was used to model purchase frequency rates. However, Lawrence s model is not a complete model of purchase frequency as no individual distribution is chosen (no mixing distribution). Consequently, it suffers from several shortcomings. Firstly, Lawrence s model has not addressed the question of withinconsumer variability, as it assumes that an individual consumer always purchases at the average purchase rate (Morrison 1981). In addition, the lognormal model used by Lawrence (1980) is a continuous distribution, whereas purchase frequency counts are integers and should be described by a discrete distribution. Finally, that approach also creates a problem of estimating the non-buyers as the log of zero is negative infinity. With conditional trend analysis, it is crucial to estimate what the non-buyers in period one will do in period two. More recently, Abe (2009) proposed the multivariate lognormal distribution to model the relationship between purchase rates and dropout rates in customer base analysis. The author compared the Pareto/NBD model with a multivariate lognormal based model and found that the modified model performed as well as the Pareto/NBD model at the individual level prediction. Rungie and Laurent (2010) compared the multivariate lognormal model with the Dirichlet multinomial model of brand choice. The authors showed that the multivariate lognormal model gives a better fit than the Dirichlet multinomial model if the resample size is large (e.g. 10,000 or larger resample size). Yet, there is no study that has applied the univariate mixture Poisson lognormal distribution to model purchase frequency counts and predict future purchases. This is an important area that has not been examined. Jerath et al. (2011a) point out that many companies face difficulties in accessing individual level data and even when they can get access to this data, the data format is often unfamiliar, or there is possible data loss. These problems potentially create barriers to implement 23

34 individual level data models. As such, aggregate count models are necessary for such situations. The Poisson lognormal model (PLN) developed in this study overcomes the disadvantages of Lawrence s model by allowing variance within a given individual consumer. Also, by combining the lognormal distribution with the Poisson distribution, the continuous lognormal distribution is converted to a discrete distribution, which more appropriate when modelling purchase frequency counts. Finally, estimation of the nonbuyers class is not a problem, as the model includes the zero counts. The proposed PLN model has not been fitted to purchase frequency counts previously, althought it has been shown that the model gives a better fit for count data compared to the negative binomial model (e.g. Connolly et al. 2009; Tsionas 2010; Winkelmann 2008). It is evident that the lognormal distribution tails are heavier than that of the gamma distribution (Sohn 1994; Kaas and Heseelager 1995; Miranda-Moreno et al. 2005). Previous research has shown that in data that contain outliers, the PLN model gives a better fit than the NBD model (Connolly et al. 2009; Sohn 1994; Miranda-Moreno et al. 2005). Thus the PLN model may be more suitable for purchase frequency of heavily brought brands, or if there are outliers such as excessively heavy buyers, where the NBD shows a lack of fit (Ehrenberg 1959; Chatfield et al. 1966; Ehrenberg 1988). Not only does the PLN model appeal in fitting empirical data, but the lognormal distribution has an attractive theoretical interpretation (Cassie 1962; Winkelmann 2008). It is quite reasonable to assume that the individual consumer s mean purchase rate is determined through the interaction of multiple unobserved factors, which can be both positive or negative such as advertising, promotion, word of mouth and other consumer specific factors. If there are many independent unobserved factors that affect the mean purchase rate of a 24

35 given consumer, the multiplicative process may converge them to a lognormal distribution relying on the central limit theorem (Aitchison and Brown 1969; Johnson et al. 1994; Winkelmann 2008). The theory of the lognormal distribution of mean purchase rate can be described as following. Suppose x t is the mean purchase rate of an individual buyer at time t, and e t is a serials of random variables, which independently identical distributed, e t is also independent of x t Then x t x t 1 = e t x t 1 Or x t = (e t +1)x t 1 Starting with any mean purchase rate at time 0, x 0, we have x t = x 0 (1+ e 1 )(1+ e 2 )...(1+ e t ) Suppose the effect at each step to be small, then log(1+ e) = e Taking logs we obtain log x t = log x 0 + e 1 + e e t Relying on the central limit theorem, log x t is normally distributed and hence x t is lognormal distributed (Aitchison and Brown, 1969). In light of the empirical results and the theoretical advantages of the PLN model, Winkelmann (2008, p.134) suggests, the previous neglect of the Poisson lognormal model in the literature should be reconsidered in future applied work 25

36 Consequently, this thesis will fit the PLN model for modelling purchase frequency and predicting future purchases of multiples brands and product categories and compare this model with the well-known NBD model. 26

37 Chapter 3 Mathematical expressions, parameter estimations and data 3.1 Mathematical expressions Negative binomial distribution The NBD model was used to model purchase counts with two assumptions: (a) Purchases x of a given consumer in successive time periods follow a Poisson distribution with parameter λ f (x) poisson = exp ( λ )λx x! (1) with mean E[x] = λ (b) The mean rates of purchasing λ of different consumers in the long run differ and their distribution is a gamma distribution. exp( λ /a) f (λ;k,a) = λ k 1 a k Γ(k) (2) respectively where k and a are the shape and scale parameters of the gamma distribution, with mean 27

38 E[λ] = ka and variance var[λ] = ka 2 Combining (2) and (1), the probability density function of x purchases is f (x) NBD = 0 f (x) poi sson f (λ;k,a)dλ This distribution has a closed form f (x) NBD = (1 + a) k Γ(x + k) x!γ(k) a 1 + a x (3) Anscombe (1950) with mean E[x] = ka and variance var[x] = ka(1+ a) Poisson lognormal distribution Take λ to be a lognormal distribution, f (λ;µ,σ) = 1 λσ 2π exp logλ µ 2σ 2 ( ) 2 where µ is the mean and σ is the standard deviation of the normal distribution Y wherey = ln(λ) (Crow and Shimizu, 1988). (4) 28

39 The mean of the lognormal distribution is E[λ] = exp(µ + σ 2 /2) and the variance is var[λ] = exp(µ + σ 2 /2){ exp(σ 2 ) 1} Combining (9) and (1), the probability density function of the PLN model is f (x) PLN = f (x) poisson f (λ;µ,σ)dλ 0 1 = λ x 1 exp( λ)exp logλ µ x!σ 2π 2σ 2 0 ( )2 (5) dλ (Bulmer, 1974) The mean of the PLN distribution is E[x] = exp(µ + σ 2 /2) and the variance is var[x] = exp(µ + σ 2 /2){ 1+ (exp(µ + σ 2 /2)(exp(σ 2 ) 1) } 3.2 Parameter estimations With the NBD model, this thesis uses the maximum likelihood estimation method to estimate the parameters k and a of the model. With the PLN model, probability P(x=0,1,2,3 ) no longer has a closed form. Yet, the parameters µ and σ can be easily estimated based on numerical approximation. Following the simulation method proposed by Train (2009), this thesis estimates the parameters of the PLN by drawing from a density, calculating the Poisson lognormal 29

40 probability for each draw, and averaging the results. With drawing from a density, this thesis here uses Halton draws (Halton, 1960). Halton draws are generated from a sequence of a prime. Train (2009) describes the Halton sequence with an example of the prime 3. Below are the steps to create the sequence: 1. Divide the unit interval into three parts with breaks at 1/3 and 2/3. These breakpoints are the first terms in the sequence. 2. Divide each of the three segments into thirds and add the breakpoints for these segments to the sequence. The sequence becomes 1/3, 2/3, 1/9, 4/9, 7/9, 2/9, 5/9, 8/9. The lower breakpoints in all three segments (1/9, 4/9, 7/9) should be entered before the higher breakpoints (2/9, 5/9, 8/9). 3. Divide each of the nine segments into thirds and add the breakpoints for these segments to the sequence. The sequence becomes 1/3, 2/3, 1/9, 4/9, 7/9, 2/9, 5/9, 8/9, 1/27, 10/27, 19/27, 4/27, 13/27, 22/27, 7/27, 16/27, 25/27, 2/27, 11/27, 20/27, 5/27, 14/27, 23/27, 8/27, 17/27, 26/ Continue this process for as many points as necessary. We can create Halton sequences for other prime numbers in a similar way. The sequence for prime k at t+1 iteration can be created as below (Train, 2009). s t +1 = {s t,s t +1/k t,s t + 2 /k t,...,s t + (k 1) /k t } (6) Train (2009) points out that Halton draws have several advantages compared to random draws. Firstly, they provide better coverage than random draws. As a result, they tend to be self-correcting over observations. Halton draws also allow negative correlation over observations and hence, reduce error in the simulated log-likelihood function. These advantages make Halton draws more effective than a random draw. Indeed, Halton draws can be considered as well-placed draws from a standard uniform density (Train, 2009). 30

41 Previous research has shown that a small number of Halton draws (e.g. 100) provide more precise results than a large number of random draws (e.g. 1000) (Bhat, 2001; Hensher, 2001; Munizaga Alvarez-Daziano, 2001; Spanier and Maize, 1991; Train, 2000; Train, 2009). Using Halton draws, below are steps taken to estimate the PLN model parameters. The estimation is performed in Excel. 1. Take 1000 draws from a Halton sequence for prime Transform each Halton draw to a draw from a normal density with the specified mean µ and variance σ. 3. Exponentiate each draw from the normal density to obtain a draw from a lognormal density. 4. For each draw from the lognormal density, calculate the P(x=0,1,2,3 ) based on the specified µ and σ, using the Poisson probability function. 5. Average these probabilities for all draws. This results in the initial theoretical probabilities of the PLN model. 6. Calculate the actual frequencies from the observed data set. 7. Use maximum likelihood procedure to optimise the initial theoretical probabilities over the actual frequencies. This procedure yields the new estimated values of µ and σ. For both models, the likelihood function is L = n x * log( p x ) 31

42 To measure the fit of the models, this thesis uses the log-likelihood ratios, the higher the log-likelihood, the better the fit. 3.3 Data The data used to compare the fit and accuracy of the PLN vs. NBD models is household consumer data for a 104-week period from the Kantar (previously TNS) Superpanel database. The panel consists of 16,998 telephone-owning households across the UK. The panel is drawn from only full-time residents. The sample is demographically and regionally balanced in order to represent the UK population. Data is collected from panel participants twice a week via electronic terminals in the home, with purchases being recorded via homescanning technology (TNS 2008). As such it represents a very large, valid and reliable data source. This thesis uses the first 52 weeks for parameters estimations and the last 52 weeks for a test period. This thesis analyses the top five brands in each of the following four FMCG categories: shampoo, deodorant, toilet soap, and bleach. This approach gives a total of 20 comparisons of the PLN and NBD models 32

43 Chapter 4 Method of prediction and measuring the accuracy of the PLN versus the NBD 4.1 Method of prediction Goodhardt and Ehrenberg (1967) derived the conditional expectation of the NBD. From their derivation, the conditional expectation is E[X 2 X 1 = x] = a(k + x) 1+ a (7) Lemma Yet, the conditional expectation of the NBD can be simplified as E[X 2 X 1 = x] = (x +1) f (x +1) f (x) (8) Proof If x has a negative binomial distribution, following (7), the probability density function of (x +1) is 33

44 Γ(k + x + 1) a f (x + 1) = (1+ a) k (x + 1)!Γ(k) 1+ a x +1 (k + x)γ(k + x) a a = (1+ a) k (x + 1)x!Γ(k) 1+ a 1 + a a = k + x Γ(k + x) a (1 + a) k 1 + a x + 1 x!γ(k) 1+ a a = k + x f (x) 1 + a x + 1 x x (9) Hence, the following equation holds: a (k + x) = 1+ a (x +1) f (x +1) f (x) (10) Substituting (10) into (7) gives E[X 2 X 1 = x] = (x +1) f (x +1) f (x) QED It should be noted that the NBD is not the only count model to have this property. In fact, all compound Poisson models (Poisson mixed with gamma, Weibull, lognormal etc.) have this property (Robbins, 1977). This property results in a very simple estimation of the total number of purchases in period two made by the buyers who brought x purchases in period one. Since the number of buyers who brought x purchases in period one is B(x) = f (x)n where N is the sample size 34

45 Then, the total number of purchases in period two made by the buyers who brought x purchases in period one is S[X 2 X 1 = x] = E[X 2 X 1 = x]b(x) = E[X 2 X 1 = x] f (x)n = (x +1) f (x +1)N = (x +1)B(x +1) (11) That is, the number of purchases in period two made by the buyers who bought x purchases in period one is the same as the number of purchases made by the buyers who bought (x +1) purchases in period one. Consequently, this thesis uses equation 8 for predicting future purchases of brands and product categories and compares the conditional predictions of the PLN model and the well-known NBD model. 4.2 Measuring the accuracy of predictions To measure the accuracy of the predictions of the PLN and the NBD models, this theis uses Theil s U coefficient of inequality. The Theil s U coefficient is easy to understand and interpret. It ranges from 0 to 1. The smaller the U, the better the prediction. The U coefficient is calculated as below. U = n ( ) 2 A x P x x= 0 n n 2 2 A x + P x x= 0 x= 0 where A x and P x are actual and predicted values, respectively. 35

46 The use of Theil s U coefficient here is appropriate for this analysis as it captures the prediction of the full distribution, especially the prediction from the tail of the distributions. It is particularly useful for studies on modelling long tailed distribution (e.g. Wu and Chen, 2000a; 2000b; Fader and Hardie, 2002). Other measures such as MAPE or MAE require one to group the tail of the distribution, and different censor points might give different results. 36

47 Chapter 5 Results 5.1 Model fit Table 5 reports the log likelihood results of the NBD and the PLN models in fitting purchase frequency data for twenty brands in four categories. As we can see, the log-likelihood ratios are very close between the two models, which suggests that the PLN and the NBD models are very competitive. The PLN model outperforms the NBD models in ten cases, whereas the NBD gives a better fit in the other ten. Table 5. PLN and NBD fit to twenty brands in four product categories. Brands Log likelihood Brands Log likelihood NBD PLN NBD PLN Toilet soap Bleach Dove Domestos Imperial leather Parozone Palmolive Toilet Duck Cussons Harpic Tesco Bloo Shampoo Deodorant Alberto Sure Head & Shoulders Lynx Pantene Sanex Soft&Gentle Herbal Essences Dove LOreal Rightguard Note. Bold figures indicate better fits. These close results are not unexpected, as the NBD model has long been reported as effective in terms of modelling purchase frequency counts even though its assumptions may not strictly hold true (Ehrenberg 1959; Ehrenberg 1988; Fader and 37

48 Hardie 2002; Morrison and Schmittlein 1988; Schmittlein et al. 1985). However, Morrison and Schmittlein (1988) postulated that deviations from the NBD model show up much more in the conditional expectations than in the purchase frequency distribution. Consequently, these authors suggested that conditional expectation should be used to test the NBD model even if the observed distribution looks very similar to the NBD. Indeed, Schmittlein and Morrison (1983, p.453) stated if conditional trend analysis is of primary interest, the conditional expectations are the nature quantities for model comparison. This thesis therefore examines the conditional predictions of the NBD and the PLN models in the following section. 5.2 Conditional prediction Table 6 shows the accuracy of future purchase predictions of the NBD and the PLN models. As assessed by U, the PLN model predicts future purchases better than the NBD model for all brands with an exception of Palmolive (toilet soap brand). On average, the U for the PLN predictions is while for the NBD predictions is

49 Table 6. PLN and NBD predictions of twenty brands in four product categories Brands Theil s U Brands Theil s U NBD PLN NBD PLN Toilet soap Bleach Dove Domestos Imperial leather Parozone Palmolive Toilet Duck Cussons Harpic Tesco Bloo Average Average Shampoo Deodorant Alberto Sure Head & Shoulders Lynx Pantene Sanex Soft&Gentle Herbal Essences Dove LOreal Rightguard Average Average The graph below shows an example of predictions of the PLN model compared to the NBD model. As we can see, the PLN model predicts future purchases better than the NBD model in all buyer classes. The sales contribution of period one non-buyers in period two as predicted by the PLN model closely match the actual data, whereas the NBD prediction of this class deviates significantly from the actual data. Again, we see that the PLN predicts very well the sales contributions of other buyer classes in period two, while the NBD predictions are biased, especially for buyers who bought 1-6 purchases in period one. 39

50 Figure 2. Conditional predictions of the PLN and NBD models - Dove (Toilet soap) 5.3 Purchases of the zero class Previous literature has noted that The NBD tends to under predict test period purchases by the zero class, the group of customers who bought nothing in the base period. This under prediction can be a serious problem as it leads to an overstatement of one of the key goals of marketing effort-attracting previous non buyers to the brand (Lenk et al., 1993, p.289). Therefore this thesis compares the zero class s purchases in period two as predicted by the NBD and the PLN models, with the actual purchases. We calculate the ratio of difference between the theoretical predictions and the observed purchases of the zero class. A ratio closer to 1 indicates a better prediction. As we can see from Table 7, the PLN model consistently outperforms the NBD model in predicting the purchases by the zero class in period two. 40

51 Table 7. Ratios of difference between theoretical and actual zero class s purchases in period two NBD PLN NBD PLN Toilet soap Shampoo Dove Alberto Imperial leather Head & Shoulders Palmolive Pantene Cussons Herbal Essences Tesco LOreal Average Average Deodorant Bleach Sure Domestos Lynx Parozone Sanex Soft&Gentle Toilet Duck Dove Harpic Rightguard Bloo Average Average Note. Closer to 1 indicates better fit. 5.4 Category buying Prior studies on modelling brand buying behaviour have emphasized the need to understand purchase frequency distributions of the entire product category (e.g. Ehrenberg et al. 2004; Sichel 1982). If the compound Poisson distribution gives an adequate fit to individual brand purchases, then it should do the same for the product category. We therefore present the fit of the PLN and the NBD models to actual purchase count data at the category level, using all category data. Table 8 shows the results of the NBD and the PLN models when fitted to purchase frequency data and hence predicting future purchases in four product categories. As we can see, across all 41

52 the categories, the PLN model fits the observed data better than the NBD model. This good fit is shown by the log likelihood values. This superior performance of the PLN model could be explained by the reason that at the category level, the tail of the distribution is usually heavier, which therefore fits the PLN model better than the NBD model. In terms of prediction, the PLN model also out performs the NBD model in all categories as assessed by Theil s U. Table 8. PLN and NBD results to 4 product categories Products Shampoo Toilet Soap Bleach Deodorants Models NBD PLN NBD PLN NBD PLN NBD PLN Log likelihood (period 1 fit) Theil s U (period 2 prediction)

53 Chapter 6 Modelling how brands grow or decline 6.1 Background This chapter demonstrates how to use the NBD and PLN models to diagnose whether new buyers, existing light buyers, or existing heavy buyers contribute most to brand growth or decline. Despite a considerable amount of research has devoted to understanding how brands grow, the primary source of brand growth has not been fully examined previously. The following briefly reviews literature on this topic. The topic of how brands grow has been discussed in the marketing literature for decades. This research can be classified into two main areas, growth strategy for new brands; and how existing brands grow in mature markets. It is rather obvious that for new brands, growth must be through acquisition of new buyers, although there might be arguments about the likely characteristics of those new buyers (e.g. whether new buyers are category light or heavy buyers). However, for existing brands in mature product categories such as fast moving consumer goods, the mechanism of brand growth has long been debated in the marketing literature. The central question of the debate is whether the majority of growth comes from new buyers who have not used the brand previously, or existing buyers who purchase more or in greater quantities. This leads to two approaches that have been applied in marketing management: customer acquisition and customer loyalty. 43

54 In terms of marketing metrics, customer acquisition is often measured by penetration (the proportion of buyers who buy the brand in a period). If the brand acquires more new buyers, its penetration will increase. On the other hand, customer loyalty is often measured by average purchase frequency (average number of purchases per buyer in a given period) or share of category requirement (SCR - the percentage of purchases in the product category that brand s buyers devote to the brand). These are measures of behavioural loyalty. If the brand has a higher average purchase frequency or higher SCR than its competitors, it has more customer loyalty. A number of authors (e.g. Fader, 2011; Hallberg, 1995; Reichheld, 1996; Schultz and Walters, 1997; Reichheld, 2003) have argued that focusing on loyalty will lead to brand growth. For example, Hallberg (1995) argued that the most likely path to growth is retaining high-profit buyers and continuing to grow their loyalty and profitability rather than accelerating an already healthy flow of new buyers. This is because loyal customers account for a large proportion of sales and profit. Similarly, Reichheld (1996) concluded that by focusing on loyalty, the best customers are retained in the companies business, building repeat sales and referrals. These loyal customers will cause a number of economic effects including revenues and market share growth. Fader (2011) also suggested that not all customers are equal and therefore companies should focus on the most profitable customers rather than trying to serve everybody. This, as a result, will improve marketing effectiveness and lead to growth as these customers hold the key to companies long term profitability. On the other hand, there is a considerable amount of literature indicating that in order to grow, a brand must acquire new customers (e.g. Anschuetz, 2002; Baldinger et al., 2002; Dawes, 2009; Graham, 2009; Riebe et al., Forthcoming; Riebe, 2003; Sharp, 2010). For instance, Anschuetz (2002) examined both the buying 44

55 distributions of competing brands that differ in market share in the same period, and growing brands in a period of two years. He found that both cases yield similar results: as a brand grows, the number of households increases more than the frequency of buying. This is in line with the Double Jeopardy Law (Ehrenberg et al., 1990; Sharp, 2010), which states that brands with less market share have far fewer buyers, and these buyers are only slightly less loyal. Similar results are also reported in Baldinger et al. (2002), who studied brands that change their market shares in a five-year period. By correlating the share change with change in penetration and with change in loyalty (SCR), the authors found that penetration is the key to share growth over time. However, they also found that loyalty strongly leverages the effects of penetration. This means that, both penetration and loyalty are related to growth in market share. Using a different method, Riebe et al. (Forthcoming) and Riebe (2003) examined the level of customer acquisition and defection of growing and declining brands over nine years. They found that growing brands show particularly higher acquisition than expected whereas declining brands have poor acquisition but their defections were normal. Again, this supports the view that acquisition is more important than loyalty when trying to grow a brand. However, the studies mentioned above only tell us that there are high correlations between brand share growth and penetration or acquisition. Measuring penetration and loyalty does not enable researchers to identify the primary sources of additional sales. For example, if a study finds that the source of brand sales growth is loyalty (e.g. higher average purchase frequency), this does not necessarily mean that the current buyers of the brand buy more than expected, as there might be a case 45

56 where any newly acquired buyers buy the brand with much higher frequency than expected. Alternatively, if the source of growth is penetration (more buyers), it does not necessarily mean that there are more new buyers than expected, as there might be a case where a brand has a higher retention rate than expected, and only a small number of new buyers. In addition, previous studies have not compared the additional sales contributed by new buyers to existing buyers. It is possible that while penetration increases more than purchase frequency, existing buyers still contribute to additional sales more than new buyers. Furthermore, loyalty metrics such as SCR and average purchase frequency do not allow us to differentiate between light and heavy buyers. It might be the case that overall loyalty does not change while there is an increase in loyalty among light buyers and a decrease in loyalty among heavy buyers, or that new buyers are heavy buyers. As such, the true source of growth is still not known if we only examine penetration or loyalty in isolation or even if we examine both in aggregate. In summary, the prior literature has demonstrated controversy over the sources of brand growth. Nevertheless, the main source of additional sales has not yet been identified. Do new buyers, existing light buyers, or existing heavy buyers contribute most to brand growth? In order to identify the specific source of growth, we need benchmarks to compare any sales increases across different groups of buyers. Without these benchmarks, it is difficult to identify the true source of growth. This chapter demonstrates how to use the NBD and PLN benchmarks to fully identify the sources of brand growth or decline. The NBD model is a well known model, but it could be biased when applied to this type of analysis. On the other hand, 46

57 the PLN model has been shown to give a better prediction than the NBD model and thus, it might be more suitable to be used here. The next section shows the results for both growing and declining brands. The sales importance of different buyer classes (non, light and heavy) is estimated by comparing the actual purchases with the expected purchases (benchmarks) predicted by the PLN and the NBD model. The deviations between the actual and expected purchases will show where the sales increase or decline has comes from. 47

58 6.2 Growing brands Figure 3 shows an example of a growing shampoo brand. As we can see, both models identify that the sales increase come mainly from the non-buyers (a large deviation between the actual purchases and the predicted purchases by the PLN and NBD models). The difference between the two models is that the PLN model indicates the light buyers contribute more than the heavy buyers, whereas the NBD mode suggests the heavy buyers contribute slightly more than the light buyers. Figure 3. Contributions to sales increase, using NBD and PLN benchmarks - Timotei, Shampoo Note: Light buyers: buyers purchase less than the average brand purchase frequency in year one. Heavy buyers: buyers purchase more than the average brand purchase frequency in year one. Figure 4 shows another example of a growing brand in a different category (spirits). Similar to the previous example, both models show that the non-buyer class contributes to sales increase more than the other buyer classes. The difference between the two models is that the PLN model indicates that the light buyers 48

59 contribute more than the heavy buyers whereas the NBD model suggests the exact opposite. Figure 4. Contributions to sales increase, using NBD and PLN benchmarks - Baileys, Spirit In both cases, the PLN predictions seem smoother than the NBD predictions. The PLN indicates sales importance declining from non-buyers to light buyers then to heavy buyers; while the NBD predicts sales importance reducing from non-buyers to light buyers but increasing by heavy buyers. 49

60 6.3 Declining brands Figure 5 shows a declining brand. As we can see, the two models give different results. The PLN model indicates that the sales decline comes from all buyer classes almost equally while the NBD model shows that the sales decline comes mainly from the light buyers of the brand Figure 5. Contributions to sales decline, using NBD and PLN benchmarks -Dove, Shampoo Figure 6 shows another example of a declining brand. Again, the PLN model indicates that the sales decline comes from all buyer classes almost equally while the NBD model shows that the sales decline comes mainly from the light buyers of the brand 50