Computer Simulated Shopping Experiments for Analyzing Dynamic Purchasing Patterns: Validation and Guidelines

Size: px
Start display at page:

Download "Computer Simulated Shopping Experiments for Analyzing Dynamic Purchasing Patterns: Validation and Guidelines"

Transcription

1 Computer Simulated Sopping Experiments for Analyzing Dynamic Purcasing Patterns: Validation and Guidelines Katia Campo 1, Els Gijsbrects 2, and Fabienne Guerra 3 1 Senior Manager Modeling & Analytics Accuris Group Belgium, and UFSIA, Plantin & Moretuslei 220 b10, 2000 Antwerp, Belgium, Katia.campo@accuris.com 2 Professor, UFSIA, Prinsstraat 13, 200 Antwerp, and CREER, FUCAM, Caussée de Bince 171, 7000 Mons, Belgium, Els.gijsbrects@ufsia.ac.be 3 Professor, CREER, FUCAM, Caussée de Bince 171, 7000 Mons, Belgium, Guerra@message.fucam.ac.be Acknowledgments Tis researc was possible tanks to a researc grant made available by te Belgian Science Policy Department to te researc center CREER (FUCAM) (PAI n 03-53), and a special researc grant from te University of Antwerp, UFSIA. Te first autor, Katia Campo, wises to tank te Fund for Scientific Researc of Flanders (Belgium) for allowing to participate in tis researc as Postdoctoral Fellow of te Fund for Scientific Researc. Te autors tank Dr. Byron Sarp and two anonymous referees for teir useful suggestions and comments. Abstract Computer simulated sopping experiments open up new opportunities for marketing researc, allowing researcers to collect purcase data in a tigtly controlled yet realistic environment, at relatively low cost and wit a ig degree of flexibility. Wile a number of autors ave used computer simulations to generate purcase data, researc on te validity of tese outcomes is scarce. Te present paper aims to improve insigts into te validity of computer simulation experiments, by providing a replication and extension of te work by Burke et al. (1992). Experimental purcase data are generated and compared wit real life scanner data to provide additional evidence for major biases examined by Burke et al., and to assess te effect of repetitive coice tasks and incomplete product assortments on te realism of purcase quantity and coice data. Te results sow tat, overall, te experimental data correspond quite closely to actual purcase data. Consumers exibit realistic purcase variation, and provide meaningful data trougout te experimental periods. Yet, some biases are found in category purcase quantities - wic are overestimated - and in purcase sares of generic products - wic are underestimated. Wile promotion quantity effects appear to be sligtly overestimated, more researc is needed on tis issue. Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 22

2 Introduction Computer simulated experiments ave gained increasing attention from marketing researcers interested in studying dynamic buying beavior. Recent years ave witnessed te development of software programs enabling consumers to engage in 'virtual sopping', tat is, to repeatedly purcase items from a computer screen reproducing store selves (te 'virtual store') under igly controlled conditions. Besides simulating te consumers' sopping beavior, tese software packages typically allow researcers to collect additional information from consumers troug computerized questionnaires. Computer simulated sopping programs ave been developed in various degrees of sopistication, and are continuously evolving (Burke et al. 1992, Burke 1995, Burke 1996, Coen and Gadd 1996). According to Burke (1996), tey offer a viable alternative to oter marketing researc tools currently employed: "...most marketing researc tecniques are, by and large, sadly outmoded. Te tools most marketers employ are too expensive, vulnerable to observation and manipulation by competitors, contrived and unrealistic, or simply incapable of providing te information managers really need...virtual sopping simulation can elp managers make tactical decisions in areas suc as new products and promotions, packaging and mercandising. Ultimately, te tool promises to cange te way companies innovate and ow tey approac a variety of strategic issues tat range from entering new markets to responding to a competitor's attack". Computer simulated sopping experiments could offer 'te best of two worlds' for modeling consumer purcases over time. As far as oter laboratory tests are concerned, Computer Simulated Sopping Experiments (CSSE) offer more realism (compared to rudimentary paper and pencil tests) or more flexibility at a muc lower cost (compared to simulated test markets in laboratory stores). In comparison wit real life test markets or even controlled field experiments, CSSE offer more control over extraneous variables and more flexibility in manipulating marketing variables and treatments, witout exposure to competitive reactions or adverse consumer or distributor effects, and tis at iger speed and lower cost (Burke 1996, Coen and Gadd 1996, Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 23

3 Brucks 1988). Besides tese advantages, CSSE ave some inerent limitations. Being an articicial environment, CSSE cannot be expected to provide a completely realistic picture of marketplace penomena. Te artificial nature of te setting entails potential biases tat sould be carefully examined, not in order to remove tem altogeter, but to get a better grasp on teir magnitude, implications, and underlying reasons. Tese insigts serve two crucial purposes. First, by pinpointing areas were caution is needed, tey prevent researcers and managers from making unjustified generalizations. Second, tey foster evolvement in te development of CSSEs by indicating experimental features tat may diminis certain biases. Previous researc on te issue is relatively scarce. An interesting study by Burke et al. in 1992, provides a first evaluation of te validity of CSSE for analyzing dynamic purcasing beavior. In a small scale study, te autors examine potential biases in CSSE results at different stages of te buying process, and ypotesize links wit experimental features. A recent article by Coen and Gadd (1996) provides furter insigts into caracteristics tat enance te realism and te attractiveness of virtual sopping programs to respondents. Te present paper intends to sed furter ligt on te issues of CSSE validation and design. It builds upon te work by Burke et al., and examines some additional validation issues. Previous Researc and Purpose of te Paper Wile a number of autors ave used computer simulations to generate dynamic purcasing data, researc on te validity of tese outcomes is scarce. Burke et al. (1992) report on a first systematic evaluation of te quality and realism of computer simulated multi-period buying data. In teir study, data obtained from a sample of 16 respondents participating in a rudimentary and a more realistic laboratory sopping experiment are compared to te respondents actual Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 24

4 purcases under identical marketing conditions. Burke et al. (1992) find tat teir more sopisticated computer experiment provides significantly better results tan te rudimentary test. In a recent HBR article, Burke (1996) briefly refers to an additional validation study of dynamic virtual sopping data. In tis study, 300 consumers were invited to make 6 sopping trips in a simulated computer store. Tese purcase data were compared to UPC scanner information obtained from a concurrently running conventional test market. Simple comparisons of consumers' reactions to package size and price manipulations in te virtual store wit conventional test market data demonstrated consumer responses to be similar in te virtual and actual store environment. Also, simulated market sares for various brands of te tested product categories (cleaning products and ealt and beauty aids) closely matced scanner market sares. Te autor concludes tat "te performance of te virtual store was particularly impressive given te number of ways in wic te real and computer generated stores differed", yet empasizes te need for "future bencmarking of virtual sopping results against existing metodologies". Clearly, despite tese appealing overall results, one sould remain aware of te fact tat CSSE are subject to external validity treats resulting from artificial experimental conditions. Artificial conditions may refer to te experimental task, stimuli, settings, manipulations or measurements (Lync 1982). To te extent tat unrealistic features in one of tese areas interact wit te treatment variables (e.g. recorded coice, purcase quantity or promotion responses), te generalizability of te experimental results may be treatened. Table 1 provides an overview of potentially unrealistic CSSE features, and indicates ow tey may reduce external validity. First, te task of making fictitious purcases in a virtual store may be experienced as unrealistic for several reasons. Te purcases bear no real consequences and require less effort (respondents do not ave to wander troug te store to searc for and/or compare alternatives placed at distant positions on te self). Respondents may terefore be willing to engage in more risky purcases (e.g. trying an unfamiliar brand), and /or in more active evaluation processes. As indicated by Burke et al. (1992), te extent to wic te fictitious Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 25

5 nature of te task affects purcase quantity and coice decisions depends on te type of products and buying beavior. For frequently purcased consumer goods, consumers tend to develop simple coice euristics - based on cues suc as brand, price and promotions - tat can be easily reproduced in computer simulations (Burke et al. 1992, p.72). Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 26

6 Table 1: External validity: Realism of virtual sopping UNREALISTIC ASPECTS OF CSSE POTENTIAL VALIDITY THREATS Experimental task Fictitious purcases Exploratory coice beavior, iger PQ * Absence of pysical displacement More extensive alternative evaluation Pre-imposed sopping frequency Bias in PQ/sopping occasion Reduced number of product categories More extensive alternative evaluation Reduced interpurcase time Stronger purcase event feedback Absence of consumption experiences Stimuli In-store marketing stimuli - Products: presentation and assort. - Price - Promotions Oter factors of purcase decisions - Time constraint - Budget constraint - Space constraint - Inventory - Settings Pysical store environment Store atmospere People: sopping company Manipulations Promotion frequency and conditions Oter in-store stimuli Bias in coice beavior Bias in price sensitivity Increased promotion sensitivity More extensive alternative evaluation Higer PQ, reduced price sensitivity Higer PQ Bias in PQ Rational, socially acceptable beavior Purcases less influenced by oters Bias in promotion effects Bias in effect of oter stimuli Measures Coice Less private label and generics purcases Purcase quantity and timing Higer purcase frequency and/or PQ * PQ = purcase quantity Anoter task-related issue is tat in dynamic CSSE, data on subsequent sopping occasions are often collected in te course of a single experimental session. Suc concentrated data collection is typically needed for practical reasons, or to reduce internal validity treats suc as istory, maturation and mortality effects. Yet, tis time compression may furter reduce te realism of Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 27

7 te purcasing task, and following Burke et al. (1992) - strengten purcase event feedback. Anoter reason wy CSSE purcase simulation tasks may be experienced as unrealistic, is tat some sopping decisions ave to be pre-imposed (e.g. a fixed sopping frequency, fixed and severely limited number of categories to make purcase decisions on). In cases were preimposed decision aspects deviate from te respondents abitual buying beavior and affect oter (measured) purcase decisions, te external validity of te results may be treatened (e.g. coice of one of two complementary products tat are usually selected in combination). Second, as demonstrated by Burke et al. s (1992) comparison of te rudimentary and realistic laboratory results, te realism of in-store marketing stimuli and te way tey are presented to te consumer (e.g. real versus fictitious brands, verbal versus visual announcements of promotions) may be crucial to te external validity of CSSE purcase data. Similar observations were made by Burke (1996) and Coen and Gadd (1996), wo empasize te importance of store familiarity and availability of sufficient product and promotion information. Likewise, te extent to wic real-life constraints and oter factors influencing actual buying decisions can be reproduced in a realistic way, may affect biases in quantity and coice beavior. For instance, Burke et al. (1992) attribute purcase quantity biases to te absence of realistic time, budget and space constraints. Tird, te settings in wic CSSE data are collected may ave an impact on simulated purcase beavior. Laboratory environments may be unrealistic in terms of pysical features (e.g. a sterile environment may reinforce te impression of being observed), atmospere (e.g. lack of a noisy and stimulating store atmospere), or company (e.g. absence of family or friends wo usually accompany te respondent on is/er sopping trips). From teir pilot study of a 3-D modeled virtual store, Coen and Gadd (1996) conclude, for example, tat te stylized nature of teir virtual store environment affects consumers purcase beavior because te degree of sensory stimulation is muc lower tan tat in a real store. Fourt, manipulations of treatment variables sould be in line wit real life conditions in order to Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 28

8 obtain realistic estimates of teir effect. For instance, researc on sales promotions effects as sown tat te strengt of te promotion response depends on te intrinsic attractiveness of te promotion advantage, but also on te way it is announced and on te frequency wit wic promotions ave been offered in te past (see e.g. Blattberg and Neslin 1990). Hence, a lower/iger tan usual promotion frequency could lead to an over-/underestimation of promotion effects. Finally, measures and measurement procedures may influence experimental results. Knowing tat teir buying beavior is observed and recorded may encourage respondents to exibit more rational or socially acceptable beavior, suc as buying iger priced alternatives rater tan ceap low quality brands (see below). It may also create a tendency to comply wit te interviewer s expectations or objectives, e.g., te respondent may feel forced to purcase more or more frequently to accommodate te interviewer. Te researc objective of tis paper is twofold. First, we intend to gater additional evidence on te type and magnitude of biases in CSSE outcomes. Te study of Burke et al. (1992) is a pilot study, comprising an in dept analysis of a small number of ouseolds, for wic actual and simulated purcases are 'matced'. Tis paper opts for a between-subjects design (Carson et al. 1996), as suggested by Burke (1996). More specifically, we collect computer simulated sopping data from a large sample of ouseolds, and confront te results wit information from an equally large scanner panel data set. Te magnitude of our sample base allows us to sed furter ligt on te external validity of CSSE outcomes, including metod-specific ouseold selection biases. Second, our study also offers some additional insigts, by examining te impact of design variables not explicitly studied so far. To assess te impact of simulation lengt and te completeness of te on-screen assortment, we investigate te consistency in purcase beavior over subsequent simulation periods, as well as te coice beavior of consumers wo do not find teir preferred items on screen. Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 29

9 Researc Hypoteses As stated earlier, a first objective of our researc is to sed more ligt on te degree of realism of consumer purcase information obtained in CSSE. Burke et al. s (1992) validation study indicated tat purcase quantities, coice sares, and coice dynamics more closely reflect actual buying beavior in realistic tan in rudimentary computer experiments, but can still be biased. Building on Burke et al. s teoretical reasoning and experimental results, ypoteses 1 to 4 specify expected biases in tese 3 purcase beavior caracteristics. All ypoteses focus on buying beavior for frequently purcased, non-impulse products, tat are typically cosen on te basis of cues (Burke et al. 1992). Hypoteses 1 to 3 investigate te external validity of experimental sopping trips disregarding promotional effects, tat is, wen promotions are eiter absent or separately accounted for (by removing teir effect on te test variables). In turn, ypoteses are formulated on te purcase quantity, brand coice, and purcase dynamics (degree of purcase variation) witin a product category. Hypotesis 4 relates to consumers' differential reactions to in-store promotions in a CSSE setting, were te effect of promotions on purcase quantity as well as brand coice is considered. Our second objective is to examine te impact of two additional experimental features. Hypoteses 5 and 6 concentrate on ow te number of sopping trips, and te 'completeness' of te CSSE assortment, affect consumers' quantity and coice decisions. Below, we briefly present te ypoteses and associated comments. Hypotesis 1: In te presence of inventory cues, te number of units purcased by consumers in a CSSE is in line wit actual purcases. Comments: Previous researc suggests tat laboratory sopping may lead to above normal purcase quantities, especially in te presence of promotions. Following Malotra (1996) consumers tend to increase te beavior being measured, suc as food purcasing. As demonstrated by Burke et al. s (1992) researc findings, te bias in purcase quantity decisions Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 30

10 can be substantially reduced by making te coice context more realistic. Altoug promotional purcases are still somewat overestimated in teir realistic experiment, Burke et al. argue tat te bias could be furter reduced by imposing additional constraints tat restrict purcase quantities in real life, suc as time, space and budget constraints. In line wit tese guidelines, Coen and Gadd (1996) imposed a fictitious budget constraint based on te ouseold s normal spending in te product category, to reduce quantity biases. Feedback from respondents revealed, toug, tat te product-specific budget constraint was experienced as too restrictive and igly unrealistic. In oter words, buyers do not appear to restrict purcase outlay for specific product categories. In line wit te buying beavior modeling literature, we believe tat purcase quantity decisions of frequently purcased, low cost products are rater restricted by andling and olding cost considerations ( storage constraints ). Empirical results confirm tat consumers typically rely on inventories available at ome to decide upon te purcase quantity and timing of frequently purcased consumer goods (see e.g. Sivakumar and Raj 1997; and Kumar, Karande and Reinartz 1998). Inventory cues may terefore act as realistic purcase quantity constraint mecanisms. Hypotesis 2: In te context of CSSE, consumers tend to buy more national brands, and less private labels and generics, tan in reality. Comments: Tis ypotesis is in line wit previous researc on te presence of response bias in interviews, and te social desirability to purcase national brands, rater tan private labels or generics (Burke et al. 1992). In an experimental setting, consumers want to look good, or give te rigt answer (Malotra 1996). To te extent tat distributor brands are perceived to be ceaper and of lower quality, respondents may be reluctant to admit buying tese brands wen tey know tey are being observed. Over-reporting of National Brand purcases - as opposed to private label purcases - is a well known problem in survey researc, and as already been referred to by Wind and Lerner (1979) and Sudman (1964b). Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 31

11 Hypotesis 3: In te context of CSSE, consumers seek less variation in coice beavior tan in reality. Comments: Burke et al. (1992) ypotesize tat te time compression factor increases te impact of past purcases on current purcases. Tey also argue tat satiation is less likely to occur since respondents do not actually consume products (see e.g. McAlister and Pessemier 1982). Teir conclusion is tat time compression decreases switcing, and reduces te number of different brands bougt in te category, wic was confirmed by te empirical results. Hypotesis 4: Consumers react more strongly to sales promotion offers in CSSE tan in reality. Comments: Previous researc suggests tat consumers react more strongly to promotions in a laboratory setting because of iger promotion visibility (Gabor et al. 1970, Nevin 1974, Burke et al. 1992, Nagle and Holden 1995). Compared to te actual, large supermarket selves, products and promotions are conveniently arranged on a small computer screen, placed in front of te respondent. In te same way as products at eye level attract more attention tan products placed lower or iger on te self, CSSE promotions may be more clearly visible and may more easily attract respondents attention tan actual promotions (Corstjens and Corstjens 1995). CSSE promotions are terefore expected to exert a stronger influence on consumers buying decisions tan in te actual store environment. Burke et al. s study pointed to overly strong reactions to promotions in te rudimentary as well as te realistic experiment Hypotesis 5: Consumers do not systematically cange teir purcase beavior (quantity and coice decisions) over subsequent simulation periods of a CSSE. Comments: A crucial question in analyzing dynamic beavior in a laboratory setting concerns te number of successive sopping trips tat can be presented to te consumer. Using CSSE to study dynamic buying beavior is conditional upon consumers being able and willing to exibit realistic buying patterns over subsequent periods. In a typical interview setting, potential treats to validity of information collected over time are testing and maturation effects (Cook and Campbell 1979, Malotra 1996). Testing effects may occur if respondents first ave to become Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 32

12 accustomed wit te interviewing procedure (in te case of CSSE, te purcase simulation procedure). In suc cases, information on early purcases is unusable. Tis type of initialization effect can be circumvented, toug, by providing clear instructions and familiarizing respondents wit te procedure beforeand (Coen and Gad 1996). Maturation effects, leading to deterioration of information quality towards te end of te interview, can be a result of boredom, fatigue or a lack of willingness to spend more time (Malotra 1996, Carson et al. 1996). Like for oter interview types, te risk of maturation effects in CSSE can be substantially reduced by keeping interview lengt and tus te number of successive purcase occasions witin reasonable limits. Hypotesis 6: Te absence of preferred items on te computer screen does not affect te quantity purcased in te product category, but leads to a coice bias in favor of major national brands. Comments: Hypoteses 1 to 5 are expected to old for consumers wo find teir regularly purcased products on te computer screen, and are terefore able to realistically reproduce teir real life purcasing patterns (Coen and Gadd 1996). In many CSSE, it will be practically infeasible to include all te items in te category in te virtual store. Since consumers do not ave te possibility to sop elsewere in a CSSE, we do not expect assortment reductions to affect purcase quantity. In terms of coice, te consumer beavior literature indicates tat buyers wo are uncertain about teir preferences for te (less known) available alternatives can be expected to make a coice on te basis of cues suc as brand name and price (see e.g. Kapferer 1994, Kan and McAlister 1997). In line wit tis researc and wit Hypotesis 2, we expect consumers wo cannot find teir regular product on te self to switc to a 'well known and socially acceptable alternative rater tan to a (lower quality) distributor brand. Description of te CSSE experiment Te CSSE used ere is comparable, in degree of sopistication, to te 'realistic' experiment described in Burke et al (1992). Te discussion tat follows describes te major features of te Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 33

13 CSSE experiment and te actions taken to enance te realism and external validity of te results. Experimental task and stimuli Te experiment involved two product categories: jam for wic ypoteses will be tested and a second category (paper towel or margarine depending on use rates) to make purcase simulations more realistic and reduce time compression effects (successive purcases in one category only, could strongly increase purcase feedback effects). Te product categories selected contain a 'reasonable' number of items in teir assortment, are regularly promoted, and are frequently purcased by a majority of consumers. Given te ig purcase frequencies, consumers can be expected to be fairly familiar wit items in te categories. Items in te product classes are typically cosen on te basis of cues tat can be visualized, suc as brand, pack, price, promotion, and product information. Using scanned pictures of real products, actual self layouts were imitated on te screen. To make te task as realistic as possible, te actual coice context (self layout, prices) of a large supermarket cain was reproduced in te computer simulation. Of te actual store assortment, a limited assortment of 22 items was selected for te jam category, including te most popular product variants (tastes) and major national brands, in addition to distributor brands and generics. Product prices were displayed on a price bar below te self, but remained at te same level trougout te purcase simulation. Respondents could zoom in on an item, retrieve product information printed on te (actual) product package, or select one or several items for purcase from te fictitious self. Eac respondent made purcase decisions over twelve successive weeks. In te task instructions preceding te purcase simulation, respondents were asked to imitate teir real sopping beavior as accurately as possible. For instance, tey were asked to mimic teir true purcase frequency (tey did not ave to make a purcase every week) and coice pattern (tey did not always ave to buy te same item wen tey frequently switced among different items). Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 34

14 Additional cues displayed at te bottom of te screen were a time indicator (simulation week) and te ouseold inventory level. Te latter was updated weekly using information on te ouseold's reported average consumption rate, and previous purcase decisions. As indicated above, information on te ouseold inventory sould make purcase quantity decisions more realistic. In addition, te fact tat interviews were conducted in store (see below) implies tat some time constraint is active. Indeed, if customers are intercepted in te course of a normal sopping trip and are asked to perform teir experimental purcases on tat occasion, tese experimental purcases will take place under similar time pressure as teir normal sopping activities. More realistic time constraints may also elp to reduce purcase quantity biases (Burke et al. 1992). Settings In te actual data collection stage, soppers were intercepted and interviewed in five different outlets of te supermarket cain tat ad served as a model for te selves presented in te CSSE. Tis guaranteed consumer familiarity wit te virtual store selves, and ensured tat consumers found temselves in a noisy, stimulating, natural environment. As indicated earlier, previous researc revealed tat tese interview conditions are particularly important in creating a realistic sopping atmospere (Coen and Gadd, 1996). Manipulations In te course of teir sopping trips, consumers were confronted wit in-store promotions of different types (direct price cuts, coupons, premiums, and quantity discounts). Promotions were offered for two test items in eac category, and were randomly assigned to periods and consumers. No promotions occurred in te first two periods, and te promotion frequency was in line wit te normal occurrence of in-store promotions in te category. Promotions were visualized by means of realistic on-pack labels in combination wit colored self tags. Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 35

15 Measures Information on te ouseold s consumption rate and oter caracteristics suc as variety seeking tendency and socio-demograpics was collected by means of a computer questionnaire preceding te purcase simulations. At te same time, te questionnaire allowed respondents to become familiar wit te virtual store s self layout and assortment, tereby reducing te risk of initialization effects (te CSSE self was displayed several times, for example, wen consumers were asked to indicate teir average use rate in te category). During te purcase simulations, item coice(s), purcase incidence and quantity were recorded. Unless te respondent explicitly requested elp, te questionnaire was self-administered, so as to minimize interviewer bias. Yet, te continuous presence of two interviewers (see Metodology section), and te fact tat respondents knew teir purcase decisions were recorded, implies tat measurement biases may not ave been completely eliminated. Metodology Store managers of te interview stores agreed to make space available near te ceckout counters to install tables wit PC s. Tis was possible as interviews were conducted in spacious ypermarkets, in wic wide aisles were available for special events or display activities. Respondents were intercepted eiter before or after tey completed teir sopping activities. Tose willing to co-operate were lead to one of te PC s and given clear instructions on ow to complete te interviews in a self-administered fasion. Two interviewers remained permanently available to supervise respondent activities and - if necessary - provide assistance. A special video-corner was installed near te tables to entertain te cildren accompanying respondents during te interview. In total, 506 respondents were interviewed, yielding some 3700 purcase occasions (i.e., simulation weeks in wic at least one item was bougt). To validate our ypoteses, we confronted te CSSE outcomes wit results obtained wit actual scanner panel data. Note tat tese data are widely used in practice to establis purcase levels, Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 36

16 market sares and promotional reactions. Data were available for 2 outlets of te considered retail cain, over 16 subsequent weeks, for all store customers possessing a cain loyalty card. In addition, information was collected on various types of in-store promotions during te period for wic scanner panel data were available. In comparing te experimental wit te scanner panel data, we corrected as muc as possible for 'systematic' sample or 'selection' differences, and for differences in in-store conditions. Matcing Samples and Store Conditions Inabitants in te trading area of te outlets from wic scanner panel data were available, ave a socio-demograpic profile tat closely matces tat of te outlets were CSSE interviews were conducted. In te scanner panel data set, we retain consumers wo purcase at least one pack in te category per mont: tis minimum use rate is similar to te one used for screening CSSE respondents. For te jam category, tis leaves us wit information on 486 ouseolds in te experimental data set, and 283 ouseolds from te scanner panel data. Wile te scanner panel sample is based on consumers olding a cain loyalty card, te experimental sample is drawn from all consumers in te store. Te implication of tis difference in sampling frame is expected to be limited, as loyalty card penetration is considerable (cain surveys reveal tat about 95% of store visitors are card olders), and does not imply exclusive patronage of tat cain (many soppers ave loyalty cards for different cains). Yet, te risk tat card oldersip induces a bias in beavior for te scanner panel sample cannot be completely ruled out. Oter potential differences between te experiment and real life setting concern te frequency wit wic consumers are confronted wit te selves (weekly visits in te experiment), and te widt of te assortment presented (experiment comprises only subset of items). Information on scanner panel members store visit frequency, and on weter CSSE consumers find teir regularly purcased products on te computer selves, was collected in te initial questionnaire. Tis information will be elpful in detecting potential biases in beavior arising from experimental store visit and assortment conditions (for instance, by examining differences in buying beavior between (i) consumers wo visit te store on a weekly basis and tose wo ave substantially lower or Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 37

17 iger sopping frequencies, and (ii) between consumers wo did or did not find teir favorite product on te virtual store self). As outlined above, te artificial selves presented on screen comprise only a subset of te items sold in te cain's outlets (22 out of 57 items for jam). Te self layout is a reproduction of real store planograms for te category. Wile regular price levels in our experiment matc tose in te scanner panel data set, promotional conditions are somewat different between te two data sets (see Hypotesis Tests section). Purcase Quantity and Coice Models As indicated above, Hypoteses 1 to 3 investigate te validity of CSSE purcase decisions in te absence of promotions. Promotion effects on purcase quantity and coice decisions sould terefore be removed in order to test tese ypoteses. Wit tis purpose, we estimate for eac data base: (i) a purcase quantity model, linking te number of units bougt by a consumer in a given category and week, to ouseold and marketing variables, and (ii) a stocastic coice model, explaining te coice probability of any item in te assortment, given a category purcase, as a function of item caracteristics, previous coice decisions, and promotion variables. Purcase quantity probabilities per consumer are modeled as a Poisson-based process (see e.g. Bucklin and Gupta 1992, Dillon and Gupta 1996), ouseold coice probabilities as a Multinomial Logit model (see e.g. Guadagni and Little 1983, Gupta 1988, Bucklin and Gupta 1992). Model specifications can be found in Appendix 1 and Appendix 2 respectively. Using te parameter estimates, purcase quantity and coice probabilities under a no-promotion condition can be computed by setting all promotion variables equal to zero (see Hypoteses Tests section). Comparison of tese values for te scanner panel and CSSE data sets allows to test Hypoteses 1 to 3. In addition, estimated promotion parameters can provide insigt in to te realism of CSSE promotion effects on purcase quantity and coice decisions (Hypotesis 4). Tests on te stability of purcase quantity and coice beavior in te CSSE (Hypotesis 5) are Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 38

18 carried out by introducing weekly parameters into te coice and quantity models for te experimental data set (i.e., for eac simulation week, a dummy variable is incorporated into te model specification; see Appendix 1 and Appendix 2). Wen respondents systematically cange teir buying beavior over subsequent simulation periods, at least some of te week parameters sould be significantly different from zero. Positive and significant parameters for te last weeks in te purcase quantity model would, for instance, indicate a tendency to increase purcase quantity towards te end of te purcase simulation. Hypotesis 6, finally, is tested by comparing weekly purcase quantities and coice sares of national and store brands between (1) te subsample of respondents wose favorite item is available in te CSSE assortment, and (2) te subsample of respondents wo did not find teir most preferred item on te virtual store self. Since not all consumers face te same CSSE promotions, comparisons are carried out for computed quantities and coice sares under te nopromotion condition. Estimation Results Te purcase quantity model is estimated using maximum likeliood procedures. Estimation for te experimental data set is based on 11 weeks, week 1 being an initialization week for te lagged promotion effects. For te scanner panel data, weeks 1 to 4 serve as initialization weeks over wic te ouseold's use rate in te category is calculated, and model estimation is based on purcases from weeks 5 to 16. Bot models provide a satisfactory fit, wit Mean Absolute Percentage Errors (MAPE) for weekly purcase quantities of.059 in te experimental data set, and of.099 in te scanner data set. Te estimation results - wic can be obtained from te autors on request - reveal a igly significant positive effect for use rate, and negative impact for inventory level. Estimated promotion coefficients are eiter insignificant, or ave te expected sign: a more detailed discussion of promotion effects is provided wen discussing Hypotesis 4. Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 39

19 Coice models are estimated by maximum likeliood procedures, using information from weeks 5 to 12 for te experimental data set, and from weeks 5 to 16 for te scanner data set. Weeks 1 to 4 again serve as initialization weeks, to compute te ouseold s intrinsic preferences (loyalty variables) and variety seeking tendency. Goodness of fit appears to be satisfactory for bot data sets, wit MAPE in brand sares of.080 and.091 for te experimental and scanning data set, respectively. Estimation results furter reveal tat most coefficients are significant and ave te expected sign. To test weter te MNL model s IIA assumption is justified, a procedure suggested by Fader and Hardie (1996) was applied. Results demonstrate tat bot data sets comply wit te IIA assumption. Hypotesis Tests Hypotesis 1 In te presence of inventory cues, te number of units purcased by consumers in a CSSE is in line wit actual purcases. To ceck weter experimental purcases matc wat consumers buy in real life settings, we compare te weekly experimental purcase rates wit tose obtained from te scanner panel data set. Week 1 is left out of tis analysis as it is considered exceptional (starting inventory is zero for all consumers), and since te purcase quantity model is estimated for weeks 2 to 12. Next to observed experimental purcases, wic may be affected by promotional activities in te experiment, we calculate expected weekly purcase quantities per consumer in te absence of promotions. Tese expected quantities are obtained from te purcase quantity model troug dynamic simulation, taking into account tat consumers wo buy less in a given week wen te promotion is absent, will end up wit a lower inventory level at te beginning of te following week, and ence are more likely to buy larger quantities ten. For purposes of comparison, we also include expected quantities from te Poisson model wen promotions are present. Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 40

20 Figure 1. Frequency Distribution of Weekly Purcase Quantities in Experiment and Scanner Data Set frequency ,25 0,33 0, w eekly purcase quantity exper exp(nopr) scanner scan(nopr) Exper : observed weekly purcase quantities, experiment Exp(nopr) : simulated weekly purcase quantities, experiment, no promotion Scanner : observed weekly purcase quantities, scanner data Scanner(nopr): simulated weekly purcase quantities, scanner data, no promotion Figure 1 sows a istogram of weekly purcases per consumer as tey are (i) observed in te experiment, (ii) simulated in te experiment under te no-promotion condition, (ii) observed from te scanner panel, and (iv) simulated for te scanner panel in te absence of promotions. Note tat in te scanner panel data, eliminating te impact of promotions leads to a more pronounced sift towards lower quantities 1 tan for te experimental data. Tis follows from te frequency as well as from te nature of te promotional actions: te scanner panel data are caracterized by somewat more frequent promotions tat, in contrast wit tose in te experimental setting, may encompass various tastes of te promoted brand simultaneously and are featured in retailer 1 Tis sift is especially apparent at te lower end of te grap. A tentative explanation for tis penomenon is tat promotional purcase quantity sifts are largely due to purcase acceleration (see e.g. Gupta, 1988), implying sort term peaks and post-promotion dips in category purcases. For eavy users, wo consume at least one pack per week and are likely to buy te product on a weekly basis, almost all post promotion dips are likely to occur witin te sixteen-week period, and to compensate for te immediate sales peaks. For ligt users (weekly consumption below one pack), post promotion dips will to a considerable extent be observed after te observation period, and not yet intervene in te simulated quantities for te sixteen-week period. Te necessary downward correction witin te observation period is terefore larger for ligt users, at te low end of te grap. Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 41

21 leaflets. From Figure 1, we observe tat te experiment as a iger frequency for larger weekly quantities per consumer (1 or 2 units per week) compared to te scanning data set. Te proportion of buyers adopting only 1 unit every 3 weeks (use rate.33) or every 4 weeks (use rate.25) is more important in te scanner panel tan among CSSE consumers. A non-parametric test on te difference between te quantity distributions indicates tat te deviations are significant (Man-Witney z=-8.765; p=0.00). Table 2, wic summarizes means and standard deviations for te various weekly purcasing rates, provides furter support for tis finding. We terefore ave to reject our ypotesis tat experimental purcase rates coincide wit real life purcase frequencies. Despite te presence of te inventory cue (wic acts as an implicit space constraint), we find tat experimental purcase quantities systematically exceed scanner purcases. Table 2. Means (Standard Deviations) of weekly purcases in te experiment and scanner data set Experiment Scanning Observed.82 (.59).53 (.35) Simulated, wit promotions.79 (.55).56 (.40) Simulated, witout promotions.76 (.54).48 (.34) Tese differences are not likely to result from card oldersip bias: ceteris paribus, one would expect soppers possessing a loyalty card (and from wic te scanner sample was drawn) to purcase at least as muc in te cain as non-card olders. Furter analysis revealed tat purcase quantity deviations cannot be attributed to differences in sopping frequency between experimental and scanner panel consumers. An interesting finding, toug, is tat witin te experimental setting, consumers purcases coincide wit self-reported use rates. Figure 2 provides a istogram of consumers self-reported use rates, and observed as well as simulated experimental purcase quantities. A Wilcoxon Matced-Pairs Signed-Ranks test confirms tat Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 42

22 reported and observed experimental use rates ave te same distribution (te test-value is equal to -.575, p=.565). Having specified a use rate, consumers tend to engage in experimental purcases consistent wit tis estimate. Tey are encouraged to do so since teir inventory level trougout te experiment is based on te use rate estimate. Tis finding offers a tentative explanation for wy Hypotesis 1 fails. If consumers overestimate teir use rate, tis will be translated into exaggerated experimental purcase quantities. Earlier studies ave indeed sown tat in an interview setting, respondents ave a tendency to overstate purcases (Sudman 1964, Wind and Lerner 1979). Under tese assumptions, a meaningful approac for future CSSE would be to estimate respondents use rates on te basis of teir past observed purcase beavior, and to rely upon tis information for experimental inventory updates instead of upon teir selfreported use rates. Figure 2. Frequency Distribution of Reported and Observed Purcase Quantities (experiment) frequency ,25 0,33 0, w eekly purcase quantity reported exper exp(nopr) exp(pro) Reported Exper Exp(nopr) : reported purcase quantity per week : observed purcase quantity per week (experiment) : simulated purcase quantity, no promotion Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 43

23 Hypotesis 2 In te context of CSSE, consumers tend to buy more national brands, and less private labels and generics, tan in reality. In comparing market sare levels across data sets, we ave to account for differences in promotional conditions and assortments: two minor national brands of te scanner panel store s assortment are not included in te experiment, wile for oter national brands and te private label some taste variants are missing. To make te results comparable, we terefore predict market sares for bot data sets in te absence of promotions, and rescale te (predicted) scanner data market sares of items included in te experiment so tat tey sum to one. Figure 3 provides a summary of tese calculations. It reports, for te brands and items included in te experiment: (i) te observed market sares in te experimental data set, (ii) predicted sares in te experiment under te no-promotion condition, (iii) observed scanner data sares of items available in te experiment (rescaled so tat tey sum to one), and (iv) predicted scanner data sares in te absence of promotions, for te items available in te experiment, rescaled to sum to one. Note tat correcting for promotions leads to a marked decline in sare for NB1: tis brand is regularly promoted in bot te experimental and te scanning data set. NB2 is never on promotion in te CSSE, and subject to occasional promotional activities in te scanning data set. Tis explains wy it benefits from promotional corrections in te CSSE data (by capturing some of NB1 s sare), wile tose corrections do not make a difference in te scanning data set (own promotional sare lost, but compensated by absence of promotions for NB1). A similar finding olds for te private label, wic is promoted in bot data sets, but muc less eavily or effectively so tan NB1. Te Generic, finally, is never on deal, and benefits from te elimination of promotion effects by capturing part of NB1 s sare. Turning to Hypotesis 2, some interesting patterns emerge from Figure 3. For two of te national brands (NB1 and NB3), experimental sares significantly and systematically exceed true sares Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 44

24 (based on a t-test of te significance between two estimated proportions). Tis finding is in line wit our ypotesis. For te generic product, observed experimental sare is significantly below observed scanning sare, and predicted experimental sare witout promotions is substantially below predicted scanner sare. Tis observation, too, is consistent wit our expectations. Yet, te findings for te Private Label do not coincide wit te ypotesized pattern. After correcting for promotional differences, Private Label sare in te experiment is comparable to te market portion it captures in te scanner panel data set. Finally, experimental sare for one National Brand NB2 is in line wit te rescaled scanning data sare. Figure 3. Experimental versus Scanning Market Sares for Items Available in te Experiment NB1 NB2 NB3 PL Gen exp exp(nopr) scan scan(nopro) Exp Exp(nopr) Scan Scan(nopr) : observed market sare, experiment : simulated market sare, experiment, no promotion : observed market sare (rescaled), scanner data : simulated market sare (rescaled), scanner data, no promotion We conclude from tis discussion tat te results provide partial support for Hypotesis 2. Like Burke et al. (1992), we find tat coice sares tend to be overestimated for national brands, and tat te experimental data set systematically underestimates te true sare of generic products. Yet, no suc underestimation is found for te private label. A plausible explanation is, tat customers of te analyzed supermarket cain ave come to perceive private labels as valuable and Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 45

25 socially acceptable alternatives, in contrast to generics wic are still attributed a negative sign value. As suggested in a recent article by Bronnenberg and Watieu (1996), te quality gap between national brands and private labels appears to be closing. Dar and Hoc s (1997) study furter demonstrates tat te performance of store brands varies widely among retailers and product categories, depending on factors suc as te quality and breadt of private label offerings, te retailer s overall pricing strategy, and degree of promotional support for private labels. Te realism of private labels coice sares in CSSE s may terefore depend on tese factors and associated buyer perceptions, yielding biased estimates for some retail cains and/or product categories but not for oters. Hypotesis 3 In te context of CSSE, consumers seek less variation in coice beavior tan in reality. To test for differences in purcase variation between experimental and real life settings, we compute te level of entropy in te purcase istory of CSSE and scanner panel ouseolds. Entropy is a widely accepted measure of purcase variation, and indicates weter a ouseold tends to concentrate purcases in a single item (entropy equal to 0) or to spread purcases over several items (entropy>0; te measure equals 1 at maximum level of variation) (see e.g. Van Trijp and Steenkamp, 1990). Since jam items differ in brand and taste, bot attributes are accounted for Table 3. Mean (Standard Deviation) of Entropy in te experiment and scanner data set Experiment Scanning absolute entropy 1.18 (.45) 1.00 (.39) relative entropy.57 (23).54 (.21) in our entropy computations. Table 3 reports te average entropy measure, and its standard deviation, across ouseolds in Journal of Empirical Generalisations in Marketing Science, Volume Four, 1999 Page 46