Local Preference Minorities and the Internet: Why e-retailer Demand Is Greater in Areas Where Target Customers Are in the Minority

Size: px
Start display at page:

Download "Local Preference Minorities and the Internet: Why e-retailer Demand Is Greater in Areas Where Target Customers Are in the Minority"

Transcription

1 Marketing Science Institute Special Report Local Preference Minorities and the Internet: Why e-retailer Demand Is Greater in Areas Where Target Customers Are in the Minority Jeonghye Choi and David R. Bell Copyright 2008 Jeonghye Choi and David R. Bell MSI special reports are in draft form and are distributed online only for the benefit of MSI corporate and academic members. Reports are not to be reproduced or published, in any form or by any means, electronic or mechanical, without written permission.

2 1 Local Preference Minorities and the Internet: Why e-retailer Demand is Greater in Areas Where Target Customers are in the Minority Jeonghye Choi and David R. Bell December 5, 2008 Jeonghye Choi is a Doctoral Candidate at the Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA ( jeonghye@wharton.upenn.edu, Tel: ) and David R. Bell is an Associate Professor at the Wharton School, University of Pennsylvania, 3730 Walnut Street, Philadelphia, PA ( davidb@wharton.upenn.edu, Tel: ). We thank seminar participants at the 2008 INFORMS Conference (Washington, DC) and the 2008 Erin Anderson Invitational Conference for feedback. Eric Bradlow and Christophe Van den Bulte provided detailed comments. Diapers.com generously provided the data. We are very grateful to Ross Rizley, MSI Research Director, and two anonymous MSI reviewers for constructive feedback and suggestions.

3 2 Local Preference Minorities and the Internet Abstract Local stores face trading area and retail space constraints, so the products they offer tend to cater to the tastes of the local majority. Consumers whose preferences are dissimilar to the majority in trading area preference minorities may be under-served. Using sales data from the leading online retailer for baby diapers (Diapers.com), we examine whether the minority status of the target customers in a particular location affects online sales. We hypothesize that, holding the absolute size of the target group constant, online sales will be higher in regions where this target group is the preference minority. We further conjecture that online sales of niche products, relative to popular products, will be even more responsive to preference minority status. Finally, we show that these two hypotheses imply that niche products in the tail of the Long Tail sales distribution will draw a greater proportion of their total sales from high preference minority regions. The data support both hypotheses and the Long Tail result. Implications for retailing theory and practice are discussed. Keywords: Internet Retailing; Long Tail; Preference Minority

4 1 Local stores face trading area and retail space constraints, so the products they offer tend to cater to the tastes of the local majority. Consumers whose preferences are dissimilar to the majority in trading area preference minorities are likely to be under-served or in a worse-case scenario, neglected by local retailers altogether. In this paper, we examine online demand from preference minorities, and explain why Internet retailers draw more sales from regions that contain them, holding the absolute number of target customers per region constant. We further show how this effect is exacerbated for niche products (relative to popular products), and why niche products in the tail of the Long Tail sales distribution draw a greater proportion of their total online demand from high preference minority regions. Thus, we address why and how region-level demand and aggregate demand emerges at Internet retailers through a process of virtual, rather than physical, agglomeration of customers. To clearly describe the key idea and the contribution of this paper, we focus on the following example. Imagine a local area in which the elderly are the majority of population. Since retailer stocking rules take local preferences into account (see Farris, Olver, and de Kluyver 1989), young parents with newborns living in this area might not find a full assortment of baby products in the local market. That is, they assume the status of preference minorities. Local stores may indeed still allocate some shelf space to baby products, but if they do, the brands and variety offered will be limited. Therefore, the local market characteristic of a prevalent elderly population puts the young parents at a relative disadvantage when it comes to shopping for items for their newborns. This relative disadvantage is exacerbated when even more narrowly defined preferences are taken into account. Further suppose the newborn is sensitive to chlorine, (a harsh chemical sometimes used in the manufacture of diapers), and the parents are advised to use chlorine-free diapers. If the baby in question is among a small minority in the

5 2 local neighborhood who suffers from extreme chlorine sensitivity, the parents will again find it difficult to purchase such niche products in local stores. Given that shelf space at local offline stores is fixed, the type of available products is determined by the mix of local customers preferences (see Waldfogel 2007). Profit-maximizing firms allocate space according to the Pareto or 80/20 rule (Chen et al. 1999; Farris, Olver, and de Kluyver 1989; Reibstein and Farris 1995) and a product is made available locally only if it is wanted by a sufficient number of local neighbors; preference minorities with atypical needs are thereby implicitly harmed. This push and pull aspect to distribution decisions is described by Farris, Olver, and de Kluyver (1989) who note that: Retail buyers favor products that provide the greatest returns to the shelf space and the merchandizing resources allotted them (p. 109). The goal of this research is to study how an Internet retail alternative helps local preference minorities, and what this implies for the Internet retailer s sales distribution across products and markets. Two hypotheses for preference minorities category needs and brand-specific preferences are developed and tested using online sales data. These two hypotheses imply a conjecture regarding the Long Tail sales distribution (Anderson 2006), which we also develop and test. Our data come from the leading online baby products retailer, Diapers.com (see Data and Measures section for details). As the largest U.S. online retailer carrying baby products, Diapers.com provides an excellent setting for measuring differences across regions in online demand for diapers overall, and for specific brands. The focal product category for our analysis diapers has several features that make it well-suited for our study. First, per-capita consumption of diapers is relatively constant, and total consumption in a particular location is tied to the number of babies living there. Second, Diapers.com carries leading national brands

6 3 (Pampers, Huggies, and Luvs) and a leading niche brand (Seventh Generation) that has selective distribution in offline markets, i.e., it is not available in every offline supermarket. Third, we determine exactly which offline stores in which locations carry this niche brand, in order to directly control for region-level variation in access to popular and niche products. There is no absolute standard for defining minor preferences ; hence, we define them by looking at the relative size of the focal sub-group in a local area. To this end, we construct a preference minority index (hereafter, the PM Index) using the following proportion defined at the local market level: [1 - (Target Population / Total Population)] (see also Forman, Ghose, and Wiesenfeld 2008; Goolsbee and Klenow 2002; Sinai and Waldfogel 2004). The PM Index reflects a key assumption of our analysis: The amount of local product variety available offline to the focal group depends on the relative size of the focal group. 1 To vary the PM Index across markets, while holding the focal population constant, we define three bins (terciles) of local markets. In each bin, local markets are homogeneous in terms of focal population size (i.e., category sales potential), but are substantially different in terms of total population and local retailing environment. 2 We make three substantive contributions to the literature on Internet retailing. First, we hypothesize and demonstrate that sales substitution, from offline retailers to online retailers, increases across local markets as their PM Indices increase, i.e., as the relative size of the focal group decreases. Holding the characteristics of the local environment (and the total size of the focal group) constant, online sales are higher in markets where the focal group is more of a preference minority. On average, online sales in markets at the 90 th percentile of the PM Index 1 Of course, it also depends on the absolute size of the focal group as well. In larger markets, i.e., with more members of the target group, there are more retail formats to address their needs (see Christaller 1993). 2 We also estimate the model using all the local markets across three terciles and obtain qualitatively identical results (available upon request). Further justification for the separate analysis is given subsequently.

7 4 are roughly 50% higher than those in markets at the 10 th percentile, even though both markets contain the same number of potential customers. Second, we hypothesize and find an interaction with the types of products sold. By definition, total niche product sales are lower than total popular product sales; however, local online sales of niche products respond more strongly to the presence of preference minorities than local online sales of popular products do. A move from the 10 th to the 90 th percentile PM Index markets, increases local online sales of popular products by about 30%, but increases local online sales of specialty products by more than 150%, even though both markets contain the same number of potential customers. General dissatisfaction with offline options that affects preference minorities might be particularly acute for those with specialized tastes. Preference minorities sort into online alternatives to alleviate their limited local options; this online agglomeration intensifies for members of the preference minority who favor niche products. Third, we show formally that our two hypotheses have an important implication for the Long Tail sales distribution. Popular products and niche products both have more online-offline substitution in local markets with a higher PM Index. However, the breakdown of online sales for popular and niche products is different. Niche products with a lower overall sales rank (i.e., products in the tail of the Long Tail) draw a greater proportion of their total online demand from high PM regions, than popular products do. The remainder of the paper is organized as follows. The next section summarizes the key ideas from the extant literature, describes the hypotheses, and motivates our research. The section immediately following describes the data and measures for the empirical work. We subsequently specify the empirical model and report the findings. The paper concludes with a discussion of the implications for retailing theory and practice, and for future research.

8 5 BACKGROUND AND HYPOTHESES We propose a new driver of online-offline sales substitution and develop two hypotheses centering on this driver. In particular, location-specific under-provision of product variety for a relevant target population has an overall positive effect on local online sales (H 1 ); this effect is stronger for niche products compared to popular products (H 2 ). The combination of these two effects should lead to a specific pattern of sales in the Long Tail: Niche products draw a greater proportion of their total online demand from high PM regions, than popular products do. Online-Offline Substitution Studies of online-offline sales substitution address several consumer-related benefits from using online retailers, including lower prices (Brynjolfsson and Smith 2000; Chiou 2005; Goolsbee 2000), greater convenience (Balasubramanian, Konana, and Menon 2003; Cairncross 1997; Keeney 1999), and greater product variety (Brynjolfsson, Hu, and Raman 2008; Ghose, Smith, and Telang 2006). Among these factors, price has received the most attention. Brynjolfsson and Smith (2000) and Goolsbee (2000), for example, demonstrate that consumers switch from local retailers to online retailers for lower prices, and to avoid local sales tax, respectively. Anderson et al. (2008) find that when retailers open physical stores in a location and thereby acquire a nexus for tax purposes Internet sales at that location suffer (since the firm now has to charge sales tax on any Internet sales in that location). Forman, Ghose, and Goldfarb (2008) find that convenience as measured by reduced travel cost has a large influence on online-offline substitution. Choi, Bell, and Lodish (2008) find corroborating evidence for the importance of convenience. In their data, the sales effect of increased convenience through faster

9 6 shipping to particular location is stronger than the effect of tax arbitrage (measured by the difference between the Internet tax rate when no nexus exists, zero, and the offline sales tax percentage). Collectively, these studies imply that the value proposition of Internet retailers to potential customers is determined by where those customers live, which in turn directly reflects the options and constraints they face in the local offline market. Local Preference Minorities and Distribution Theory The Concept of Compromised Demand. While our study focuses on the relatively new context of Internet retailing, some of the key ideas relate back to classic findings developed in the distribution channels literature prior to the introduction of the Internet. Farris, Olver, and de Kluyver (1989) and Reibstein and Farris (1995) note that not all consumers can find their first choice brands in all local stores. While market leader brands tend to be stocked in all stores in a local market (i.e., discount stores, supermarkets, and convenience stores), niche brands with less turnover tend to be stocked only in local stores with considerable shelf space. A shopper looking for a niche brand, but shopping in a convenience store, might be forced to buy the popular brand, as it is the only brand stocked in the category. In this case, the popular brand sales at the convenience store represent compromised demand (see Farris, Olver, and de Kluyver, p. 114, Figure 4). By analogy, preference minority shoppers in local markets may also be subject to compromised demand effects, however they can now be alleviated via the Internet. To understand why all brands are not all distributed through all local stores, consider that offline retailers face two constraints. The first is the constraint of fixed retail space, i.e., shelf space and inventory space. Due to high fixed costs of product provision in retail stores, addition of a new product will drop another product from the shelf space (Farris, Olver, and de Kluyver

10 7 1989). The second constraint, which interacts with the first, is the set of heterogeneous preferences of the local customers who make up the retailer s limited local trading area. Rational shelf space allocation decisions dictate that not all product categories, or brands within a category, make the cut (Farris, Olver, and de Kluyver 1989). The decisions are subject to the preferences of the local majority (Waldfogel 2007) or consumer pull (Anderson 1979). The combination of these two factors fixed shelf space and heterogeneous preferences in geographically limited trading area leads local retailers to stock products that appeal to sufficiently large customer groups, and therefore generate sufficient product. In this sense a member of the preference minority in a local area (such our family with a newborn in the Introduction) is like a customer who is forced to shop at a convenience store and therefore faces limited or negligible assortment in product categories of direct interest. Figure 1, which is analogous to Figure 4 in Farris, Olver, and de Kluyver 1989 (p. 114) illustrates the idea. The left half of Figure 1 shows four local markets that all have 1,000 target customers, but vary by total population. In the right half, there are four local markets that all have 10,000 target customers, but again vary by total population. Suppose that as the total population increases, so does the number of local stores, but that the physical size of local stores is largely unrelated to population size. When the target population is a large fraction of the total (e.g., in Markets D and H), there is a full assortment of products available locally (Brands 1-4 are all available locally). When the target population is a small fraction of the total, i.e., the PM Index is high as in Markets A and E, only Brand 1 is available locally. Markets A and E might also be expected to have more comprised demand such that consumers there are less satisfied

11 8 with their local offline options (e.g., Fornell 1995). 3 [Insert Figure 1 about here] General Online Demand Effect (H 1 ). Internet retailers are less subject to the two constraints just discussed. Consumers who cannot find their first choices in local markets now do not need to compromise but can substitute from offline retailers to online retailers. Preference minorities in different regions, but facing the similar local constraints, can agglomerate virtually into the same online retailer. The idea that individuals with heterogeneous preferences for public goods sort into different physical neighborhoods underlies much of the analysis in urban economics (see especially Tiebout 1956 and Dowding, John, and Biggs 1994 for a comprehensive review). This idea of sorting also translates to private goods and online activities in the following way. Local consumers with minority preferences, many of whom will be living in geographically separate places, may similarly sort into online retailers (i.e., virtual neighborhoods) to take advantage of essentially unlimited product assortment available online. The interplay between local consumer demand and local retailers stocking decisions affects the proportion of total local demand satisfied online versus offline. In Figure 2 the PM Index defined as: 1-(Focal Population)/(Total Population) is on the x-axis. The PM index across different local markets (e.g., D through A in Figure 1) increases from left to right. This is because while the total population in a local market increases from left to right, the size of the focal population stays the same. Imagine that a local target group (say households with babies) has relatively fixed per-capita consumption of a product category (say diapers). That is, total consumption in a local market has to be proportional to the size of a target group. In high PM markets (such as markets A and E in Figure 1) where this focal group is small relative to other 3 The analogy with Farris, Olver, and De Kluyver (1989) is the following. If Markets A through D were stores of different types within the same market, Brand 1 would have 100% Product Category Volume (PCV) and Brand 4 25% PCV.

12 9 local groups (i.e., households without babies), local retailers allocate limited space and attention to the product categories wanted by preference minorities. Hence, customers are driven online. High PM markets, compared to low PM markets, should see more online demand for the product category in question. Figure 2 therefore shows a positive slope across markets for online sales; as the PM Index in a market goes up, so do online sales. [Insert Figure 2 about here] H 1 : Local Preference Minorities and Online Demand. Substitution from offline retailers to online retailers will be greater in markets that have a higher PM Index. Popular versus Niche Products (H 2 ). We further conjecture that the online-offline substitution in H 1 will be intensified when local consumers in the preference minority do not favor popular products. This is because if a local retailer (e.g., a store in a neighborhood in which the elderly are the majority) decides to stock a product category of potential relevance to the preference minorities (e.g., baby diapers) at all, she will most likely choose a popular product such as a leading national brand (Farris, Olver, and de Kluyver 1989; see also Figure 1 Markets A and E). Niche products in preference minority markets are therefore subject to double jeopardy. By definition, fewer consumers prefer niche products; hence, even fewer local retailers in the preference minority market will stock them. Conversely, products with high sales and large market shares further increase market share and sales through a positive-feedback process (Reibstein and Farris 1995). Forman, Ghose, and Goldfarb (2008) provide related evidence. They find that when Barnes and Noble open a physical bookstore in a neighborhood, sales of popular books at Amazon.com decline in that neighborhood, however, sales of niche (or less popular) books do

13 10 not. Niche books are less likely to be stocked, on average, in physical stores so consumers who want them continue to shop for them online. In general, products relevant to the PM will be hard to find in a local market, but the situation will be exacerbated for consumers in the PM who prefer niche products. [Insert Figure 3 about here] H 2 : Popular versus Niche Products. Online-offline substitution for niche products, relative to popular products, will be more sensitive to changes in the PM Index. That is, as the PM Index increases across markets, online-offline substitution will be stronger for niche products. The Long Tail and Local Preference Minorities The Long Tail. Anderson (2006) introduced the Long Tail and Figure 4 (see The 80/20 Rule Revisited at summarizes the key insight. On the x-axis products are ranked from best selling to worst selling. The green area to the left of the second vertical bar in Figure 4 (a) represents product assortments and corresponding sales at offline retailers. The first green area to the left of the first bar shows 20% of popular products accounting for 80% of category sales; the second green area shows the remaining 80% of less popular and niche products. Online retailers can expand their inventory to include the yellow area to the right of the second bar, i.e., all those products that would not meet the shelf space or customer preference constraints faced by local offline retailers. Figure 4 (b) shows the sales and profit implications. While offline retailers gain 20% of their sales from the 80% of books in the less popular or niche category, they might make little or no profit from these books after taking inventory

14 11 holding costs and product turn into account. Conversely, online retailers have infinite shelf space, negligible inventory holding costs, and are not subject to a locally-defined trading area. While individual niche product sales over a large number of niche products are small, in aggregate they can contribute significant profits (25% in the example). [Insert Figure 4 about here] The Long Tail and Sales Differences across Local Markets. Together H 1 and H 2 generate a new insight into where sales of popular and niche products in the Long Tail come from. That is, we show that the rank ordering of a specific product in the Long Tail has implications for the proportional mix of its sales across geographical markets that vary according the PM Index. To see this, assume the same market environments described above. There, the focal population size is the same across all markets, but of different relative proportion (i.e., as in Figures 2 and 3). Online sales for all products increase across markets as the PM Index increases (H 1 ), however online sales response to the PM Index is stronger for niche products (H 2 ). By definition, a niche product, relative to a popular product, is positioned further into the right tail of the Long Tail. As shown below, high PM markets are especially important for niche products. (This observation has implications for geo-targeting which we take up in the Discussion and Conclusion.) I 1 : Online Demand and the Long Tail. Popular products and niche products both have more online-offline substitution per region in regions with a higher PM Index. However, the breakdown of online sales for popular and niche products is different. Niche products with a lower overall sales rank (i.e., products in the tail of the Long Tail), draw a greater proportion of their total online demand from high PM regions, than popular products do. Proof: Please see Appendix. Define online sales, y, of a popular product at a particular location as y = a 1 + b 1 x where x = PM Index location of a local market. Similarly, y = a 2 + b 2 x for the niche product. H 1 and H 2 state that the

15 12 niche product, by definition, has lower overall sales, but that the niche product s online sales respond more strongly to the PM Index. In Figure 5 this implies that a 1 > a 2 and b 1 < b 2. Without loss of generality, divide the space of all local markets into two groups: one group with a relatively low PM Index (0 to x 1 ) and the other with a relatively high PM Index (x 1 to x 2 ). 4 The aggregate sales of the popular product in the two markets are determined by integrating out the relevant areas under the sales curve y = a 1 + b 1 x as follows. A and B are total sales in low PM and high PM markets, respectively. x2 x1 x A = a + bx dx= a + bx x 1 ( ) (.5 ) { } B = ( a + bx) dx = ( x x ) a +.5 b ( x + x ) Similarly, aggregate sales of the niche product, in the low PM and high PM markets, are defined by C and D respectively. x2 x1 x C = a + b x dx= a + b x x 1 ( ) (.5 ) { } D = ( a + b x) dx = ( x x ) a +.5 b ( x + x ) Using the fact that ab 1 2 ab 2 1> 0 and AD BC a1b 2 a2b1 yields the following relationship a2 b2 B D (1) < < a1 b1 A C. In words, this says that for the niche product the sales ratio of high PM-market sales to low PMmarket sales, i.e., D divided by C, is greater than the same ratio for popular products, i.e., B divided by A. This relationship is illustrated in Figure 5 and will be examined empirically in the Empirical Findings section using sales data from Diapers.com. [Insert Figure 5 about here] DATA AND MEASURES 4 It is straightforward to show that the general case of n partitions of local markets along the PM Index leads to an identical result, but requires additional integrals of the relevant sales areas. Details are available upon request.

16 13 Data To test our hypotheses, we need to: (1) chose an appropriate product, (2) define the unit of analysis for local markets, (3) profile and control for the local offline retail environment, and (4) control for differences in geo-demographic characteristics across markets. Product Category. Diapers.com, the leading online retailer of diapers in the United States, provided zip code-level sales data. These data are cumulative from the firm s inception in October 2004 through March Three major national brands Pampers, Huggies, Luvs and one niche brand with selective offline distribution (Seventh Generation), are used in the analysis. By selective distribution, we mean that the brand is sold only in certain retailers throughout the United States (we subsequently identify the exact location of each supermarket that carries Seventh Generation). For each SKU sold, we determine the exact number of diapers in the packet, and compute aggregate sales by the zip code and brand combination, for each local market. We use the diapers category to test H 1 and H 2 for the following two reasons. First, namebrand diapers mentioned above are well known nationally and parents can be relatively certain of product quality before placing an order online and receiving the merchandise (Lal and Sarvary 1999; Lynch and Ariely 2000; Overby and Jap 2008). Equally important, the niche brand Seventh Generation has limited distribution offline. Second, consumption of diapers can reasonably be expected to be proportional to the total baby population, and to be constant at the per-capita level. The product category is not expandable in the way that some others might be

17 14 (e.g., books, soft drinks, etc.). Constant consumption across markets with the same number of target customers is a reasonable assumption (see Figure 1). Unit of Local Market. Zip codes are the units that define local markets. This makes sense for two reasons. First, a zip code is a relatively self-contained unit of buyers and sellers (especially for packaged goods and products such as diapers). The most accessible offline local retail format for baby diapers is the local supermarket and all zip codes that we examine have at least one supermarket; that is, supermarkets will be contained within the geographical territory of a zip code. 5 Second, zip codes are widely used as the unit of analysis in other studies of related phenomena, such as restaurant and bookstore variety (see Waldfogel 2007 for a review). We focus our attention on zip codes that lie within Metropolitan Statistical Areas (MSAs). Limiting the analysis to zip codes within MSAs is not only consistent with standard practice, but also ensures that we do not have zip codes that are extremely sparse in terms of either focal population or total population. 6 In order to rule out confounding effects by outside options other than disposable diapers, such as the availability of cloth diaper cleaning services in large target markets with more baby population (see Christaller 1933), we hold the focal population, i.e., households with babies, constant across regions while allowing the total population to vary across regions. 7 That is, we need the data to conform to the assumptions implied by Figures 1, 2 and 3 the PM Index varies on the x-axis because only the denominator total population is changing. Unfortunately, in 5 Residential zip codes average four supermarkets. There is one discount store every five zip codes, and one warehouse club every fifteen zip codes. Also, supermarkets appear every 2.5 miles, discount stores every eight miles, and warehouse clubs every 15 miles. 6 Consumers residing zip codes in MSAs have shorter travel distances to offline alternatives, and are not induced to shop online due to complete inaccessibility of local retail formats. Our empirical models account for different unobserved baseline rates across MSAs. 7 Using population aged 0 to 4 years old does not change results in any meaningful manner (average household size does not vary across zip codes). As noted previously, the model using data in three bins together provides qualitatively identical results.

18 15 the real data, both the number of households with babies (focal group) and the total population (denominator of the PM Index) vary by location. Hence, we define three separate bins of data (terciles) for analysis, based on the empirical distribution of the population data across all local markets in the United States. Within each tercile, the number of households with babies is approximately constant (and is used as an offset variable in the Poisson model used subsequently). Zip codes in the same bin have roughly equal size in terms of the target population, but substantially different total populations, i.e., within each tercile the PM Index varies in a manner consistent with Figures 1, 2 and 3. Another advantage of this approach is that we can see how the results change (or not) when the size of the target population differs across settings. The PM Index for each bin is summarized in Table 1 (b). Local Retail Environment. Local retail variables are constructed from the 2007 US Census of Business and Industry and they serve as proxies for local offline retail activity. We obtain retail information about major local competitors, including supermarkets, discount stores (Wal-Mart and Target), and warehouse clubs, using 8-digit NAICS (North American Industry Classification System) codes. While 6-digit NAICS codes are often used in research, greater accuracy is achieved with our approach. 8 We therefore limit our attention to NAICS for retail grocery stores and NAICS for warehouse clubs and obtain their specific store locations, in order to relate store locations to specific zip codes. Wal-Mart and Target belong to the discount department stores classification and we also obtain store locations directly. We then calculate the distance from the focal zip code to the nearest store of each format since physical distance is taken as a parallel to transportation costs in spatial differentiation models (see e.g., Balasubramanian 1998; Bhatnagar and Ratchford 2004; Cheng and Nault 2005). 8 6-digit codes for supermarkets (for example) can include candy stores and other smaller retail formats that differ from what is typically thought of as a supermarket. SIC codes have been superseded by NAICS codes.

19 16 Preference minorities who favor niche products will face even more limited variety than those who favor popular products (H 2 ). Seventh Generation diapers (the niche brand in our research) has selective local distribution in offline stores, meaning that only certain stores stock it. We obtained the exhaustive national list of all local retailers in the United States that sell Seventh Generation products (see Each store on the list was contacted directly via telephone to confirm that Seventh Generation diapers were available for sale. After confirming availability in the listed stores, we compute the distance from each zip code in the dataset to the nearest store stocking Seventh Generation diapers. The distances to the nearest retail formats that do not sell Seventh Generation diapers were re-computed in order to test H 2. Thus, we have general measures of distance to stores that stock the popular products (such as Pampers) and specific measures of distances to stores that stock the niche product. Geo-Demographic Market Characteristics. Zip-level geo-demographic characteristics are obtained from the 2000 US Census of People and Household. To control for observed differences across local markets we construct geo-demographic covariates that describe the local environment overall, and the characteristics of households who live there. We also account for the expected number of days it takes products to ship from Diapers.com warehouses to each zip code. This shipping time information was obtained from the UPS website ( and confirmed by management. Shipping times range from 1 to 6 days when Diapers.com had only one warehouse, and 1 to 4 days after a second warehouse was built. The PM Index. As noted earlier, empirically, there is no absolute determinant for minority preferences. Drawing on published research (e.g., Chen et al. 1999), we assume the following regarding offline retailers: More local stores are available as population increases, but

20 17 the physical size of stores need not be related to population size. 9 Since local retailers are unlikely to make strategically collusive decisions about which brands to stock, a greater number of supermarkets alone need not necessarily mean an increased amount of product variety in a local market. An increase in total population will necessarily limit the available retail space allocated for the focal group (as the focal group becomes proportionally smaller), and in turn, locally-available product variety for the focal group. Therefore, our proxy measure for local preference minorities is based on the market-specific proportion of households with babies, relative to the total number of households. This measure is presumed to reflect the relative variety, from one market to another, of goods and services available locally for the baby group. It mirrors the measures by Goolsbee and Klenow (2002), Sinai and Waldfogel (2004), and Forman, Ghose, and Wiesenfeld (2008). 10 Summary. The model variables and the associated summary statistics are grouped by bins and by types and presented in Table 1. Note that within each bin, the variation across markets in the size of the focal group, i.e., the number of households with babies, is substantially smaller than the variation across markets in the total number of households (see Table 1 (b)); the coefficient of variation for the number of households with babies is about half of the coefficient of variation for total households. Interestingly, distances from each zip code to the nearest supermarket that does not sell Seventh Generation diapers are roughly equal across three bins, 9 These two assumptions are validated with our data. First, the total size in square footage of particular stores (e.g., Target, Whole Foods) tends to be driven more by chain level decisions, than population size per se. We examine store space using two variables describing (1) four ranges of square footage of local retailers and (2) eleven ranges of the number of employees working there. Among the 1,415 (224) local Target (Whole Foods) stores, for example, 99% (79%) belong to the highest range of being more than 40,000 square feet and 81% (69%) belong to the range of having employees. Second, we examine the relation between the numbers of each retail format and population size using our using 8-digit NAICS codes at the MSA level. The number of households, for example, has significant correlations of.97 with the number of supermarkets,.86 with the number of discount stores, and.96 with the number of warehouses. 10 Information about actual local product variety is unavailable. Instead we assume supermarkets allocate shelf space by population composition and sales are proportional to shelf space. Using the fraction of supermarket sales attributable to households with babies provides qualitatively identical results.

21 18 but distances to the nearest store that does sell Seventh Generation decreases as the overall market size reflected by the size of the total population increases (see Table 1 (c)). Geodemographic characteristics largely remain similar across the three bins, which is perhaps not surprising given that the analysis is confined to MSAs. [Insert Table 1 about here] EMPIRICAL ANALYSIS Local Preference Minorities and Online Demand (H 1 ) To test H 1 we examine whether both the number of buyers per location and the number of repeat orders per buyer increase as the PM Index across markets increases, and to what degree. We assume the number of buyers (repeat orders) in zip code z in MSA m is Poisson distributed with rate parameter λ z( m), and λz( m) is modeled as a function PM z(m) (the PM Index), and local market characteristics, X v zm ( ), including access to local retailers and geo-demographics. Numbers of households with babies vary slightly across local markets in the same bin despite our attempt to hold them constant, and thus, we include the number of households with babies as an offset variable in the model for the number of buyers (Agresti 2002; Rabe-Hesketh and Skrondal 2005). The number of buyers is included as an offset variable in the model for repeat orders. In addition to the controls for observed heterogeneity, MSA-level random effects help control for unobserved heterogeneity in the baseline rates. The error tem ε z( m) allows for over-dispersion. The error term is assumed independent and Gamma distributed with the same shape and scale

22 19 parameter, θ (the equal parameters are needed for model identification; see Cameron and Trivedi 1986). (2) Yz( m) λz( m) ~ Poisson( ) v v log( λ ) = α + α + β PM + γ X + ε T zm ( ) 0 m zm ( ) zm ( ) zm ( ), where 2 m ~ N (0, ) α τ and exp( ε ) ~ Gamma( θθ, ) z( m) Since an increase in the PM Index, PM z(m) = 1 (Focal Population) / (Total Population), means that the focal group s preferences are becoming more minor, we expect β > 0 as more sales are sent online (H 1 ). Table 2 column (1) shows the expected positive effect of the PM Index on trial in the first bin. As the PM Index gets larger, more buyers emerge from local markets. To understand the quantitative effects implied by the estimate from Bin 1, suppose that two local markets are of equal size in terms of baby population, but differ in terms of total population, and therefore on the PM Index. For example, suppose one market has a PM Index equal to 0.80 (the 10 th percentile market) and the other has 0.89 (the 90 th percentile). At the mean of the other covariates, this implies 3.32 (4.32) new buyers from the 10 th (90 th ) percentile market. Note that moving from the 10 th to the 90 th percentile market does not change the size of the focal population as in both markets there is an identical number of potential customers. Instead, it is the increase in the total population which serves to make the customers in the 90 th percentile market more isolated. As they become a smaller fraction of the total market, local retailers allocate less space to products they want (or the variety they want), which in turn drives more buyers into online marketplace. [Insert Table 2 about here]

23 20 Column (2) of Table 2 shows the expected positive effect of the local PM Index on repeat orders of buyers in the first bin; local markets with larger PM values have greater numbers of repeat orders among buyers. While the estimate is highly significant, we examine whether the magnitude is practically meaningful by using same two local markets used to illustrate the marginal effects above. The 10 th percentile market shows 0.94 repeats per buyer, while the 90 th shows 1.24 repeats per buyer. Combining the higher trial and repeat rates, Diapers.com sales are almost 50% higher in the 90 th percentile market compared to the 10 th percentile market, even though in both markets the total number of target customers is the same. Diapers.com performs better in local markets with preference minorities, since customers in those markets are more likely to try and repeat-buy at online alternatives in order to alleviate the constraint imposed by limited local options. Columns (3) and (4) for the second bin and (5) and (6) for the third bin also show the corroborating evidence for the positive effect of PM on online sales. Again, in the second (third) bin, a move from the 10 th to the 90 th percentile market increases the number of new buyers from 8.05 (13.17) to (17.19), and repeat orders per buyer from 1.27 (1.28) to 1.52 (1.36). The average total sales percentage increase (averaged across terciles), when moving from the 10 th to the 90 th market is about 40%. The estimates of the control variables, while not the focus of this research, are largely plausible and we note a few interesting observations. First, both trial rates and repeat orders per buyer are positively correlated with the distance to the discount stores and warehouse clubs. The further shoppers have to go to reach these formats, the more likely they are to buy online. Interestingly, there is a negative relationship with the distance to supermarket; more buyers enjoy online shopping in areas where discount stores and warehouse clubs are inaccessible, but

24 21 supermarkets are conveniently located. Sales substitution from offline to online retailers is expected to come from patrons in discount stores and warehouse clubs while supermarkets are complementary since shoppers must typically visit them anyway for perishable products (e.g., Bell and Song 2007). In general, Diapers.com performs better in regions that have the following local population composition: higher percentage of college-educated individuals, working females, middle-income households, white population, and urban housing units. Greater sales are expected in regions that have rapid population growth, denser population, and expeditious delivery. In summary, H 1 is strongly supported by the data. Our empirical findings expand the conventional sorting argument to the online marketplace. Alongside the abundant evidence for individuals sorting into local regions and neighborhoods (Tiebout 1956; Dowding, John, and Biggs 1994), we add the finding that local consumers living in different regions yet facing similar local market constraints agglomerate at the same online retailer. In other words, local consumers will sort into different online retailers depending on types of market constraint. Popular versus Niche Products (H 2 ) While customers in preference minority markets will have difficulty finding diapers of all types, those who prefer Seventh Generation products will experience even more difficulty in finding them in local markets. Thus, the benefits from online markets will be greater for these locally-isolated consumers with atypical product preferences. 11 This product-level analysis and test of H 2 requires some additional data manipulation. First, we need to take into account the fact 11 Preferences for the national brands and the niche product are bi-modal and there is relatively little switching between these brands, conditional upon shopping at Diapers.com. Thus, we exclude the possibility that online buyers endogenously form the preference for variety.

25 22 that different SKUs of the same brand have different numbers of diapers in packages; hence, our dependent measure is based on the actual number of diapers, not the number of packages (or SKUs) purchased. Second, recall that we obtained an exhaustive list of all offline stores that stock Seventh Generation products and computed the product-specific expected travel distances from each zip code in our dataset (see Table 1 (b)). Recall that to account for limited availability of Seventh Generation brands, we control for local access to two different types of stores those that sell Seventh Generation and those that don t. Equation (2) is fit at the product level with the number of brand j diapers purchased in each zip code z and MSA m as the dependent variable. As for H 1, the rate parameter λ z( m) is modeled as a function of the PM Index, PM z(m), the number of households with babies as an offset variable, and local market characteristics, Z v z( m), including access to local retailers and geodemographics. We again include MSA-level random effects to control unobserved variation in the baseline rate, and the error tem ε z( m) to allow for over-dispersion. Table 3 provides the parameter estimates. The estimates provide strong support for H 2. As the PM Index increases across markets, online demand responds more strongly for Seventh Generation products than it does for the widely available national brands, Pampers, Luvs, and Huggies. When we compare the estimates for the PM Index across brands within the same bin, Pampers, Huggies, and Luvs have estimates that are more or less the same size, whereas the Seventh Generation estimate is more than twice as large. 12 To understand the magnitude of the effect implied by the estimate, we return to the marginal analysis conducted for H 1. At the mean of all other covariates, a move from the 10 th to the 90 th percentile market in the first bin increases 12 Comparison of the PM index estimates across four models provides the relative increases (percentage increases) of the dependent variables induced by the same increase in the PM index. We also examine four dependent variables simultaneously and obtain the same results.

26 23 the sales each brand as follows. Sales of Pampers diapers increase by 30% (from 662 to 857), Huggies by 45% (201 to 291), and Luvs by 36% (114 to 155). The increase for Seventh Generation is dramatically greater at 189% (from 63 to 197). This same pattern of a more dramatic increase for Seventh Generation diapers is also observed in the second and third bins. In the second bin the largest national brand increase is 38% (for Pampers), whereas the increase for Seventh Generation is 157%. In the third bin the largest national brand increase is again 38% this time for Huggies yet Seventh Generation sales go up by 96%. Therefore, preference minorities are not only more likely to sort into online alternatives to alleviate their limited local options (H 1 ), but this effect is intensified for niche products (H 2 ). [Insert Table 3 about here] Estimates of the effects of the control variables are largely consistent with those observed for overall online demand. There is however one interesting difference in the effect of access to local retail stores which provides further implicit support for H 2. The distance to stores (supermarkets) selling Seventh Generation diapers has a positive effect on online sales of Seventh Generation diapers. That is, when the store with the sought after product is further away, we get the expected and intuitive effect shoppers go online just as we saw in the general test for the effect of PM status (H 1 ). The estimated effect of distance to a Seventh Generation supermarket is however insignificant for all the national brands. This difference could be rooted in how consumers perceive these same retail stores. For non-seventh Generation buyers, supermarkets selling Seventh Generation diapers are just simply supermarkets; for Seventh Generation buyers they are far more than that they are special stores that provide an offline alternative for the product they want.

27 24 Local Preference Minorities, Online Demand, and the Long Tail (I 1 ) Earlier, we showed that H 1 and H 2 lead to I 1 niche products with a lower overall sales rank (i.e., products in the tail of the Long Tail), draw a greater proportion of their total online demand from high PM regions, than popular products do. We now compute the empirical analog. First, we fix the value of the PM Index at the 10 th percentile, the mean, and at the 90 th percentile so that all brands are compared at the same values of the PM Index. Holding everything else equal, we then compute the expected number of sales for each of the four diaper brands (Pampers, Huggies, Luvs, and Seventh Generation) in all three markets. Note now that a given brand (e.g., Pampers) within a specific bin (e.g., Bin 1) draws sales from three generic types of regions. Specifically, it sells in regions at the 10 th percentile of the PM Index, at the mean, and at the 90 th percentile. With this information in hand, we draw the corresponding Long Tail sales distribution over our four products. Figure 6 shows the results. [Insert Figure 6 about here] The Long Tail plots are drawn separately for each bin. (We focus mainly on Bin 1 for clarity of exposition as the finding repeats for Bins 2 and 3). Figure 6 (a) is the typical Long Tail plots with the product sales rank on the x-axis and corresponding sales on the y-axis. In Figure 6 (b), the sales aggregation over markets with different degrees of preference minority status is made explicit. In accordance with I 1, all national brands draw proportionally more sales from regions that have a higher PM Index (recall that the ratio of sales integrals for popular products is always strictly greater than one because the slope of the online sales line is positive see equation 1). Pampers, for example gets 38% of its sales from the 90 th percentile markets, but 29% from the 10 th percentile markets. The proportion of sales a brand draws from a region is

28 25 monotonically related to the location of that region on the PM Index: The higher the PM Index value for a local market, the higher the relative proportion of total brand sales accounted for by that local market. In fact, regardless of the differences in aggregate sales of Pampers, Huggies, and Luvs, the sales share distribution across the three different types of local markets is remarkably similar across these three brands. The sales distribution for the niche product Seventh Generation shows a stark contrast to distribution for the national brands, as predicted by I 1. In Bin 1, the relative proportion of sales from the 90 th percentile market to the 10 th percentile market is 58:18, or about three to one. This pattern of results repeats for Bins 2 and 3 (not shown; available upon request). As we move along the Long Tail, online sales for niche are more likely to come from high PM markets, raising their importance in accounting for the total sales of niche products. DISCUSSION AND CONCLUSION We have proposed the concept of local preference minorities as a driver of online-offline sales substitution in local markets. Detailed location-specific sales data from Diapers.com, the leading online retailer for diapers, publicly available data on local retail alternatives, and geodemographic market characteristics were used to test two hypotheses. Specifically, substitution from offline retailers to online retailers is greater in markets that have a higher PM Index (H 1 ). That is, holding the total number of potential customers constant, we find they are more likely to purchase online when they are in the minority in their local area. (See also Forman, Ghose, and Goldfarb 2008; Sinai and Waldfogel 2004; Waldfogel 2007.) Moreover, online-offline

29 26 substitution for niche products, relative to popular products, is more sensitive to changes in the PM Index (H 2 ). Implications for Retailing Practice and Theory The empirical findings show that, holding the total size of the target population constant, overall demand for the Internet retailer s products is higher in local markets when the target customers are local preference minorities. Specifically, there are more new buyers in preference minority markets, and more repeat orders per buyer. This suggests an interesting trade off. On the one hand, the Internet retailer might want to focus on local markets where the absolute number of potential customers is high. Conversely, our research suggests that it also makes sense to focus on markets where local target customers constitute a smaller relative fraction of the total population, and are therefore likely to be underserved in the local market. Analysis of marginal effects from the model suggests that overall sales in high PM markets can be up to 50% higher than in low PM markets. This is an interesting finding because a naïve examination of potential segment size would suggest both have about equal potential. Internet retailers might also place special emphasis on selling niche products to preference minorities. Marginal effects for the niche product show sales gains of % in high PM markets versus low PM markets. This implies that Internet retailers can potentially gain from using geo-coded old economy data to find and target local markets with high potential. Internet retailers should also decompose total product sales over spatial markets. We showed that the location of products in the Long Tail sales distribution is directly related to how they draw sales from preference minority markets. Specifically, products in the tail draw proportionally

30 27 greater sales from preference minority regions. Most prior research in Internet retailing has centered mainly on absolute benefits available to customers, such as lower prices and free shipping. Recent studies have, however, begun to stress that many benefits are contextual and relative to offline alternatives, and result from an easing of location-specific constraints (see Anderson et al. 2008; Choi, Hui, and Bell 2008; Forman, Ghose, and Goldfarb 2008). In this research, we recognize that the Internet retailer can free customers from preference externalities imposed by ones neighbors. Preference minorities arise from the inter-relationship between their relative size as a constituency, and the need for local retailers to allocate fixed shelf space effectively across the products they sell. Even though Internet retailers are largely ubiquitous, i.e., consumers anywhere can access them, the net benefit to individual consumers still depends largely on where they live, and, who lives next to them. In other words, geography still matters a good deal for borderless retailing. Limitations and Future Research While our analysis covers all zip codes contained within MSAs in the U.S., it nevertheless focuses on a single product category. Additional empirical support for the hypotheses using other product categories could be pursued. The research relies on region-level sales data and does not assess the preference minority status of specific individual consumers. Rather, we characterize the preference minority status of a market segment within its local market. Further work could be done at the level of the individual (e.g., Sinai and Waldfogel 2004). Finally, we develop the preference minority arguments from a cross-sectional (crossmarket) perspective. It may also be possible, given appropriate data, to examine the dynamics of preference minority status over time. This would also allow one to explore more of the dynamic

31 28 nature of substitution between online and offline markets (e.g., Overby and Jap 2008). There are at least two promising avenues for future research. First, Central Place Theory (Christaller 1933), a cornerstone of modern retailing thought that explains distances between cities of different sizes, and how the spatial relationships between these cities affects the emergence of retail stores (see also Shonkwiler and Harris 1996), could be reconfigured to address Internet retailing. According to Central Place Theory, small towns (for example) might have a single gas station and main store, whereas larger towns have more stores and more variety of stores. Our empirical tests suggest that one might be able to develop a complementary theory for the distribution of customers acquired by Internet retailers, in contrast to the distribution of stores (given customers) implied by Central Place Theory. Second, the possibility of the endogenous nature of preference for variety, and Internet shopping in general might be examined. Preference minorities might be driven online for the reasons we suggest (H 1 ), but having got there, expand their category-level preferences, and Internet shopping behavior in general. We intend to pursue these topics in future studies.

32 29 APPENDIX: PROOF OF IMPLICATION 1 Proof: Define online sales, y, of a popular product at a particular location as y = a 1 + b 1 x where x = PM Index location of a local market. Similarly, y = a 2 + b 2 x for the niche product. H 1 and H 2 state that the niche product, by definition, has lower overall sales, but that the niche product s online sales respond more strongly to the PM Index. In Figure 5 this implies that a 1 > a 2 and b 1 < b 2. Without loss of generality, divide the space of all local markets into two groups: one group with a relatively low PM Index (0 to x 1 ) and the other with a relatively high PM Index (x 1 to x 2 ). 13 The aggregate sales of the popular product in the two markets are determined by integrating out the relevant areas under the sales curve y = a 1 + b 1 x as follows. A and B are total sales in low PM and high PM markets, respectively. x2 x1 x A = a + bx dx= a + bx x 1 ( ) (.5 ) { } B = ( a + bx) dx = ( x x ) a +.5 b ( x + x ) Similarly, aggregate sales of the niche product, in the low PM and high PM markets, are defined by C and D respectively. x2 x1 x C = a + b x dx= a + b x x 1 ( ) (.5 ) { } D = ( a + b x) dx = ( x x ) a +.5 b ( x + x ) Using the fact that ab 1 2 ab 2 1> 0 and AD BC a1b 2 a2b1 yields the following relationship a2 b2 B D (1) < < a1 b1 A C. In words, this says that for the niche product the sales ratio of high PM-market sales to low PMmarket sales, i.e., D divided by C, is greater than the same ratio for popular products, i.e., B divided by A. This relationship is illustrated in Figure 5 in the main text and is examined empirically in the Empirical Findings section using sales data from Diapers.com. 13 It is straightforward to show that the general case of n partitions of local markets along the PM Index leads to an identical result, but requires additional integrals of the relevant sales areas. Details are available upon request.

33 30 REFERENCES Anderson, Chris (2006), The Long Tail: Why the Future of Business is Selling Less of More, Hyperion: New York, NY. Anderson, Eric T., Nathan M. Fong, Duncan I. Simester, and Catherine E. Tucker (2008), How Sales Taxes Affect Customer and Firm Behavior: The Role of Search on the Internet, Journal of Marketing Research, forthcoming. Anderson, Evan E. (1979), An Analysis of Retail Display Space: Theory and Methods, Journal of Business, 52 (1), Agresti, Alan (2002), Categorical Data Analysis, Wiley: New York, NY. Balasubramanian, Sridhar (1998), Mail versus Mall: A Strategic Analysis of Competition between Direct Marketers and Conventional Retailers, Marketing Science, 17 (3), , Prabhudev Konana, and Nirup M. Menon (2003), Customer Satisfaction in Virtual Environments: A Study of Online Investing, Management Science. 49, Bell, David R., and Sangyoung Song (2007), Neighborhood Effects and Trial on the Internet: Evidence from Online Retailing, Quantitative Marketing and Economics, 5 (4),

34 31 Bhatnagar, Amit and Brian T. Ratchford (2004), A Model of Retail Format Competition for Non-Durable Goods, International Journal of Research in Marketing, 21, Brynjolfsson, Erik and Michael D. Smith (2000), Frictionless Commerce? A Comparison of Internet and Conventional Retailers, Management Science, 46 (4), , Yu (Jeffrey) Hu, and Mohammad S. Rahman (2008), Battle of the Retail Channels: How Product Selection and Geography Drive Cross-Channel Competition, Working Paper, Sloan School of Management, MIT. Cairncross, Frances, (1997), The Death of Distance, Cambridge, MA: Harvard University Press. Cameron, A. Colin and Pravin K. Trivedi (1986), Econometric Models Based on Count Data Comparisons and Applications of Some Estimators and Tests, Journal of Applied Econometrics, 1, Chen, Yuxin, James D. Hess, Ronald T. Wilcox, and Z. John Zhang (1999), Accounting Profits Versus Marketing Profits: A Relevant Metric for Category Management, Marketing Science, 18 (3), Cheng, June and Barrie R. Nault (2007), Internet Channel Entry: Retail Coverage And Entry Cost Advantage, Information Technology and Management, 8 (2),

35 32 Chiou, Lesley (2005), Empirical Analysis of Retail Competition: Spatial Differentiation at Wal- Mart, Amazon.com, and their Competitors, Working Paper, Occidental College. Choi, Jeonghye, Sam K. Hui, and David R. Bell (2008), Bayesian Spatio-Temporal Analysis of Imitation Behavior Across New Buyers at an Online Grocery Retailer, Journal of Marketing Research, forthcoming., David R. Bell, and Leonard M. Lodish (2008), Search and Word-Of-Mouth: How Local Environments Affect New Buyer Acquisition Online, MSI Working Paper, Marketing Science Institute, Cambridge: MA. Christaller, Walter (1933), Die zentralen Orte in Suddeutschland, Jena: Gustav Fischer. (Translated (in part) by Charlisle W. Baskin (1966), as Central Places in Southern Germany, Englewood Cliffs, NJ: Prentice Hall). Dowding, Keith, Peter John, and Stephen Biggs (1994), Tiebout: A Survey of the Empirical Literature, Urban Studies, 31 (4/5), Farris, Paul, James Olver, and Cornelis de Kluyver (1989), The Relationship Between Distribution and Market Share, Marketing Science, 8 (2),

36 33 Forman, Chris, Anindya Ghose, and Avi Goldfarb (2008), Competition Between Local and Electronic Markets: How the Benefit of Buying Online Depends on Where You Live, Management Science, forthcoming.,, and Batia Wiesenfeld (2008), Examining the Relationship Between Reviews and Sales: The Role of Reviewer Identity Disclosure in Electronic Markets, Information Systems Research, forthcoming. Fornell, Claes (1995), The Quality of Economic Output: Empirical Generalizations About Its Distribution and Relationship to Market Share, Marketing Science, 14 (3), Ghose, Anindya, Michael D. Smith, Rahul Telang (2006), Internet Exchanges for Used Books: An Empirical Analysis of Product Cannibalization and Welfare Implications, Information Systems Research, 17, Goolsbee, Austan (2000), In a World Without Borders: The Impact of Taxes on Internet Commerce, Quarterly Journal of Economics, 125, and Peter J. Klenow (2002), Evidence of Learning and Network Externalities in the Diffusion of Home Computers, Journal of Law and Economics, 45(2), Keeney, Ralph L. (1999), The Value of Internet Commerce to the Customer, Management Science, 45 (4),

37 34 Lal, Rajiv and Miklos Sarvary (1999), When and How Is the Internet Likely to Decrease Price Competition? Marketing Science, 18 (4), Lynch, John G. and Dan Ariely (2000), Wine Online: Search Costs Affect Competition on Price, Quality, and Distribution, Marketing Science, 19 (1), Overby, Eric and Sandy Jap (2008), Electronic and Physical Market Channels: A Multi-Year Investigation in a Market for Products of Uncertain Quality, Working Paper, Georgia Institute of Technology. Rabe-Hesketh, Sophia and Anders Skrondal (2005), Multilevel and Longitudinal Modeling Using Stata. STATA Press: College Station, TX. Reibstein, David J. and Paul W. Farris (1995), Market Share and Distribution: A Generalization, a Speculation, and Some Implications, Marketing Science, 14 (3), Shonkwiler, J. Scott and Thomas R. Harris (1996), Rural Retail Business Thresholds and Interdependencies, Journal of Regional Science, 36 (4), Sinai, Todd and Joel Waldfogel (2004), Geography and the Internet: Is the Internet a Substitute or a Complement for Cities? Journal of Urban Economics, 56, 1-24.

38 35 Tiebout, Charles M. (1956), A Pure Theory of Local Expenditures, Journal of Political Economy, 64 (5), Waldfogel, Joel (2007), The Tyranny of the Market: Why You Can't Always Get What You Want, Harvard University Press: Cambridge, MA.

39 36 Table 1: Summary Statistics (a) Dependent Variables Bin 1 Bin 2 Bin 3 Dependent Variables Mean SD Mean SD Mean SD H 1 : Local Preference Minorities and Online Demand Number of Buyers Number of Repeat Orders by Buyers H 2 : Popular Versus Niche Products Popular product: Number of Pampers Diapers Sold Popular product: Number of Huggies Diapers Sold Popular product: Number of Luvs Diapers Sold Niche product: Number of Seventh Generation Diapers Sold (b) The PM Index over Three Terciles Bin 1 Bin 2 Bin 3 Preference Minority Variables Mean SD Mean SD Mean SD Local Fraction of Households with Babies Number of Total Households Number of Households with Babies PM Index = [1 - Fraction of Households with Babies] Local Retail Sales by Households with Babies ($ 000s) Average Annual Supermarket Sales Average Supermarket Sales by Households with Babies Median Annual Supermarket Sales Median Supermarket Sales by Households with Babies Maximum Annual Supermarket Sales Maximum Supermarket Sales by Households with Babies

40 37 (C) Independent Control Variables Bin 1 Bin 2 Bin 3 Control Variable Mean SD Mean SD Mean SD Access to Retail Services Distance to the Nearest Supermarket Distance to the Nearest Discount Store Distance to the Nearest Warehouse club Distance to the Nearest Store Selling Seventh Generation Distance to the Nearest Supermarket with No Seventh Generation Local Environment Percentage with Bachelors and/or Graduate Degree Percentage of Female Population in Labor Force Percentage of Households Earning $75,000 or More Percentage of Households Below the Poverty Line Percentage of Blacks Percentage of Apartment Buildings with 50 Units or More Percentage of Homes Valued at $250,000 or More Annual Population Growth Rate from 2000 to Population Density (in square miles) Delivery Time One-Day Shipping (1=Yes, 0 = No) Two-Day Shipping (1=Yes, 0 = No) Three-Day Shipping (1=Yes, 0 = No) nd Warehouse Led to One-Day Shipping (1=Yes, 0 = No) nd Warehouse Led to Two-Day Shipping (1=Yes, 0 = No) Note: Each bin (tercile) includes 2,979 residential zip codes. Discount stores and warehouse clubs did not stock Seventh Generation diapers at the time of data collection. All the retail formats selling Seventh Generation diapers are considered in the computations for the distance to nearest store selling Seventh Generation (used to test H 2 ). Similarly, distances to remaining stores that do not carry this brand are re-computed in the test of H 2.

41 38 Table 2: Parameter Estimates Bin 1 Bin 2 Bin 3 Buyers Repeats Buyers Repeats Buyers Repeats Intercept * * * * * * Preference Minority PM z(m) = [1 - Fraction of Households with Babies] * * * * * * Access to Retail Services Distance to the Nearest Supermarket * * * * * * Distance to the Nearest Discount Store * * * * * * Distance to the Nearest Warehouse Club * * * * * * Local Environment Percentage with Bachelors and/or Graduate Degree * * * * * * Percentage of Female Population in Labor Force * * * * * * Percentage of Households Earning $75,000 or More * * * * * * Percentage of Households Below the Poverty Line * * * * * * Percentage of Blacks * * * * * * Percentage of Apartment Buildings with 50 Units or More * * * * * * Percentage of Homes Valued at $250,000 or More * * * * * * Annual Population Growth Rate from 2000 to * * * * * * Population Density (in square miles) * * * * * * Delivery Time One-Day Shipping (1=Yes, 0 = No) * * * * * * Two-Day Shipping (1=Yes, 0 = No) * * * * * * Three-Day Shipping (1=Yes, 0 = No) * * * * * * 2 nd Warehouse Led to One-Day Shipping (1=Yes, 0 = No) * * * * * * 2 nd Warehouse Led to Two-Day Shipping (1=Yes, 0 = No) * * * * * * Variance θ * * * * * * τ * * * * * * -2LL Note: The number of households with babies is used as an offset variable in the model for the number of buyers, and the number of buyers as an offset variable in the model for repeat orders. * Indicates significance at p <.05 (the corresponding standard errors are available upon request). Each bin includes 2,979 residential zip codes. The dependent variable, Buyers, means the number of buyers in each local market, and the dependent variable, Repeats, means the number of repeat orders by buyers in each local market.

42 39 Table 3: Parameter Estimates Bin 1 Bin 2 Bin 3 Popular Product Niche Product Variable Pampers Huggies Luvs Seventh Generation Preference Minority PM z(m) = [1 - Fraction of Households with Babies] * * * * Access to Retail Services Distance to the Nearest Seventh Generation Store * * * * Distance to the Nearest Supermarket * * * * Distance to the Nearest Discount Store * * * * Distance to the Nearest Warehouse Club * * * * Popular Product Niche Product Variable Pampers Huggies Luvs Seventh Generation Preference Minority PM z(m) = [1 - Fraction of Households with Babies] * * * * Access to Retail Services Distance to the Nearest Seventh Generation Store * * * * Distance to the Nearest Supermarket * * * * Distance to the Nearest Discount Store * * * * Distance to the Nearest Warehouse Club * * * * Popular Product Niche Product Variable Pampers Huggies Luvs Seventh Generation Preference Minority PM z(m) = [1 - Fraction of Households with Babies] * * * * Access to Retail Services Distance to the Nearest Seventh Generation Store * * * * Distance to the Nearest Supermarket * * * * Distance to the Nearest Discount Store * * * * Distance to the Nearest Warehouse Club * * * * Note: The number of households with babies is used as an offset variable. * Indicates significance at p <.05 (standard errors for the estimates shown are available upon request). Parameter estimates and standard errors for the control variables are not shown for ease and clarity of exposition, but are available upon request. Each bin includes 2,979 residential zip codes. Discount stores and warehouse clubs do not stock Seventh Generation diapers at the time of data collection. All the retail formats selling Seventh Generation diapers are considered when computing the Distance to the Nearest Seventh Generation Store and the remaining supermarkets without Seventh Generation diapers are used to compute the Distance to the Nearest Supermarket when testing H 2.

43 40 Figure 1: Shelf Space Allocation in Local Markets 1,000 Target Customers in Four Different Local Areas PM Total Brand Brand Brand Brand Market Index Population Market A 4,000 E 40, B 2,000 F 20, C 1,333 G 13, D 1,000 H 10,000 Target Proportion 10,000 Target Customers in Four Different Local Areas Total Brand Brand Population 1 2 Brand Availability % 75% 50% 25% % 75% 50% 25% Brand 3 Brand 4 Note: Prior to Internet retailing, consumers who favor brands which are locally unavailable have to compromise with locally available alternatives. Markets A and E are High PM markets; Markets D and H are Low PM markets. In Market A, for instance, only Brand 1 is stocked and all the local consumers there have to buy Brand 1 regardless of their first choices (see Farris, Olver, and de Kluyver 1989). Now, locally-compromised demand can move online.

44 41 Figure 2: Local Preference Minorities and Online demand H 1 : Substitution from offline retailers to online retailers will be greater in markets that have a higher PM Index. Sales Low PM Index High Note: The x-axis varies across local markets with the same sized focal population, but different total populations. The y-axis is total category demand from the focal group. The PM Index is: [1-(Focal Population)/(Total Population)]. Since the size of the focal population is held constant, the total consumption by the focal group is also constant over markets. The size of the total population increases from left to right, so the focal population becomes a smaller fraction of the total population from left to right, i.e., the focal group s preferences become more minor from left to right. High PM markets have relatively limited local product availability; hence, consumers in these markets should be more likely to buy online. Conversely, in Low PM markets where the focal population is a significant portion of the total population, offline alternatives will be relatively plentiful.

45 42 Figure 3: Popular versus Niche Products H 2 : Online-offline substitution for niche products, relative to popular products, will be more sensitive to changes in the PM Index. Online Sales Low PM Index High Note: The x-axis varies across local markets with the same sized focal population, but different total populations. The PM Index is: [1-(Focal Population)/(Total Population)]. Since the size of the focal population is held constant, the total consumption by the focal group is also constant over markets. The size of the total population increases from left to right, so the focal population becomes a smaller fraction of the total population from left to right, i.e., the focal group s preferences become more minor from left to right. Sales substitution from offline retailers to online retailers intensifies when consumers in the preference minority do not favor popular products. If local retailers stock products catering to the preference minority at all, they are most likely to stock popular products such as leading national brands; local product offerings will be even more unfavorable for consumers in preference minority markets who prefer niche products.

46 43 Figure 4: The Long Tail (a) The Long Tail Sales Distribution Online (b) Product Inventory, Revenues, and Profits for Offline versus Online Retailers

47 44 Figure 5: Online Demand and the Long Tail I 1 : Niche products with a lower overall sales rank (products in the tail of the Long Tail), will draw proportionally more demand from preference minority regions as the PM Index increases. (a) Online Sales of a Popular Product versus a Niche Product Online Sales Popular Product y=a 1 + b 1 x a 1 b 1 b 2 Niche Product y=a 2 + b 2 x a 2 x 1 x 2 PM Index Note: Define online sales, y, of a popular product at a particular location as y = a 1 + b 1 x where x = PM Index location of a local market. Similarly, y = a 2 + b 2 x for the niche product. H 1 and H 2 imply that a 1 > a 2 and b 1 < b 2. Without loss of generality, divide the space of all local markets into two groups: one group with a low PM Index (0 to x 1 ) and the other with a high PM Index (x 1 to x 2 ). (b) Local Sales of Popular versus Niche Products and the Contribution to the Long Tail Online Sales Online Sales Low PM Index High Popular Product Niche Product Product Rank Note: The aggregate sales of the popular product are A and B in low PM and high PM markets, respectively. Similarly, aggregate sales of the niche product are defined by C and D in the low PM and high PM markets, respectively. For the niche product, the sales ratio of high PM market sales to low PM market sales, i.e., D divided by C, is greater than the same ratio for the popular product, i.e., B divided by A. Thus, niche products with low overall sales ranks, i.e., products in the tail of the Long Tail, necessarily draw proportionally more demand from preference minority regions (see equation 1).

48 45 Figure 6: Local Market Contribution to the Long Tail (a) Long Tail Sales Distribution (b) Sales Fraction by Local Market Note: The top panel is the Long Tail plot (based on the data from Bin 1) which shows the product rank on the x-axis and corresponding sales on the y-axis. In the bottom panel are the relative contributions of the three markets to the aggregate sales of each brand. In accordance with I 1, all national brands draw proportionally more sales from regions with higher PM Indices, but the niche product (Seventh Generation) draws proportionally greater demand from regions where the Preference Minority Index is high. The relative proportion of sales from the 90 th percentile market to the 10 th percentile market is 58:18, or about three to one. (This finding repeats for Bins 2 and 3; available upon request.)