A Longer Tail?: Estimating The Shape of Amazon s Sales Distribution Curve in Erik Brynjolfsson, Yu (Jeffrey) Hu, Michael D.

Size: px
Start display at page:

Download "A Longer Tail?: Estimating The Shape of Amazon s Sales Distribution Curve in Erik Brynjolfsson, Yu (Jeffrey) Hu, Michael D."

Transcription

1 A Longer Tal?: Estmatng The Shape of Amazon s Sales Dstrbuton Curve n Introducton Erk Brynjolfsson, Yu (Jeffrey) Hu, Mchael D. Smth The term The Long Tal was coned by Wred s Chrs Anderson (Anderson 2004) to descrbe a phenomenon where nche products account for a much larger proporton of sales n Internet markets than they do n brck-and-mortar markets. Ths phenomenon has captured much attenton and debate n the popular press (e.g., Gomez 2006, Orlowsk 2008) and n the nformaton systems, marketng, and operatons management lteratures. In an earler study of the Internet s Long Tal phenomenon, Brynjolfsson, Hu, and Smth (2003) found that a log-lnear relatonshp (a power law) can be used to descrbe the relatonshp between Amazon sales and Amazon sales rank. Assumng such a log-lnear relatonshp holds for all the books sold by Amazon, we estmated that nche books that are not typcally stocked n brck-and-mortar bookstores accounted for 39% of Amazon s total sales n 2000, and that the sales of these nche ttles enhanced consumer surplus by $731 mllon to $1.03 bllon n Followng Brynjolfsson et al. (2006), we hypothesze that there are several factors that mght ncrease the proporton of nche sales over tme. Frst, exposure to nche products could drve consumers to develop a taste for more nche products. Second, producers could have an ncreased ncentve to create more new nche products over tme. Fnally, search tools, product revews, product popularty nformaton, and recommendaton engnes could ncrease the sales of nche products dsproportonately. On the other hand, some have argued that the Long Tal may be a short-lved phenomenon. For nstance, over tme, consumers who buy from Amazon could become less domnated by early adopters of e-commerce who may have a strong taste for nche products. A larger proporton of manstream consumers could lead to proportonately more sales of popular products, reducng the sze of Amazon s Long Tal. In addton, Amazon s search and recommendaton tools could be tuned (ntentonally or unntentonally) to promote popular products (Hosanagar 2008). Fnally, producers of popular products could employ onlne marketng strateges to promote ther products and counteract the effect of Amazon s search and recommendaton tools promotng nche products. To analyze whether the Long Tal phenomenon represents a temporary of permanent shft, we collected Amazon sales and sales rank data n 2008 on a larger and broader sample of books than was avalable n our 2000 data. We then match ths sample to our 2000 sample to compare changes n the profle of sales over tme. Our analyses suggest that the long tal of Internet book sales has gotten longer from 2000 to We also develop a new methodology to ft the relatonshp between sales and sales rank and apply t to our 2008 data. Our analyses suggest that nche books account of 36.7% of Amazon s sales n 2008 and the consumer surplus generated by nche books has ncreased at least fve fold from 2000 to Lterature Economc explanatons for the exstence of superstars and popular products can be traced to Rosen (1981) and Frank and Cook (1995). Brynjolfsson et al. (2006) pont out several demand-sde and supply-sde factors that could drve sales to nche products on the Internet. These demand-sde and supply-sde factors can even renforce each other. For nstance, Cachon et al. (2006) show that low consumer search costs can enhance a retaler s ncentve to provde a large product selecton. Ths could lead to even more sales of nche products. There s also a growng body of lterature that emprcally examnes sales dstrbutons n varous product markets. Brynjolfsson et al. (2007) fnd the sales dstrbuton of an Internet channel s less concentrated than that of a catalog channel, usng data from a clothng retaler. Elberse and Oberholzer-Gee (2008) fnd evdence that Internet retalng has shfted 1

2 demand toward nche vdeo products over tme, although they also fnd that a substantal proporton of nche products have almost zero sales. Chellappa et al. (2007) have smlar fndngs for musc sales. However, none of these papers addresses whether the Long Tal phenomenon s a temporary or permanent shft, or how t mght change over tme. To answer these questons, one needs to compare the sales dstrbuton of a smlar profle of products over a suffcently long perod of tme. In ths paper, we address ths queston by collectng data on Amazon sales and sales rank n We then use the sample matchng statstcal technque to construct a 2008 sample that s comparable to the 2000 sample used n Brynjolfsson et al. (2003). 3. Data The data for ths paper come from a major publsher wth annual sales of more than $1 bllon. The publsher provded us wth ther Amazon sales and sales rank data on a sample of 1,598 ttles over 10 weeks from June to August Overall, we have 15,980 observatons of Amazon sales and sales ranks. Table 1 compares the summary statstcs for our 2000 and 2008 samples. It s clear that our 2008 sample has more observatons (15,980 vs. 901) and covers a much broader spectrum of books (sales ranks of 71 to 5,350,140 versus sales ranks of 238 to 961,367 mllon) than our 2000 data does. 4.1 Sample Matchng Table 1: Summary Statstcs for Our 2008 Sample and Our 2000 Sample Varable Obs. Mean S.D. Mn Max 2008 Sample Weekly Sales 15, Weekly Sales Rank 15, , , ,350, Sample Weekly Sales Weekly Sales Rank ,054 61, ,367 We use a statstc technque called sample matchng (Rassler 2002) to construct a sub-sample from our 2008 sample that matches our 2000 sample on the bass of weekly sales rank. Summary statstcs for the 2008 matched sample (reported n Table 2) are then very comparable to those for our 2000 sample (shown at the bottom of Table 1). Table 2: Summary Statstcs for 2008 Matched Sample Varable Obs. Mean S.D. Mn Max Weekly Sales Weekly Sales Rank , Re-estmatng Amazon s Long Tal We then repeat the estmaton of the log-lnear relatonshp between Amazon sales and sales rank, usng the 2008 matched sample. The lnear regresson model we use s: y = β 0 + β 1 x, (1) where y s the natural log of Weekly Sales, and x s the natural log of Weekly Sales Rank. The estmaton results usng the 2008 matched sample are reported n Column (1) of Table 3, wth the analogous results from our 2000 data n Column (2). Note that 41 observatons are dropped after takng the natural log of Weekly Sales n the 2008 matched sample, and 40 observatons are dropped n the 2000 sample. The 2

3 coeffcent on Log(Weekly Sales Rank) s when the 2008 matched sample s used, sgnfcantly smaller n sze than the same coeffcent when the 2000 sample s used (-0.871). Table 3: Results of The Log-lnear Regresson 2008 Matched Sample (1) 2000 Sample (2) Constant (0.432) (0.156) Log(Weekly Sales Rank) (0.042) (0.017) Obs R Fgure 1: Amazon s Long Tal n 2008 vs. n The above results provde emprcal evdence that Amazon s Long Tal has become longer and fatter n 2008 than n As sales ranks ncrease, book sales declne. Such a declne s at a slower pace n 2008 than n 2000, as shown by the relatve smaller sze n the coeffcent on Log(Weekly Sales Rank) n Fgure 1 shows the estmated log-lnear relatonshp between Amazon sales and sales rank, wth the 2008 results n blue and 2000 results n red. We plot these two curves on both a normal scale and a 3

4 logarthmc scale. These two curves cross when sales rank s 14,949. Ths means popular books (wth sales rank below 14,949) tend to sell fewer copes n 2008 than n 2000, whle nche ttles (wth sales rank below 14,949) tend to generate more sales n 2008 than n Developng A More Accurate Method of Estmatng Amazon s Long Tal The log-lnear regresson method assumes that the coeffcent on Log(Weekly Sales Rank) does not vary as a book s sales rank ncreases. It s possble that ths assumpton may not hold. In ths paper, we ft the relatonshp between Log(Weekly Sales) and Log(Weekly Sales Rank) to a seres of splnes, rather than just a sngle lne. Such a splne fttng technque allows the slope coeffcent to vary as a book s sales rank ncreases, leadng to a more accurate estmate of the sze of Amazon s Long Tal. Our 2000 sample does not contan any observaton wth Weekly Sales Rank above 1mllon. In our 2008 sample, we have 569 observatons wth Weekly Sales Rank above 1 mllon, allowng us to more accurately estmate the shape of Amazon s Long Tal for books wth sales ranks above 1 mllon. Fnally, books wth Weekly Sales Rank above 1 mllon frequently have zero Weekly Sales as well. Our orgnal method took the natural log of Weekly Sales, droppng observatons wth 0 sales. To utlze these observatons, we now use a negatve bnomal regresson model, rather than a lnear regresson: f ( y x ) = e µ µ y, y y! = 0,1,2,3,... (2) where y s Log(Weekly Sales), x s Log(Weekly Sales Rank), E( y x ) = µ s the condtonal mean, and ε s unobserved heterogenety followng a log-gamma dstrbuton wth ε ~ Γ(θ,θ) (Cameron and Trved 1998). We model the natural log of condtonal mean as a seres of splnes of x that are broken down at the 25th, 50th, and 75th percentle of x : ln(µ ) = β 0 + β 1 x + β 2 (x p 2 > p 2 ) + β 3 (x p 3 > p 3 ) + β 4 (x p 4 > p 4 ), (3) where p 2 s the 25th percentle of x (11.78), p 3 s the 50th percentle of x (12.46), and p 4 s the 75th percentle of x (13.02). The results usng the new methodology (negatve bnomal regresson and splnes) are reported n Column (1) of Table 4. To show the dfference between the new methodology and the old methodology (lnear regresson and no splnes), we estmate the model n equaton (1) and report the results n Column (2) of Table 4. Fgure 2 shows the estmated curves, wth the curve usng the new methodology n red and the curve usng the old methodology n green. Table 4: Results Usng New Methodology vs. Old Methodology 2008 Sample and New Methodology (1) 2008 Sample and Old Methodology (2) Constant (0.210) (0.092) x (0.019) (0.008) (x ! p 2 > p 2 ) (0.077) (x ! p 3 > p 3 ) (0.149) (x ! p 4 > p 4 ) (0.186) Obs. 15,980 7,668 4

5 Table 4 shows that the coeffcent on Log(Weekly Sales Rank) s f we use the old methodology. When the new methodology s used, the coeffcent on the frst splne becomes , whle the coeffcents on the other splnes are negatve (the coeffcents on the second and fourth splnes are statstcally sgnfcant). These results ndcate that the slope coeffcent becomes more negatve as a book s sales rank ncreases. In other words, book sales decrease at a pace that s faster than a regular power law, as the book s sales rank ncreases Fgure 2: Amazon s Long Tal n 2008, Usng New and Old Methodologes , new 2008, old , new 2008, old Re-estmatng The Sze of Amazon s Long Tal n 2008 Our new methodology allows us to ft the relatonshp between Log(Weekly Sales Rank) and Log(Weekly Sales Rank) more accurately. To obtan an accurate estmate of the total sales and the sales generated by books ranked above 100,000, we smply ntegrate under the curve as shown n Column (1) of Table 4 and fnd that books ranked above 100,000 account for 36.7% of Amazon s total sales n The estmates n Column (2) of Table 4 usng the old methodology would have estmated that books ranked above 100,000 account for 82.57% of Amazon s total sales n

6 The methodology used by Brynjolfsson et al. (2003) a lnear regresson wthout splnes could have caused an overestmate of the sze of Amazon s Long Tal. Ths s manly because the assumpton that the coeffcent on Log(Weekly Sales Rank) does not vary as a book s sales rank ncreases may not hold. We then use ths and our other calculatons to estmate the consumer surplus gan from long tal books n Followng our pror work, we take 100,000 as the cutoff pont for nche books, and recalculate the consumer surplus generated from sellng these nche books on the Internet. Several changes have happened n the eght years from 2000 to Frst, the number of books n prnt has ncrease from 2.3 mllon n 2000 to 3-5 mllon n Second, book ndustry revenue has clmbed from $24.6 bllon to $37.3 bllon. Thrd, the share of book purchases through the Internet channel has rsen from 6% n 2000 to 21-30% n Combnng these changes wth the new estmates of the percentage of sales n the Long Tal, we estmate that sellng nche books that are unavalable n brck-and-mortar stores leads to a consumer surplus of $3.93 bllon to $5.04 bllon n the year These estmates are fve tmes of the estmates n Brynjolfsson et al. (2003), even though the estmates n Brynjolfsson et al. (2003) are lkely to have been overestmates. References Anderson, C The Long Tal. Wred Magazne 12(10) Brynjolfsson, E., Y. J. Hu, M. D. Smth Consumer surplus n the dgtal economy: Estmatng the value of ncreased product varety at onlne booksellers. Management Scence 49(11) Brynjolfsson, E., Y. J. Hu, M. D. Smth From nches to rches: The anatomy of the long tal. Sloan Management Revew 47(4) Brynjolfsson, E., Y. J. Hu, D. Smester Goodbye Pareto prncple, hello long tal: The effect of search costs on the concentraton of product sales. Workng Paper. Cachon, G. P., C. Terwesch, Y. Xu On the effects of consumer search and frm entry n a multproduct compettve market. Marketng Scence 27(3) Chellappa, R., B. Konsynsk, V. Sambamurthy, and S. Shvendu An Emprcal Study of the Myths and Facts of Dgtzaton n the Musc Industry. Workshop on Informaton Systems and Economcs, Montreal, Canada. Elberse, A., F. Oberholzer-Gee Superstars and underdogs: An examnaton of the long tal phenomenon n vdeo sales. Workng Paper. Fleder, D., K. Hosanagar Blockbuster culture s next rse and fall: The mpact of recommender systems on sales dversty. Management Scence (Forthcomng). Gomes, L It may be a long tme before the long tal s waggng the web. Wall Street Journal July 26th. Orlowsk, A Choppng the long tal down to sze. The Regster Nov 7. Rassler, S Statstcal Matchng: A frequentst Theory, Practcal Applcatons and Alternatve Bayesan Approaches. Sprnger, New York Rosen, Sherwn The economcs of superstars. Amercan Economc Revew 71(5) Tucker, Catherne, Juanjuan Zhang How does popularty nformaton affect choces? A feld experment. MIT Sloan Workng Paper. 6