Informational Cascades vs. Network Externalities: An Empirical Investigation of Herding on Software Downloading

Size: px
Start display at page:

Download "Informational Cascades vs. Network Externalities: An Empirical Investigation of Herding on Software Downloading"

Transcription

1 Informational Cascades vs. Network Externalities: An Empirical Investigation of Herding on Software Downloading 1. INTRODUCTION Wenjing Duan, Bin Gu, Andrew B. Whinston McCombs School of Business, The University of Texas at Austin Extended Abstract for Consideration in WISE 2005 Herd behavior, i.e. everyone is doing what everyone else is doing (Banerjee 1992), portrays various social and economic situations where individuals are markedly influenced by the decisions of others, such as in financial investment, technology adoption, firms strategic decisions, political voting, and dining and fashion trends. When there are two restaurants next to each other, customers often pick the one with more seats occupied. Despite mediocre reviews, a New York Times bestseller can be sold well enough to continue as a bestseller (Bikhchandani et al. 1998). Herd behavior is particularly prominent in the IT industry. IT managers are known to follow each other in making IT investment decisions (auffman and Li 2003), and computer users often adopt popular software thus making them even more popular (Bryjolfsson and emerer 1996; Gandal 1994). There are putatively two primary explanations for the herd behavior in the IT industry. The first and the most well-known explanation is network externalities, i.e. the utility a user derives from consumption of a software increases with the number of other users adopting the software (atz and Shapiro 1985). 1 A classic example in this case is the spreadsheet software (Bryjolfsson and emerer 1996; Gandal 1994). The second explanation that is less known in the IT industry is informational cascades (Banerjee 1992; Bikhchandani et al. 1992; Li 2004). Informational cascades theory assumes that sequential decision-makers have imperfect knowledge of the true value and thus infer the utility of the product from actions of their predecessors. While the intrinsic quality of the product does not change, the inferred utility varies based on the actions of earlier adopters. At certain point, everyone starts to imitate their predecessors without regard to their own information (Bikhchandani et al. 1992). Differentiating the two causes of herd behavior has both theoretical and practical values. Under network externalities, the adoption decisions are driven by the expectation of deriving higher utility by choosing the popular product, which is generally socially efficient. However, under informational cascades, the adoption decision is affected by inferred utility rather than the actual utility of the product. It is in fact well-known that herding caused by informational cascades is socially inefficient and leads to suboptimal social allocation (Bikhchandani et al. 1992). Differentiating the two causes is the first step for us to understand how herd behavior affects social welfare in the IT industry. In addition, separating informational cascades from network externalities improves our understanding of the impact of network externalities in the IT industry. Prior studies have shown significant presence of network externalities in IT products. However, these studies mostly attribute adopters convergent behavior to network effects without considering the informational cascades, thus might overstate the significance of network effects. Our analysis attempts to reveal the true level of network externalities in the IT industry. Practitioners can also benefit from this study in understanding the online consumer herd behavior, the impact of various types of product information, the influence of virtual communities, and subsequently, in constructing more efficient business strategies. Understanding the causes of herd behavior, especially the impact of information cascades, is particularly important in the Internet age. While herding has often been observed in local environment (Bikhchandani et al. 1992), the pervasive use of the Internet and other information technologies may significantly change its influence. Given the numerous new products and services available on the Internet, consumers are often faced with intricate purchasing decisions without accurate information about product quality (Brynjolfsson and Smith 2000). Informational cascades can be common in such situations which require consumers to infer product quality from other consumers choices and incorporate that information into their own decision-making process. In a larger context, our research also offers a better understanding of how online information affects consumer decision making and social welfare. The Internet is often attributed to empowering consumers and increasing consumer welfare. By enabling consumers to observe other consumers choice, the Internet may well increase informational cascades in consumer decision making and decrease consumer welfare. 1 The literature also refers to network externalities as network effects. We use both terms interchangeably throughout the paper. 1

2 We based our analysis on data collected from CNET download.com. We seek to identify the causes of consumers herd behavior and its impact on consumer software product choices. We develop a market share model assimilated with a nested logit structure to capture the dynamics of software market share over time. Our model also incorporates a mechanism to separate the network effect from the predictions of informational cascades theory. After fitting the model with our panel data set, we find that the impact of informational cascades is significant and present in all types of software product markets. Specifically, our results suggest that consumers are in favor of information inferred from others behavior, but choose to ignore other sources of information such as professional product reviews or user reviews. In addition, our results show significant network effects for software products that requires interactions among users (e.g. instant messaging software). However, network effects are not present for other types of software products after we take into consideration of the impact of informational cascades. 2. DATA Data for this study were collected from CNET Download.com (CNETD: CNETD is a library of over 30,000 free or free-to-try software programs for Windows, Macintosh, and Handheld devices. Within a specific category, the software listing can be sorted by the number of downloads, software name, CNET rating, user rating, and the date added. A report by CNETD show that number of total downloads (download counts) is the most popular used sort option (37%), suggesting that users may place significant weight on previous customers choices. CNET editorial staff provides reviews and ratings for some of the software programs. In addition, CNETD offers a user feedback system for customers to share their opinions and experiences. The user review system requests customers comments as well as an overall assessment indicated by thumbs-up or thumbs-down. 2 CNETD provides a rank of the most popular titles in Windows each week, which includes the top 50 mostdownloaded programs. We collected the most popular list each week since November The number of software programs listed in each category varied considerably from approximately 60 to 300. Such a variation reflects the idiosyncratic environment in a specific software category which can be defined as a single market. We started collecting data of ten categories from November 2004 on a daily basis and the data collection is ongoing. Everyday we extracted the following information for every software program listed in each category, software name, description, date added, total download, last week download, CNET rating, number of user votes, thumbs-up, thumbs-down, 3 and whether the software program has been labeled pop (software is designated as pop since its debut on the most popular list) and new (software is defined as new for the first 15 days). We also collected software characteristics including operating system requirements, file size, publisher, license, and price if its license is freeto-try. We construct a measurement of daily market share for each individual software program, which reflects customers choices in a particular category. Let i = 1...I index the software in a specific category. DAILYDOWNLOAD it is defined as the number of downloads of software i at day t. Hence, the daily download market share of software i at day t is DAILYDOWNLOAD (1.) it Sit = I DAILYDOWNLOAD i= 1 it 3. EMPIRICAL METHODOLOGY AND RESULTS 3.1 Empirical Model Our major objective is to uncover the impact of herding on customers choices of software program. We do not observe each individual users download decision, but customers software choices can be collectively measured by daily download market share (S it ) in each individual market (i.e. in one software category). We employ the Multinomial logit (MNL) market-share models as our basis of empirical analysis to explain the market shares (or choice) of different products. Since consumer choices underlie the process of market-share formation, the discussion of market-share models overlaps substantially with consumer choice models. In addition, market-share models deal 2 At the end of January 2005, CNETD removed the percentage and the thumbs-up/thumbs-down system in lieu of the more comprehensive, five-star user-rating system. The analysis shown in this paper used data from the old system. We compare the data from the old and new system and find no significant difference in terms of the number and the distribution of reviews. 3 After January 28th 2005, we collected the five-star based representation of the user ratings. 2

3 with market response over time as well as over competitors (Eliashberg and Lilien 1993). MNL models are consistent with the individual choice models (random utility models) pioneered by McFadden (1978) if the joint distribution for random utilities follows a multivariate extreme-value distribution. Another advantage MNL model holds is that it allows more realistic market share elasticities (Eliashberg and Lilien 1993). The general specification of the MNL model is of the form Ait S = (2.) it I i = A 1 it where S it is the market share of the i-th product in a market of I products in time period t, and A it is the attractiveness of the product. MNL models specify attractiveness as Ait = exp μi + bk Xikt + ε (3.) it k = 1 where μ i is a parameter for the constant influence of product i, X ikt is the value of k-th exploratory variable which may influence consumers product choices, and ε it is the error term. Nakanshi and Cooper (1982) show that MNL models can be estimated by OLS regression using log( S ) = α + b X + ε it t k ikt it k = 1 where α t is an intercept specific to the t-th time period. This basic form MNL model suffers from the independence of irrelevant alternative (IIA) property, which treats every software product the same. This feature may produce unreasonable substitution patterns. In our software downloading scenario, preliminary analysis shows that consumers favor popular software programs. In addition, consumers tend to associate quality with more recent releases. As a result, customers are more likely to substitute the popular software programs with each other, as is the case for new released software programs. We thus incorporated a two-level nested choice structure into our MNL model, with popular at the higher level and new at the lower. The final linear model becomes 4 log( S ) = μ + α + α POPD + α NEWD + α log( S ) + α log( S ) + b X + ε (5.) it i t 0 it 1 it 3 i n p t 4 n p t k ikt it k = 1 We construct a set of panel data to estimate (5.). Our panel data consists of daily observations of software downloading. log(s i n p ) t denotes the conditional market share of selecting software i from new (or not new) releases that is popular (or unpopular) at day t, and similarly, log(s n p ) t is the conditional market share of selecting new (or not new) releases that is popular (or unpopular) at day t. POPD it and NEWD it are dummy variables which indicate if software i is labeled as popular or new at day t. μ i is the fixed effect capturing the idiosyncratic and time-constant unobserved characteristics associated with each piece of software. In X ikt we include time-varying exploratory variables AGE it (number of days software i has been posted) and AGESQ it (AGE-SQUARED) to control for the growth rate in software life cycle. We also add NUMSOFTWARE it that is the total number of software programs listed in this category at day t to control for the competition effect. In order to separate the herding effect caused by informational cascades from the network effects, we include variables LASTWEEDOWLOAD it that designates the most recent week s download record, and variable TOTALDOWNLOAD i,t-1 that is the cumulative total number of downloads for software i until day t-1. The justification lies in that informational cascades can be established following the observations of a series of immediate ancestors, while network externalities can only be built on the total number of adopters. Due to page limitation, our analyses here are demonstrated for only two software categories: Mp3 search tools (MP3) and Internet Chat (Chat). The results from the remaining eight categories are qualitatively the same and are available from the authors upon request. We choose Internet Chat (Chat) and Mp3 Search Tools (Mp3) because they represent two ends of the spectrum of software products in terms of user interaction. While Chat provides Internet chatting and messaging that requires interaction among users and is expected to have significant network effects, using the MP3 software is not anticipated to depend on number of users. We also include two interaction terms TOTAL_AGE and LASTWEE_AGE to test the magnitude of the herding effect over time. Lastly, variable USERRATING i,t-1 is included to measure the impact of average user ratings on consumers software choices. 5 (4.) 4 The derivation of the model is shown in the complete paper. 5 Since CNET ratings do not change overtime, its impact will be captured by the fixed effect. We regressed the fixed effect coefficients on CNET rating and other time-constant variables and do not find any significant impact. 3

4 Table 1. Descriptive Statistics Mp3 Search Tools Variable N Mean S. D. Min. Max. DAILYSHARE 2, TOTALDOWNLOAD (M) 2, LASTWEEDOWNLOAD (M) 2, DAYS 2, , CNETRATING USERRATING 2, POPD 2, NEWD 2, NUMSOFTWARE 2, Table 2. Descriptive Statistics Internet Chat Variable N Mean S. D. Min. Max. DAILYSHARE 5, TOTALDOWNLOAD (M) 5, LASTWEEDOWNLOAD (M) 5, DAYS 5, , CNETRATING USERRATING 5, POPD 5, NEWD 5, NUMSOFTWARE 5, Table 3. Estimation Results Mp3 Search Tools Internet Chat Variable Coefficient (Std. Err.) Coefficient (Std. Err.) log(s i n p ) t 0.29*** (0.006) 0.62*** (0.002) log(s n p ) t 0.44*** (0.06) 0.69*** (0.10) TOTALDOWNLOAD i,t (0.12) 0.15*** (0.06) LASTWEEDOWLOAD it 1.09*** (0.25) 1.10* (0.58) AGE it ** (0.002) *** (0.0007) AGESQ it 6.60e-06*** (9.03e-07) -9.07e-08 (2.98e-07) TOTAL_AGE ( ) *** ( ) LASTWEE_AGE *** (0.001) (0.004) USERRATING i,t (0.02) (0.02) POPD it 0.51*** (0.04) 0.93*** (0.03) NEWD it 0.30*** (0.08) 0.17 (0.17) NUMSOFTWARE it 0.05 (0.04) (0.002) n = 2198 R 2 = 0.34 n = 5061 R 2 = 0.40 *** p<.01 ** p<.05 ** p<.10 Note: Time dummies (for each day) and software dummies (fixed effect for each software) used in estimating the model are not reported 3.2 Results The estimation results are shown in Table 3. As expected, the coefficients of log(s i n p ) t and log(s n p ) t are positive and significant at 1% lever for both categories, suggesting that consumers view popular and (or) new products as closer variants. The coefficient of LASTWEEDOWLOAD it is positive and significant for Mp3 at 1% level, while significant at 10% level for Chat. In addition, the two coefficients have about the same value. The coefficient of TOTALDOWNLOAD i,t-1 is positive and significant only for Chat at the 1% level. Our interpretation for these findings is that the impact of informational cascades, which is captured by LASTWEEDOWLOAD it is, in general, a significant influencer in consumers choice across different types of software products. Its impacts are also largely consistent over different product categories. On the other hand, the network effects are only significant for software programs that require interactions among users. For other types of software products, network effects are insignificant after taking informational cascades into consideration. Consistent with this result, we also find that the interaction terms of LASTWEE_AGE and TOTAL_AGE are negative and significant for MP3 and Chat respectively, indicating that the herding effect dissipates over time. In terms of the age variables, we find that coefficient of AGE it is negative and significant for both categories, indicating that consumers may rate new software programs more favorably. The parameter for AGESQ it is positive and significant only for Mp3, suggesting its potential nonlinear 4

5 growth rate. In Table 1 and 2, it seems that, on average, software programs in Mp3 have younger ages than those in Chat, which may explain the possible curvature of Mp3 software programs during their earlier stages in the life cycle. It is worth noting that user ratings do not display any significant impact on market shares in our estimation results. The coefficient of POPD it is positive and significant for both categories, further conforming that consumers tend to favor popular products. 4. CONCLUDING REMARS The research presented here is to analyze the herd behavior on the Internet and to separate the effects of two mechanisms that cause herding, informational cascades and network effects. We show that the informational cascades effect is a significant and consistent driver for consumers software choices, while network effect is a significant driver only for certain types of software that requires user interaction. We are in the process of extending our panel data analysis to other online shopping environments to develop a deeper understanding of the information effect of online consumer behavior. A bigger issue in studying the Internet is that various research has shown that the Internet empowers consumers by providing them with more information. If consumers are rational, it is often expected that such provision of information will benefit consumers which leads to higher social benefits. However, we find that the provision of sales ranking information may reduce consumers incentive to collect and analyze information. Rather, they may choose to follow each other blindly. As such, the Internet may reduce use of information which leads to lower social benefits. This study is the first step towards a better understanding of the information role of the Internet. More research is expected to shed additional light on the investigation of herd behavior under the influence of information technologies. REFERENCES Banerjee, A. V. A Simple Model of Herd Behavior, Quarterly Journal of Economics (107:3), 1992, pp Brynjolfsson, E., and emerer. Network Externalities in Microcomputer Software: An Econometric Analysis of the Spreadsheet Market, Management Science (42:12) 1996, pp Brynjolfsson, E., and Smith, M.D. Frictionless Commerce? A Comparison of Internet and Conventional Retailers, Management Science (46:4) 2000, pp Bikhchandani, S., Hirshleifer, D., and Welch, I. A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades, Journal of Political Economy (100:5), 1992, pp Bikhchandani, S., Hirshleifer, D., and Welch, I. Learning from the Behavior of Others: Conformity, Fads, and Informational Cascasdes, Journal of Economic Perspectives (12:3), 1998, pp Bikhchandani, S. and Sharma, S. Herd Behavior in Financial Markets, IMF Staff Papers (47:3), 2001, pp Economides, N. The Economics of Networks, International Journal of Industrial Organization (16:4), 1996, pp Eliashberg, J. and Lilien G. L., Handbooks in Operations Research and Marketing Science (vol. 5), Amsterdam: North-Holland, atz, M. L. and Shapiro, C. Network Externalities, Competition, and Compatibility, American Economic Review (75:3), 1985, pp auffman, J. R. and Li, X. Payoff Externalities, Informational Cascades and Managerial Incentives: A Theoretical Framework for IT Adoption Herding, Proceedings of the 2003 INFORMS Conference on IS and Technology, Atlanta, GA, October Gandal, N. Hedonic Price Indexes for Spreadsheets and an Empirical Test of the Network Externalities Hypothesis, RAND Journal of Economics (25:1), 1994, pp Li, X. Informational Cascades in IT Adoption, Communications of the ACM (47:4), 2004, pp Nakanishi, M. and Cooper, L. G. Simplified Estimation Procedures for MCI models, Marketing Science (1:3), 1982, pp Scharfstein, D. S. and Stein, J. C. Herd Behavior and Investment, American Economic Review (80:3), 1990, pp