Usage-Based Pricing of the Internet

Usage-Based Pricing of the Internet Aviv Nevo y Northwestern University Jonathan Williams x University of Georgia Preliminary and Incomplete November 212 John Turner z University of Georgia Abstract We estimate demand for residential broadband to study the e ciency properties of usage-based billing. Using detailed-high frequency internet protocol data records, we exploit variation in the intertemporal tradeo s faced by subscribers to estimate the distribution of subscribers preferences for di erent characteristics of service; access and overage fees, usage allowances, and connection speeds. We nd signi cant heterogeneity in tastes along each dimension of service. Using these estimates, we examine the e ciency of various 3-part tarri pricing schedules. We nd that usage-based pricing models currently being employed in North America are successful at eliminating large volumes of low-value tra c while having a minimal impact on subscriber welfare. These ndings provide strong support for the FCC s backing of the industry s move away from at-rate pricing. Keywords: Demand, Broadband, Dynamics, Usage-based Pricing, Welfare. JEL Codes: L13. We thank those North American Internet Service Providers that provided the data used in this paper. We thank Terry Shaw, Jacob Malone, Scott Atkinson, and seminar participants at Ga. Tech and UGA for insightful comments that signi cantly improved this paper. Jim Metcalf provided expert computational and storage support for this project. All remaining errors are ours. y Department of Economics, Northwestern University, nevo@northwestern.edu, ph: (847) 491-71. z Department of Economics, University of Georgia, jlturner@uga.edu, ph: (76) 542-3376. x Corresponding Author: Department of Economics, University of Georgia, jonwms@uga.edu, ph: (76) 542-3689. 1

1 Introduction In the U.S., last mile" connectivity to the internet is privately provided by telecomm (e.g., AT&T) and/or cable (e.g., Comcast) companies. This leaves the problem of allocating scarce network resources (i.e., bandwidth) at the discretion of the internet service provider or ISP. This ability of ISPs to price the delivery of content from the internet to subscribers (and vice versa) has important implications for the future development of online content, communications, and more generally, the way in which people use the internet. Therefore, this has lead to signi cant discussion on the way the Internet should be regulated. During the past decade in the U.S., ISPs have typically sold unlimited access to the internet for a xed monthly fee. During this time, the average residential subscriber s usage has grown 5% annually. This dramatic growth in usage has led to a shift in the industry towards usage-based pricing plans similar to those commonly associated with cellular phones. Typically, these plans take the form of a three-part tari : a xed access price, a usage allowance, and a marginal price for usage in excess of the allowance. Just this year, two of the largest cable providers, Comcast and Time Warner Cable, conducted trials of usage-based pricing in select markets. ISPs argue that usage-based pricing is necessary to curtail the usage of the small number of subscribers that dramatically drive up network costs and degrade the quality of service for other subscribers. This views usage-based pricing as a type of Pigouvian tax that helps equate a subscriber s private bene t to costs realized by the ISP (i.e., network investment) and other subscribers (i.e., degraded service). ISPs also argue that usage-based pricing gives the right incentives for content developers to minimize the bandwidth requirements of their applications. For example, Youtube recently added an option for users to degrade the quality of video streams. This allows the subscriber to degrade quality to a level acceptable to them, while avoiding overage charges and minimizing any costs to the ISP or other users of the tra c they generate. These types of arguments led the Federal Communications Commission (FCC) to recently back the practice, "Usage-based pricing would help drive e ciency in the networks," Julian Genachowski, FCC Chairman (Chicago Tribune, May 22, 212). 2

The recent shift in the industry towards usage-based pricing models, along with the support of government regulators, has given rise to numerous organizations devoted to preventing it. These include web sites (e.g. www.stopthecap.com and openmedia.ca/meter) that monitor ISPs activities for indications (e.g., reporting usage on subscribers bills) that usage-based pricing will be introduced. The web sites then ask their followers to bombard the ISP with complaints, often providing direct contact information for the companies executives, in the hope of preventing or delaying it. More formal organizations lobby regulators and legislators directly. Geekdom is one such organization. "It s like locking the doors to the library," Nicholas Longo, Geekdom Director (NY Times, June 26, 212). Generally, these organizations believe that the activities of high volume subscribers are of high value and any type of caps" or usage-based pricing will result in signi cant welfare losses for subscribers. The extant academic literature on topics related to the economics of internet access is very limited and almost exclusively theoretical in nature. Existing theoretical studies (e.g. Odlyzko (212)) often reach very strong and con icting conclusions regarding the welfare consequences of usage-based pricing. To date, the lack of detailed data on consumption of internet content has limited empirical work on the topic to only a couple papers: Goolsbee and Klenow (21) and Lambrect et. al. (27). Goolsbee and Klenow (21) use Forrester Technographics Survey data on individuals time spent on the internet and earnings to innovatively estimate the private bene t to subscribers of residential broadband. However, the authors estimate relies on the potentially dubious assumption that an hour spent on the internet is an hour in forgone wages. The lack of more empirical studies on these important issues is largely due to the proprietary nature of, and technological constraints associated with collecting, much of the data required to study these issues. However, the dramatic growth in usage over the past decade has forced ISPs to invest in technology to track usage and better manage scarce network resources. These investments in data collection have created an opportunity for academic researchers to better understand the economics of broadband internet access. In this paper, we use 5 months of detailed hourly data on internet utilization data we obtained from a group 3

of North American ISPs, some of which employ usage-based pricing, to study the impact of usage-based pricing on subscribers. We observe the total volume of content downloaded and uploaded each hour for approximately 5 months from late 211 to early 212 for over 3, subscribers. To study the welfare implications of usage-based pricing, we begin by building a dynamic model of subscribers inter-temporal decision making throughout a billing cycle under usagebased pricing. Speci cally, we model subscribers as utility-maximizing agents that solve a dynamic optimization program each billing cycle. While we do not have variation in the service plans or tiers o ered to subscribers during the sample period, the high frequency nature of our data and variety of three-part tarri s o ered to consumers allow us to accurately estimate demand. In particular, the high frequency data allows us to exploit variation in the shadow price, i.e., implications of current consumption on the probability of incurring overage charges later in the billing cycle, to trace out marginal utility for subscribers. In addition, selection into plans or choice of a particular three-part tarri reveals a great deal about preferences by revealing an average willingness to pay for content and a preference over the speed of one s connection (i.e., Mb/s). Our provider o ers plans ranging from almost linear tarri s (i.e., very low usage allowances) to plans with allowances well over 1GBs. The connection speeds, overage prices, and usage allowances are all non-decreasing in the xed access fee. A potential concern with such an approach with such a model is that it is wrong to have each subscriber solve his/her optimization problem in isolation. Or, network externalities among users makes the problem a dynamic game in which subscribers choose when to use the internet based on preferences and expectations regarding congestion in the network. However, the data we use in this paper comes from an ISP that operates an over-provisioned and pristine network. This allows us to accurately model a subscriber s usage decision as an independent one. We discuss how we can measure the absence of network externalities in this provider s network in Section 2. To estimate the model, we adapt the techniques of Ackerberg (29), Bajari et. al. (27) 4

and Fox et. al. (212). These techniques avoid very computationally expensive xed-point estimation algorithms. In particular, it is only necessary to solve the dynamic programming problem for each type of agent a single time. Second, one can relax parametric/distributional assumptions typically made when estimating such dynamic models. This is critical for our purposes given the limited amount of information we have about a subscriber and the extreme heterogeneity in usage behavior is di cult to model. Finally, the techniques naturally deal with di cult to model forms of selection, which is an important issue in our application. Subscribers select into a service tier or plan and we only observe usage under that optimally selected plan. The application itself is of interest and has important policy implications. As desirable, yet bandwidth-intensive, applications continue to be developed and subscriber usage grows, it will be increasingly important to have accurate measures of the demand for content. Our results largely support the current regulatory stance of the FCC. We nd that usage-based pricing, as currently implemented by North American providers, is successful at removing a great deal of low-value tra c from the networks. This is largely due to the negative correlation between the value and volume of typical online activities (e.g., 5GBs to stream a movie and only bytes to send 1s of emails). 1 We show that subscribers derive a great deal of utility from the rst bytes of tra c generated on a broadband connection, but utility diminishes rapidly thereafter. This supports the goal of the FCC to increase the reach of basic broadband service to more rural and under-served areas as stated in their National Broadband Plan. 2 Finally, our results are important for how content will be delivered in the future by telecomm and cable companies. There is an appeal to unicasting, or allowing each user to view content at their convenience. However, the low willingness to pay we nd for this convenience and high costs (i.e., large amount of additional tra c to accomodate on networks) suggests that into the forseeable future (a short time in this industry), the cost e ectiveness of broadcasting will continue to dominate arguments against usage-based billing 1 This is consistent with Net ix s failed attempt to raise the prices of their service in 212. 2 The FCC is currently working with telecomm and cable providers to o er a basic $9.99 tier to low-income households. See Greenstein and Prince (28) for more on the historical reach of broadband services. 5

of the internet. The remainder of the paper is as follows. In Section 2 we discuss our data in greater detail and provide reduced-form results that motivate assumptions made in the structural model and demonstrate a high price elasticity for online content. In Section?? we discuss the model used to capture the intertemporal decisions of subscribers regarding consumption of content under usage-based billing. Section 4 presents our methodology for estimating the structural model and presents the results. Sections 5 and 6 presents the results of our counterfactual exercise to identify the bene t to subscribers of removing usage allowances and nal conclusions, respectively. 2 Data The data used in this paper are Internet Protocol Data Record (IPDR) data. IPDR is a standardized framework for collecting usage and performance data from IP-based services and is currently the most popular way for cable operators to e ciently measure subscriber usage. The IPDR framework is supported by a DOCSIS (Data Over Cable Service Interface Speci cation) 2./3. compliant CMTS (Cable Model Termination System). A CMTS uses a collector to gather data at a minimum con gurable reporting period of 15 minutes. Our data reports the volume of a subscriber s usage every hour, linking subscriber s to their usage through their cable modem s (CM) MAC (Media Access Control) address. The usage reported for a subscriber does not re ect DOCSIS framing overhead; however, it does include operator-initiated management, control tra c, and Internet-originated tra c (pings, port scans, etc) which must be considered when metering internet usage. See Clarke (29) for more details on the structure of DOCSIS cable networks. The unit of observation for the IPDR data is a MAC address and a record creation time, or month, day, and hour. The data also reports the DOCSIS mode, 2. or 3., of each subscriber s modem. DOCSIS 3. modems permit greater provisioned speeds, as DOCSIS 2. modems limit a subscriber s connection to no more than 42.88 Mb/s. In the data, we observe bytes and packets passed by a subscriber, both in the upstream (e.g., uploading a 6

le to Dropbox) and downstream (e.g., streaming a movie from Net ix) directions. For billing purposes, and consequently our purposes, the direction of the tra c is ignored and we examine the total tra c in either direction. In addition to the number of bytes and packets passed by a subscriber, we also observe the number of packets that are delayed or dropped by the network for each subscriber. Delayed packets correspond to those that are requested by a subscriber in excess of their connections provisioned speed, e.g., requesting packets at a rate of 1 Mb/s on a connection that is provisioned for 8 Mb/s. Typically, delayed packets are ultimately passed. Dropped packets are those that never reach their destination. Observing the extent of dropped packets is extremely important, because in a network that is inadequately provisioned (i.e., not enough bandwidth to handle requests for content) externalities among users can result in interdependent demands. Fortunately, our data comes from a market and internet service provider (ISP) that operates an overly-provisioned and pristine network. Over a 5 month period, not a single subscriber had more than :1% of packets dropped in any one hour period. Our discussions with industry experts suggest that dropped packets in excess of 1% correspond to a degraded quality of service that would be noticeable to the subscriber. For this reason, in modeling a subscriber s usage decision, we reasonably assume that their usage decision does not depend on concerns over congestion in the network. Otherwise stated, we assume independence of subscribers demand functions. IPDR data does identify the CMTS interface that a user is linked to. Using this information, one can infer which users are connected on the network and allow for interdependent demands. See Malone (212) for an empirical study of network externalities in broadband networks. As mentioned above, a subscriber s usage is linked to the MAC address of their cable modem. Through the MAC, we re able to link in information about the subscriber s service tier (e.g. usage allowance of 5 GB and a provisioned downstream (upstream) speed of 8 Mb/s (1 Mb/s)) and the day that the billing cycle resets (e.g., usage counters reset to zero on 7 th of each month). We have monthly reports that give the service tier for each subscriber. Not surprisingly, since our provider did not change any features of any tiers, we see very 7

few subscribers (less than :1%) switch tiers during the ve months for which we have data. This lack of variation in the features of tiers would seem to be discouraging for identi cation purposes. However, as we discuss below, the high frequency nature of the data allows us to see intertemporal decisions made by every subscriber at each point in the billing cycle, compensating for the lack of variation in features of the service tiers. 2.1 Sample and Descriptive Statistics Our sample includes hourly usage for approximately 3, subscribers from October 1 st of 211 to February 14 th of 212 in a single metropolitan market. Due to the sheer volume of data and the fact that over 85% of residential s usage is during peak hours (7pm-11pm), we aggregate usage to a daily level. See Figure 1 for average subscriber utilization in Kb/s over the day and across all subscribers and tiers. In addition, we remove subscriber-day observations that are not part of a complete billing cycle for a subscriber. This results in either 3 or 4 complete billing cycles for each subscriber, depending on when the subscriber s usage counter resets each month. The internet service provider o ers multiple tiers, which di er along a few dimensions of signi cance. Similar to broadband service o ered by a typical North American provider, our provider di erentiates service tiers by provisioned speed, ranging from 2 Mb/s (1 Mb/s) to 6 Mb/s (2 Mb/s) downstream. The fairly uncommon aspect of our provider s broadband service is that each tier is priced with a three-part tari, similar to many cell phone plans, which includes an access fee, a usage allowance, and a per GB overage fee. The access fees are a xed fee paid each month, irrespective of usage, while the usage allowance permits the subscriber to use a certain amount of data before incurring overage fees for each GB of data in excess of the allowance. From the least to the most expensive tier (lowest to highest access fee), the usage allowance and provisioned speed are non-decreasing. Figures 2a and 2b plot monthly usage quantiles for subscribers on the least and most expensive tiers as a percentage of the usage allowance, respectively. Figure 2a (2b) shows that on the lowest (highest) tier approximately 3% (2%) of subscribers exceed their usage 8

allowance. The large number of subscribers well below the usage allowance demonstrates the importance of allowing for satiation for online content in subscribers preferences. These gures also point to the large degree of heterogeneity in usage across subscribers, even within a tier, with the heaviest users on each tier in a month use 2 times more than the median user. We discuss how we control for selection into service tiers when estimating demand in Section 4. Table 1 breaks down usage at a daily frequency, the unit of observation for the remainder of our analysis, aggregating across service tiers. Average usage in a month is 21.7 GBs, while median usage is only 8.5 GBs. This corresponds to an interquartile range of 56 GBs, with the 75 th percentile (62 GBs) over 1 times the 25 th percentile (6 GBs). approximately 6% of users exceed their usage allowance. allowance, the average (median) overage is 26.9 GBs (14.2 GBs). On average, Of those who exceed their usage For all subscribers, the median price paid per GB of content is $5.73, while the 25% is $1.79 and the 75% is $19.73. As we discuss in Section 4.1, these average willingness to pay statistics will be important for inferring preferences for the subscribers that have a negligible probability of exceeding their usage allowance in a given month (face a shadow price near zero for consuming content at each point in the billing cycle). 3 2.2 Preliminary Analysis Before moving to the structural model we provide evidence that suggests that subscribers are aware of their state (position relative to usage allowance and time remaining in month) and are forward looking. Similar to many cell phone operators, our provider gives notices via E-mail and text as a subscriber nears their allowance, allows the subscriber to login and check their usage to date, and provides an application for web browsers that monitors usage in real time. Thus, the cost of verifying one s state should is small. We begin by running the following regression c ikmt = + 1 Cikm(t 1) C k + 2 daysleft mt + dow mt + time t + im + ikmt, (1) 3 Past studies, e.g., Lambrecht et. al. (27), have completely relied on such variation to identify demand. 9

where the dependent variable, c ikmt, is subscriber i s usage on plan k, t days from the end of the billing cycle in month m. The ratio, C ikm(t 1), is the proportion of the usage allowance C k used to date and is given by the subscriber s total usage in the previous (t 1) days of the billing cycle, C ikm(t 1) = P t 1 =1 c ikm, divided by the usage allowance on plan k, C k. We also include daysleft mt, the number of days left in the billing cycle, dummies for the days of the week, dow mt, and a time trend, time t. The inclusion of subscriber-billing month xed e ects removes persistent forms of heterogeneity across subscribers as well as any billing-cycle speci c shocks to usage (e.g., seasonality or trends in usage). Intuitively, 1 should be negative while the sign of 2 is ambiguous. As the probability that a subscriber will exceed the usage allowance increases, i.e., the shadow price of current consumption increases, a subscriber with a high price elasticity of demand will tend to pull back ( 1 < ) on usage. This reduction in usage may occur well in advance of the usage allowance if the subscriber wants to ensure that overage charges will not be incurred. Similarly, for any level of previous usage, a subscriber further from the end of the billing cycle may want to reduce current usage ( 2 < ) to ensure the usage allowance is not exceed. However, early in the billing cycle, a great deal of uncertainty regarding future demand is yet to be realized so the user may want to ensure that they use the entire allowance ( 2 > ). It is important to note that any form of positive serial correlation in usage will work against nding a negative relationship between consumption (c ikmt ) and previous consumption ( C ikm(t 1) C k ). Such correlation in usage may arise from the dynamics of the subscribers intertemporal decision process itself. For example, a subscriber that enters an undesirable state (i.e. high cumulative usage early in a billing cycle) may respond by using the service consistently less throughout the remainder of the billing cycle. This points to the importance of modeling the entire process for consumption, not just the process near any nonlinearities in the pricing schedule. We discuss these issues further in Section 4. The estimates of Equation 1, and variations of, are reported in Table 1a 1d. In each of the Tables, Columns 1 and 2 report the estimates of Equation 1 where the dependent variable is in levels and log-transformed, respectively. Columns 1 and 2 both report a negative sign 1

for the proportion of the usage allowance used to date. The pull back in consumption of -.255 GB or approximately 17% of daily consumption is statistically and economically meaningful. These results are consistent with subscribers being aware of their states and adjusting consumption in response to the probability of exceeding the usage allowance and incurring overage charges, i.e., an increase in the shadow price of consumption. Only one of the estimates of the coe cient on the days remaining in the billing cycle is statistically signi cant. This may be due in part to the strong correlation between the proportion of the usage allowance used in a month and the number of days remaining in the month. The other controls show that the weekend is the heaviest usage day (Sunday is omitted) and there is some evidence of a weak positive trend in daily usage. The linear relationship between a subscriber s current usage and their position relative to the usage allowance assumed in Columns 1 and 2 is clearly ad-hoc. In particular, one may expect a highly nonlinear relationship, as it s not clear when and how a subscriber will begin to respond to an increasing probability of exceeding their usage allowance. This would depend on a number of things, including any uncertainty in future demand for internet content. To better capture any such dynamics in usage, we specify a set of indicators for a subscriber s position relative to their usage allowance; between 5% and 75%, 75% and 9%, 9% and 95%, and 95 to 1%, and over the allowance. Columns 3 and 4 of Table 1 report these estimates with the dependent variable in levels and log transformed, respectively. Column 3 shows a monotonically decreasing consumption pro le for subscribers nearing (and exceeding) their usage allowance. Subscribers increasingly reduce consumption as it becomes clear that the usage allowance will be binding and the shadow price of current consumption approaches the per-unit overage price. The results in Column 4 are very similar, once a user exceeds 95% of their usage allowance, they ve reduced consumption by approximately 27% and have fully internalized the overage price. Yet, again, well in advance of these levels, subscribers begin to account for the possibility of exceeding the usage allowance. In the next section, we formalize this intuition by modeling the intertemporal decision making process facing subscribers. 11

3 Model 3.1 Utility We assume consumers derive utility from content and a numeraire good. To consume content, each consumer must choose a tier or plan, indexed by k: Each plan is characterized by the speed s k by which content is delivered over the internet, by the usage allowance C k, by the xed fee F k and by the per-unit price of usage in excess of the allowance, p k. Speci cally, F k pays for all consumption up to C k, while all units above C k cost p k per unit. For any plan, the number of days in the billing cycle is T. Utility is additively separable over all days in the billing cycle. 4 Let consumption of content on day t of the billing cycle be c t and let consumption of the numeraire good on day t be y t : We specify the simple quasi-linear form, where the ow of utility is quadratic in c t. Speci cally, a consumer of type h on plan k has u h (c t ; y t ; k) = t ln(1 + c t ) c t ( 1h 2h ln(s k )) + y t ; where the time-varying unobservable, t ; is not known to the subscriber until period t and is independently and identically distributed on [; ] according to a distribution G h : 5 Hence, the consumer s marginal utility varies randomly across days in ways that the consumer cannot predict. The speci cation includes a constant marginal cost of consuming online content, 1 1 ln(s k ), that is decreasing in the speed of the connection, s k. This implies that the consumer has a satiation point, which captures key features of the data. di er across types of consumers. All parameters Letting income be I and letting total consumption since the beginning of the billing cycle be C t P t j=1 c j and Y t P t j=1 y t; respectively, de ne the monthly budget constraint as F k + p k (C T C k )1 C T > C k + YT I; (2) 4 In this way, we assume that content with a similar marginal utility is generated each day or constantly refreshed. This may not be the case for a subscriber that has not previously had access to the internet. 5 The right-truncation of G is necessary to ensure that the consumer can a ord any e cient level of daily consumption. Let h, s k and p k be the highest levels of these parameters. It su ces to assume that T p k ( h + ln(s k ) + ) < I. 12

where 1 [] is the indicator function. Denote the discount factor 2 (; 1): Conditional on choosing plan k, the consumer s problem is to choose daily consumption to maximize U = TX t 1 E [u h (c t ; y t ; k)] ; subject to (2). t=1 Throughout this paper, we will assume that all consumers have su cient income to pay for satiation levels of content. 3.2 Optimal Consumption The subscriber s problem is a nite-horizon dynamic-programming problem. Consider the terminal period (T ) of a billing cycle and denote the remaining allowance C kt MaxfC k C T 1 ; g: The e ciency condition for optimal consumption depends on whether it is optimal to exceed C kt. Intuitively, if the consumer is well below the cap (i.e., C kt is high) and does not have a particularly high draw of T, then she consumes content up to the point where the marginal utility of content is zero. If marginal utility at c t = C kt is positive but below p k ; then it is optimal to consume the remaining allowance. If one is already above the cap (i.e., C kt = ) or draws an extremely high T ; then it is optimal to consume up the the point where the marginal utility of content equals the overage price. In each of these situations in the last period, there are no intertemporal tradeo s. Usage today has no impact on next period s state, as cumulative consumption resets to zero at the beginning of each billing cycle. Thus, the problem is reduced to solving a static utility maximization problem, given a subscriber s cumulative usage up until period T, C T 1, and the realization of preference shock, T, which together determine the implications for overage charges and the marginal utility of usage, respectively. Denote this optimal level of consumption, for a given realization of T, by c hkt. For a given realization of the preference shock in period T, the suscriber s utility from 13

entering the nal period with state C T 1 and behaving optimally in the terminal period is 2 T ln(1 + c hkt ) c hkt ( 3 1h 2h ln(s k )) + y t V hkt (C T 1 ; T ) = 4 p k fc hkt 1 C T 1 > C k (C T 1 + c hkt C k )1 5 C T 1 < C k < C T 1 + c hkt g. Prior to the realization of T, the subscriber s expected utility is then E [V hkt (C T 1 )] = Z V hkt (C T 1 ; T )dg h ( T ). The expected value function, E [V hkt (C T 1 ; T )], is de ned for all C T 1 >. Similarly, the expected usage of a subscriber prior to the realization of T is given E [c hkt (C T 1 )] = Z c hkt (C T 1 ; T )dg h ( T ). Other conditional moments of optimal consumption can be calculated similarly for each state, (C T 1 ; t). Similarly, for any day in the billing period besides the last day, t < T, the optimal policy function for a subscriber of type h on plan k is 2 3 T ln(1 + c t ) c t ( 1h 2h ln(s k )) + y t c hkt(c t 1 ; t ) = argmax 4 p k fc t 1 C t 1 > C k (C t C k )1 C T 1 < C k < C t g c t +E V hk(t+1) (C t 1 + c t ) 5. and the value functions are given by 2 T ln(1 + c hkt ) c hkt ( 1h 2h ln(s k )) + y t V hkt (C t 1 ; t ) = 4 p k fc hkt 1 C t 1 > C k (C T 1 + c hkt C k )1 C T 1 < C k < C T 1 + chkt g +E V hk(t+1) (C T 1 + c hkt ) : 3 5 for each ordered pair (C t 1 ; t ). 6 Similar to the terminal period, the expected value function is E [V hkt (C t 1 )] = Z V hkt (C t 1 ; t )dg h ( t ). 6 Notice this formulation of the optimization problem assumes that the subscriber is aware of their cumulative consumption, C t 1, on each day in the billing cycle. This is a realistic assumption for our data, as the results in Table 2 demonstrate. 14

for all t < T = 3 and the mean of the mean of a subscriber s usage at each state is Z E [c hkt(c t 1 )] = c hkt(c t 1 ; t )dg h ( t ): (3) The policy functions for each type (h) of subscriber imply a distribution for the time spent in particular states (t; C t 1 ) over a billing cycle. We discuss solving for this distribution, generated by optimal subscriber behavior, and how it, along with the moments of usage, forms the basis of our method of moments approach discussed in Section 4. 3.3 Model Solution and Stationary Distribution Let G h denote normal distribution, truncated at, with mean h and variance 2 h. For a plan, k, and subscriber type, h, characterized by the vector ( 1h ; 2h ; h ; h ), the nitehorizon dynamic program described above can be solved recursively, starting at the end of each billing cycle (t = T ). To do so, we discretize the state space for C t to a grid of 18 points with spacing of size, c sk GBs, for each plan, k. Our data is hourly, so time is naturally discrete, but we aggregate time up to the day (t = 1; 2; :::; 3 over a billing cycle with T = 3 days). 7 This discretization leaves t as the only continuous state variable. Because the subscriber does not know t prior to period t, we can integrate this out and the solution to the dynamic programming problem for a subscriber of each type h can be characterized by the expected value functions, EV hkt (C t 1 ), and policy functions, c hkt (C t 1; t ). To perform the numerical integration over the bounded support [; ] of t, we use adaptive Simpson quadrature. Having solved the program for a subscriber of type h, one can then generate the transition process for the state vector implied by the solution to the dynamic program. The transition probabilities between the 54, possible states (18*3) are implicitly de ned by threshold values for t. For example, consider a subscriber of type h on plan k, that has consumed C t 1 prior to period t. The value of t that makes a subscriber indi erent between setting c t = zc sk rather than c t = (z + 1)c sk (advance cumulative consumption by z or z + 1 steps 7 This aggregation loses very little information, as over 8% of usage is on peak (between 6pm and 11pm). 15

of size c sk ) equates the marginal utility (net of any overage charges) of an additional unit of consumption to the loss in the net present value of future utility EV hk(t+1) (C t 1 + (z + 1)c s ) EV hk(t+1) (C t 1 + zc s ). These thresholds, which along with all subscribers initial condition, (C = ), de ne the transition process between states. Subscribers will consume no less if speed (s k ) is higher (lower opportunity cost of time), the overage price is lower, and the gradient of the expected value function is not too steep in cumulative consumption. For each subscriber type, h, and plan, k, we characterize this transition process by the cdf of the stationary distribution that it generates, hkt(c) = P (C t 1 < C), the proportion of subscribers that have consumed less than C through period t of the billing cycle. 8 These probabilities, for di erent values of C, are directly observable in our data and form the basis for our method of moments approach discussed in Section 4. 3.4 Optimal Plan Choice After solving the dynamic program a subscriber type, h, under every plan, k, selection into plans by subscribers can be naturally dealt with. A subscriber selects a plan with knowledge of their type, ( 1h ; 2h ; h ; h ), and the features of the plan, but not the realization of their particular needs (realizations of t for t = 1::T ) over the course of a billing cycle. In this case, the subscriber will select the plan, k = 1; ::; K, with the highest expected utility, or choose no plan at all, k =. To identify the optimal plan for each type, one can simply nd the plan that gives the highest expected utility at the beginning of a billing cycle, E [V hk1 ()], and then ensure that this is greater than zero (the outside option s value, E [V h1 ()] is normalized to ). The optimal plan for a type h subscriber is then kh = arg max fe [V hk1 ()] F k g. k2f;1;:::;kg where the xed fee for the outside option is, F =. 8 The discretized state space makes this cdf a step function. 16

4 Estimation We use a method of moments approach to recover the primitives of the model, the joint distribution of the parameter vector ( 1h ; 2h ; h ; h ). Our model predicts moments of optimal behavior at each state, along with the time spent in di erent states, (C t each subscriber type. the distribution of C t 1 ; t), for We seek to nd the distribution of subscriber types that matches 1 in the population of subscribers at each point in the billing cycle, t. This approach has the advantage of exploiting the high-frequency nature of our data, as it allows us to use variation in the intertemporal decisions made by subscribers at di erent states, rather than the end-product of these decisions (e.g. monthly internet usage). Our approach to estimation is most similar to the two-step algorithms advocated by Ackerberg (29), Bajari et. al. (27), and Fox et. al. (211). The rst step is to recover the moments to be matched from the data and solve the dynamic program for a wide variety of subscriber types, ( 1h ; 2h ; h ; h ). 9 C t We recover both the cdf of cumulative consumption, 1, for each plan, k, at each point in the billing cycle and the unconditional mean and variance of usage at each state, (C t 1 ; t). In the second step, we follow Fox et. al. (211) by searching for the weights or density of each type that best match the moments recovered from the data. The moments we chose to match were chosen for their identifying power and computational ease. In particular, these moments are linear in the type-speci c weights which reduces the matching process to a linear regression subject to a linear constraint and non-negativity restrictions. In addition to the computational advantages, this approach has the advantage of not placing parametric restrictions on the shape of the subscriber type distribution and naturally deals with selection (i.e., identify each type s optimal plan, k h, in the rst step). 9 Fox et. al. (211) correctly point out that identifying the correct support for the parameter vector, ( 1h ; 2h ; h ; h ), may in fact be viewed as an additional step to the estimation process. Yet, their motivating example of a random coe cient demand model and aggregated data (i.e., market shares for each product) is much di erent than our application. In particular, the authors are assuming that one observes only aggregate data making it impossible to know exactly what range of types are consistent with the data. However, in our application, we know the complete distribution of usage and this dramatically simpli es identifying the support of the type distribution that is consistent with even the most infrequent occurrences in the data. 17

4.1 Identi cation To realize the full computational advantages of the Fox et. al. (212) approach, we consider those moments with the most identifying power and then decompose the moments into parts that are linear in the parameters. The advantage of our data is that we observe the distribution of actions for subscribers at each state, (C t 1 ; t), along with the distribution of subscribers across states. Thus, we observe how consumers respond to marginal (shadow) prices ranging zero upto the overage price. This allows us to consider any moments of the conditional distribution of consumption at those states at which a subscriber is present. We focus on the conditional mean and variance of usage at each state. These moments are determined by the probability that di erent types reach a particular state and the actions taken at this state. Thus, it is important that our econometric approach correctly identify both this compositional aspect of the conditional moments. To see this, consider a model with two types, low usage (L) and high usage (H) subscribers. Consider those states that are only be reached by the low types (i.e., low cumulative consumption well into a billing period). At these states, subscribers are essentially solving a static utility maximization problem with a marginal price of zero, as there is a negligible probability they will exceed the usage allowance (i.e., shadow price of consumption is nearly zero). Knowing only low usage subscribers are present in these states and observing a subscriber solve this problem each day, equating marginal utility to zero, identi es the parameters of the utility function for these L types. Similarly, high demand subscribers are likely to exceed the usage allowance, equating marginal utility to the overage price from the beginning of the billing cycle. Thus, observing variation in usage at these states identi es the utility function for high demand subscribers. One might then argue that the weights, relative mass of H and L types in the population, for each type are identi ed by the mixture of actions taken at intermediate states that can be reached by both types. However, this is a very weak source of identi cation in our data due to the large degree of heterogeneity among users. Speci cally, consumers sort themselves out across the state space so quickly at the beginning of the month that the only real identifying 18

variation for the weights comes on the very rst day of the billing cycle for which all types are at the same state. After that point, the types are essentially over disjoint portions of the state space. Thus, the conditional moments by themselves may identify the types, h, of subscribers that are present in the data but provide very little information on their relative weights. Along with the lack of identifying power for the weights, considering the conditional moments also has the problem of being nonlinear in the weights. For any reasonable number of types (e.g., 5 or more), this results in an infeasible constrained-nonlinear optimization problem. For example, the conditional mean of consumption at each state is a mixture of type-speci c policy functions, E [c kt(c t where P 1 )] = H E [c hkt(c t 1 )] ht (C t 1 ), h=1 ht (C t 1 ) = ht(c t 1 ) h : HP ht (C t 1 ) h h=1 Thus, ht (C t 1 ) is a nonlinear function of both the probability a type reaches a particular state and the relative mass of the type in the population. The conditional variance of usage is de ned similarly. To remedy both the computational and identi cation di culties with these moments, we decompose the conditional moments in to two parts, the numerator and denominator The numerator, HP E [c hkt(c t 1 )] ht (C t 1 ) h, h=1 is just the unconditional mean of usage at each state while the denominator, HP ht (C t 1 ) h, h=1 is the mass of subscribers at a particular level of cumulative consumption, C t 1, on day t of the billing cycle. Both these moments are linear in the weights, h, and together solve the identi cation problem. In particular, by matching both sets of moments, we match the 19

conditional usage at each state (useful for identifying utility of each type) while also pinning down the relative weights of each type by matching the distribution of subscribers across the state space. The details of the matching procedure are discussed in Section 4.3. 4.2 Recovering Empirical Moments The large number of observations and high frequency of our data, along with the low dimensionality of our state space, (C t 1 ; t), allows us to adopt a exible nonparametric approach for recovering moments from the data to match to our model. We recover both the cdf of cumulative consumption for each day in the billing cycle, t, along with the conditional mean and variance of usage at each state. The unconditional mean and variance are then the product of the pdf of cumulative consumption and the conditional moments. 4.2.1 CDF of Cumulative Consumption To recover the cumulative distribution of C t 1 at each point in the billing cycle, t, for each plan, k, we use a smooth version of a simple Kaplan-Meier estimator, b kt (C) = 1 N k XN k i=1 1 C i(t 1) < C. We estimate these moments for each k and t, considering values of C such that b kt(c) 2 [:1; :99], ensuring that we t the tails of the usage distribution. This results in approximately 3, moments to match for each plan. plan k. 1 Let bcdf k denote the vector of moments for To compute point-wise standard errors for our estimates of these distributions, we draw on the literature on resampling methods with dependent data, see Lahiri (23). The dependence in our data comes from the panel nature of the data, as we observe individuals making daily decisions on consumption over 3 or 4 full billing cycles. The straightforward structure of our panel signi cantly simpli es the resampling procedure. We repeatedly estimate the cumulative distribution functions, leaving out di erent groups of subscribers. We 1 We use a normal kernel and adaptive bandwidth to smooth the empirical cdf. 2

choose 1, randomly sampled groups of 5, subscribers and re-estimate each distribution omitting the di erent groups of subscribers each time. These estimates are then used to calculate a variance-covariance matrix, V b cdf k, for the moments for each plan, k. This weighting matrix is used to account for the di erent scale of our moments and inversely weight more variable moments. Figures 2a, 2b, and 2c present the recovered cdf of cumulative consumption for each day of the billing cycle, for the least expensive, most popular, and most expensive plans, respectively. by our provider. The least and most expensive plans are the two least popular plans o ered Yet, there is still a more than adequate number of observations to get an accurate characterization of the time spent in di erent states by subscribers on these plans. On both the least and most expensive plans, there are a signi cant proportion of subscribers that exceed their usage allowance, 2% and 3%, respectively. While the proportion of subscribers exceeding the allowance on the most popular plan is small, the absolute number of users is actually larger than the total number of users to exceed the allowance on all other plans combined. 4.2.2 Unconditional Mean and Variance of Consumption The large number of observations in and richness of our data, along with the low dimensionality of the our state space, (C t 1 ; t), allows us to adopt a very exible estimation approach to recover the moments of usage at each state. Our problem essentially reduces to estimating a surface de ned over the (C t 1 ; t) plane. To exibly estimate the conditional moments, we adopt a nearest neighbor approach. Consider point in the state space, ( e C e t 1 ; et). A neighbor is an observation in the data for which t = et and C t 1 is within some distance of e C e t 1 number of nearest neighbors, those with the smallest distance from e C e t (e.g.,.5 GBs). Denote the xed 1, used to estimate the moments at ( e C e t 1 ; et), under plan k, by N k ( e C e t 1 ; et). The estimate of the conditional mean at ( e C e t 1 ; et) is be h c kt( C ~ i t ~ 1) = 1 N kt ( e C e t 1 ) 21 N kt ( e C e t 1 ) X i=1 c i,

where i = 1::::N k ( e C e t 1 ; et) indexes the set of nearest neighbors. Similarly, our estimator of the conditional variance is bv h c kt( C ~ i t ~ 1) = 1 N kt ( e C e t 1 ) 1 N kt ( e C e t 1 ) X i=1 c i b E h c kt( ~ C ~ t 1)i 2. If N k ( e C e t 1 ; et) < 1, we do not estimate the conditional mean. If there are at least 1 but less than 5 neighbors, we use all neighbors to estimate the conditional mean. If there are more than 5 neighbors, we use those 5 neighbors nearest to C e e t 1. The unconditional mean is then recovered as the product of the probability of observing a subscriber at state ( e C e t 1 ; et), estimated from the cdf of cumulative consumption we recover, and the conditional mean. Let the vector of estimates for the unconditional means and variances for plan k be denoted by bavg k and b var k, respectively. The nearest neighbor approach has a number of advantages over other estimators for our application. restrictions on the surface. First, as with any nonparametric estimator, it imposes no parametric Second, nearest neighbor estimators inherently are bandwidth adaptive, see Pagan and Ullah (1999). This is particularly useful in our application. The number of users reaching very high volumes of cumulative consumption can be small for some plans. In these low-density situations, nearest neighbor estimators will expand the bandwidth appropriately until a given number of observations are included in the estimator of the surface. We do restrict the degree to which the estimator can expand the bandwidth in these low-density situations in order to limit any potential bias such expansion might introduce. Our results are very robust to varying both the minimum number of neighbors (N kt ( C e e t 1 ) > 1) required for a conditional moment to be estimated and the cuto that determines that determines how much the bandwidth can adapt in low-density situation to identify N kt ( C e e t 1 ) neighbors. We estimate each surface at the same set of discrete set of state space points used when numerically solving the dynamic programming problem for each subscriber type. We again use a block-resampling procedure to compute a variance covariance matrix for our estimates for the conditional mean and variance, b V avg k respectively. We use these matrices to inversely weight more variable moments. 22 and b V var k,

While we will match the unconditional mean and variance at each state, it is useful and intuitive to present the conditional means which demonstrate a few properties of our data more clearly than the analogous unconditional moments. These results are summarized in Figures 3a-3c and 4a-4c for the mean and variance, respectively, for the least expensive, most popular, and most expensive plans. For each plan, the surfaces characterizing the conditional means have the same pattern. Very early in the billing period the di erent types of subscribers reveal themselves. The high types sort themselves to high cumulative consumption states and continue to consume at a very high level. Interestingly, we see that consumption is relatively smooth across the billing cycle, which suggests that (high volume) subscribers are quite adept at smoothing consumption. Or, we do not see much of a drop in average consumption for the highest volume subscribers as they near the overage, reinforcing our decision to model subscribers as forward-looking and rational economic agents. The low-volume types tend to migrate to low cumulative consumption states as the billing cycle progresses and continue to consume at low levels. In addition, there is a wide variety of intermediate types that consume at a fairly constant level throughout the billing cycle. The estimates of the standard deviation in usage at each state follow a similar patterns to the means. The sorting is again evident and higher mean types tend to have much more variable usage, while the standard deviation of usage tends to be proportional to the mean. 4.3 Matching Moments 4.3.1 Objective Function The second step of our estimation approach follows the method of moments approach due to Bajari et. al. (27) and Fox et. al. (211). Our objective is to match, as closely as possible, the empirical moments we recover from the data to those predicted by our model. The parameters we minimize over are the relative mass of di erent types, h, in the population of subscribers that choose a plan, k. 23