Market Design & Analysis for a P2P Backup System

Size: px
Start display at page:

Download "Market Design & Analysis for a P2P Backup System"

Transcription

1 Market Design & Anaysis for a P2P Backup System Sven Seuken Schoo of Engineering & Appied Sciences Harvard University, Cambridge, MA seuken@eecs.harvard.edu Denis Chares, Max Chickering, Sidd Puri Microsoft One Microsoft Way, Redmond, WA {cdx,dmax,siddpuri}@microsoft.com ABSTRACT In this paper we take the probem of a market-based P2P backup appication and carry it through market design, to impementation, to theoretica and experimenta anaysis. Whie the ong-term goa is an open market using rea money, here we consider a system where monetary transfers are prohibited. We first describe the design of the P2P resource exchange market and the UI we deveoped. Second, we prove theorems on equiibrium existence and uniqueness. Third, we prove a surprising impossibiity resut regarding the imited controabiity of the equiibrium and show how to address this. Fourth, we present a price update agorithm that uses daiy suppy and demand information to move prices towards the equiibrium and we provide a theoretica and experimenta convergence anaysis. The market design described in this paper is aready impemented as part of a Microsoft research project on P2P backup systems and an apha version of the software has been successfuy tested. Categories and Subject Descriptors J.4 [Computer Appications]: Socia and Behaviora Sciences Economics Genera Terms Agorithms, Design, Economics Keywords Market Design, P2P, Onine Backup, Exchange Market 1. INTRODUCTION: P2P BACKUP With the increasing importance of information technoogy in our ives, we are aso more and more dependent on being abe to readiy access a of our data a the time. However, users reguary ose vauabe data because their hard drives crash, their aptops are stoen, etc. Aready in 2003, the annua costs of data oss in the US was estimated to be $18.2 Part of this work was done whie the author was a research intern at Microsoft. Permission to make digita or hard copies of a or part of this work for persona or cassroom use is granted without fee provided that copies are not made or distributed for profit or commercia advantage and that copies bear this notice and the fu citation on the first page. To copy otherwise, to repubish, to post on servers or to redistribute to ists, requires prior specific permission and/or a fee. EC 10, June 7 11, 2010, Cambridge, Massachusetts, USA. Copyright 2010 ACM /10/06...$ Biion. 1 With broadband connections becoming faster and cheaper, onine backup systems are becoming increasingy attractive aternatives to traditiona backup. There are hundreds of companies offering onine backup services, e.g., Sky- Drive, Idrive, Amazon S3. Most of these companies offer some storage for free and charge fees when the free quota is exceeded. A of these services, however, rey on arge data centers and thus incur immense costs. The motivation for peer-to-peer (P2P backup systems is that ide resources on the computers of miions of users can be used to avoid these costs. Whie the tota network traffic increases with a P2P soution, the primary cost factors that can be avoided are 1 costs for hard drives, 2 energy costs for buiding, running and cooing data centers 2, 3 costs for arge peak bandwidth usage, 4 personne costs for computer maintenance. In a P2P backup system, these costs amost disappear because we can use ots of otherwise ide resources. The main idea of P2P backup is that users provide some of their resources (storage space, upoad bandwidth, downoad bandwidth, and onine time in exchange for using the backup service. A study performed by Microsoft in 2008 showed that about 40% of Windows users have more than haf of their hard disk free and thus coud be exceent candidates for using a P2P backup system. The rapidy decreasing costs for arge hard drives might make P2P backup even more attractive in the future. In our own recent study [13], we showed that many users are not wiing to pay the high fees for server-based backup and about haf our participants said they woud consider using P2P backup instead. Thus, there is definitey potentia for P2P backup appications. 1.1 The Hidden Market Approach Our P2P backup system is nove in that it uses a market to aocate resources, more efficienty than a non-market-based system coud. We have impemented and successfuy tested the system in apha version. During the design phase for this system we foowed a new paradigm that we recenty introduced caed Hidden Market Design [14]. The goa of a hidden market is to hide some of the compexities of the market from the user and to make the interaction for the user as seamess as possibe. A P2P backup appication is a particuary good exampe for hidden market design because the appication targets miions of technicay unsophisticated In 2008, data centers in the US were responsibe for about 3% of the country s energy consumption. Note that a P2P backup system cannot ony reduce costs but is aso more environmentay friendy due to reduced carbon emissions.

2 users, in a domain where markets/trading/currency etc. are very unnatura. As a resut of this hidden markets approach, our resuting design is very unusua in that we provide our users with an indirect way to express their preferences. Each resource in the market has a price, and the reative prices refect the scarcity of the resources. The market design addresses the combinatoria nature of resources, namey that a users must provide a certain amount of a resources, even if they currenty ony consume a subset of them. For exampe, a user who ony contributes space is useess to the system because no fies coud ever be sent or received from that user if no bandwidth is provided. The design must aso aow users to express idiosyncratic preferences regarding how much of each resource they want to suppy. Some users might need their own disk space a ot and prefer to sacrifice their internet connection. Other users might use their bandwidth for services ike VOIP or fie-sharing and might have a high disutiity if the quaity of those services were affected. We aow different users to provide different ratios of their resources, and we update prices reguary taking into account aggregate suppy and demand. 1.2 Overview of Resuts In this paper we present our market design for a P2P backup system and provide a theoretica and experimenta anaysis of its properties. At a times, the design and anaysis was done for the actua impemented system. Our approach is nove and different from reated work in that we foow the hidden market design paradigm [14] and use a very indirect market mode. In our system, users do not continuousy specify demand and suppy vectors but instead ony periodicay choose bounds on their future maximum suppy and demand, which makes the system much more usabe. After introducing an economic mode, we define a safety property that sha guarantee that the system can aways satisfy new incoming demand. Then, we define a buffer equiibrium, which is an equiibrium defined on suppy and demand bounds. We prove that this equiibrium exists, is unique, and satisfies the safety property. However, we aso show that the equiibrium cannot be easiy controed, in particuar the size of the suppy side buffer is out of the market operator s contro. We expain the origin of this counter-intuitive resut and show which design changes woud be necessary to give the market operator more contro. A price-update agorithm is introduced that takes system-wide suppy and demand information and updates prices to drive the market towards the buffer equiibrium. Anaytica and experimenta anaysis shows reasonabe convergence speed when the initia price vector is chosen cose enough to the equiibrium. 1.3 Reated Work In recent years there has been much research on P2P storage systems, eectronic markets, distributed accounting, resource exchange systems, etc. Amost 10 years ago, the research projects OceanStore [8] and FarSite [3] aready investigated the potentia of distributed fie systems using P2P. Both projects, however, did not do any kind of market design. More recenty, researchers have ooked at the incentive probem, often with the primary goa to enforce fairness (you get as much as you give. Samsara [5] is an accounting scheme that aows for fairness enforcement. In contrast to our design, their scheme is fuy distributed. Whie such a design has some advantages, it prevents the use of sophisticated pricing and payment mechanisms that we empoy. The idea to use eectronic markets for the efficient aocation of resources is even oder than ideas regarding P2P storage systems. Aready in 1996, Ygge et a. [16] proposed the use of computationa markets for efficient power oad management. More recenty, grid networks and their efficient utiization have gotten more attention [9]. Fundamenta to these designs is that participants are sophisticated users abe to specify bids in an auction-ike framework. Whie this assumption seems reasonabe in energy markets or computationa grid networks, we are targeting miions of users with our backup service and thus we cannot assume that users are abe to directy act as traders on an exchange market. The two papers most simiar to our work are by Aperjis et a. [2] and Freedman et a. [7]. They anayze the potentia of exchange economies for improving the efficiency of fiesharing networks. Whie the domain is simiar to ours, the particuar chaenges they face are quite different. They use a market to baance suppy and demand with respect to popuar or unpopuar fies. However, in their domain there is ony one scarce resource, namey upoad bandwidth, whie we must design an exchange market for mutipe resources. There exist mutipe P2P backup appications that are being used in practice. The appication most simiar to ours is Wuaa ( However, none of these systems are market-based and do not aow different users to provide different resource ratios. Thus, these systems exhibit economic inefficiencies compared to our exchange market. In our own prior work, in [13], we have studied the roe user interface (UI design pays for the design of hidden markets, and we have described in detai the design and evauation of a new UI for the P2P backup appication. This paper compements that work: we ony briefy describe the UI, to the degree necessary to understand our mode, and focus on the market design and anaysis. This paper is a significanty extended version of a previous workshop contribution [12]. 2. THE P2P RESOURCE MARKET Our system uses a hybrid P2P architecture where a fies are transferred directy between peers, but a dedicated server coordinates a operations and maintains meta-data about the ocation and heath of the fies. The roe of the server in this system is so sma that standard oad-baancing techniques can be used to avoid scaing bottenecks. Each user in the system is simutaneousy a suppier and a consumer of resources. A singe peer on the consumer side demanding a service (backup, storage, or retrieva needs mutipe peers on the suppier side offering their resources (space, upoad and downoad bandwidth, and onine time. The production process of the server (bunding mutipe peers and coordinating them is essentia, turning unreiabe storage from individua peers into reiabe storage. Note that each peer on the suppier side offers a different resource bunde whie each peer on the consumer side gets the same product, i.e., a backup service with the same, high reiabiity. Erasure Coding and Repication. One natura concern about P2P backup is that individua users have a much ower avaiabiity than dedicated servers. Thus, a P2P system must maintain a higher fie redundancy to guarantee the same fie avaiabiity as server-based systems. Simpy storing mutipe fie copies woud be very costy. Fortunatey, we can

3 significanty reduce the repication factor by using erasure coding [10]. The erasure code spits up a fie into k fragments, and produces n > k new fragments, ensuring that any k of the n fragments are enough to reconstruct the fie. Using this technique, we can achieve the same high reiabiity as severbased systems whie keeping repication ow. For exampe, if users are onine 12h/day on average, using erasure coding we can achieve a fie avaiabiity of % with a repication factor as ow as 3.5, compared to simpe fie repication which woud resut in a factor of 17. Note that the process for backing up fies aways invoves four steps. First, the user s fies are compressed. Then the compressed fies are automaticay encrypted with a private key that ony the user has access to (via Microsoft LiveID. Then, the encrypted fie is erasure coded, and then the individua fragments are distributed over hundreds of peers. Using this process, the security of our P2P backup system can be made as high as that of any server-based system. Basic Operations in the Backup System. We consider the foowing five high-eve operations: 1. Backup: When a user performs a backup, fie fragments are sent from the consumer to the suppiers. 2. Storage: Suppiers must persistenty store the fragments they receive (unti they are asked to erase them. 3. Retrieva: When a user retrieves a backup, fie fragments are sent from the suppiers to the consumer. 4. Repair: When the server determines a backed up fie to be unheathy, the backup is repaired. 5. Testing: If necessary, the server initiates test operations to gather new data about a peer s avaiabiity. Tabe 1: Operations and their Required Resources. Operation Resources Required from Suppiers 1. Backup Downoad Bandwidth 2. Storage Space 3. Retrieva Upoad Bandwidth 4. Repair Downoad and Upoad Bandwidth 5. Testing Downoad and Upoad Bandwidth Prices, Trading & Work Aocation. For now, monetary transfers are prohibited and a trades in the market are done using a virtua currency. Each resource has a price at which it can be traded and in each transaction the suppiers are paid for their resources and the consumers are charged for consuming services. Prices are updated reguary according to current aggregate suppy and demand, to bring the system into equiibrium over time. Trading is enabed via a centraized accounting system, where the server has the roe of a bank. The server maintains an account baance for each user starting with a baance of zero and aows each user to take on a certain maxima deficit. The purpose of the virtua currency is to aow users to do work at different points in time whie maintaining fairness. Users have a steady infow of money from suppying resources and outfow of money from consuming services, which varies over time. In steady state, when users have been onine ong enough, their income must equa their expenditure. Users cannot earn money when they are offine but must sti pay for their backed up fies. Thus, their baance continuousy decreases during that time. As ong as we do not use rea money, the maximum deficit that users can take on must be bounded. Utimatey, it is a poicy decision what happens when a user hits a pre-defined deficit eve. Our system wi first notify the user (via emai and visuay in the appication and present options to remedy the situation (e.g., increase suppy. Faiing this after a reasonabe timeout period (e.g., 4 weeks, the user s backups wi be deeted. The server is invoved in every operation, coordinating the work done by the suppiers and aocating work to those users with the owest account baances to drive a accounts (back to zero over time. This is possibe because the users steady-state income must equa their expenditure. Thus, when users have been onine for a sufficient time period, their account baance is aways cose to zero. 3. THE USER INTERFACE In this section we ony describe the UI to the degree necessary to understand the economic mode; see [13] for more detais. Figure 1 dispays our current impementation of the UI, a settings window where users can contro their resources. This window has two distinct areas: On the right side, the users can set maximum bounds on the suppy they are wiing to give up, using the three siders for space, upoad and downoad bandwidth. Beow the siders the current average onine time of the users is dispayed. To change this vaue the users have to eave their computer onine for more or fewer hours per day than they are currenty doing. On the eft side of the window, the users can choose how much onine backup space they need. On the bar chart the users can see how much they have aready backed up (their current demand and how much tota onine backup space they have (the bound on their demand given the current suppy. To change anything about their settings, the users can either drag the bar chart on the eft side up or down, move any of the siders on the right side, or change how often they are onine. Both sides of the window are connected to each other such that a change on either side affects and dynamicay updates the vaues on the other side as we. The semantics of this connection are important: on average, users must be abe to pay for the tota consumption chosen on the eft side with the suppy chosen on the right side. The users have mutipe ways of seeing/experiencing current market prices of the resources. They can either move the siders whie observing the bar chart on the eft. If the bar chart moves a ot, then the price for that resource is reativey high. If the bar chart ony moves a itte, the price is currenty reativey ow. Aternativey, users can ook at the hep text to the right of the three siders, where we show the users how much more of each resource they have to give up to get 1 more GB of onine backup space. This information is aso an indirect encoding of the current resource prices. Bunde Constraints. We now turn our attention to the combinatoria aspect of the market, the bunde constraints. If a user keeps increasing one sider towards the maximum whie the other two siders are reativey ow, at some point the onine backup space on the eft might stop increasing. For exampe, if users imit their upoad bandwidth to 5 KB/s, then increasing their space suppy from 50 GB to 100 GB shoud not increase their onine backup space. We woud simpy never store 100 GB on these users hard disks because 5 KB/s woud not be enough to have a reasonabe retrieva rate for a of these fie pieces. Thus, for the sys-

4 Figure 1: Screenshot of the impemented user interface. On the eft side, the users see their current demand and the upper bound on how much they can maximay consume given their suppy choice. On the right side, the users can specify bounds on the maximum suppy they are wiing to give up. tem to use the whoe suppy of 100 GB, the users woud first have to increase their suppy of bandwidth. An anaogous argument hods true for other combinations of resources. In the UI, as ong as the user moves the siders within the bue region (where the resources are usefu to give up, the onine backup space changes more or ess ineary. Once the user moves the sider out of the bue region into the gray region, the onine backup space stops increasing because the user is now suppying more of that resource than is usefu. In our impementation, we use system-wide information regarding the demand for each of the three resources to determine the regions where a user s suppy is usefu. Obviousy, if a user suppies resources in the same ratio as they are being used in the system, then a of the resources are usefu. However, we give each user a certain sack, i.e., we aow the users to specify suppy ratios that deviate from the system-wide ratio by a factor γ > 1 into either direction. Note that the sack factor determines the users fexibiity regarding their resource suppy. This fexibiity comes from the fact that the server can navigate certain kinds of work to certain users. For exampe, repair traffic consumes a ot of bandwidth but no space; cod backups require a ot of space but ony itte bandwidth; hot backups require a ot of bandwidth but itte space. The server chooses the sack factor depending on how much freedom the system has in aocating different kinds of work to different kinds of users. 4. THE ECONOMIC MODEL In this section we introduce a forma economic mode to aow for a theoretica anaysis of the properties of the rea P2P market system. In the P2P economy, there are I users who are simutaneousy suppiers and consumers. The set of commodities traded on the market is L = {S, U, D, A, B, Σ, R}. The first four commodities are space (S, upoad bandwidth (U, downoad bandwidth (D, and avaiabiity (A, which are the resources that users suppy. The ast three commodities are backup service (B, storage service (Σ, and retrieva service (R, which are the services that users consume. By sighty abusing notation, we sometimes use S, U, D, etc. as subscripts and sometimes they denote the resource space, e.g., for a particuar amount of upoad bandwidth u we require that u U. Each user i has a fixed endowment of the suppy resources denoted w i = (w is, w iu, w id, w ia S U D [0, 1]. We anayze the equiibrium in a singe snapshot of the market when a users adjust their bounds on suppy to reach their desired demand vectors. The next aspect of the mode is driven by our UI. Via the siders, the user seects upper bounds for the suppy vector, which we denote X i = (X is, X iu, X id, X ia. In return for the suppy X i, the user interface shows the user the maximum demand of services, denoted Y i = (Y ib, Y iσ, Y ir. In Figure 1 you can see that this user has currenty chosen X is = 80.8GB, X iu = 400KB/s, X iu = 300KB/s and X ia = 0.5 as the maximum suppy vector. At any point in time, a certain set of resources from the user are being used, aways ess than X i, and a certain set of services is being demanded. We denote user i s current suppy as x i = (x is, x iu, x id, x ia, and anaogousy user i s current demand for services as y i = (y ib, y iσ, y ir. The user does not choose x i and y i directy via the UI. Instead, the server chooses x i (obeying the bound X i such that user i can afford the current demand y i which the user simpy chooses by backing up fies or retrieving them. Note that the UI dispays the user s consumption vector in an aggregated way; i.e., instead of isting the services backup, storage, and retrieva separatey, we simpy dispay the currenty used onine backup space (= 17.28GB in Figure 1 and the maximum onine backup space that user coud consume (= 33.5GB in Figure 1. In practice, users have a certain cost for opening the settings window and adjusting the settings. Instead of modeing this cost factor directy, we assume that when users open their settings window, they are panning ahead for the whoe time period unti they pan to open the settings window the next time. Whie a user might currenty consume y i, he pans for consuming up to Y i the next time he opens the settings window. He then seects the suppy vector X i that he is wiing to give up to get this Y i. The user cares about how

5 arge the bounds on his suppy are, because he has negative utiity for giving up his resources. To make this more forma, we et K i = w i X i with K i S U D A, denote the vector of resources that the user keeps, i.e., his endowment minus the suppy he gives up. We can specify the user s preference reation over a the resources he keeps, and the services he consumes: i (K is, K iu, K id, K ia, Y ib, Y iσ, Y ir. We make the foowing assumption (cf. [11], chapters 1-3: Assumption 1. Each user s preferences are (i compete, (ii transitive, (iii continuous, (iv stricty convex, and (v monotone. Note that strict convexity requires stricty diminishing margina rates of substitution between two goods, i.e., we need to compensate a user more and more with one good as we take away 1 unit of another good. This is a reasonabe assumption because it represents a genera preference for diversification. Monotonicity means that a commodities are goods, i.e., if we give users more of any of the commodities, they are at east as we off as before. Given compete, transitive, and continuous preferences, there exists a utiity function u i(k i, Y i = u i(k is, K iu, K id, K ia, Y ib, Y iσ, Y ir that represents the preference reation and this utiity function is continuous (cf. [11], p Prices and Fow Constraints The system can avoid non-inear prices and support an equiibrium with inear prices as ong as the user specifies a suppy vector within the sack region that the system aows for each user s suppy. The ony resource that is not subject to these sack regions, or imits, is avaiabiity: as ong as the user s avaiabiity is arger than zero, the other resources can be used. To simpify the pricing mode, we introduce three new composite resources S, U, and D, incorporating the user s avaiabiity into the other resources in the foowing way: X iu U = X iu X ia X id D = X id X ia X is S = ϕ(x is, X ia X is X ia overhead factor Note that this notation denotes composite and not vector quantities. The definitions for the composite resources upoad and downoad bandwidth are straightforward: we mutipy the bound on bandwidth the user suppies (e.g., 300 KB/S with the user avaiabiity [0, 1] and then mutipy it with 24 hours, 60 minutes and 60 seconds, to cacuate how many KBs we can actuay send to this user per day. The definition of X is is a itte more intricate because the user s avaiabiity does not enter ineary into the cacuation. However, it enters monotonicay, i.e., more avaiabiity is aways better. Here, it suffices to know that the server can compute this function ϕ and convert a user s space and avaiabiity suppy into the new composite resource; further detais are beyond this paper. We can now define user i s suppy vector for the three composite resources: X i = (X is, X iu, X id. The advantage of using these composite resources is that now, the suppy from different users with different avaiabiities is comparabe. For exampe, 1 unit of S from agent i with avaiabiity 0.5 is now equivaent to 1 unit of S from agent j with avaiabiity 0.9. Obviousy, internay agent i has to give much more space to make up for his ower avaiabiity, but in terms of bookkeeping, we can now operate directy with composites. We define the aggregate suppy vector for the composite resources as X = i Xi, and anaogousy for Y, x and y. We make the foowing we-known observation (cf. [11], chapter 3 that wi be usefu ater: Observation 1. The suppy and demand functions X i and Y i are homogeneous of degree zero. This impies that the aggregate suppy and demand functions, i.e., X and Y, are aso homogeneous of degree zero. Now we get to the pricing aspect of the system. We use p = (p S, p U for the prices for suppied composite resources, and q = (q B, q Σ, q R for the demanded services. We require that in steady state, users can pay for their consumption with their suppy. We can express this fow constraint formay: X i p = Y i q. (1 At the same time, the server aocates enough work to user i such that the user s current suppy x i is enough to pay for the demand y i, which eads to a second fow constraint: x i p = y i q. (2 4.2 Production Functions We have aready mentioned the important roe of the server in our market, i.e., that of combining resources from different suppiers into a vauabe bunde. Formay, the server is the ony producer in our market. 3 For each service, we have a production function that defines how many input resources are needed to produce one unit of that service: Backup: f B : S U D B Storage: f Σ : S U D Σ Retrieva: f R : S U D R Note that these production functions are defined via the impementation of our system, i.e., the particuar production technoogy that we impemented. For exampe, they are defined via the particuar erasure coding agorithm that is being used, by the frequency of repair operations, etc. Thus, we can now specify a series of properties that these production functions guarantee due to our impementation: System Property 1. Production functions are fixed and the same for a users. System Property 2. The production functions a exhibit constant returns to scae (they are homogeneous of degree 1, i.e., {B, Σ, R} : f (k a, k b, k c = k f (a, b, c k R. System Property 3. Each production function is bijective, and thus we can take the inverses: B : B S U D Σ : Σ S U D R : R S U D Given the inverse functions for the individua services backup, storage, and retrieva, we can define an inverse function for a three-dimensiona service vector (b, σ, r B Σ R: (b, σ, r = B (b + Σ (σ + R (r (3 3 Note that this is what aows us to define an exchange economy despite the fact that production is happening in the market. For more detais see [11], pp

6 Property 1 hods because of the way we have defined the composite resources, with any differences between the agents avaiabiities aready considered. Property 2 (CRTS hods approximatey for fie sizes above a certain threshod (approx. 1MB due to the properties of the erasure coding agorithm. 4 Property 3, the bijectivity of production, hods, because for each service unit, there is ony one way to produce it. For exampe, to backup one fie fragment, the erasure coding agorithm tes us exacty how many suppier fragments we need, and the server tes us how much repair and testing traffic we can expect on average per fragment. The foowing system s property comes from the UI design and is motivated by keeping the money fow in the system constant. This property is ony possibe because the server is the ony producer in our system and production functions are fixed (Property 1, exhibit constant returns to scae (Property 2, and are bijective (Property 3: System Property 4. We charge the consumers exacty the amount we pay the suppiers, i.e., for demand vector y i, we charge user i exacty: y i q = (y p. Using Property 4, we can now re-write the fow constraints for agent i as: X i p = (Y i p and x i p = (y i p Thus, from now on, we can omit the price vector q for demanded services and ony need to consider price vector p. 5 Effectivey, we can treat the whoe market as an exchange economy for the composite resources S, U, and D, assuming users ony engage in exchange of suppied resources because everyone has access to the same production technoogy (cf. [11], pp Remember that the UI automaticay cacuates and adjusts the maximum demand vector Y i for user i based on the user s suppy bound X i. In practice, the maximum income is divided by the current average income of the user, and the resuting factor is mutipied with the user s current demand, giving us the maximum demand the user can afford: System Property 5. The system uses a inear demand prediction mode for the cacuation of a user s maximum demand Y i : Y i = Xi p yi = λi yi x i p To faciitate the equiibrium anaysis in the next section, we make the foowing simpifying assumption: Assumption 2. We assume that with a arge number of users, a inear demand prediction is aso correct for the aggregate demand vectors, i.e.: λ : Y = λ y 4 Very sma fies are an exception and need specia treatment in the impementation, because they are more expensive to be produced (again due to the erasure coding. We take care of this in the impementation by charging users more when they are backing up sma fies (essentiay we have two sets of prices, one for norma fies and one for sma fies. 5 Going forward, pease remember that mutipications with p are aways dot products, and thus p showing up on the eft and the right side of an equation does not cance out. This assumption is justified because in practice, the system wi have thousands or miions of users. Let n denote the number of users in the economy, et Y n = n i=1 Yi, y n = n i=1 yi, and et µ(λi denote the mean of the distribution of the λ i s. Given that the λ i s are independent from the y i s, it foows from the strong aw of arge numbers, that if the number of users n is arge enough, then Y n is ineary predictabe by µ(λ i y n aong each dimension to any additive error. More specificay, for any ε and δ 0, for arge enough n: P r[ Y n µ(λ i y n ε] 1 δ. 5. EQUILIBRIUM ANALYSIS A rea-word instance of the P2P backup appication woud have thousands if not miions of users. Thus, the underying market woud be arge enough so that no individua agent had a significant effect on market prices. Consequenty, users can be modeed as price-taking agents and a genera equiibrium mode is suitabe to anayze this market. 5.1 The Buffer Equiibrium A standard equiibrium concept in genera equiibrium theory is the Warasian equiibrium which requires that demand equas suppy such that the market cears. Certainy we want to have enough suppy to satisfy current demand, i.e., we want that: x = (y. But remember that users are not constanty adjusting x i. Instead, they choose maximum bounds on their suppy X i via the UI. But given that the maximum suppy X i is generay arger than x i, it is not a very strong requirement to have x = (y. In particuar, we do not ony want to baance the market now, but we want to guarantee that the backup system can aso satisfy demand Y in the future, which impies that we must aways have some excess suppy of a resources. Ideay, we want to maximize the buffer between the current usage of resources, i.e., (y and the maximum suppy of resources, i.e., X. We wi use this size of the buffer repeatedy and thus define it more formay: Definition 1. (Size of the Suppy-Side Buffer The size of the suppy-side buffer is the smaest ratio, over a resources, of maximum suppy to current demand: λ = min {S,U,D} X (y The reason for having a suppy side buffer is that we want to be safe, i.e., we want to be sure that we can satisfy new incoming requests. More specificay, we want to make sure that as demand increases from its current state y to the maximum state we aow the users Y, we wi aways have enough suppy to satisfy this demand. More formay: Definition 2. (Safety Property The safety property of the system is that we aways have enough suppy to satisfy increasing demand, i.e.: (4 y Y : X (y (5 Using the bijectivity and the CRTS properties of the production functions, it is easy to show the foowing emma:

7 Lemma 1. If X = (Y, then the safety property is satisfied. In words, when the bound on aggregate suppy of a resources equas the amount of resources needed to produce the projected service vector Y, then we can guarantee the safety property. When we have reached this state of the system, we say we have reached the buffer equiibrium: Definition 3. (Buffer Equiibrium A Buffer equiibrium is a price vector p = (p S, p U, an aggregate suppy vector X(p, and an aggregate demand vector Y (p, such that: X(p = (Y (p i.e., it is a Warasian equiibrium defined on the suppy and demand bounds chosen by the users. We ca this equiibrium the buffer equiibrium because the extent to which X is above (y, i.e., the size of the buffer, determines the eve of safety in the system. 5.2 Equiibrium Existence In this section, we wi prove that a buffer equiibrium exists under some reasonabe assumptions. To do so, we first introduce some new notation and prove two Lemmas before we get to the actua theorem. We et L = {S, U, D} and we use to index a particuar composite resources. We define the vector-vaued function Z(p to measure the reative buffer for each individua resource in the foowing way: ( Z (p = X (p (y(p L X (p (y(p In words, the first term represents the average suppy to demand ratio, in our case averaged over the three goods storage space, upoad and downoad bandwidth. The second term represents the suppy to demand ratio of the particuar good. Thus, Z (p represents how far the buffer between suppy and demand for good is away from the average buffer. If Z is negative, then the buffer between suppy and demand for good is reativey high and shoud be decreased; if Z is positive, then the buffer between suppy and demand for good is reative ow and shoud be increased. If the buffer is the same for a goods, we have reached the equiibrium. Thus, we can prove the foowing Lemma: Lemma 2. If Z(p = 0, then the market has reached a buffer equiibrium and p is the equiibrium price vector. Proof. If Z(p = 0 then: ( : X (p (y(p L = X (p (y(p λ > 1 s.t. : X (p = λ (y(p Now, due to Assumption 2 we know that δ : Y = δ y. Thus: (6 : X (p = λ ( 1 Y (p δ (7 : X (p = λ 1 δ (Y (p (8 : X (p = λ (Y (p for λ = λ 1 δ (9 X(p = λ (Y (p (10 But from the fow constraints (Eqn. 4 we aso know that: X(p p = (Y (p p (11 Equations (10 and (11 can ony both be true if λ = 1. Thus, it foows that: X(p = (Y (p which is the definition of the buffer equiibrium. Next we show that Z( has a series of nice properties: Lemma 3. Given that users preferences are strongy monotone with respect to suppy resources, the function Z( has the foowing properties: (i Z( is continuous. (ii Z( is homogeneous of degree zero. (iii p : Z (p = 0. (iv If p n p, where p 0, p > 0 and p k = 0 for some k, then for n sufficienty arge: Z k (p n = max{z S (p n, Z U (p n, Z D (p n }. Proof. Property (i, the continuity of Z( foows directy from the continuity of the user preferences (which is why X(p and y(p are continuous and the continuity of the inverse production functions. Property (ii, the homogeneity of degree zero foows because X(p and y(p are homogeneous of degree zero. Property (iii, foows directy from the definition of Z( : ( Z (p = 3 X (p (y(p L {S,U,D} X(p (y(p = 0 Finay, property (iv: as the price of resource k {S, U, D} goes towards zero, due to users strongy monotone preferences for suppy resources, they wi suppy ess and ess of that resource, and suppy more of the other resources instead, at east of resource whose price is bounded away from zero. However, because of the bunde constraints, the users cannot reduce their suppy of resource k towards zero. Let γ > 1 denote the sack factor we aow users when setting their preferences. The reevant constraints, ower-bounding the suppy for resource k, are: L \ {k} : X ik 1 λ k (Y (Y X i As p n p with p k = 0, for n arge enough, p n wi be sufficienty cose to zero, such that each user i chooses to suppy the minima amount of resource k that is possibe. Thus, at east with respect to one of the other resources, the sack constraint wi be binding, i.e.,: { 1 X ik = max λ k (Y (Y X i, 1 λ k (Y } fm 1 (Y Xim This does not say that the constraint wi be binding for the same resource or m for every user. However, for every user, one of the constraints wi be biding and thus, every user wi contribute east to the suppy side buffer for resource k. Consequenty, the tota suppy side buffer for good k wi be minima (i.e., Z k (p n wi be maxima, which impies that Z k (p n = max{z S (p n, Z U (p n, Z D (p n }.

8 Theorem 1. A buffer equiibrium exists in the P2P exchange economy, given that users preferences are continuous and stricty convex, monotone w.r.t. service products as we as strongy monotone w.r.t. to suppy resources. Proof. We have shown in Lemma 2 that once we have found a price vector p such that Z(p = 0, we have reached a buffer equiibrium. Furthermore, in Lemma 3 we have shown four properties of Z(. Equipped with these two resuts, the remainder of our proof foows using techniques from the standard equiibrium existence proof for the Warasian Equiibrium (see [11] page 586. We omit the detais here due to space constraints, but we want to briefy point out where changes in the proof are necessary. First, note that we are not working with the excess demand function of this economy and instead use the function Z( that measures the reative buffer size for each resource. Thus, in step 1 of the proof in [11], we cannot use Waras aw and instead use property (iii of Lemma 3. Second, in step 4 of the proof in [11], when proving upper hemicontinuity of the fixed-point correspondence, we cannot use the resut that the excess demand for one resources goes to infinity when its price goes towards zero. Instead, we use property (iv of Lemma 3. One might wonder what prevented the direct appicabiity of standard theorems regarding equiibrium existence (e.g., [11] page 585. It turns out that the standard resuts were not directy appicabe for three reasons. Most importanty, we do not assume that users have strongy monotone preferences w.r.t. service products. The consequence of this is that as the price of one resources goes towards zero, it is not necessariy the case that the demand for service products produced from that resource go towards infinity. Second, we do not have a pure exchange economy and have to take the production functions into account. Those exhibit constant returns to scae which impies that production sets are neither stricty convex nor bounded above, which compicates the anaysis significanty (cf. [11] page 583. Third, each user s suppy is subject to the bunde constraints, i.e., a user s suppy cannot drop beow or go above certain imits. Due to these three factors, we needed sighty different machinery to prove equiibrium existence in our economy. 5.3 Equiibrium Uniqueness Without any further restrictions on the user s preferences, we cannot say anything about the uniqueness of the buffer equiibrium (cf. Sonnenschein-Mante-Debreu Theorem, [11], pp , because the substitution effect and the weath effect coud either go in the same direction or in opposite directions.the gross substitutes assumption resoves this probem, by assuming that in gross terms, taking substitution and weath effect into account, the resources are substitutes. We assume that this is the case for the suppy resources: Assumption 3. (Suppy Resources are Gross Substitutes We assume that the aggregate suppy function X(p satisfies the gross substitutes condition [1], i.e., whenever p and p are such that, for some k, p k > p k and p = p for k, we have X (p < X (p for k. The standard equiibrium uniqueness proof for Warasian equiibria reies on the assumption that the aggregate excess demand function satisfies the gross substitutes condition for a commodities. However, we ony want to make that assumption w.r.t. suppy resources where this seems very reasonabe because as the price for one resource decreases, this means that the reative price for another resource increases, and thus users woud be happy to suppy more of the more costy resources now. However, for the demanded services, the gross substitutes assumption is most certainy vioated. For exampe, if the price for storage woud increase, it is not reasonabe to assume that now users woud store fewer fies onine, but instead consume more backup and retrieva operations. Thus, we cannot make the gross substitutes assumption for a commodities. Instead, we wi make the foowing assumption w.r.t consumed services: Assumption 4. (Services are Perfect Compements We assume that the aggregate demand function Y (p satisfies the perfect compements condition. A consequence of the perfect compements condition is that price changes affect a dimensions of the aggregate demand vector equay. For an individua user, the Leontief utiity function woud induce the perfect compements property. However, note that we do not require individua users to have demand functions that satisfy the perfect compements condition. It is a much weaker assumption, and much more reasonabe due to the aw of arge numbers, to assume that the aggregate demand function satisfies it. Theorem 2. The buffer equiibrium is unique (up to normaization, given that the aggregate suppy function satisfies the gross substitutes property (Assumption 3, and that the aggregate demand function satisfies the perfect compements property (Assumption 4. Proof. The fact that we have different assumptions on the suppy and demand side of our economy compicates the uniqueness proof. When prices go up for good, it is not a priori cear what happens to the buffer equiibrium. To get a better hande on this, we first separate the suppy and demand aspects by introducing yet another aternative description of the buffer equiibrium: X = (Y (12 ( (X S, X U, X D = 1 1 (Y, f (Y, f (13 S U D ( 1, X U, X D = (1, U X S X S D S S (14 ( XU, X ( D U X S X S D = 0 S S (15 We define a new vector-vaued function g(p = ( g U (p, g D (p : ( XU g U (p = (Y U X S (Y S ( XD and g D (p = X S (Y D (Y S which naturay eads to a new equiibrium definition: Definition 4. (Buffer Equiibrium [1. Aternative] A buffer equiibrium is a price vector p and g(p such that ( 0 g(p =. 0 Thus, we have simpified the probem of finding equiibrium prices to finding the root of the function g(p. Showing uniqueness of the buffer equiibrium is now equivaent to,

9 showing that g(p = 0 has at most one (normaized soution. Now, et s assume that g(p = 0, i.e., p is an equiibrium price vector. We show that for any p, g(p 0 uness p and p are coinear, i.e., uness p = λp for some λ > 0. Note that because X(p and Y (p are homogeneous of degree zero, g( is aso homogeneous of degree zero. Thus, we can assume that p p and p = p for some. We now ater the price vector p to obtain p in two steps, owering (or keeping unatered the price of resources k one at a time. Because of Assumption 4 (the aggregate demand function satisfies the perfect compements condition, a price change affects a dimensions of the demand function equay, i.e., µ R : Y (p = µ Y (p. Because the production function is bijective and exhibits constant returns to scae, this impies that (Y (p = µ (Y (p. Thus, 1 (Y (p f U U = (Y (p (Y (p S S (Y (p, i.e., changes in the demand function Y ( due to price changes do not affect g(. Thus, we ony have to pay attention to changes in the suppy function X. Here, we need to differentiate the foowing 3 cases:. Case 1: =storage. By gross substitution (see Assumption 3, the suppy of good S cannot decrease in any step, and, because p p, it wi actuay increase in at east one step. In turn, the suppy of U and D wi stay the same or decrease because of homogeneity of degree zero. Thus, the first term in the g( functions wi decrease, whie the second term stays constants, and thus, g(p < g(p. Case 2: =upoad bandwidth. By gross substitution, the suppy of good U cannot decrease in any step, and, because p p, it wi actuay increase in at east one step. The suppy of S and D on the other hand wi stay the same or decrease. Thus, the first term in g U ( wi increase, whie the second term stays constants. Thus, g U (p > g U (p (note, we do not even need to consider g D (p in this case. Case 3: =downoad bandwidth. By gross substitution, the suppy of good D cannot decrease in any step, and, because p p, it wi actuay increase in at east one step. The suppy of S and U on the other hand wi stay the same or decrease. Thus, the first term in g D ( wi increase, whie the second term stays constants. Thus, g D (p > g D (p (again, we do not even need to consider g U (p is this case. In summary, in a three cases we estabished that g(p g(p which concudes the equiibrium uniqueness proof. 5.4 Limited Controabiity of the Buffer Size So far we have shown under what conditions the buffer equiibrium exists and when it is unique. We know from Lemma 1 that when the system is in the buffer equiibrium, then the safety property is guaranteed, i.e., we aways have enough suppy to satisfy demand as it increases from y towards Y. But what happens if the system is out of equiibrium? Note that in practice, users do not permanenty adjust their settings, and thus price changes wi ony affect suppy and demand with a significant deay. Consequenty, it woud be desirabe to have a arge enough buffer between current demand and maximum suppy, such that even if the system is out of equiibrium, we can satisfy new incoming demand. For exampe, it seems ike desirabe goa to have at east twice as much suppy as current demand, i.e., X 2 (y. Unfortunatey, the uniqueness of the buffer equiibrium has an immediate consequence regarding the imited controabiity of the buffer equiibrium: Coroary 1. (Limited Controabiity of the Market Given Property 4, and Assumptions 3 and 4, the market operator cannot infuence the size of the buffer in the buffer equiibrium. Given this imited controabiity, it is natura to ask what buffer size to expect in equiibrium. It turns out that, in equiibrium, the suppy side buffer is uniquey determined via the individua demand side buffers of a users. Proposition 1. In the buffer equiibrium, given Assumption 2, the size of the suppy buffer equas the size of the demand buffer. Proof. X = (Y (16 X = (λ y (17 X = λ (y (18 In words, the size of the buffer depends on how forwardooking the agents are. If on average the users give themseves a 25% buffer on the demand side (e.g., a user has currenty backed up 20GB and sets the siders in such a position that his/her maximum onine backup space is 25GB, then we woud aso have a 25% buffer on the suppy side, i.e., X = 1.25 (y. Now we turn to the question why the market operator cannot infuence the size of the suppy side buffer, i.e., which system properties or which assumptions we made in our market economy are the imiting ones. Remember that the imited controabiity of the buffer equiibrium was a coroary of the uniqueness property, which reied on two assumption, namey gross substitutabiity of suppied resources, and that services are perfect compements. It turns out, however, that the imited controabiity remains even without those assumptions, strengthening the resut from Coroary 1: Proposition 2. Given system property 4, if each individua user i has a imited panning horizon in that he chooses not to give himsef more than a demand side buffer of λ i, then there exists a Λ such that the market operator cannot achieve a buffer equiibrium with buffer size Λ. Proof. For the proof we construct a simpe counterexampe. We aow price changes to affect x, X, y and Y and in particuar we do not make the gross substitutabiity assumption or the perfect compements assumption. We choose a Λ such that i : Λ > λ i. And we et λ i = max i λ i. Now: i : Y i = λ i y i (19 Y = i λ i y i (20 Y λ i y i (21 i Y λ i y i (22 i Y λ i y (23 (y λ i (y (24 X λ i (y (25 Thus, the buffer between suppy and demand woud be ess or equa to λ i which by assumption was stricty ess than the buffer Λ that the market operator desired.

10 5.5 Decouping Suppy and Demand Prices We have seen in the previous section that the imited controabiity of the buffer equiibrium does not hinge on the gross substitutes or perfect compements assumptions. In this section we show that the crux of the matter is System Property 4, i.e., the couping of suppy and demand prices: X i p = Y i q (26 X i p = (Y i p (27 So far, we have charged each consumer exacty as much as we pay the suppiers for the corresponding resources. If we decoupe suppy and demand prices we gain additiona freedom in pricing commodities. In particuar, if we wanted to increase the suppy side buffer, then we coud make some services more expensive, i.e., we coud increase the price on the services beyond the true costs for the resources necessary to produce them. To avoid extracting money out of the system over time, in an actua transaction, we woud sti charge the consumers according to the true costs of the resources. The infated prices woud ony be used in the UI to induce users to increase their suppy so that the market operator coud achieve the desired size of the suppy buffer. Proposition 3. If we decoupe suppy and service prices, then the market operator can adjust prices such as to achieve any desired buffer size Λ > 1. Proof. We sti et p denote the price used to cacuate the payments for the suppiers. But consumers now have to pay price p = Λ p for the resources that are necessary to produce their services. Thus, the new fow constraint is: X p = (Y Λ p (28 Note that showing consumers a price of Λ p instead of p has the same effect as if the amount of resources necessary to produce the corresponding services had increased by factor Λ. Now, assume the market operator updates price vector p, as before, unti Z(p = 0. In the proof for Lemma 2 we have shown that this impies X(p = λ (Y (p. If we pug this into the new fow constraint, we get: λ (Y (p p = (Y Λ p λ = Λ X(p (Y (p = Λ X(p (y(p Λ which shows that suppy side buffer Λ can be achieved. At first sight this resut might seem ike the perfect soution regarding the previousy imited controabiity of the buffer equiibrium. But unfortunatey, it is not. There is a good reason for charging the consumers as much as we pay the suppiers, namey such that in the UI we can dispay to the users the true costs of the services they are consuming. Effectivey, the UI shows the users some kind of contract: if you want to consume these services, then you need to suppy this many resources to pay for them. Note that if we woud artificiay infate prices, we woud dispay an incorrect contract, and in some sense ie to our users. Some users might actuay figure this out over time an then try to expoit it. Furthermore, this technique coud actuay decrease the efficiency of the system. Some users might consume ess services than before because they cannot afford to give up as much suppy as the UI says they woud need to. In the worst case, users might decide to competey eave the system when the perceived costs for using it seem to high. Thus, it remains to be studied how new user interfaces can be designed that give the market operator more contro over the suppy side buffer, whie avoiding the probems we mentioned. 6. THE PRICE UPDATE ALGORITHM In this section we devise a price update agorithm that is invoked reguary on the server (e.g., once a day, with the goa to move prices towards the buffer equiibrium over time. Our agorithm is oriented at the tâtonnement process as defined by Waras [15]. However, Waras agorithm ony aowed trades at equiibrium prices. In our system, however, we must aow trades at a times, even out of equiibrium. 6.1 The Agorithm Because users preferences are homogeneous of degree zero, coinear price vectors are equivaent. Thus, instead of searching for the equiibrium price vector in R 3, we can simpify the task by ooking at projective space RP 2 : RP 2 := ( (p S, p U R 3 \ {0} /, with (p S, p U λ(p S, p U λ R + Thus, we can fix the price of an arbitrary good (the numeraire and normaize the price vector accordingy. Here, we normaize the price of storage space to 1: p = (p S, p U (1, p U p S p S In Section 5.3, we have reduced the probem of finding the buffer equiibrium to finding the root of the function g(p = ( g U (p, g D (p where ( XU g U (p = (Y U X S (Y S ( XD and g D (p = X S (Y D (Y S This formuation of the buffer equiibrium is aso usefu for the price update agorithm, because finding the root of a function is a we-understood mathematica probem. Newton s method is probaby the best-known root-finding agorithm, which aso converges very quicky in practice. However, it requires the evauation of the function s derivative at each step. Unfortunatey, we don t know the function g( and thus cannot compute its derivative. Instead, we ony get to know individua points in each iteration and can use these points to estimate the derivative. This is exacty what the secant method does for a one-dimensiona function. The probem is that g(p is 2-dimensiona, and thus the secant method is not directy appicabe. The appropriate muti-dimensiona generaization is Broyden s method [4], a quasi-newton method. Unfortunatey, that method requires knowedge of the Jacobian, which we don t know and aso cannot even measure approximatey. However, we show that one can use an approximation to the diagona sub-matrix of the Jacobian instead of the fu Jacobian matrix. The diagona sub-matrix of the Jacobian can be approximated by studying changes in the function g(p. This eads to the foowing quasi-newton method for mutipe dimensions:.