Applied Microeconomics

Size: px
Start display at page:

Download "Applied Microeconomics"

Transcription

1 Applied Microeconomics Munich March 2018 Learning from Failures: Optimal Contract for Experimentation and Production Fahad Khalil Jacques Lawarrée and Alexander Rodivilov

2 Learning from Failures: Optimal Contract for Experimentation and Production * Fahad Khalil Jacques Lawarrée Alexander Rodivilov ** January Abstract: Before embarking on a project a principal must often rely on an agent to learn about its profitability These situations are conveniently modeled as two-armed bandit problems highlighting a trade-off between learning (experimentation) and production (exploitation) We derive the optimal contract for both experimentation and production when the agent has private information about his efficiency in experimentation Private information in the experimentation stage can generate asymmetric information between the principal and agent about the expected profitability of production The degree of asymmetric information is endogenously determined by the length of the experimentation stage An optimal contract uses the timing of payments the length of experimentation and the output to screen the agents Asymmetric learning by agents with different efficiency imply that both upward and downward incentive constraints can be binding and that agents are rewarded for early success when efficient and for late success when inefficient Rewarding failure can be optimal to screen agents if the length of the experimentation period is short This result is robust to the introduction of ex post moral hazard We also show that over-experimentation and over-production can be optimal to screen the agent Keywords: Information gathering optimal contracts strategic experimentation * We are thankful for the helpful comments of Nageeb Ali Renato Gomes Navin Kartik Marina Halac David Martimort Larry Samuelson and Dilip Mookherjee ** khalil@uwedu lawarree@uwedu Department of Economics University of Washington; arodivil@stevensedu School of Business Stevens Institute of Technology

3 1 Introduction Before embarking on a project it is important to learn about its profitability to determine how much resource to allocate to the project For example the optimal investment needed to extract oil from new oil fields depends on what is learned from seismic surveys and exploration drills Another example is a standard procurement problem where a government agency (principal) finds it important to ask a supplier (agent) to acquire planning information to figure out the optimal scale of the project 1 However such exploration not only is costly and delays production it can also create asymmetric information about the profitability of the project between the agent and the principal This can lead to information rent for the agent which also affects the optimal production decision The principal must account for the interdependence between the optimal output and the length of the exploration stage when designing an incentive scheme for the agent To see how the exploration or experimentation process itself can create asymmetric information about the profitability of the project consider the case where the exploration is performed by an agent who privately knows his efficiency in experimentation Exploration drills will demonstrate the profitability of the oil field If the agent is not efficient at exploration drills a poor result from the exploration only provides weak evidence of low profitability of the well However if the owner (principal) is misled into believing that the agent is highly efficient she becomes more pessimistic than the agent A trade-off appears for the principal More experiments may provide better information about the profitability of the project but can also increase asymmetric information about its expected profitability which leads to information rent for the agent in the production stage In this paper we derive the optimal contract for an agent who conducts both experimentation and production The essential features of the problem can be conveniently represented as a trade-off between experimentation and production (exploitation) analogous to a two-armed bandit problem 2 At the outset the principal and agent are symmetrically informed 1 See for example Krähmer and Strausz (2011) and references therein Other applications are the testing of new drugs the adoption of new technologies or products the identification of new investment opportunities the evaluation of the state of the economy consumer search etc 2 See Bolton and Harris (1999) Keller Rady and Cripps (2005) or Bergemann and Välimäki (2008) 1

4 that production cost can be high or low Success in the experimentation stage is assumed to take the form of finding ie the agent finds out that production cost is low 3 Thus if experimentation continues without success the expected cost increases and both principal and agent become pessimistic about project profitability efficiency (the probability of success in any given period when cost is actually low) is privately known to the agent After a series of failed experiments a lying inefficient agent will have a lower expected cost of production compared to the principal This difference in expected cost implies that that the principal (mistakenly believing the agent is efficient) will overcompensate him in the production stage An inefficient agent must be paid a rent to prevent him from overstating his efficiency A key contribution of our model is to study how the asymmetric information generated in the experimentation stage impacts production and how production decisions affect experimentation 4 At the end of the experimentation stage there is a non-trivial decision regarding the scale of output which depends on what is learned during experimentation Relative to the nascent literature on incentives for experimentation reviewed below the novelty of our approach is to study optimal contracts for both experimentation and production While the literature has equated project implementation with success in experimentation we highlight the impact of learning from failures in the optimal scale of production As we will see much can be learned even when experimentation fails and this information would be lost in a model without a production stage The asymmetric information created by experimentation impacts the optimal contract In particular it affects (i) the length of the experimentation period (ii) the information rent for the agent and the timing of its payment and (iii) the output We discuss them in turn (i) First we show that the optimal length of the experimentation period could be longer than the first-best length whereas under experimentation relative to the first best is the standard result in models of experimentation Under asymmetric information the difference in expected 3 We present our main insights by assuming that success is publicly observed but show that our key results hold even if the agent could hide success We also show our key insights hold in the case of success being bad news 4 Intertemporal contractual externality across agency problems also plays an important role in Arve and Martimort (2016) 2

5 costs between types the driving force for the rent determines the distortion in the length of the experimentation stage Interestingly we find that this difference in expected costs is nonmonotonic in time because the agents have different efficiency in experimenting Thus the principal may actually benefit from increasing the length of the experimentation stage since it may lower the difference in expected costs Using time for screening the types turns out to be complex as it balances two countervailing forces: a more efficient experimenter has a greater chance to learn good news but also becomes pessimistic more quickly after successive failures Another consequence is that the optimal length of the experimentation phase varies for each type (ii) Second it is possible that the efficient agent also gets a rent He faces a gamble when misreporting his type as inefficient On the one hand he can hope to collect the rent offered to the inefficient type On the other hand if he fails in experimentation he will be undercompensated at the production stage as he will be relatively more pessimistic compared to the principal These two elements are both related to the difference in the expected costs after failure When the efficient type is much more likely to succeed than the inefficient type the adverse selection problem is relatively severe and the positive part can dominate to make this gamble positive 5 The timing of payment is used to minimize the value of this gamble to the efficient type and limit the rents paid to both types To illustrate this incentive suppose the principal rewards early success for the inefficient agent Such a scheme may look attractive for the efficient agent because he is more likely to succeed early in the experimentation stage But an alternative scheme such as rewarding failure by the inefficient agent may also become attractive for the efficient agent during a long experimentation stage because successive failures convince the efficient agent that the project is high cost and experimentation is likely to fail As indicated by the above the relative likelihoods of success and failure play a crucial role in determining the optimal timing of the payments We find that rent to the efficient agent will be paid for early 5 Indeed we show that if the principal were restricted to offering an identical length of the experimentation stage to both types the difference in expected cost is identical for each type and the high- is not binding Thus it efficient type 3

6 success since he is more likely to succeed early If the principal wants to reward an inefficient agent for success it has to be late in the experimentation stage Remarkably we also show that it may be optimal for the principal to reward an agent after failure When the optimal length of the experimentation period is short the relative probability of failure for an inefficient agent is greater than his relative probability of success Thus rewarding the inefficient agent after failure becomes a useful tool to screen the types 6 Our result suggests that the principal is more likely to reward failures in industries where cost of an experiment is relatively high; for example this is the case in oil drilling In contrast if the cost of experimentation is low (like on-line advertising) the principal will rely on rewarding the agent after success One may wonder if this result is sensitive to the absence of moral hazard in our model To show that it does not we consider the case where the agent can hide success The agent must then be rewarded in every period to disclose success Thus we obtain an intuitive payment scheme where the agent collects a moral-hazard rent in every period but the screening incentive to delay reward for success or even reward failure remains intact (iii) Third unless experimentation succeeds the principal will use the choice of output to screen the types during the production stage Since the inefficient agent always gets a rent we expect and indeed find that the output of the efficient agent is distorted downward as in a standard static second best contract However when the efficient agent also commands a rent the output of the inefficient agent is distorted upward A higher output for the inefficient agent makes it more costly for the efficient agent to lie since a lying efficient agent being relatively more pessimistic after failure will be under-compensated in the production stage Our paper builds on two strands of the literature First it is related to the literature on principal-agent contracts with endogenous information gathering before production 7 It is typical in this literature to consider static models where an agent exerts effort that increases the precision of a signal relevant to production By modeling this effort as experimentation we introduce a dynamic learning aspect and especially the possibility of asymmetric learning by 6 In Manso (2011) rewarding failure solves a moral hazard problem 7 Early papers are Cremer and Khalil (1992) Lewis and Sappington (1997) and Cremer Khalil and Rochet (1998) while Krähmer and Strausz (2011) contains recent citations 4

7 different agents We contribute to this literature by characterizing the structure of incentive schemes in a dynamic learning stage Importantly in our model the principal can determine the degree of asymmetric information by choosing the length of the experimentation stage and over or under-experimentation can be optimal To model information gathering we rely on the growing literature on contracting for experimentation following Bergmann and Hege ( ) Most of that literature has a different focus and characterizes incentive schemes for addressing moral hazard during experimentation but does not consider adverse selection 8 Recent exceptions that introduce adverse selection are Gomes Gottlieb and Maestri (2016) and Halac Kartik and Liu (2016) 9 In Gomes Gottlieb and Maestri there is two-dimensional hidden information where the agent is privately informed about the quality (prior probability) of the project as well as a private cost of effort for experimentation They find conditions under which the second hidden information problem can be ignored Halac Kartik and Liu (2016) have both moral hazard and hidden information They extend the moral hazard-based literature by introducing hidden information about expertise in the experimentation stage to study how asymmetric learning by the efficient and inefficient agents affects the bonus that needs to be paid to induce the agent to work 10 We add to the literature by showing that asymmetric information created during experimentation affects production which in turn introduces novel aspects to the incentive scheme for experimentation Unlike the rest of the literature we find that over-experimentation relative to the first best and rewarding an agent after failure can be optimal to screen the agent The rest of the paper is organized as follows In section 2 we present the base goodnews model under adverse selection and public success In section 3 we consider extensions and robustness checks In particular we study ex post moral hazard where the agent can hide success and the case where success is bad news 8 See also Horner and Samuelson (2013) 9 See also Gerardi and Maestri (2012) for another model where the agent is privately informed about the quality (prior probability) of the project 10 They show that without the moral hazard constraint the first best can be reached In our model we impose a limited liability instead of a moral hazard constraint 5

8 2 The Model (Learning good news) A principal hires an agent to implement a project of a variable size Both the principal and agent are risk neutral and have a common discount factor It is common knowledge that the marginal cost can be low or high ie with The probability that is denoted by Before the actual production stage (exploitation) the agent can gather information regarding the production cost We call this the experimentation stage The experimentation stage During the experimentation stage the agent gathers information about the cost of the project The experimentation stage takes place over time where is the maximum length of the experimentation stage and is determined by the principal In each period experimentation costs and we assume that this cost is paid by the principal at the end of each period We assume that it is always optimal to experiment at least once In the main part of the paper information gathering takes the form of looking for good news (see section 32 for the case of bad news) If the cost is actually low the agent learns it with probability in each period If the agent learns that the cost is low (good news) we will say that the experimentation was successful To focus on the screening features of the optimal contract we assume for now that the agent cannot hide evidence of the cost being low In section 31 we will revisit this assumption and study a model with both adverse selection and ex post moral hazard We assume that the agent is privately informed about his experimentation efficiency represented by Therefore the principal faces an adverse selection problem As we will see next this implies that the principal and agent may update their beliefs differently during the experimentation stage tion about his efficiency determines his type and we will refer to an agent with high or low efficiency as a high or low-type agent With probability the agent is a high type With probability he is a low type Thus we define the learning parameter with the type superscript: ) 6

9 where as both and approach amount of failures Figure 1 Expected cost with Third we also note the important property that the difference in the expected cost is a non-monotonic function of time If the first failure would be a perfect signal regarding the project quality 12 There exists a unique time period such that achieves the highest value at this time period where 7

10 The production stage After the experimentation stage ends production takes place The principal value of the project is where is the size of the project The function is strictly increasing strictly concave twice differentiable on and satisfies the Inada conditions The size of the project and the payment to the agent are determined in the contract offered by the principal before the experimentation stage takes place If experimentation reveals that cost is low in a period experimentation stops and production takes place based on 13 If experimentation fails ie there is no success during the experimentation stage production occurs based on the expected cost in period 14 The contract Before the experimentation stage takes place the principal offers the agent a menu of dynamic contracts Relying on the revelation principle we use a direct truthful mechanism where the agent is asked to announce his type denoted by A contract specifies for each type of agent the length of the experimentation stage the size of the project and a transfer as a function of whether or not the agent succeeded while experimenting In terms of notation in the case of success we include as an argument in the wage and output for each In the case of failure we include the expected cost 15 A contract is defined formally by where is the maximum duration of the experimentation stage for the announced type and are the wage and the output produced if he observed and the output produced if consecutive times An agent of type announcing his type as receives expected utility at time zero from a contract : 13 In this model there is no reason for the principal to continue to experiment once she learns that cost is low 14 We assume that the agent will learn the exact cost later but it is not contractible 15 Since the principal pays for the experimentation cost the agent is not paid if he does not succeed in any 8

11 Conditional on the actual cost being low which happens with probability the probability of succeeding for the first time in period is given by Experimentation fails which happens with probability or if the agent fails times despite which happens with probability The optimal contract will have to satisfy the following incentive compatibility constraints for all and : We also assume that the agent must be paid his expected production costs whether experimentation succeeds or fails Therefore the individual rationality constraints have to be satisfied ex post (ie after experimentation): 16 for where the and are to denote success and failure type is offered to the agent of where the cost of experimentation is 16 We discuss later the case of ex ante participation constraints On the importance of ex post participation constraints in sequential screening see the recent paper by Krähmer and Strausz (2015) 9

12 To summarize the timing is as follows: (1) The agent learns his type (2) The principal offers a contract to the agent In case the agent rejects the contract the game is over and both parties get payoffs normalized to zero; if the agent accepts the contract the game proceeds to the experimentation stage with duration as specified in the contract (3) The experimentation stage begins (4) If the agent learns that the experimentation stage stops and the production stage starts with output and transfers as specified in the contract In case no success is observed during the experimentation stage the production stage starts with output and transfers as specified in the contract 21 The First Best Benchmark is common knowledge before the principal offers the contract agent covers the cost in case of success and the expected cost in case of failure the wage to the for 10

13 Note that the first-best termination date of the experimentation stage is a nonmonotonic This non-monotonicity is a result of two countervailing forces In any given period of the experimentation stage the high type is more likely to learn conditional on the actual cost being low This suggests that the principal should allow the high type to experiment longer However at the same time the high type agent becomes relatively more pessimistic with repeated failures This can be seen by looking at the probability of success conditional on reaching period given by over time In Figure 2 we see that this conditional probability of success for the high type becomes smaller than that for the low type at some point Given these two countervailing forces the first-best stopping time for the high type agent can be shorter or longer than that of the type agent depending on the parameters of the problem 17 amount of failures Figure 2 Probability of success with 17 For example if and then the firstbest termination date for the high type agent is whereas it is optimal to allow the low type agent to experiment for seven periods However if we now change to and to the low type agent is allowed to experiment less that is 11

14 22 Asymmetric information Assume now that the agent privately knows this type To understand the role of beliefs in generating rent in the production stage we start with a benchmark without an experimentation stage but with asymmetric information about expected cost of production The principal can only screen the agents with the output and payments We obtain a standard second best contract where the hidden parameter is the expected marginal cost (eg Baron-Myerson (1982) Laffont- Tirole (1986)) Suppose in this case a type belief is denoted by which implies that the expected cost at the production stage is Suppose that the high-type is more pessimistic than the low type about the cost being low: This implies that now is: It can be easily shown that the optimal contract resembles a standard second-best contract with adverse selection In particular and constraints are binding the low type gets a positive informational rent and produces the first-best output: The high type gets zero rent and his output is distorted as follows: As in a standard adverse selection model We now return to our main case where an experimentation stage precedes production Recall that asymmetric information arises in our setting because the two types learn asymmetrically in the experimentation stage and not because there is any inherent difference in their ability to implement the project Furthermore private information can exist only if experimentation fails If an agent experiences success before the termination date the true cost is revealed 12

15 We now introduce some notation for ex post rent of the agent which is the rent in the production stage Define by the wage net of cost to the type who succeeds in period and by the wage net of the expected cost to the type who failed during the entire experimentation stage: for Therefore the ex post constraints can be written as: for where the and are to denote success and failure Note that if the principal only had to satisfy an ex ante participation constraint she could use the fact that the high type is relatively more likely to succeed (conditional on ) to screen the agent without distorting the duration of the experimentation stage In other words since success during the experimentation stage is a random event that is correlated -known ideas from mechanisms à la Crémer- McLean (1985) that says the principal can still receive the first best profit 18 To simplify the notation we denote of type does not succeed during the periods of the experimentation stage: Using this notation we can rewrite the two incentive constraints as: 18 To implement the first best the principal has to counter the incentive of the low type to pretend to be a high type Relative to the first best payments the principal can change the payments to the high type only She can increase the payment in case of success and lower it otherwise while keeping the high type at zero expected rent his level of utility in the first best contract This payment scheme will lower the rent of the low type who is less likely to succeed in the experimentation stage By choosing the appropriate transfers the principal can obtain the first best level of profit While this scheme ensures a zero expected rent for each type it also implies a large positive ex post rent in case of success and a large negative ex post rent in case of failure When such schemes are allowed the first best can be reached See Theorem 1 in Halac Kartik and Liu (2016) for a formal proof in a case without production and a fixed up-front payment 13

16 In this problem it is the low type who has an incentive to claim to be a high type and is binding: since a high type must be given his expected cost following failure a low type will have to be given a rent to truthfully report his type as his expected cost is lower that is We will see that it is also possible that the becomes binding making both binding simultaneously Using our notation the principal maximizes the following objective function subject to and The optimal contract is derived formally in Appendix A and the key results are presented in Propositions 1 2 and 3 The principal has three tools to screen the agent: the timing of the payments the length of the experimentation period and the output for each type We examine each of them below 221 The timing of the payments: rewarding early/late success or failure Recall that the low type receives a strictly positive rent and is binding 19 The principal has to determine when to pay this rent to the low type and we show here that the timing of payment can lead to both types of agents getting a rent For instance if the principal pays the rent to the low type after an early success the high type may be interested in claiming to be low type to collect the rent Indeed the high type is more likely to succeed early given that the project is low cost 19 We begin Appendix A by proving this result in a Claim 14

17 However misreporting his type is risky for the high type If he fails to find good news the principal believing that he is a low type will require the agent to produce based on a lower expected cost Thus misreporting his type is a gamble for the high type: he has a chance to obtain the lowduring the experimentation stage The principal We will next characterize the optimal timing of payments which depends on the stochastic structure of the dynamic problem and its impact on the highmisreports There are two cases depending on whether only or both constraints are binding Proposition 1 There are two cases to consider Case A: type here is no restriction on when to reward the low ) the principal must reward the low type after failure and the high type for early success (in the very first period) If the cost of experimentation is small the principal must reward the low type after late success (last period) and the high-type for early success (in the very first period) Proof: See Appendix A If is not binding we show in Case A of Appendix A that the principal can use any combination of and to satisfy the binding : there is no restriction on when and how the principal pays the rent to the low type as long as failure or both Therefore the principal can reward either success 15

18 If is binding the high type has an incentive to claim to be a low type and we are in Case B This is the more interesting case and we focus on its characterization in this section We first simplify the analysis by showing in Appendix A that if the principal rewards success it will be in at most one period 20 This is due to the strictly monotonic relative likelihood of success of the two types and it greatly simplifies the analysis since for at most one period We denote by the rent to the low type ie the of the : The principal can pay by rewarding success in some period ( ) or rewarding failure ( ) If she rewards success in some period such that then and for and If she rewards failure then and for all where compare the relative incentive effects on the high type of rewarding the low type after success or after failure If the principal rewards the low type only after success in some period with of the constraint) can be written using (1) as: from misreporting (ie the of the constraint) can be written using (2) as: 20 See Lemmas 2 and B22 in Appendix A 16

19 Comparing these two expressions we can study the nature of the gamble for the high type when he misreports and derive the optimal payment schemes The choice to reward the low type after success or failure depends on which scheme yields a lower of The first term in each expression is the expected gain from misreporting and the second is the possibility of incurring a loss if experimentation fails The second term is identical in the two expressions so we focus on the first term the expected gain from misreporting If the low type is rewarded for success in period the relative probability of success of a high type in period is: If the low type is rewarded only after failure the relative probability of failure is: The relative probability of success for a high type decreases with while the relative probability of failure for a high type is constant given We show in Appendix A that there is a period such that the of under success or failure equal each other 21 We can now formally define this period as when the two relative probabilities are equal to each other: This critical value determines which type is relatively more likely to succeed or fail during the experimentation stage In any period the high type who chooses the contract designed for the low type is relatively more likely to succeed than fail compared to the low type For the opposite is true This feature plays an important role in structuring the optimal contract The critical value determines whether the principal will choose to reward success or failure in the optimal contract 21 See Lemma 1 in Appendix A for the proof 17

20 If the principal wants to reward the low type after success it will only be optimal if the experimentation stage lasts long enough If the principal can reward success in the last period as the relative probability of success is declining over time and the rent is the smallest in the last period If the experimentation stage is short the principal will pay the rent to the low type by rewarding failure since the high type is relatively more likely to succeed during the experimentation stage amount of failures Figure 3 Relative probability of success/failure with The important question is therefore what the optimal value of is and consequently whether the principal should reward success or failure The optimal value of is inversely related to the cost of experimentation In Appendix A we prove in Lemma 6 that there exists a unique value of such that for any Therefore when the cost of experimentation is high ( the length of experimentation will be short and it will be optimal for the principal to reward the low type after failure Intuitively failure is a better instrument to screen out the high type when experimentation cost is high So it is the adverse selection concern that makes it optimal to reward failure 18

21 Finally if the high type also gets positive rent we show in Appendix A that the principal will reward him for success in the first period only This is the period when success is most likely to come from a high type than a low type In Section 224 below we discuss when both s bind Intuitively this case emerges when the adverse selection problem is severe enough ( is significantly higher than ) 222 The length of the experimentation period: over- or under-experimentation While the standard result in the experimentation literature is under-experimentation in models with asymmetric information we find that over-experimentation can also occur We already know from the first best contract that the length of the experimentation period is nonmonotonic in types So in general the low type or high type may experiment longer This result extends to the second best under asymmetric information and can be larger or smaller than In particular we find that each type may over-experiment compared to the first best The reason is that the difference in the expected cost is a non-monotonic function of time (see Figure 1) and extending the length of experimentation can help reduce information rent We explain this next Proposition 2 In the optimal contract each type may over-experiment relative to the first best Proof: See Appendix A The driving factor for the rent and therefore the length of the experimentation stage is the difference in expected costs between the types after experimentation fails For instance consider the low type agent When he lies about his type his expected cost after failure is differen his informational rent is positively related to the difference in the expected cost between types Since this difference is non-monotonic in time when the first-best experimentation stage is long enough (beyond the maximum of ) 19

22 the principal may find it optimal to increase the experimentation period to lower the rent Otherwise under-experimentation is optimal 22 In Appendix E we provide sufficient conditions for under/over experimentation to occur in a simplified version of our model 23 The main condition for over-experimentation is that the high type is very efficient at experimentation (ie high) Figure 4 plots for different values of and shows that the benefit from over experimentation through a reduction in is higher for larger By increasing the experimentation period the principal can lower the rent In Figure 4 we use and with the dashed line) Suppose now we increase only the parameter the principal optimally chooses then (see amount of failures Figure 4 Difference in the expected cost with 22 Here is an example showing both under and over-experimentation can occur in the optimal contract: The first-best termination dates are and In the second-best case the principal optimally chooses which is over-experimentation Consider another example for under-experimentation: The first-best termination dates are and In the second-best case the principal optimally chooses which is underexperimentation 23 We consider simplified version of our model where the output after failure is fixed ie 20

23 223 The output: under- or over- production Proposition 3 After failure the high type under-produces relative to the first best output The low type overproduces if the high type receives a rent and produces at the first best level otherwise After success each type produces at the first best level Proof: See Appendix A The principal can use the choice of output to screen the types and limit the rent to both types We derive the formal output scheme in Appendix A but present the intuition here If experimentation was successful there is no asymmetric information and no reason to distort the output Both types produce the first best output If experimentation failed to reveal the cost asymmetric information will induce the principal to distort the output to limit the rent This is a familiar result in contract theory In a standard second best contract à la Baron-Myerson the type who receives rent produces the first best level of output while the type with no rent underproduces relative to the first best type produces the first best output while the high type under-produces relative to the first best To limit the rent of the low type the high type is asked to produce a lower output When both incentive constraints bind to limit the rent of the high type the principal will increase the output of the low type and require over-production relative to the first best To understand the intuition behind this result recall that the rent of the high type mimicking the low type has two components The first component is the rent promised to the low type after failure in the experimentation stage The second component is negative and comes from the higher expected cost of producing the output required from the low type By making this output higher the principal can strengthen the negative component and lower the rent of the high type We now present the intuition when both 224 When are both IC constraints binding? constraints can be binding simultaneously and and on the key sources of 21

24 rent the difference in expected costs and As we explained before the high expected utility from misreporting is a gamble The positive part comes from the low- and the negative part is due to under-compensation after failure which is related to While the rents depend on the probabilities of success and failure as well as the optimal quantities the impact of and on the difference in expected costs and is very informative We illustrate the impact of increasing Consider and the principal optimally chooses and grants rent only to the low type In this example is of the similar magnitude as and as a result the gamble is not attractive for the high type (see the dashed line in Figure 5 below) Suppose now we increase only the parameter We see that the curve is skewed to the left and its maximum value increases The principal optimally chooses and grants rent to both types Intuitively the gamble is positive for the high type since is significantly smaller than (see the plain line in Figure 5 below) amount of failures Figure 5 Difference in the expected cost with 22

25 To emphasize the role of and in determining and consider a case where the principal must choose an identical length of the experimentation stage for both types ( ) 24 We prove in Proposition 4 below that in this scenario the constraint is not binding Intuitively magnitudes of and will be the same and the principal will always use the screening instruments to make the gamble strictly negative for the high type 25 Based on Proposition 4 we conclude lengths of the experimentation stage that results in being binding in Proposition 1 Proposition 4 If the duration of the experimentation stage must be chosen independently of the announced type the high type gets no rent Proof: See Appendix B In Appendix E we provide sufficient conditions for both version of our model where the output after failure is fixed ie main condition is for the high type to be very efficient at experimentation (ie s to bind in a simplified high) 26 The 3 Extensions 31 Success might be hidden: ex post moral hazard A notable result from the main section was that the principal may want to reward failure or wait until later periods to reward success The implementation of schemes with such properties relies on our assumption that the outcome of experiments in each period is publicly 24 For example the FDA requires all the firms to go through the same amount of trials before they are allowed to release new drugs on the market 25 If is the same for both types the difference in the expected cost are the same and the gamble is determined by the difference in optimal quantities We show in Appendix B that we have As a result the high type receives a strictly negative utility if he mimics a low type 26 The restriction to a fixed quantity allows us to highlight the role of the difference in expected cost In general an increase in the length of the experimentation stage affects both the difference in the expected cost as well as the optimal quantity in a complex manner For example if an increase in the principal may also want to reduce the distortion in but the outcome is also affected by the increase in 23

26 observable If the agent were able to suppress a finding of success he would gain by hiding success or postponing the revelation of success In this subsection we allow the agent to engage in ex post moral hazard by hiding success when it occurs 27 We obtain a more realistic incentive scheme where the agent is paid a rent in each period but our key insights regarding the screening properties of the optimal contract remain intact The dynamic optimization problem for the principal when success is privately observed by the agent becomes more complex as the principal has to deal with both adverse selection and ex post moral hazard problems simultaneously We find that the agent will obtain a morald to the low-type to screen the high type agent remains intact In particular we show that our finding of rewarding the agent after failure does not depend on success being observed publicly Specifically we assume that success is privately observed by the agent and that an agent who finds success in some period can choose to announce or reveal it at any period Thus we assume that success generates hard information that can be presented to the principal when desired but it cannot be fabricated only by the payment and the output tied to success/failure in the particular period but also by the payment and output in all subsequent periods of the experimentation stage while the agent knows the true cost is In addition to the existing and constraints the optimal scheme must now satisfy the following new ex post moral hazard constraints: and for The constraint makes it unprofitable for the agent to hide success in the last period The constraint makes it unprofitable to postpone revealing success in prior periods The two together imply that the agent cannot gain by postponing or hiding success the ex post moral hazard constraints 27 See eg Rosenberg Salomon and Vieille (2013) 24

27 in addition to all the constraints presented before First as formally shown in the Appendix C both and may be slack and either or both may be binding 28 Since the ex post moral hazard constraints imply that both types will receive rent these rents may be sufficient to satisfy the constraints Second private observation of success increases the cost of paying a reward after failure When the principal rewards failure with the constraint forces her to also reward success in the last period ( because of ) and in all previous periods ( because of ) However we show below that it can still be optimal to reward failure Proposition 5 When success can be hidden the principal must reward success in every period for each type When both the and constraints bind and the optimal it is optimal to reward failure for the low type Proof: See Appendix C While details and the formal proof are in the Appendix C we now provide some intuition why rewarding failure or postponing rewards remain optimal even when the agent privately observes success We also provide an example below where this occurs in equilibrium The argument for postponing rewards to the low type to effectively screen the high type applies even when success is privately observed This is because the relative probability of success between types is not affected by the two ex post moral hazard constraints above An increase of $1 in causes an increase of $1 in which in turn causes an increase in all the previous according to the discount factor Therefore the increases in and are not driven by the relative probability of success between types And just as in Proposition 1 we again find that it is optimal to reward failure when the low type experiments for a relatively brief length of time and both and are binding For example when the principal optimally chooses and grants rent only to the low type by rewarding failure since 28 Unlike the case when success is public the may not always be binding 25

28 While we have focused on how the ex post moral hazard affects the benefit of rewarding failure it is clear that those constraints also affect the other optimal variables of the contract For instance the constraint ( ) can be relaxed by decreasing either (which will decrease ) or So we expect a shorter experimentation stage and a lower output when success can he hidden 32 Learning bad news In this section we show that our main results survive if the object of experimentation is to seek bad news where success in an experiment means discovery of high cost For instance stage 1 of a drug trial looks for bad news by testing the safety of the drug Following the literature on experimentation we call an event of observing although this is a bad news for the principal I principal and agent both become more optimistic if success is not achieved in a particular period and relatively more optimistic when the agent is a high type than a low type Also as time goes by without learning that the cost is high the expected cost becomes lower due to Bayesian updating and converges to In addition the difference in the expected cost is now negative since the type is relatively more optimistic after the same amount of failures Denoting by the updated belief of agent that the cost is actually high the type expected cost is then An agent of type announcing his type as receives expected utility at time zero from a contract but now is a function of incentive problem is again similar to that under learning good news However it is now the high type who has an incentive to claim to be a low type Given the same length of experimentation following failure the expected cost is higher for the low type Thus a high type now has an incentive to claim to be a low type: since a low type must be given his expected cost following failure a high type will have to be given a rent to truthfully report his type as his expected cost is lower that is The details of the optimization problem mirror the case for good 26

29 news of Propositions 1 2 and 3 and the results are similar We present results formally in Appendix D We find similar restrictions when both constraints bind as in Propositions and 4 The type of news however determines the optimal production and length of experimentation decisions Regarding the distortion in the output it is similar to Proposition 3 except that the distortions in output are switched for the two types The parallel between good news and bad news is remarkable but not difficult to explain In both cases the agent is looking for news The types determine how good the agent is at obtaining this news The contract gives incentives for each type of agent to reveal his type not the actual news 4 Conclusions In this paper we have studied the interaction between experimentation and production where the length of the experimentation stage determines the degree of asymmetric information at the production stage While there has been much recent attention on studying incentives for experimentation in two-armed bandit settings details of the optimal production decision are typically suppressed to focus on incentives for exploration In reality each stage may impact the other in interesting ways and our paper is a step towards studying this interaction There is also a significant literature on endogenous information gathering in contract theory but typically relying on static models of learning By modeling experimentation in a dynamic setting we have endogenized the degree of asymmetry of information in a principal agent model and also related it to the length of the learning stage By analyzing the stochastic structure of the dynamic problem we clarify how the principal can rely on the relative probabilities of success and failure of the two types in order to screen them The rent to a high type should come after early success and to the low type for late success If the experimentation stage is not long enough the principal has no recourse but to pay Without a production stage with a scalable project size there would be under experimentation relative to the first best With a scalable project size we find a new result that over-experimentation can be also optimal Over production can occur in the production stage 27

30 While our main section relies on publicly observed success we show that our key insights survive if the agent can hide success Then there is ex post moral hazard which implies that the agent is paid a rent in every period but the screening properties of the optimal contract remain intact Finally we prove that our key insights do hold in both good and bad-news models 28

31 Appendix A (Proof of Propositions 123) We first characterize the optimal payment structure and (Proposition 1) then the optimal length of experimentation and (Proposition 2) and finally the optimal outputs and (Proposition 3) Denote the expected surplus net of costs for then is to choose contracts and to maximize the expected net surplus minus rent of the agent subject to the respective and constraints given below: by The subject to: for for We begin to solve the problem by first proving the following claim Claim: The constraint is binding and the low type obtains a strictly positive rent Proof: If the constraint was not binding it would be possible to decrease the payment to the low type until ( and are binding but that would violate since QED 29

32 I Optimal payment structure and (Proof of Proposition 1) First we show that if the high type claims to be the low type the high type is relatively more likely to succeed if experimentation stage is smaller than a threshold level In terms of notation we define to trace difference in the likelihood ratios of failure and success for two types Lemma 1: There exists a unique such that and Proof: Note that is a ratio of the probability that the high type does not succeed to the probability that the low type does not succeed for periods At the same time is the probability that the agent of type succeeds at period of the experimentation stage and period by two types As a result we can rewrite for is a ratio of the probabilities of success at as or equivalently for where can be interpreted as a likelihood ratio We will say that when ( ) the high type is relatively more likely to fail (succeed) than the low type during the experimentation stage if he chooses a contract designed for the low type There exists a unique time period such that defined as 30