Randomized experiments

Size: px
Start display at page:

Download "Randomized experiments"

Transcription

1 Majeure Economie September 2011

2 1 Limits of OLS 2 What is causality? A new definition of causality Can we measure causality? 3 Solving the selection bias Potential applications Advantages & Limits 4 Education policies Access to education in developing countries Quality of education in developing countries 5 Market for credit The demand for credit Adverse Selection and Moral Hazard 6 Conclusion

3 Omitted variable bias (1/3) OLS do not measure the causal impact of X 1 on Y if X 1 is correlated to the residual. Assume the true DGP is : Y = α + β 1 X 1 + β 2 X 2 + ε, with cov(x 1, ε) = 0 If you regress Y on X 1, cove(y,x1) cov β 1 = V e(x 1) = β 1 + β e(x 1,X 2) 2 V e(x 1) + cove(x1,ε) V e(x 1) If β 2 0, and cov(x 1, X 2 ) 0, the estimator is not consistent.

4 Omitted variable bias (2/3) In real life, this will happen all the time: You never have in a data set all the determinants of Y. This is very likely that these remaining determinants of Y are correlated to those already included in the regression. Example: intelligence has an impact on wages and is correlated to education.

5 Omitted variable bias (3/3) Assume you have two groups of people, one group: high levels of education, one group: low levels of education. In the group with high education, you observe that people have higher wages than in the group with lower education. You could say: this difference in wages is due to a difference in levels of education if the only difference across those two groups was their different levels of education. But those two groups probably differ on many more dimensions than education only. Group with higher education is probably made up of people coming from wealthier families, with higher cognitive ability... because of all those omitted variables (parents wealth, IQ...) you can not interpret this difference in wages as the causal impact of education on wages but only as a mere correlation.

6 Is it so big an issue? (1/2) No if your objective is objective 1: make the best prediction on some future outcome based on present information in the credit risk analysis, objective = make good predictions on who will default in one year based on information on customers avialable at the time of their application. whether coefficients reflect causal relationships or are merely due to correlations is not an issue from the moment the predictive power of the model is good.

7 Is it so big an issue? (2/2) Yes if your objective is objective 2: to assess the causal impact of a variable on another. Assume you are a policy maker trying to assess the efficacy of a program in order to decide whether to maintain it or not. Training program for the unemployed to help them to find a job. I run the following regression: found a job = α + β1 {followed training} + ε. If β > 0, can I conclude that the training program is effective? Question really crucial when evaluating the impact a program, the efficacy of a new drug, the relevance of a new marketing campaign...

8 A new definition of causality Defining causality The causal impact of a program is the difference between what happens to recipients of the program and what would have happened to them if they had not received the program. The big problem of impact evaluation is that we do not observe what would have happened to beneficiaries of a program if they had not benefited from it. Running example: training program offered to unemployed among those unemployed, chose to participate in the training program, declined the offer. objective = measure whether it increases their chances of finding a job in less than 6 months

9 A new definition of causality The Rubin framework of potential outcomes We consider a program (treatment) represented by a binary variable T. T i equals 1 if individual i follows the program (treated) / receives the treatment and to 0 otherwise (untreated). T = 1 if an unemployed follows the training program. T = 1 if a sick person receives a given medicine. Each individual i has ex ante two potential outcomes Y i,1 and Y i,0. Y i,1 = what will happen to him if he receives the treatment. Y i,0 = what will happen to him without it. Each individual has two lives: his life when he follows the training program and his life when he does not follow it. Smoking or no smoking movie. Y i,1 = 1 and Y i,0 =1: the unemployed finds a job in his two lives. Program useless: he would have found a job anyway. Definition: the causal impact of the treatment on mister i is Y i,1 Y i,0

10 Can we measure causality? Do we observe causality? Can we compute Y i,1 Y i,0 for any individual in our sample? The issue we have is that for each individual, we observe only one of his potential outcomes. We observe Y i,1 for the treated but we do not observe their Y i,0. Conversely, for the untreated, we observe their Y i,0 but not their Y i,1. => impossible to compute Y i,1 Y i,0 for any individual in the sample because one of the two figures is missing. We can not assess the impact of the program on each individual. What we observe is Y i = Y i,1 T i + Y i,0 (1 T i ).

11 Can we measure causality? Finding comparison groups Since there is no hope to measure the impact of a program on any single individual, what we can hope to achieve is to measure the impact of the program on groups of individuals. Idea = we have a group of treated individuals, we are going to find a group of individuals not treated and use it as a comparison group. To measure the impact of the training program, we should find a group of unemployed which did not benefit from the training program and compare the share of people having found a job in less than 6 months in the treated group and in the comparison group. For this comparison to yield a credible measure of the impact of the training program, the treated group and the comparison group should be similar in every respect except that the treated group benefited from the training program while the comparisonclément groupdedid Chaisemartin not.

12 Can we measure causality? The average treatment effect Since we can never measure Y i,1 Y i,0, let us try something else: E(Y i,1 T i = 1) E(Y i,0 T i = 1): average effect of the treatment on the treated (ATT). We can easily estimate the first quantity from our sample: 1 Y i, where N 1 is the number of treated. N 1 i/t i =1 Example: % of unemployed who found a job after 6 months among those who followed the treatment. But we can not estimate E(Y i,0 T i = 1): percentage of the treated who would have found a job should they not have received the treatment. Natural idea: replace it by E(Y i,0 T i = 0) which we can estimate from our data: percentage of those who found a job among those unemployed who chose not to participate to the training program. Good idea?

13 Can we measure causality? The selection bias In most cases this is not a good idea. Underlying assumption: E(Y i,0 T i = 1) = E(Y i,0 T i = 0). What happened to the untreated is representative for what would have happened to the treated if they had not been treated. But the two populations are very likely not to be similar: unemployed who enroll for a training program might be more motivated to find a new job than those who do not. enrollment into the program is selective: selection bias. Since those two groups vary on more than one dimension (treated group benefited from the program but also probably more motivated to find a job), impossible to know whether we should attribute the difference in their placement rate after 6 months to the fact that one group benefited from the program and not the other, or to the fact that one group was more motivated than the other.

14 Solving the selection bias A solution to the selection bias (1/2) Assume that for this training program, job seekers interested in participating but seats only. To evaluate the efficacy of the program, you can randomly choose who will follow the program and those who will not. Randomization ensures that treatment and comparison group are comparable in every respect (age, proportion of men/women, qualifications, motivation, experience, cognitive abilities, etc.). When a population is randomly allocated into two groups, the two groups will have extremely similar characteristics, provided population sufficiently large. Assume that among the jobseekers, not motivated to find a job and extremely motivated. When randomly selecting the who will receive the training, possible that we select extremely motivated and unmotivated personsrandomized treatment experimentsand comparison

15 Solving the selection bias A solution to the selection bias (2/2) Indeed, when tossing a fair coin times, the probability of getting heads in each of those draw is equal to one half. Therefore, we expect to get heads in approximately half of the tosses, around times. Getting heads is so far from the scenario we expect to observe on average (2 500 heads), that the probability that it will occur is almost 0. Actually, one can compute that when tossing a fair coin times, there is a 95% probability of getting heads between 2430 and 2570 times. Returning to the training example, this means that when randomly selecting the job-seekers who will actually receive intensive counseling, there is a 95% probability that the number of extremely motivated job-seekers assigned to the intensive counseling group will be between 2430 and 2570, in which case Clément the treatment de Chaisemartin grouprandomized thexperiments comparison group will

16 Solving the selection bias Identification of the ATT If treated and untreated are randomly selected, this means that T i Y i,1, Y i,0. Treatment status is not related to potential outcomes: What happens to untreated individuals is representative of what would have happened to treated individuals had they not received the treatment. Randomization => the two populations were similar before receiving the treatment => if we observe a difference in the end, due to the treatment. For the sake of simplicity, assume Y 1 and Y 0 are discrete. E(Y 0 T = 0) = y i P(Y 0 = y 0 T = 1) = y i P(Y 0 = y i ) = yi P(Y 0 = y i T = 0) = E(Y 0 T = 1). Thus, ATT = E(Y 1 T = 1) E(Y 0 T = 0). The ATE is identified because you can compute it from the sample.

17 Solving the selection bias Estimation The average effect of the training program can be estimated very simply, as the difference between the rate of treated unemployed who find a job and the same rate for the untreated : ÂTE = E(Y1 T = 1) E(Y0 T = 0) If 55% of the treated had found a job after 6 months and only 50% among the untreated then the ATE is to increase the placement rate by 5 percentage points. Only thing to check is that the difference between the two means is statistically significant.

18 Potential applications Medicine Medicine: every single pill you swallow has been previously tested in clinical trials: among patients who have the same disease, 500 are sorted out and receive the new pill and 500 receive a placebo. double blinded: neither the patients nor the doctor know who receive what. after for instance one year, we compute the rate of patients cured in the two groups. If significantly higher in the test group, the pill is efficient.

19 Potential applications Policy Evaluation Public policies: among 100 high-schools, 50 are sorted out (test group) and 50 are the control group. in the 50 schools that belong to the test group, students are given financial incentives to attend class. In the end, we compare the absenteeism rate in the two groups of high-schools. If significantly lower in the test group, the policy is effective.

20 Potential applications Marketing Marketing: can I improve my advertising / my product to increase my sales? Assume you have two types of advertising you send by mail to your customers. You want to know which one has the highest response rate. You randomly split a mail listing into two parts, and you send one type of advertising to the first part of the mailing, and the other type to the second part of the mailing. You compare the response rates with the two adverts. If one is significantly higher than the second one, one advert is better than the other one. You can do the same thing to measure by how much a change in your product increases the take-up rate. For instance: giving new customers who buy your product a present. When surfing on the web, you are constantly part of a randomized experiment without even knowing it.

21 Potential applications Sales Assume you want to test the impact of a new pricing policy on your sales: Your team of sales is made up of 200 people. You randomly select 100 of them who offer these new prices to their customers. In the end you compare sales in the control and in the treatment group and you measure thus the impact of the new prices on sales.

22 Advantages & Limits Advantages Avoid inefficient expenses: if training programs do not help unemployed to find jobs, stop training programs. if a medicine does not increase the percentage of patients cured wrt placebo, do not prescribe it. if giving presents to your customers does not increase the take-up rate, don t give them presents. if reducing you prices does not have a big impact on demand, keep your prices high. Test and learn: instead of generalizing policies which you do not know the effectiveness, test them before on a small sample.

23 Advantages & Limits Feasibility issues Not everything is testable with a randomized experiment: you can not measure the impact of monetary policy on companies investment through a randomized experiment, because it is impossible to set a control group: central bank can have only one interest rate. These experiments are costly: you must follow everyone both in the control and test group over a long period of time. When the percentage of people lost to follow-up is too important, this can threaten the validity of your results.

24 Advantages & Limits The issue of attrition (1/2) Internats d excellence project. Two group of students. Outcome measure: comparison of the results of the two groups in standardized tests in maths and in french. One group: very easy to follow, other group much harder. => one key issue = in the end you might end-up following 95% of the test group and 80% of the control group. => you compute average scores in the test group and in the control group only on the 95% and 80% of students you managed to follow. But are those two populations still comparable?

25 Advantages & Limits The issue of attrition (2/2) Probably not. The blue bar was similar to the orange bar due to randomization. In the control group, 20% of students lost to follow-up: because they were harder to find, and probably because they dropped out from school... In the test group, only 5%: easier to find, fewer students dropped out because given an amazing opportunity... => the comparison of students who passed the test in the test group might disadvantage the test group: the test group comprises more prone to drop out students, i.e. students who would have dropped out from a normal school but stayed in school because of the amazing opportunity given to them. => one very important thing to look at in a randomized experiment is that the attrition rate is low and balanced in the test and control groups.

26 Advantages & Limits Ethical considerations Unfair to select recipients of a program out of randomness. However, these experiments are conducted when there are more applicants than seats: some people would have been refused anyway. Inequality ex-post but equality ex-ante: everyone has the same chances of getting into the program. Before the experiment, not sure that program effective: nobody can say that the control group has been disadvantaged. The program might be detrimental: training program useless + waste of time for unemployed. In this case, the control group would even end up better off than the treatment group. Even if we assume that the program if beneficial, there is no way to know before the experiment what are the categories of the population on which it is the most beneficial. => no way to have a fair criterion on which you can decide who should be treated and who should not.

27 Advantages & Limits External validity issue Result might be dependent on the specificity of the population on which experiment conducted and on the specificity of the program. For instance result of the training program experiment might be due to the particular design of the training program, or to the fact that the population is made up of French unemployed. Maybe this result would not work with a slightly different program intended at Korean unemployed. This is an issue for researchers: to be regarded as universal, results should be confirmed with experiments on several programs in several countries. This is not so much of an issue for decision makers: they want to test the efficacy of a given program on a given population.

28 Advantages & Limits General equilibrium issues (1/2) Sometimes, a policy might not yield the same results when tested as an experiment and when generalized. Very strong issues for unemployment policies, not for education or health policies. You run a randomized experiment to evaluate a training program for unemployed people. You select one ANPE agency in France. From its database, you randomly select 1000 unemployed who are going to follow your training program and 1000 who are not going to follow it. After 6 months: only 10% of people in the test group are still unemployed, against 40% in the control group. Do you think French unemployment rate will decrease if you generalize this training to all French unemployed?

29 Advantages & Limits General equilibrium issues (2/2) Maybe not. Think of unemployed as a rank on a waiting list. The most employable have the lowest rank and find a job first. During the experiment, maybe the training program only changed unemployed s ordering on the waiting list. The test group gained some seats on the waiting list because they became more employable due to the training program whereas unemployed of the control group did not. Putting it in other words, during the experiment, unemployed in the test group might have found jobs more quickly because they stole them from the control group, but maybe no net impact on overall job creations. If the policy is generalized, everyone will become more employable => ordering of the waiting list will not change and nobody will find a job more quickly because the policy does not create new jobs per se, or does not improve the matching process between workers and firms.

30 Access to education in developing countries Enrollment Access to education = enrollment + presence. United Nations set Millennium goals for development: 100% enrollment rate in primary education in developing countries in 2015 & equality between girls and boys. Traditional assumption derived from standard economic analysis: children do not go to school because their parents do not want to, and parents do not want to because having their children at school is costly: direct costs: enrollment costs, buying a uniform etc... indirect costs: opportunity costs (child is not working) => various policies inspired from this analysis have been experimented through randomized experiments.

31 Access to education in developing countries Absenteeism Not enough that children are enrolled, they must come to class. Absenteeism is a big issue in developing countries. For instance, Kremer and Miguel compute that absenteeism rate in Kenya in preschool and grade 1 and 2 = 30% in Assumption: absenteeism might be due once more to opportunity costs: children must take care of their younger brothers and sisters or work incapacity to come (diseases).

32 Access to education in developing countries Increasing the enrollment rate (1/2) In Kenya, public education is free. Only direct cost to education = uniform that parents have to buy. Cost = 6$ wrt to annual income per inhabitant = 390$ => not negligible. Experiment (Duflo, Dupas and Kremer): distribution of uniforms in Kenya. Schools where distribution takes place chosen at random. Results: reduces drop-out rate by 1/3. In schools were there was a distribution of uniforms, two years later the drop out rate among girls was 12% against 18% in schools where there had been no distribution. Rates = 9% and 13% for boys. Effective even though effect not tremendous.

33 Access to education in developing countries Increasing the enrollment rate (2/2) Progresa program in Mexico. A subsidy given to poor mothers conditional on the fact that their child goes to school. No impact on primary school enrollment. Universal result across many countries. Impact on secondary school enrollment: girls enrollment passes from 67% in control villages to 75% in test villages. boys enrollment goes from 73% in control villages to 77.5% in test villages. Here as well, significant but not tremendous impact.

34 Access to education in developing countries Decreasing absenteeism rate One quarter of children in the world have intestinal worms => tired + anemia. A NGO wanted to implement a deworming program in 75 schools. Not possible for the NGO to implement it in all the schools the same year, must be done over three years. => what the researchers suggested = select 25 schools at random where you implement the program first year, and in the other schools you will implement it during year 2 and 3. Allows to solve ethical issues. In these schools, doctors come some day and all the students of the school present on that day receive deworming treatment. School attendance was increased by 0.14 schooling year. Spill over of the effect even to students not dewormed (worms are contagious).

35 Access to education in developing countries Comparison of the cost-effectiveness of these policies Cost effectiveness = cost of the intervention / the output. With the Progresa program, cost to have one student studying one more effective year (withdrawing absenteeism) = 1000$. With the uniform program, this cost is 61$ for girls and 121$ for boys. With the deworming program, this cost is 3.5$. Comes both from the numerator and the denominator (more students impacted at a lower cost). apparently less costly to try to increase children s ability to come to school, than to try to increase parents willingness to send their children to school by decreasing the cost.

36 Quality of education in developing countries Context Once you have students in the classroom, it is better that they learn something. However, as per international standards, students knowledge is very low in some developing countries. => how to increase the quality of education? Two directions for research: educational system is good, but lacks resources => do the same thing but with more resources. change the educational system (pedagogy, incentives given to teachers...).

37 Quality of education in developing countries Resources allocated to education Distribution of schoolbooks in schools. No impact. Decrease of the pupils to teacher ratio. No impact. Potential explanations: schoolbooks often in English => useful only for those with already a good level. even when you decrease the pupil to teacher ratio, does not have such a big impact due to very high absenteeism rate among teachers. => just investing more in education seems useless unless you change the rules, the pedagogy etc...

38 Quality of education in developing countries Changing the pedagogy (1/2) Working in small groups. Pratham, an Indian NGO hired young ladies called Balsakhi, so that they work with students in grade 3 or 4 who do not have a good command of grade 1 pedagogical contents during 2 hours per day. In the end of the year, 61% of the students in the test group can do a subtraction, against 51% in the control group => effective.

39 Quality of education in developing countries Changing the pedagogy (2/2) Implementing tracking: Paper by Duflo, Dupas and Kremer. In 61 schools, students were allocated randomly to classrooms => classrooms with strong and weak students. In 60 schools, allocated based on achievement to a test. => the best with the best and the weakest with the weakest. Ex ante, two possible stories tracking more homogeneous classrooms, easier to teach positive impact good students have a positive influence on the weakest ones disadvantages the weakest. In tracking schools, average score to a standardized test > than in the non tracking schools. Difference = 14% of the SD of the score of the test on the whole population => important. Also a positive effect on weakest students.

40 Quality of education in developing countries Giving financial incentives to teachers Incentives based on students grades to standardized tests: Experiment in Kenya (Kremer and Glewwe): in the beginning students get better grades but not long lasting. Experiment in India (Mulharidharan and Sundaraman): better grades in the short-run, no long-run results. But teachers can incentive students to cheat during the exam (happened during an experiment in the US).

41 The demand for credit The demand for credit (1/2) Experiment carried out by Karlan and Zinmann with a South-African subprime lender. A mail was sent to inactive customers to offer them a new credit. Interest rate was randomly assigned, between 3.5% per month and 11.75% per month. Also the design of the letter sent to them was decided at random. Notably, on some letters, a picture of an attractive woman appeared, on others picture of an attractive man, and on other letters no photo.

42 The demand for credit The demand for credit (2/2) Results: increasing the interest rate by 1% decreases the response rate by 0.3%. Elasticity = placing a photo of a woman on the advertising increases the response rate by the same amount than decreasing the interest rate by 2 percentage points (from 10% to 8% for instance). => customers are rational to some extent and understand interest rates but their choices can be manipulated as well thanks to marketing techniques.

43 Adverse Selection and Moral Hazard Conceptual framework Adverse selection: if you offer your customers a very high interest rate, only those who know in advance they will not pay you back, or those who are so risky that no bank wants to lend them at a lower rate will take your offer. Raising the interest rate, you attract only the highest risk profile. Ex ante: before the loan is granted. => default rate should be an increasing function of interest rate. Moral hazard: once granted a loan, borrowers decide whether they want to repay it or not, comparing the cost of reimbursing to the cost not to pay back (moral cost, having some of your belongings taken etc...). => the incentive not to pay back is an increasing function of interest rate => default rate should be an increasing function of interest rate. But borrowers ability to pay back is also a decreasing function of the interest rate: the higher the interest rate, the higher the repayments => the lower the ability of a borrower to pay them (burden of the debt). => default rate should be an increasing function of interest rate. => if you observe that default rate is an increasing function of interest rate, is it due to adverse selection, to moral hazard or to the burden of the debt?

44 Adverse Selection and Moral Hazard Experimental protocol (1/2) => they set up a new experiment to answer this question. Customers are sent an offer, some are sent a low-rate offer, others are sent a high-rate offer. Among those who were sent the high rate offer and answer the offer, half of them are finally offered a low-rate contract, half receive a high rate contract. => group A and B once more randomly split into two groups, group A-B1 where customers are told that the low rate they have been offered is valid only for this particular loan and group A-B2 where they are being told that if they pay back the first loan then they will be offered another loan with the same low interest rate. What do you measure when comparing the default rate in groups A and B? in groups B and C? in groups A-B1 and A-B2?

45 Adverse Selection and Moral Hazard Experimental protocol (2/2) Comparing the % of default between groups A and B allows measuring adverse selection. Comparing groups B and C allows to measure the impact of the interest rate on default: burden of the debt (unvoluntary default) + moral hazard (voluntary default). However, we cannot disentangle the two. Comparing default rate in groups A-B1 and A-B2 allows to determine whether there exists pure moral hazard. If default for the first loan is lower in A-B2 it means that it is enough to provide customers future incentives not to default to decrease the default rate => there exists pure moral hazard (voluntary default).

46 Adverse Selection and Moral Hazard Mixed evidence of adverse selection and burden of the debt, strong evidence of moral hazard

47 Conclusion Today we have seen a way to truly measure the impact of a treatment: randomized experiments. Extremely simple with respect to all the complicated regression models we have seen previously. Much more convincing. Applicable to a very large number of fields: public policy evaluation, marketing, medicine, sales... If interested: Esther Duflo, Lutter contre la pauvreté, Tome 1: Le développement humain (in French, coming soon in English). Esther Duflo, Lutter contre la pauvreté, Tome 2: La politique de l autonomie (in French, coming soon in English).