What is impact evaluation?

Size: px
Start display at page:

Download "What is impact evaluation?"

Transcription

1 February 1st, 2012 GEM Project

2

3 Presentation Outline 1 Defining and Measuring Impact 2 3 4

4 Presentation Outline 1 Defining and Measuring Impact 2 3 4

5 Did the program change anything?

6 How to measure impact? Most policy questions involve cause-and-effect relationships Impact is define as the difference between: what happened (with the program) what would have happened (without the program)

7 What would have happened in absence of the program?

8 Difficulty of impact evaluation What would have happened in the absence of the program: The value of the indicator in the absence of the program The Counterfactual Problem: we cannot observe the same individual in two states of nature The art of impact evaluation is to reconstruct the Counterfactual correctly

9 How to reconstruct the Counterfactual? Counterfactual is often constructed by selecting a group not affected by the program Central question: How to choose a good comparison group? Invalid comparaison group implies biaised impact estimate Two methods: Non-randomized:Argue that a certain excluded group mimics the counterfactual Randomized:Use random assignment of the program to create a control group which mimics the counterfactual

10 Reconstruct the Counterfactual with a comparison group

11 Presentation Outline 1 Defining and Measuring Impact 2 3 4

12 The Balsakhi Program

13 The Balsakhi Program Implemented by Pratham, an NGO from India Program provided tutors (Balsakhi) to help at-risk children with school work In Vadodara, the balsakhi program was run in government primary schools in schools got a balsakhi in Standard 4 Balsakhis taught 20 children for 2 hours per day Teachers decided which children would get the balsakhi

14 Balsakhi Outcomes Defining and Measuring Impact Children were tested in language and math: at the beginning of the school year (Pre-test) at the end of the year (Post-test) 100 possible points on the test How can we estimate the impact of the balsakhi program on test scores?

15 Results - with no comparison On average, balsakhi children scored 51 points out of 100 at the end of the school year What can we conclude?

16 Results - with no comparison On average, balsakhi children scored 51 points out of 100 at the end of the school year What can we conclude? We need a benchmark

17 Presentation Outline 1 Defining and Measuring Impact 2 3 4

18 / Before and After Strategy: Look at average improvement in test scores over the school year for the balsakhi children In this case, we are controlling for the initial learning level of the balsakhi children

19 Results Defining and Measuring Impact Under what conditions can this difference be interpreted as the impact of the balsakhi program?

20 Results Defining and Measuring Impact

21 Presentation Outline 1 Defining and Measuring Impact 2 3 4

22 / With and without Balsakhis taught 20 children in each grade: approximately two-thirds of all children stayed with the regular teacher. Strategy: compare post-test scores of children who learned from the extra-teacher to children who learned from the regular teacher In this case, the children not learning from the balsakhi form the control group

23 Defining and Measuring Impact Let s compare their results... with theirs

24 Results Under what conditions can this difference be interpreted as the impact of the balsakhi program?

25 Results How to choose the comparison group? Non beneficiaries may be different from beneficiaries. Why?

26 Results How to choose the comparison group? Non beneficiaries may be different from beneficiaries. Why? Programs often target beneficiaries according to specific criteria (First-come, first-served) People choose to participate or not in the program (Motivation) If non beneficiaries are different, they cannot represent a good counterfactual The comparison will be biased because there is selection of the beneficiaries: Selection bias

27 Selection Biais Defining and Measuring Impact

28 How to correct the selection biais? Acknowledge that the comparison group is different at the beginning Look at the trend of the comparison group in time to predict the trend of the treated group Comparing relative levels This is called the difference in difference estimator, or diff-in-diff

29 Presentation Outline 1 Defining and Measuring Impact 2 3 4

30 Strategy: Compare improvement in test scores during the school year between balsakhi and non-balsakhi children Our control group still consists of children with no balsakhi, but we are also controlling for the pre-existing difference in test scores between the two groups

31 Double difference Defining and Measuring Impact Let s compare their progress... with theirs

32 Difference-in-Differences Estimator

33 Estimator Method to follow: Collect baseline data on each group before the introduction of the program Collect follow up data on each group after the program Compute the difference before after for each group Substract the difference in the comparison group to the one in the treatment group

34 Results Under what conditions can this difference be interpreted as the impact of the balsakhi program?

35 Is it enough? Defining and Measuring Impact This method is based on a strong assumption : Assumes that the trend of the treatment group would have been parallel to the trend of the comparison group in the absence of the program

36 Difference-in-Differences

37 Comparing first three methods What can we conclude from this?

38 Presentation Outline 1 Defining and Measuring Impact 2 3 4

39 What is a regression? What is a multiple regression? What does controlling for some variables mean?

40 What is a regression? What is a multiple regression? What does controlling for some variables mean? Other variables can influence the outcome Compare Volunteers / Non Volunteers controlling for other variables Give the difference between Vol. and Non Vol all other measured variables being equal

41 Using regression to control for observable characteristics Strategy: Through regression, we can control for characteristics that we can observe Regression gives you the relationship between two measures, taking into account differences in observed characteristics This means that if our control group doesn t match our treatment group along observable characteristics, we can account for these differences

42 Assumption? The factors that were excluded because they are unobservable and/or have been not been measured do not bias results because they are: either uncorrelated with the outcome or do not differ between participants and non-participants

43 Comparing first four methods What can we conclude from this?

44 Presentation Outline 1 Defining and Measuring Impact 2 3 4

45 Constitution of the control group after the participants selection was done For each individual in the treatment group, find a non-treated individual with similar caracteristics

46

47 Assumption? The factors that were excluded because they are unobservable and/or have been not been measured do not bias results because they are: either uncorrelated with the outcome or do not differ between participants and non-participants

48 Presentation Outline 1 Defining and Measuring Impact 2 3 4

49 This method use the rule that say who can participate to the program as an experience It allows comparing individuals that lie just above and just below a cutoff

50

51

52 Need that the program selects beneficiaries using an index or score: Need a clearly defined eligibility cut-off Need large enough samples around the cut-off Provide local information on the population around the cut-off

53 Presentation Outline 1 Defining and Measuring Impact 2 3 4

54 Taking the biais into account An estimate of the impact of the program is biased if some other factor is influencing your estimate, not the program A poor estimate of the counterfactual could bias your estimate

55 How to make sure that the comparison group is a good counterfactual? Ideal solution : Random assignment = Assign randomly subjects in the two groups Identify subjects that satisfy all selection criteria of the program Randomly select half of the subjects that will participate in the program ( treatment group); The other half ( control group) won t participate (or later) and will serve as a comparison group With the random assignment, the comparison group is really similar to the treament group It eliminates all bias Why?

56 Defining and Measuring Impact Based on the law of large numbers: Get 1000 people in the street, and divid them in two groups randomly On average People in group 1 will have the same height than people in group 2 Same age/income etc. Note: would it be true if we divided 20 people in two groups?

57 Defining and Measuring Impact Based on the law of large numbers: Get 1000 people in the street, and divid them in two groups randomly On average People in group 1 will have the same height than people in group 2 Same age/income etc. Note: would it be true if we divided 20 people in two groups? Probably not But possible to increase the chance : stratification

58 Defining and Measuring Impact Strategy: Compare test scores between treatment and control children Our control group should be equivalent to the treatment group at the beginning, except that its children were randomly selected to be in the control group, and therefore by random chance did not receive the treatment

59 How wrong can you go? What can we conclude from this?

60 Random Defining and Measuring Impact What does the term random mean?

61 Random Defining and Measuring Impact What does the term random mean? Is random assignment the same as random sampling?

62 Random samples of voters?

63 Defining and Measuring Impact

64 Key advantage Defining and Measuring Impact Implies that the distribution of both observable and unobservable characteristics in the treatment and control groups are statistically identical In other words there are no systematic differences between the two groups Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment, any difference that subsequently arises between them can be attributed to the treatment rather than to other factors.

65 Limitations Defining and Measuring Impact Still potentially subject to threats to their validity: Internal Validity External Validity Most of the threats affect also non-experimental studies Costly Ethical issues

66 Presentation Outline 1 Defining and Measuring Impact 2 3 4

67 Education programmes Kenya, extra-teacher program (to detail) India, Balsakhi program (to detail)

68 Randomized evaluations in Peru / Philippines 18 evaluations ongoing in Philippines (give precise examples) 12 evaluations ongoing in Peru (mostly on micro-credit) (give precise examples)

69 Presentation Outline 1 Defining and Measuring Impact 2 3 4

70 There are many ways to estimate a program s impact If the design and implementation are correct, we want to find the most credible method to estimate a program impact To assess the validy of an impact evaluation, the main question is how credible is the comparaison group?