Measuring Impact. Advanced Course, July 6 th, 2011 Rachel Glennerster

Size: px
Start display at page:

Download "Measuring Impact. Advanced Course, July 6 th, 2011 Rachel Glennerster"

Transcription

1 Measuring Impact Advanced Course, July 6 th, 2011 Rachel Glennerster

2 Outline Quick overview of the steps for defining good measures: Measuring how not just whether Assessing an outcome measure Data collection basics: baseline, administrative data? Nonsurvey data collection approaches Example: measuring social capital in Sierra Leone Stratification

3 Process of determining outcome measures ) Step 1: map out outcomes and assumptions in the theory of change Step 2: determine indicators for each outcome, assessing indicators to make sure they are logically valid, measurable, precise,and reliable Step 3: assess how to collect the indicators, baseline, endline, administrative data? Step 4: field test

4 Step one: Mapping to ToC and measuring how RCTs sometimes criticised as a black box but they don t have to be Developing outcome measures which test the steps along the route to potential impact allow you to understand the process. If it failed, where the chain broke. Useful to map out a theory of change for the program (including potential unintended, or negative consequences), develop at least one outcome measure for each

5 Theory of change example: HIV education

6 Step 2: Choosing indicators I. Logically valid: come from, related to the outcome II. Measurability: a measurable indicator is observable, feasible, and detectable Observability: indicators must be observed in the real world. Eg: happiness is not an indicator, laughter or self reported happiness could be Feasibility: indicators must be politically, ethically and financially measurable Eg: questions about sexual behaviour or HIV testing in schools Detectability: study must have instrument and statistical power to measure indicator Eg: infant mortality may not a be a frequent enough event to use as indicator for maternal health programme

7 Assessing indicators (II) III. Precision: the more exhaustive and exclusive the indicator, the more precise it is Exhaustive: the indicator covers all the key aspects of the outcome Eg: people may save in many ways including buying livestock, durables, gold, or depositing money Can get a poor measure of savings if only cover some Exclusive: indicator is affected by the outcome of interest and nothing else Eg: tears is not an exclusive indicator of being sad, as they may be caused by laughter, or onions Eg: pregnancy is an exclusive indicator of unprotected sex.

8 Assessing indicators (III) IV. Reliability: indicators are reliable when they are hard to get wrong or counterfeit Forgetting:. Eg: people forget how much they spent or earned over the last year Narrow question down and make them more specific Deliberate misreporting: Eg: sexual abstinence a study in East Africa found that 50% of self reported virgins had sexually transmitted diseases Use proxy estimators that measures the are correlated with the outcome but harder to fake eg: teenage pregnancy rate Remove incentive of respondent to lie, hide the point of your question

9 New or old indicators? Indicators used in the existing literature have already been tested Literature will suggest the most current outcome measures Comparability across programmes with same goal Eg: What is the most (cost )effective way to increase educational attainment measured in years of education induced? Conditional cash transfers, school feeding, or school uniforms? Valuable for policymakers who need to decide how to allocate limited resources Indicators from existing literature may not fully reflect the specificities and context of your programme

10 Step 3: How to collect indicators Administrative data, vs. collecting survey data Administrative data is much cheaper Usually not collected on enough individuals, one exception schools Administrative data may not be accurate incentives to misreport Can t use project data, different in treatment and comparison Unlikely to be admin data all the way along the implementation chain Do I need a baseline? If randomize don t necessarily need a baseline but Allows you to demonstrate T and C were balanced initially Gives more statistical power How many people to interview covered in power lecture

11 Step 4: Field testing Indicators Have you chosen the right respondents? Do your instruments pick up variation? Are instruments appropriate given culture and politics of this context? Are questions phrased in a way people understand? Do mothers know what full immunization means? Is administrative data actually collected and reliable? Use field visits to compare collected data to administrative data Is the recall period appropriate? Are the surveys to long? What is the best time and place to find respondents?

12 Examples of nonsurvey instruments Questionnaires Focus groups Biomarkers Mechanical tracking devices (ex.: camera) Random spot checks Spatial demography (ex.: GPS, satellite imagery) Participatory resource appraisals Incognito enumerators or ride alongs Incognito enumerators and surveys Games Vignettes (subject is presented with hypothetical situation) Implicit association tests (ex.: gender occupation sorting) More information on all of these instruments on handouts

13 Example: GoBifo Evaluation Experience demonstrates that by directly relying on poor people to drive development activities, CDD [community driven development] has the potential to make poverty reduction efforts more responsive to demands, more inclusive, more sustainable, and more cost effective than traditional centrally led programs achieving immediate and lasting results at the grassroots level. Dongieret al. (2003), World Bank 13

14 CDD theory of change Financial grants for local public goods and small enterprise development The "GoBifo" in Sierra Leone gave $4,667 to communities in 3 tranches (~$100 per household) ToC: More and better public goods, more appropriate public goods NB: no differences in preferences at baseline on public goods for elite vs youth and women, made hard to test second hypothesis Training and facilitation to build durable local collective action capacity Helps communities set up representative VDCs, agree medium term development plan, establish bank accounts ToC: these steps reduce the costs of collective action, communities more likely to engage in collective action outside of GoBifo project 14

15 CDD theory of change (II) Requirements to increase participation of marginalized groups Women were co signatories on the community bank accounts Women and youths (18 35 years) managed own projects, e.g. labor soap making groups ToC: taking these positions of responsibility helps give marginalized groups the experience and the confidence to participate outside GoBifo Other members of the community learn to appreciate the benefits of including women and youth, engage them in other decisions Working together as a community helps build trust and social capital, helps reduce conflict 15

16 CDD theory of change (III) Possible negative effects of CDD. Literature points to danger of elite capture with decentralization ToC: providing grants with discretion to local communities may increase power of local elite who will then be more able to expropriate resources coming to the community 16

17 GoBifo survey questions Household survey panel (male, female, youth, non youth respondents) Questions used elsewhere in the literature general questions about trust from literature Standard literature questions on role of women, violence Also questions used in old SL studies about role of chiefs Specific, context appropriate questions on trust, group membership, collective action Eg who helped you rethatch your house? When did you last ask someone to go to town to buy something for you? Prompt for specific groups Field supervisor direct assessments of local public goods quality. Village focus group discussions with local leaders. Questions about community farm, a communal activity that many communities (treatment and comparison) had to make decisions about 17

18 GoBifo indicators and instruments Structured community activities (SCAs): Matching grant (collective action): communities received six vouchers that could be redeemed with a co pay at a local building materials store (max value $300). A direct measure of collective action capacity. Communal choice (participatory decisions making): communities were presented with two equally valued assets (batteries vs. salt) and enumerators observed ensuing deliberations, recording the number of male/female and youth/elder speakers as measures of participation and influence. Managing an asset (elite capture): communities were given a large tarpaulin, use as an agricultural drying floor or roofing material. Focus on elite capture in a surprise follow up visit 5 months later. 18

19 The ex ante analysis plan RCTs less subject to data mining than other empirical approaches have to select outcome measures before hand, take years to run But, when large number of outcome measures, potential to pick and choose results to present or emphasize data mining Before the program began in 2005, research and project teams together agreed to a set of hypotheses about GoBifo impacts. Before analyzing endline data, submitted exact list of outcome and explanatory variables under each hypothesis, including grouping into families, and econometric specifications, to the J PAL project registry. Project success is determined by the mean treatment effect across all outcomes under a given hypothesis (Kling and Leibman 2004) Results for all 318 pre specified outcomes in web appendix. 19

20 Stanford May 2011 Reshaping Institutions 20

21 Stanford May 2011 Reshaping Institutions 21

22 Stratification What is it: dividing the sample into different subgroups Run a separate random assignment (lottery) in each subgroup Example: we have 10 female (red) and 10 male (blue) students in our evaluation sample We randomly divide them into control and treatment groups of 10 students T C

23 Stratification What is it: dividing the sample into different subgroups Run a separate random assignment (lottery) in each subgroup Example: we have 10 female (red) and 10 male (blue) students in our evaluation sample We randomly divide them into control and treatment groups of 10 students T T Balance Test C T C C

24 When to stratify Stratify on variables (or index variables you create) that could have important impact on outcome variable (bit of a guess) We can stratify along multiple variables, e.g. gender and programme Limitations? Stratify on subgroups that you are particularly interested in (where may think impact of program may be different) Stratification can increase the precision of our estimate (more in Workshop 5) Stratification more important when we have a small sample size Can you overstratify? In theory, but very unlikely