Introduction to Results-Based M&E and Impact Evaluation

Introduction to Results-Based M&E and Impact Evaluation Philip Davies April 17, 2007

World Bank Workshop Using Monitoring and Evaluation for Better Education Policy, Planning and Results Bali, Indonesia 16-20 April 2007 Introduction to Results-Based M&E and Impact Evaluation Philip Davies, PhD American Institutes for Research Washington, DC 20007

What is Evaluation? A family of research methods which seeks to systematically investigate the effectiveness of social interventions.in ways that improve social conditions (Rossi, Freeman and Lipsey, 1999:20)

What is Evaluation? Evaluation can be defined as: the process of determining the merit, worth, or value of something, or the product of that process Michael Scriven, Evaluation Thesaurus, 1991, 4th edition

What is Monitoring? Monitoring is about collecting real-time information about the progress of policies, projects and programmes. It is important that this information is collected in a planned, organised and routine way. This information is essential in order to evaluate whether policies are, or are not, working.

The Experimenting Society Donald T. Campbell a society that would use social science methods and evaluation techniques to vigorously try out possible solutions to recurrent problems and would make hardheaded, multidimensional evaluations of outcomes, and when the evaluation of one reform showed it to have been ineffective or harmful, would move on and try other alternatives (Campbell, 1999a:9).

Why Evaluate? Modernising Government Better Policy Making Evidence-Based Policy and Practice Greater Accountability Performance Management Public Spending and Fiscal Control Strategic Development

Why Evaluate? Acknowledgements to Paul Gertler Need evidence on what works Limited budget forces choices Bad policies could hurt Improve program/policy implementation Design: eligibility, benefits Operations: efficiency & targeting Information key to sustainability Budget negotiations Informing beliefs and managing press

Acknowledgements to Paul Gertler Traditional M & E Monitoring Outcome trends over time e.g. poverty, school enrollment, mortality Process Evaluation Implementation Efficiency Targeting Administrative Data Management Information Systems

Impact Evaluation Answers What is the effect of the program on outcomes? How much better off are beneficiaries because of the intervention? How would outcomes change under alternative program designs? Does the program impact people differently (e.g. females, poor, minorities) Is the program cost-effective? Traditional M&E cannot answer these Acknowledgements to Paul Gertler

Acknowledgements to Paul Gertler For Example Impact Evaluation Answers What is the effect of Job Training on employment and earnings? How much do cash transfers lower poverty? Do scholarships increase on school attendance for girls more than boys? Does contracting out primary health care to private sector lead to an increase in access? Does replacing dirt floors with cement reduce parasites & improve child health? Do improved roads increase access to labor markets & raise income for the poor?

Effectiveness of What? Intervention effectiveness - what works? Resource effectiveness - at what cost/benefit? Likely diversity of effectiveness across different groups what works for whom and when? Implementation effectiveness - how it works? Experiential effectiveness - users views 4

Two Main Types of Evaluation Impact (or summative) evaluations Does the policy (programme, intervention) work? Process (or formative) evaluations How, why, and under what conditions does the policy (programme, intervention) work?

Efficacy: Proof of Concept Types of Impact Evaluation Pilot under ideal conditions Effectiveness: Normal circumstances & capabilities Impact will be lower Impact at higher scale will be different Acknowledgements to Paul Gertler Costs will be different as there are economies of scale from fixed cost

Acknowledgements to Paul Gertler So, Use Impact Evaluation To. Scale up pilot-interventions/programs Kill programs Adjust program benefits Inform (i.e. Finance & Press) e.g. PROGRESA/OPORTUNIDADES (Mexico) Transition across presidential terms Expansion to 5 million households Change in benefits Battle with the press

Evaluation Methods Quantitative Methods Social Surveys Longitudinal Studies Experimental Designs (random allocation) Quasi-Experimental Designs (matched samples, ITS) Regression Analyses

Evaluation Methods Qualitative Methods In-Depth interviews Focus Groups Other Consultative Designs Observational and Participant Observational Studies

Evaluation Methods Multi-Method Evaluations Privileges no single method (Quant or Qual) Acknowledges the complementarity of Quant and Qual Integrates impact and process evaluations Driven by substantive issues at hand rather than methodological turf wars or a priori preferences

Causation versus Correlation Acknowledgements to Paul Gertler Correlation is not causation Correlation is a necessary but not sufficient condition Correlation: X and Y are related Change in X is related to a change in Y And. A change in Y is related to a change in X Causation if we change X how much does Y change A change in X is related to a change in Y Not necessarily the other way around

Causation versus Correlation Three criteria for causation: Independent variable precedes the dependent variable Independent variable is related to the dependent variable There are no third variables that could explain why the independent variable is related to the dependent variable

Different Types of Bias in Evaluations Selection bias - who gets into the trials Performance bias - a difference between what the experimental and control group receive other than the intervention being tested Attrition bias - bias attributable to respondents lost to follow-up Reporting bias - selective use of data; use of inappropriate statistics; statistical errors

Impact Evaluations Evaluations of Outcome Attainment (have targets been met?) Evaluations of Net Effects (against a counterfactual) Single Group Pre- and Post- Tests Interrupted Time Series Designs Matched Comparisons Designs Difference of Differences Propensity Score Matching Regression Discontinuity Designs Randomised Controlled Trials Meta-Analysis` Increasing strength of internal validity and causal inference

Evaluations of Net Effects (Against a Counterfactual) Counterfactual: what would have happened without the program Need to estimate the counterfactual i.e. find a control or comparison group Counterfactual Criteria Acknowledgements to Paul Gertler Treated & counterfactual groups have identical characteristics on average, Only reason for the difference in outcomes is due to the intervention

Evaluations of Net Effects (Against a Counterfactual) 40 60 35 50 30 Net Effect 4025 20 30 Series1 Series1 Series2 15 20 10 10 5 0 0 Year 1 Year 2 Year 3 Year 4 Year 5 Year 1 Year 2 Year 3 Year 4 Year 5

Randomised Controlled Trial/ Random Allocation Experiment The gold standard in impact evaluation Gives each eligible unit/individual the same chance of receiving the treatment/intervention Lottery for who receives benefit Lottery for who receives benefit first Requires allocation independent of service or policy providers Best when blind or double blind rarely possible in public policy/public service delivery

Baseline Randomised Controlled Trial/ Random Allocation Experiment Intervention group Intervention Outcome = O 1 Eligible population R Control group No Intervention Outcome = O 2 Effect estimate = O 1 -O 2 - counterfactual is O 2

Oportunidades Acknowledgements to Paul Gertler National anti-poverty program in Mexico (1997) Cash transfers and in-kind benefits conditional on school attendance and health care visits. Transfer given preferably to mother of beneficiary children. Large program with large transfers: 5 million beneficiary households in 2004 Large transfers, capped at: $95 USD for HH with children through junior high $159 USD for HH with children in high school

Acknowledgements to Paul Gertler Oportunidades Evaluation Phasing in of intervention 50,000 eligible rural communities Random sample of of 506 eligible communities in 7 states - evaluation sample Random assignment of benefits by community: 320 treatment communities (14,446 households) First transfers distributed April 1998 186 control communities (9,630 households) First transfers November 1999

Contact Philip Davies PhD Executive Director, Senior Research Fellow American Institutes for Research 1000 Thomas Jefferson Street, NW Washington DC 20007 Tel: 202 403-5785 Mobile: 202 445-3640 PDavies@air.org 2 Hill House Southside Steeple Aston Oxfordshire OX25 4SD Tel: +44 1869 347284 Mobile: +44 7927 186074 PDavies@air.org