DOES MATCHING OVERCOME LALONDE S CRITIQUE OF NON- EXPERIMENTAL ESTIMATORS? A POSTSCRIPT

Similar documents
History and Context Within-Study Comparisons in Economics

The Effect of Gender on Productivity Status in U.S. Agriculture

Zhejiang University of Technology International Summer Session ECON350: Labor Economics

An Assessment of Propensity Score Matching as a Non Experimental Impact Estimator: Evidence from Mexico s PROGRESA Program

Copyright Impacts on Relative Firm Sales: Evidence from the National Football League *

Strategy I: Econometric tools for strategy research B

Development Economics

Empirical methods in Public and Labor Economics

Economics Labor Economics 2 Spring 2004

WORKING PAPERS. Do Dropouts Benefit from Training Programs? Korean Evidence Employing Methods for Continuous Treatments

Are workers more vulnerable in tradable industries?

Dismissals for cause: The difference that just eight paragraphs can make

Chapter URL:

Sharing Means Caring? Hosts' Price Reaction to Rating Visibility

ECON 540 THE ECONOMICS OF LABOR MARKETS Department of Economics UIUC Semester offered: Spring 2006 Instructor: Todd Elder

ISO 9000 Norm as a Club Good: Network Effect Evidence from French Employer Survey *

Direct and indirect effects of training vouchers for the unemployed

An Evaluation of Public-Sector-Sponsored Continuous Vocational Training Programs in East Germany

Economics of Human Resources

The Nature of Wage Adjustment in the U.S.: New Evidence from Linked Worker Firm Data

UNIVERSITY OF WAIKATO. Hamilton. New Zealand. The Public Sector Pay Premium and Compensating Differentials in the New Zealand Labour Market

Did the Clean Air Act Cause the Remarkable Decline in Sulfur Dioxide Concentrations?*

Do Differences in the Scale of Irrigation Projects Generate Different Impacts on Poverty and Production?

The Adequacy of Comparison Group Designs for Evaluations of Employment-Related Programs

Evaluating the Effects of River and Stream Restorations

R&D subsidies to small young companies: should the independent and high-tech ones be favored in the granting process?

PharmaSUG 2016 Paper 36

Do Earnings Subsidies Affect Job Choice?

Are two heads better than one head in managing the farm business?

MARKET VALUE AND PATENTS A Bayesian Approach. Mark HIRSCHEY * 1. Introduction

NBER WORKING PAPER SERIES WHY WE NEED TO MEASURE THE EFFECT OF MERGER POLICY AND HOW TO DO IT. Dennis W. Carlton

Effects of Short Term Measures to Curb Air Pollution: Evidence from. Santiago, Chile

Exports and Productivity Growth: First Evidence from a Continuous Treatment Approach

Introduction to Results-Based M&E and Impact Evaluation

NBER WORKING PAPER SERIES WHY DO FIXED-EFFECTS MODELS PERFORM SO POORLY? THE CASE OF ACADEMIC SALARIES. Daniel S. Hamermesh. Working Paper No.

Labour Supply and the Extensive Margin

Energy Savings from Programmable Thermostats in the C&I Sector

Re-Exploring the Trade and Environment Nexus Through. the Diffusion of Pollution. A Supplementary Appendix

Earning Functions and Rates of Return

COST, REVENUE, AND STRATEGIC INTERACTION

What We Know About the Impacts of Workforce Investment Programs. Burt S. Barnow Johns Hopkins University. Jeffrey A. Smith University of Michigan

Gasoline Consumption Analysis

Innovation, profitability and growth in medium and high-tech manufacturing industries: Evidence from Italy a.

Canadian Labour Market and Skills Researcher Network

Non-Farm Enterprises and Poverty Reduction amongst Households in Rural Nigeria: A Propensity Score Matching Approach

How to Determine the X in RPI - X Regulation: A User's Guide

Liang Wang University of Hawaii Manoa. Randall Wright University of Wisconsin Madison Federal Reserve Bank of Minneapolis

NBER WORKING PAPER SERIES MINIMUM WAGE INCREASES, WAGES, AND LOW-WAGE EMPLOYMENT: EVIDENCE FROM SEATTLE

EXPORTING AND PRODUCTIVITY: EVIDENCE FOR EGYPT AND MOROCCO

Impact of Collective Marketing by Cocoa Farmers Organizations in Cameroon

The Application of Quasi- Experimental Methods in the Context of Marketing: A Literature Review

The Effect of Prepayment on Energy Use. By: Michael Ozog, Ph.D. Integral Analytics, Inc.

Offshoring, Firm Performance and Employment Evidence from German Establishment Data

The Instability of Unskilled Earnings

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Synergies between different types of agricultural technologies: insights from the Kenyan small farm sector

Addressing Onsite Sampling in Recreation Site Choice Models

Matching and VIA: Quasi-Experimental Methods in a World of Imperfect Data

The Effect of Electricity Taxation on Firm Competitiveness: Evidence from the German Manufacturing Sector

Amenities and the Labor Earnings Function

Optimism bias and differential information use in supply chain forecasting

Is Regional Integration Beneficial For AgriculturalProductivity in Sub-Saharan Africa? The Case of CEMAC and WAEMU

Nonresponse Bias in Internet-based Advertising Conversion Studies

THE EVALUATION OF VOCATIONAL TRAINING: A COMPARATIVE ANALYSIS..

Is personal initiative training a substitute or complement to the existing human capital of. women? Results from a randomized trial in Togo

Impact of improved varieties on the yield of rice producing households in Ghana

Secondary vocational education and earnings in Brazil

Appendix C EVALUATION OF FAMILY-TO-FAMILY USING PROPENSITY SCORE ANALYSIS: A COMPARISON STUDY

SDT 366. PRODUCT MIX CHANGES AND PERFORMANCE IN CHILEAN PLANTS Autores: Roberto Alvarez, Claudio Bravo-Ortega y Lucas Navarro

Innovative Work Practices, Information Technologies and Working Conditions: Evidence for France

Improving access to microcredit in Benin: Are the poor and women benefiting? Gbètoton Nadège Adèle Djossou Department of Economics and Management,

FTA Effects on Agricultural Trade with Matching Approaches

Counterintuitive Signs in Reduced Form Price Regressions

MEASURING PROTECTED AREAS IMPACT ON DEFORESTATION IN PANAMA

Emily Nix. 360 State Street Department of Economics. New Haven, CT New Haven, CT Telephone: +1 (504)

NBER WORKING PAPER SERIES WHY BARRIERS TO ENTRY ARE BARRIERS TO UNDERSTANDING. Dennis W. Carlton

On Macroeconomic Theories and Modeist

Can Integrated Rice-Fish System Increase Welfare of the Marginalized Extreme Poor in Bangladesh? A DID Matching Approach

Use of a delayed sampling design in business surveys coordinated by permanent random numbers

Can improved food legume varieties increase technical efficiency in crop production?

Can Consumer Economic Sentiment Indicator Predict Consumption Expenditure in the Eurozone?

Stable URL:

Kimberly Bayard Judith Hellerstein David Neumark Kenneth Troske

IMPACT OF AGRICULTURAL COOPERATIVES ON SMALLHOLDERS TECHNICAL EFFICIENCY: EVIDENCE FROM ETHIOPIA

The Effect of Minimum Wages on Employment: A Factor Model Approach

Econ 792. Labor Economics. Lecture 6

Using Weights in the Analysis of Survey Data

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Impacts of Kaizen management on workers: Evidence from the Central America and Caribbean Region.

A Review of Recent Developments in IMPACT EVALUATION

Assortative Matching > 0. (2)

CASE CONTROL MATCHING: COMPARING SIMPLE DISTANCE- AND PROPENSITY SCORE-BASED METHODS

Labor Economics I ECON Spring Course Schedule: Thursdays 11:45 to 13:45 Course Location: Room 5212

Substituting fossil energy sources: the role of the climate funds and effects on the economic growth

The Cost of Punishment: Fairness in High-Stakes Ultimatum Games

Offshoring, Firm Performance and Employment Evidence from German Establishment Data

Modelling the effects of low-input dairy farming using bookkeeping data from Austria

Research Collection. Firm size wage differentials in Switzerland: evidence from Job changers. Working Paper. ETH Library

4 Putting the evidence in evidencebased

Transcription:

DOES MATCHING OVERCOME LALONDE S CRITIQUE OF NON- EXPERIMENTAL ESTIMATORS? A POSTSCRIPT Rajeev Dehejia * Department of Economics and SIPA Columbia University and NBER 1. Introduction Jeffrey Smith and Petra Todd s comments on Dehejia and Wahba (1999, 2002) Smith and Todd (2004a) and the subsequent exchange (Dehejia 2004 and Smith and Todd 2004b) have been useful in highlighting some features of our original work and of propensity score methods more generally. However, like their original comment, their rejoinder illustrates common sources of confusion in applying these methods. In these remarks, I focus on three issues: the choice of research sample, the choice of propensity score specification, and the use of balancing tests. 2. The Choice of Sample Smith and Todd contend that the choice of sample in Dehejia and Wahba is arbitrary and unmotivated, because we present our main results for a subset of Lalonde s (1986) data. On the contrary, in that paper we provide a strong economic motivation for why we focus on a subsample of Lalonde s data. The evaluation literature (Ashenfelter 1978, Ashenfelter and Card 1985, Card and Sullivan 1988, and Heckman, Ichimura, Smith, and Todd 1998) underlines the importance of controlling for a sufficiently rich set of pre- * Department of Economics and SIPA, Columbia University, 420 W. 118 th Street, Room 1022, New York, NY, 10027; e-mail: rd247@columbia.edu.

treatment covariates and in the context of labor training programs of controlling for more than one year of pre-treatment earnings. For the subset of the data that we examine, information is available on two years of pre-treatment earnings, whereas Lalonde s full sample only contains information on one year of pre-treatment earnings. The difficulty lies matching earnings reports from the experiment to the earnings reported for the comparison groups (the former are expressed in terms of experimental time, whereas the latter are in calendar time). For individuals assigned to the treatment sufficiently early or individuals that had zero earnings, it is possible to make this match. Furthermore, in Dehejia and Wahba (1999) we provide results both for Lalonde s original sample and for our sub-sample, and we show that the additional year of earnings information is indeed necessary. Hence, our choice of sample is clearly motivated, and our results are transparent to the choices we make. 3. Choice of Propensity Score Specification A consistent mistake that Smith and Todd (2004a, 2004b) make in applying propensity methods is failing to select a propensity score specification for each treatment groupcomparison group combination. Consequently, their specifications are biased or inefficient. In Smith and Todd (2004a), they incorrectly apply specifications selected for Lalonde s sample and our sample to the alternate sample that they examine. In Smith and Todd (2004b), they commit a related, albeit more subtle, error. They apply the propensity score specifications that were used in Dehejia (2004) selected for various subsets of the NSW treatment group and the two non-experimental comparison groups to other 2

samples (in particular the full set of treatment and control experimental units plus the relevant comparison group units). Though the NSW experimental control units are randomly sampled from the same population as the NSW treated units, and in that sense presumably share the same underlying (population) propensity score, the estimated propensity score should also account for sampling variation in the selection of a particular treatment group. The distinction here relates to the fact that the estimated propensity score accounts not only for factors that determine selection into the treatment at a population level, but also sampling variability in the selection of a particular treatment group. Indeed, Hirano, Imbens, and Ridder (2004) have shown that propensity score methods are efficient only when the estimated propensity score is used, not when using the true propensity score even if it were available. This implies that propensity score methods can be used even when a random experiment is conducted (see for example Hill, Rubin, and Thomas 2000), and that each treatment group-comparison group combination requires its own propensity score specification. Thus, it is not surprising that the bias estimates reported by Smith and Todd (2004b) for the specifications used in Dehejia (2004) are low when applied to the NSWtreatment group but are larger when applied to the two other samples. Another set of propensity score specifications should be selected for these alternative samples before estimating the bias or the treatment effect. Finally it should be noted that Smith and Todd inflate their bias estimates by failing to impose the common support condition (i.e., restricting attention to comparison group members that fall within the support of the propensity score distribution of the 3

treatment group). This is makes their bias calculations very misleading, since the relevant question is the extent of bias for the estimated treatment effect and the treatment effect would be estimated using observations in the interval of common support. 4. Balancing Tests As suggested by Rosenbaum and Rubin (1983), and illustrated in Dehejia and Wahba, balancing tests are useful diagnostics on the suitability of a propensity score specification for a particular treatment group-comparison group combination. As in their bias calculations, Smith and Todd seem to use the full set of comparison observations in their balancing tests. This is misleading because the objective of propensity score methods is to focus attention precisely on the subset of the comparison group that is most comparable to the treatment group. As illustrated in Dehejia and Wahba (1999, 2002), there are many (indeed, most of the) observations in the PSID and CPS comparison groups are not comparable to the treatment group. Nonetheless, Smith and Todd s observation that there is no consensus on which balancing test to use is useful, and points to the value of ongoing research on this and related topics. 5. Conclusion In conclusion, it is notable that fundamentally there is more agreement than disagreement between Smith and Todd and myself. We all agree that propensity score methods are a valuable tool in the researcher s arsenal and that these methods are not a silver bullet fix to all evaluation problems. 4

REFERENCES Ashenfelter, Orley (1978). Estimating the Effects of Training Programs on Earnings, Review of Economics and Statistics, 60, 47-57. ------ and D. Card (1985). Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs, Review of Economics and Statistics, 67, 648-660. Card, David, and Daniel Sullivan (1988). Measuring the Effect of Subsidized Training Programs on Movements in and out of Employment, Econometrica, 56, 497-530. Dehejia, Rajeev (2004), Practical Propensity Score Matching: A Reply to Smith and Todd, forthcoming Journal of Econometrics. ------ and Sadek Wahba (1999). Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs, Journal of the American Statistical Association, 94, 1053-1062. ------ and ------ (2002). Propensity Score Matching Methods for Nonexperimental Causal Studies, National Bureau of Economics Research Working Paper No. 6829, forthcoming Review of Economics and Statistics. Heckman, James, Hidehiko Ichimura, Jeffrey Smith, and Petra Todd (1998). Characterizing Selection Bias Using Experimental Data, Econometrica, 66, 1017-1098. Hill, Jennifer, Rubin, D. B., and N. Thomas. (2000) The Design of the New York School Choice Scholarship Program Evaluation, in Leonard Bickman (ed.), Research Design: Donald Campbell's Legacy. New York: Sage Publications. Lalonde, Robert (1986). Evaluating the Econometric Evaluations of Training Programs, American Economic Review, 76, 604-620. Rosenbaum, P., and D. Rubin (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects, Biometrika, 70, 41-55. Smith, Jeffrey, and Petra Todd (2004a), Does Matching Overcome Lalonde s Critique of Nonexperimental Estimators, forthcoming Journal of Econometrics. ------ and ------ (2004b), Rejoinder, forthcoming Journal of Econometrics. 5