Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).)

Similar documents
Transcription:

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) There are two data sets; each as the same treatment group of 185 men. JTRAIN2 includes 260 men as a control group. Assignment to the training program was random. N 445 in total. Simple comparison of means should be sufficient. JTRAIN3 uses a control group drawn from the Current Population Survey (CPS). Here, there are 2, 490 men in the control group, and many of them look nothing like the treated group or the control group from the experiment. In particular, overlap is very poor in JTRAIN3. 1

. * Use the experimental data first.. use jtrain2. des Contains data from jtrain2.dta obs: 445 storage display value variable name type format label variable label train byte %9.0g 1 if assigned to job training age byte %9.0g age in 1977 educ byte %9.0g years of education black byte %9.0g 1 if black hisp byte %9.0g 1 if Hispanic married byte %9.0g 1 if married nodegree byte %9.0g 1 if no high school degree mosinex byte %9.0g # mnths prior to 1/78 in expmnt re74 float %9.0g real earns., 1974, $1000s re75 float %9.0g real earns., 1975, $1000s re78 float %9.0g real earns., 1978, $1000s unem74 byte %9.0g 1 if unem. all of 1974 unem75 byte %9.0g 1 if unem. all of 1975 unem78 byte %9.0g 1 if unem. all of 1978 2

. tab train 1 if assigned to job training Freq. Percent Cum. ------------ ----------------------------------- 0 260 58.43 58.43 1 185 41.57 100.00 ------------ ----------------------------------- Total 445 100.00. sum re78 Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- re78 445 5.300765 6.631493 0 60.3079. count if re78 0 137 3

. sum unem74 unem75 re74 re75 educ if train Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- unem74 185.7081081.4558666 0 1 unem75 185.6.4912274 0 1 re74 185 2.095574 4.886623 0 35.0401 re75 185 1.532056 3.219251 0 25.1422 educ 185 10.34595 2.01065 4 16 ------------- --------------------------------------------------------. sum unem74 unem75 re74 re75 educ if ~train Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- unem74 260.75.4338478 0 1 unem75 260.6846154.4655651 0 1 re74 260 2.107027 5.687907 0 39.5707 re75 260 1.266909 3.102983 0 23.032 educ 260 10.08846 1.614325 3 14 ------------- --------------------------------------------------------. di (1.532-1.267)/sqrt(3.219^2 3.103^2).05926978. * So the normalized difference for re75 is much less that the Imbens-Rubin. * ROT,.25. For other variables, even smaller. So overlap seems fine, as. * it should with random assignment. 4

. reg re78 train, robust Linear regression Number of obs 445 F( 1, 443) 7. Prob F 0.0078 R-squared 0.0178 Root MSE 6.5795 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train 1.794343.6708247 2.67 0.008.4759489 3.112737 _cons 4.554802.3402038 13.39 0.000 3.886188 5.223416 5

. * Difference of means estimate is $1,794, and statistically significant.. reg re78 train age educ black hisp re74 re75, robust Linear regression Number of obs 445 F( 7, 437) 3. Prob F 0.0014 R-squared 0.0548 Root MSE 6.4988 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train 1.680049.6565083 2.56 0.011.3897432 2.970356 age.0543452.0372134 1.46 0.145 -.0187943.1274847 educ.4035985.1549643 2.60 0.010.0990305.7081665 black -2.180068 1.004082-2.17 0.030-4.153499 -.2066377 hisp.1435597 1.352448 0.11 0.916-2.514552 2.801671 re74.0833058.1063871 0.78 0.434 -.1257881.2923997 re75.0467654.1198593 0.39 0.697 -.1888069.2823377 _cons.6740724 2.325163 0.29 0.772-3.895821 5.243965. * Slightly smaller estimate, but could just be sampling error. 6

. gen avgre (re74 re75)/2. reg re78 train age educ black hisp re74 re75 if avgre 15, robust Linear regression Number of obs 433 F( 7, 425) 3. Prob F 0.0008 R-squared 0.0487 Root MSE 6.4005 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train 1.506995.6603664 2.28 0.023.2090038 2.804985 age.0541943.0373255 1.45 0.147 -.0191714.1275599 educ.3406206.1469409 2.32 0.021.0517992.629442 black -2.388997 1.028978-2.32 0.021-4.411517 -.3664775 hisp -.2516622 1.350757-0.19 0.852-2.906659 2.403334 re74 -.0554221.1346864-0.41 0.681 -.3201565.2093123 re75.2339261.1708404 1.37 0.172 -.1018711.5697234 _cons 1.599062 2.273637 0.70 0.482-2.869911 6.068035 7

. logit train age educ black hisp re74 re75 Logistic regression Number of obs 445 LR chi2(6) 8. Prob chi2 0.2165 Log likelihood -297.94654 Pseudo R2 0.0137 train Coef. Std. Err. z P z [95% Conf. Interval age.0121785.0137264 0.89 0.375 -.0147247.0390818 educ.0651499.0556165 1.17 0.241 -.0438564.1741562 black -.3447694.3571488-0.97 0.334-1.044768.3552293 hisp -.9181503.4994623-1.84 0.066-1.897079.0607778 re74 -.0225646.0251521-0.90 0.370 -.0718618.0267326 re75.0508353.041872 1.21 0.225 -.0312322.1329029 _cons -.9758737.7627859-1.28 0.201-2.470907.5191591. predict phat (option p assumed; Pr(train)) 8

. gen kate (train - phat)*re78/(phat*(1-phat)). * Average the kate to get ATE:. sum kate Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- kate 445 1.630114 17.80095-63.3213 133.8003. * Estimate is 1.63, which is pretty close to the regression adjustment. * estimate of 1.68.. reg kate Source SS df MS Number of obs 445 ------------- ------------------------------ F( 0, 444) 0. Model 0 0. Prob F Residual 140692 444 316.873873 R-squared 0.0000 ------------- ------------------------------ Adj R-squared 0.0000 Total 140692 444 316.873873 Root MSE 17.801 kate Coef. Std. Err. t P t [95% Conf. Interval _cons 1.630114.843846 1.93 0.054 -.0283146 3.288542 9

10

. * The unadjusted standard error is.844. Theory tells us this is conservative. * Generate score from first-stage logit.. gen uh train - phat. gen ageuh age*uh. gen educuh educ*uh. gen blackuh black*uh. gen hispuh hisp*uh. gen re74uh re74*uh. gen re75uh re75*uh 11

. reg kate uh-re75uh Source SS df MS Number of obs 445 ------------- ------------------------------ F( 7, 437) 44. Model 58831.7764 7 8404.53949 Prob F 0.0000 Residual 81860.2234 437 187.323166 R-squared 0.4182 ------------- ------------------------------ Adj R-squared 0.4088 Total 140692 444 316.873873 Root MSE 13.687 kate Coef. Std. Err. t P t [95% Conf. Interval uh 5.073816 10.29208 0.49 0.622-15.15432 25.30195 ageuh.3430616.1898331 1.81 0.071 -.0300378.716161 educuh 1.451498.7301296 1.99 0.047.0164955 2.8865 blackuh -8.287474 4.942472-1.68 0.094-18.00144 1.426496 hispuh 5.940011 6.83781 0.87 0.385-7.49907 19.37909 re74uh.2905498.3640986 0.80 0.425 -.4250523 1.006152 re75uh.1264086.6001848 0.21 0.833-1.053199 1.306016 _cons 1.630114.6488073 2.51 0.012.3549433 2.905285 12

. predict ehat, resid. * The residuals are kate with the score netted out.. sum ehat Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- ehat 445 1.76e-08 13.57829-57.20093 119.8001. di 13.58/sqrt(445).64375374. *.644 is the adjusted standard error, compared with.844.. * New t statistic is. di 1.63/.644 2.5310559. * Much closer to the regression adjustment t statistic. 13

. * What about regression on the propensity score". reg re78 train phat, robust Linear regression Number of obs 445 F( 2, 442) 4. Prob F 0.0138 R-squared 0.0218 Root MSE 6.5738 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train 1.679599.6626177 2.53 0.012.3773257 2.981871 phat 6.295762 3.968517 1.59 0.113-1.503745 14.09527 _cons 1.985166 1.662229 1.19 0.233-1.281689 5.25202. sum phat Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- phat 445.4157303.0666689.2023731.6781313 14

. gen train_phat train*(phat -.456). reg re78 train phat train_phat Source SS df MS Number of obs 445 ------------- ------------------------------ F( 3, 441) 4. Model 560.349421 3 186.78314 Prob F 0.0050 Residual 18965.3072 441 43.0052318 R-squared 0.0287 ------------- ------------------------------ Adj R-squared 0.0221 Total 19525.6566 444 43.9767041 Root MSE 6.5578 re78 Coef. Std. Err. t P t [95% Conf. Interval train 2.31026.7289971 3.17 0.002.8775199 3.743 phat -.6519878 6.12468-0.11 0.915-12.68918 11.3852 train_phat 17.01614 9.584992 1.78 0.077-1.821804 35.85408 _cons 4.820913 2.532676 1.90 0.058 -.1567015 9.798528 15

. * Now nearest neighor matching. Just one nearest neighbor, ATE.. * Actually, the variance is computed as if we want the sample. * ATE (which is the same estimate as for population ATE, but. * calcuation of standard error is easier).. nnmatch re78 train age educ black hisp re74 re75 Matching estimator: Average Treatment Effect Weighting matrix: inverse variance Number of obs 445 Number of matches (m) re78 Coef. Std. Err. z P z [95% Conf. Interval SATE 1.628099.7782999 2.09 0.036.1026589 3.153538 Matching variables: age educ black hisp re74 re75. * Almost identical to other estimates; somewhat less precise. 16

.* Nonexperimental data: control group is drawn from CPS.. use jtrain3. tab train 1 if in job training Freq. Percent Cum. ------------ ----------------------------------- 0 2,490 93.08 93.08 1 185 6.92 100.00 ------------ ----------------------------------- Total 2,675 100.00 17

. sum re78 Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- re78 2675 20.50238 15.63252 0 121.174. sum unem74 unem75 re74 re75 educ if train Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- unem74 185.7081081.4558666 0 1 unem75 185.6.4912274 0 1 re74 185 2.095574 4.886623 0 35.0401 re75 185 1.532056 3.219251 0 25.1422 educ 185 10.34595 2.01065 4 16. sum unem74 unem75 re74 re75 educ if ~train Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- unem74 2490.0863454.2809298 0 1 unem75 2490.1.3000603 0 1 re74 2490 19.42875 13.40688 0 137.149 re75 2490 19.06334 13.59695 0 156.653 educ 2490 12.11687 3.082435 0 17 18

. * Lack of overlap is clearly a problem.. di (1.532-19.063)/sqrt(3.22^2 13.60^2) -1.2543652. * The normalized difference for re75 is more than one in absolute. * value.. * Can further see the problem if we ask: What if we try to match on. * the propensity score? 19

train = 0 Density 0 10 20 30 0.2.4.6.8 Pr(train) 20

train = 1 Density 0 1 2 3 4 0.2.4.6.8 1 Pr(train) 21

. * Comparison of means gives very different result now.. reg re78 train, robust Linear regression Number of obs 2675 F( 1, 2673) 537. Prob F 0.0000 R-squared 0.0609 Root MSE 15.152 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train -15.20478.6559143-23.18 0.000-16.49093-13.91863 _cons 21.55392.311785 69.13 0.000 20.94256 22.16529 22

. * Regression adjustment, also controlling for marital status (which matters. * now):. reg re78 train age educ black hisp married re74 re75, robust Linear regression Number of obs 2675 F( 8, 2666) 253. Prob F 0.0000 R-squared 0.5863 Root MSE 10. Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train.8597703.7665736 1.12 0.262 -.6433687 2.362909 age -.081537.020672-3.94 0.000 -.1220718 -.0410022 educ.5280233.088394 5.97 0.000.3546957.701351 black -.5427091.4421585-1.23 0.220-1.409717.3242993 hisp 2.165568 1.218258 1.78 0.076 -.2232582 4.554394 married 1.220271.496305 2.46 0.014.2470896 2.193453 re74.2778865.0617851 4.50 0.000.156735.399038 re75.5681222.0665303 8.54 0.000.4376661.6985784 _cons.7767343 1.485113 0.52 0.601-2.135356 3.688824 23

. * Interactions with continuous variables don t help much:. sum age educ re74 re75 Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- age 2675 34.22579 10.49984 17 55 educ 2675 11.99439 3.053556 0 17 re74 2675 18.23 13.72225 0 137.149 re75 2675 17.85089 13.87778 0 156.653. gen train_age train*(age - 34.23). gen train_educ train*(educ - 12). gen train_re74 train*(re74-18.23). gen train_re75 train*(re75-17.85) 24

. reg re78 train age educ black hisp married re74 re75 train_age train_educ train_re74 train_re75, robust Linear regression Number of obs 2675 F( 12, 2662) 184. Prob F 0.0000 R-squared 0.5884 Root MSE 10.052 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train -9.41501 3.215613-2.93 0.003-15.72036-3.109657 age -.0903819.0213275-4.24 0.000 -.1322021 -.0485617 educ.5110259.091871 5.56 0.000.3308801.6911716 black -.5403092.4436272-1.22 0.223-1.410198.3295797 hisp 2.305321 1.20601 1.91 0.056 -.0594897 4.670132 married 1.316718.4939862 2.67 0.008.3480827 2.285354 re74.2853666.0630723 4.52 0.000.1616909.4090423 re75.566954.0677195 8.37 0.000.4341659.6997422 train_age.178565.0661498 2.70 0.007.0488548.3082751 train_educ.1826352.2581829 0.71 0.479 -.3236241.6888945 train_re74 -.2404972.2528187-0.95 0.342 -.7362382.2552438 train_re75 -.5060853.1970196-2.57 0.010 -.8924123 -.1197582 _cons 1.079186 1.543661 0.70 0.485-1.947709 4.106082 25

26

. * What if we drop all observations with (re74 re75)/2 15? Comparison. * of means still does not "work":. reg re78 train if avgre 15 Source SS df MS Number of obs 1162 ------------- ------------------------------ F( 1, 1160) 37. Model 3811.02398 1 3811.02398 Prob F 0.0000 Residual 118362.101 1160 102.036294 R-squared 0.0312 ------------- ------------------------------ Adj R-squared 0.0304 Total 122173.125 1161 105.230943 Root MSE 10.101 re78 Coef. Std. Err. t P t [95% Conf. Interval train -5.005321.8190085-6.11 0.000-6.612224-3.398417 _cons 11.1906.3223455 34.72 0.000 10.55816 11.82305 27

. * But regression adjustment does.. reg re78 train age educ black hisp married re74 re75 if avgre 15, robust Linear regression Number of obs 1162 F( 8, 1153) 82. Prob F 0.0000 R-squared 0.2797 Root MSE 8.7365 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train 2.059039.8011361 2.57 0.010.4871915 3.630887 age -.1016697.0238413-4.26 0.000 -.1484468 -.0548925 educ.4046894.099396 4.07 0.000.2096721.5997068 black -1.226412.5942647-2.06 0.039-2.392374 -.0604508 hisp.2360508.9030373 0.26 0.794-1.53573 2.007831 married 1.841522.5986858 3.08 0.002.6668861 3.016157 re74.2466141.0857127 2.88 0.004.0784438.4147844 re75.6600326.0819874 8.05 0.000.4991713.8208939 _cons 2.100924 1.639151 1.28 0.200-1.11513 5.316978 28

. * But all is not well: matching using the restricted dta set does not. * produce a positive effect:. nnmatch re78 train age educ black hisp married re74 re75 if avgre 15 Matching estimator: Average Treatment Effect Weighting matrix: inverse variance Number of obs 1162 Number of matches (m) re78 Coef. Std. Err. z P z [95% Conf. Interval SATE -3.846288 2.495099-1.54 0.123-8.736593 1.044017 Matching variables: age educ black hisp married re74 re75 29

. * Now estimate the propensity score by logit using all data:. logit train age educ black hisp married re74 re75 Logistic regression Number of obs 2675 LR chi2(7) 872. Prob chi2 0.0000 Log likelihood -236.23799 Pseudo R2 0.6488 train Coef. Std. Err. z P z [95% Conf. Interval age -.0840291.014761-5.69 0.000 -.1129601 -.055098 educ -.0624764.0513973-1.22 0.224 -.1632134.0382605 black 2.242955.3176941 7.06 0.000 1.620286 2.865624 hisp 2.094338.558456 3.75 0.000.9997842 3.188891 married -1.588358.2602447-6.10 0.000-2.098428-1.078287 re74 -.117043.0293604-3.99 0.000 -.1745882 -.0594977 re75 -.2577589.0394991-6.53 0.000 -.3351758 -.1803421 _cons 2.302714.9112558 2.53 0.012.5166855 4.088743 Note: 158 failures and 0 successes completely determined.. * It is not good to perfectly predict failures or successes completely.. * In effect, p(x) 0 for some values of x. 30

. predict phat (option pr assumed; Pr(train)). sum phat Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- phat 2675.0691589.1955842 6.84e-27.9358265. count if phat.0000001 254 31

. reg re78 train age educ black hisp married re74 re75 if phat.1 & phat.9, robust Linear regression Number of obs 309 F( 8, 300) 6. Prob F 0.0000 R-squared 0.0798 Root MSE 7.1833 Robust re78 Coef. Std. Err. t P t [95% Conf. Interval train 1.041732.863965 1.21 0.229 -.6584677 2.741931 age -.0595533.0437952-1.36 0.175 -.145738.0266314 educ.5206588.1728426 3.01 0.003.1805213.8607963 black -.0656901 1.267393-0.05 0.959-2.559797 2.428417 hisp -.4210468 1.842722-0.23 0.819-4.047344 3.205251 married 1.075984.9412249 1.14 0.254 -.7762551 2.928224 re74.1439033.1313233 1.10 0.274 -.1145283.4023349 re75.2658034.1420903 1.87 0.062 -.0138165.5454233 _cons.9472794 3.023905 0.31 0.754-5.003472 6.898031 32

. gen kate (train - phat)*re78/(phat*(1 - phat)). sum train Variable Obs Mean Std. Dev. Min Max ------------- -------------------------------------------------------- train 2675.0691589.2537716 0 1. gen katt (train - phat)*re78/(.06916*(1 - phat)). reg kate Source SS df MS Number of obs 2675 ------------- ------------------------------ F( 0, 2674) 0. Model 0 0. Prob F Residual 2.6303e 09 2674 983639.309 R-squared 0.0000 ------------- ------------------------------ Adj R-squared 0.0000 Total 2.6303e 09 2674 983639.309 Root MSE 991. kate Coef. Std. Err. t P t [95% Conf. Interval _cons 11.02854 19.17591 0.58 0.565-26.57258 48.62966 33

. reg katt if train Source SS df MS Number of obs 185 ------------- ------------------------------ F( 0, 184) 0. Model 0 0. Prob F Residual 2381062.76 184 12940.5585 R-squared 0.0000 ------------- ------------------------------ Adj R-squared 0.0000 Total 2381062.76 184 12940.5585 Root MSE 113. katt Coef. Std. Err. t P t [95% Conf. Interval _cons 91.80372 8.36355 10.98 0.000 75.30294 108.3045. * The PS weighted estimate of ATT seems unbelievably large and. * much too significant.. * Should redo the analysis with observations having phat.1 & phat.. * but can get some idea what restricting the sample will do: 34

. reg kate if phat.1 & phat.9 Source SS df MS Number of obs 309 ------------- ------------------------------ F( 0, 308) 0. Model 0 0. Prob F Residual 118555.189 308 384.919444 R-squared 0.0000 ------------- ------------------------------ Adj R-squared 0.0000 Total 118555.189 308 384.919444 Root MSE 19.619 kiate Coef. Std. Err. t P t [95% Conf. Interval _cons 1.024012 1.116107 0.92 0.360-1.172147 3.22017. * This is remarkably similar to the regression estimate when restricted. * to.1 phat.9.. * General point: when the sample has been balanced, method of. * estimation is much less important.. * Note that many fewer observations are lost by choosing sample based on. * avgre, rather than the Imbens/Rubin propensity score rule. 35