Trunkierte Regression: simulierte Daten

Similar documents
Transcription:

Trunkierte Regression: simulierte Daten * Datengenerierung set seed 26091952 set obs 48 obs was 0, now 48 gen age=_n+17 gen yhat=2000+200*(age-18) gen wage = yhat + 2000*invnorm(uniform()) replace wage=max(0,wage) (0 real changes made) sum wage Variable Obs Mean Std Dev Min Max wage 48 6438898 3330226 3177541 1208348 graph twoway scatter wage age, ytitle("einkommen (Euro)") xtitle("alter in Jahren > ") legend(off) yline(7000, lc(red)) xline(25 45, lc(green)) lfit wage age, clc > (blue) lfit wage age if wage<7000, clc(red) lfit wage age if age>24 & age<4 > 6, clc(green) reg wage age Source SS df MS Number of obs = 48 -------------+------------------------------ F( 1, 46) = 7125 Model 316748551 1 316748551 Prob > F = 00000 Residual 204500464 46 444566227 R-squared = 06077 -------------+------------------------------ Adj R-squared = 05991 Total 521249015 47 110904046 Root MSE = 21085 wage Coef Std Err t P> t [95% Conf Interval] -------- age 1854302 2196804 844 0000 1412108 2296495 _cons -1256453 9611278-131 0198-3191103 6781966 1

truncreg wage age if wage<7000, ul(7000) (note: 0 obs truncated) Fitting full model: Iteration 0: log likelihood = -23873319 Iteration 1: log likelihood = -23746611 Iteration 2: log likelihood = -23736871 Iteration 3: log likelihood = -23736804 Iteration 4: log likelihood = -23736804 Truncated regression Limit: lower = -inf Number of obs = 27 upper = 7000 Wald chi2(1) = 447 Log likelihood = -23736804 Prob > chi2 = 00344 wage Coef Std Err z P> z [95% Conf Interval] -------- eq1 age 1487481 7033039 211 0034 1090309 2865932 _cons -3002341 1966042-002 0988-3883394 3823347 -------- sigma _cons 2303502 5313692 434 0000 1262038 3344967 reg wage age if wage<7000 Source SS df MS Number of obs = 27 -------------+------------------------------ F( 1, 25) = 588 Model 209575004 1 209575004 Prob > F = 00229 Residual 891004071 25 356401629 R-squared = 01904 -------------+------------------------------ Adj R-squared = 01580 Total 110057908 26 423299644 Root MSE = 18879 wage Coef Std Err t P> t [95% Conf Interval] -------- age 7855718 323956 242 0023 118372 1452772 _cons 1391272 1129105 123 0229-9341642 3716707 reg wage age if age>24 & age<46 Source SS df MS Number of obs = 21 -------------+------------------------------ F( 1, 19) = 272 Model 126307388 1 126307388 Prob > F = 01158 Residual 883842236 19 465180124 R-squared = 01250 -------------+------------------------------ Adj R-squared = 00790 Total 101014962 20 505074812 Root MSE = 21568 wage Coef Std Err t P> t [95% Conf Interval] -------- age 1280764 777258 165 0116-346056 2907583 _cons 8835241 2760816 032 0752-4894931 6661979 2

MROZDTA: Sample Selektion mit Heckman Korrektur use "C:\hja\lehre\daten\wooldridge\stata\MROZDTA", clear bysort inlf: summ lwage -> inlf = 0 Variable Obs Mean Std Dev Min Max lwage 0 -> inlf = 1 Variable Obs Mean Std Dev Min Max lwage 428 1190173 7231978-2054164 3218876 tab inlf =1 if in lab frce, 1975 Freq Percent Cum 0 325 4316 4316 1 428 5684 10000 Total 753 10000 reg lwage educ exper expersq Source SS df MS Number of obs = 428 -------------+------------------------------ F( 3, 424) = 2629 Model 350222967 3 116740989 Prob > F = 00000 Residual 188305144 424 444115906 R-squared = 01568 -------------+------------------------------ Adj R-squared = 01509 Total 223327441 427 523015084 Root MSE = 66642 lwage Coef Std Err t P> t [95% Conf Interval] -------- educ 1074896 0141465 760 0000 0796837 1352956 exper 0415665 0131752 315 0002 0156697 0674633 expersq -0008112 0003932-206 0040-0015841 -0000382 _cons -5220406 1986321-263 0009-9124667 -1316144 3

heckman lwage educ exper expersq, select(inlf = nwifeinc educ exper expersq age > kidslt6 kidsge6) twostep Heckman selection model -- two-step estimates Number of obs = 753 (regression model with sample selection) Censored obs = 325 Uncensored obs = 428 Wald chi2(6) = 18010 Prob > chi2 = 00000 Coef Std Err z P> z [95% Conf Interval] -------- lwage educ 1090655 015523 703 0000 0786411 13949 exper 0438873 0162611 270 0007 0120163 0757584 expersq -0008591 0004389-196 0050-0017194 115e-06 _cons -5781032 3050062-190 0058-1175904 019698 -------- inlf nwifeinc -0120237 0048398-248 0013-0215096 -0025378 educ 1309047 0252542 518 0000 0814074 180402 exper 1233476 0187164 659 0000 0866641 1600311 expersq -0018871 0006-315 0002-003063 -0007111 age -0528527 0084772-623 0000-0694678 -0362376 kidslt6-8683285 1185223-733 0000-1100628 -636029 kidsge6 036005 0434768 083 0408-049208 1212179 _cons 2700768 508593 053 0595-7267473 1266901 -------- mills lambda 0322619 1336246 024 0809-2296376 2941613 -------- rho 004861 sigma 66362875 lambda 03226186 1336246 4

MROZDTA: Sample Selektion mit Heckman Korrektur zu Fuss probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 Iteration 0: log likelihood = -5148732 Iteration 1: log likelihood = -40578215 Iteration 2: log likelihood = -40132924 Iteration 3: log likelihood = -40130219 Iteration 4: log likelihood = -40130219 Probit estimates Number of obs = 753 LR chi2(7) = 22714 Prob > chi2 = 00000 Log likelihood = -40130219 Pseudo R2 = 02206 inlf Coef Std Err z P> z [95% Conf Interval] -------- nwifeinc -0120237 0048398-248 0013-0215096 -0025378 educ 1309047 0252542 518 0000 0814074 180402 exper 1233476 0187164 659 0000 0866641 1600311 expersq -0018871 0006-315 0002-003063 -0007111 age -0528527 0084772-623 0000-0694678 -0362376 kidslt6-8683285 1185223-733 0000-1100628 -636029 kidsge6 036005 0434768 083 0408-049208 1212179 _cons 2700768 508593 053 0595-7267473 1266901 Berechne linearen Praediktor: z*gamma predict lp, xb summ lp Variable Obs Mean Std Dev Min Max lp 753 2058399 8213258-2810314 2051779 * Berechne inverses Mills Verhaeltnis gen mills=normden(lp) / (norm(lp)) reg lwage educ exper expersq mills Source SS df MS Number of obs = 428 -------------+------------------------------ F( 4, 423) = 1969 Model 350479487 4 876198719 Prob > F = 00000 Residual 188279492 423 445105182 R-squared = 01569 -------------+------------------------------ Adj R-squared = 01490 Total 223327441 427 523015084 Root MSE = 66716 lwage Coef Std Err t P> t [95% Conf Interval] -------- educ 1090655 0156096 699 0000 0783835 1397476 exper 0438873 0163534 268 0008 0117434 0760313 expersq -0008591 0004414-195 0052-0017267 849e-06 mills 0322619 1343877 024 0810-2318889 2964126 _cons -5781032 306723-188 0060-1180994 0247879 * Standardfehler muessen noch angepasst werden! 5

Simulierte Daten: Sample Selektion mit Heckman Korrektur corr2data lwage educ s, n(100) corr(1, 65, 8 \ 65, 1, 78 \ 8, 78, 1) sds(7 > 23 1) means(12 123 12) (obs 100) gen sel=(s>0) tab sel sel Freq Percent Cum 0 13 1300 1300 1 87 8700 10000 Total 100 10000 gen sel_s=s if sel (13 missing values generated) gen nsel_s=s if sel==0 (87 missing values generated) graph twoway scatter sel_s nsel_s educ, yline(0, lc(red)) msize(small medlarge) l > egend(off) clc(blue red) ytitle("arbeitsangebot") xtitle("ausbildung") lfit s > educ, clc(blue) gen sel_lwage=lwage if sel (13 missing values generated) gen nsel_lwage=lwage if sel==0 (87 missing values generated) label variable sel_lwage "erwerbstätig" label variable nsel_lwage " nicht erwerbstätig" graph twoway scatter sel_lwage nsel_lwage educ, msize(small medlarge) xtitle("aus > bildung") ytitle("logarithm Lohnofferte") lfit lwage educ, clc(blue) lfit > lwage educ if sel, clc(red) legend(off) reg lwage educ Source SS df MS Number of obs = 100 -------------+------------------------------ F( 1, 98) = 7170 Model 204954768 1 204954768 Prob > F = 00000 Residual 280145259 98 285862509 R-squared = 04225 -------------+------------------------------ Adj R-squared = 04166 Total 485100027 99 490000027 Root MSE = 53466 lwage Coef Std Err t P> t [95% Conf Interval] -------- educ 1978261 0233632 847 0000 1514625 2441897 _cons -1233261 2922994-422 0000-181332 -6532023 6

reg lwage educ if sel Source SS df MS Number of obs = 87 -------------+------------------------------ F( 1, 85) = 3555 Model 974115817 1 974115817 Prob > F = 00000 Residual 232914145 85 274016641 R-squared = 02949 -------------+------------------------------ Adj R-squared = 02866 Total 330325726 86 384099682 Root MSE = 52347 lwage Coef Std Err t P> t [95% Conf Interval] -------- educ 1618271 0271415 596 0000 1078624 2157917 _cons -722441 3504985-206 0042-1419326 -025556 gen hwage=lwage if sel (13 missing values generated) heckman hwage educ, select(sel = educ) twostep Heckman selection model -- two-step estimates Number of obs = 100 (regression model with sample selection) Censored obs = 13 Uncensored obs = 87 Wald chi2(2) = 3440 Prob > chi2 = 00000 Coef Std Err z P> z [95% Conf Interval] -------- hwage educ 1763978 044358 398 0000 0894577 2633379 _cons -9295219 6097023-152 0127-2124516 2654726 -------- sel educ 5324168 1234912 431 0000 2903785 774455 _cons -4713984 1277398-369 0000-7217638 -2210331 -------- mills lambda 1438366 3461317 042 0678-534569 8222422 -------- rho 027647 sigma 52026357 lambda 14383663 3461317 7