Trunkierte Regression: simulierte Daten

Size: px
Start display at page:

Download "Trunkierte Regression: simulierte Daten"

Transcription

1 Trunkierte Regression: simulierte Daten * Datengenerierung set seed set obs 48 obs was 0, now 48 gen age=_n+17 gen yhat= *(age-18) gen wage = yhat *invnorm(uniform()) replace wage=max(0,wage) (0 real changes made) sum wage Variable Obs Mean Std Dev Min Max wage graph twoway scatter wage age, ytitle("einkommen (Euro)") xtitle("alter in Jahren > ") legend(off) yline(7000, lc(red)) xline(25 45, lc(green)) lfit wage age, clc > (blue) lfit wage age if wage<7000, clc(red) lfit wage age if age>24 & age<4 > 6, clc(green) reg wage age Source SS df MS Number of obs = F( 1, 46) = 7125 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef Std Err t P> t [95% Conf Interval] age _cons

2 truncreg wage age if wage<7000, ul(7000) (note: 0 obs truncated) Fitting full model: Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Truncated regression Limit: lower = -inf Number of obs = 27 upper = 7000 Wald chi2(1) = 447 Log likelihood = Prob > chi2 = wage Coef Std Err z P> z [95% Conf Interval] eq1 age _cons sigma _cons reg wage age if wage<7000 Source SS df MS Number of obs = F( 1, 25) = 588 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef Std Err t P> t [95% Conf Interval] age _cons reg wage age if age>24 & age<46 Source SS df MS Number of obs = F( 1, 19) = 272 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = wage Coef Std Err t P> t [95% Conf Interval] age _cons

3 MROZDTA: Sample Selektion mit Heckman Korrektur use "C:\hja\lehre\daten\wooldridge\stata\MROZDTA", clear bysort inlf: summ lwage -> inlf = 0 Variable Obs Mean Std Dev Min Max lwage 0 -> inlf = 1 Variable Obs Mean Std Dev Min Max lwage tab inlf =1 if in lab frce, 1975 Freq Percent Cum Total reg lwage educ exper expersq Source SS df MS Number of obs = F( 3, 424) = 2629 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef Std Err t P> t [95% Conf Interval] educ exper expersq _cons

4 heckman lwage educ exper expersq, select(inlf = nwifeinc educ exper expersq age > kidslt6 kidsge6) twostep Heckman selection model -- two-step estimates Number of obs = 753 (regression model with sample selection) Censored obs = 325 Uncensored obs = 428 Wald chi2(6) = Prob > chi2 = Coef Std Err z P> z [95% Conf Interval] lwage educ exper expersq e-06 _cons inlf nwifeinc educ exper expersq age kidslt kidsge _cons mills lambda rho sigma lambda

5 MROZDTA: Sample Selektion mit Heckman Korrektur zu Fuss probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6 Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Probit estimates Number of obs = 753 LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = inlf Coef Std Err z P> z [95% Conf Interval] nwifeinc educ exper expersq age kidslt kidsge _cons Berechne linearen Praediktor: z*gamma predict lp, xb summ lp Variable Obs Mean Std Dev Min Max lp * Berechne inverses Mills Verhaeltnis gen mills=normden(lp) / (norm(lp)) reg lwage educ exper expersq mills Source SS df MS Number of obs = F( 4, 423) = 1969 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef Std Err t P> t [95% Conf Interval] educ exper expersq e-06 mills _cons * Standardfehler muessen noch angepasst werden! 5

6 Simulierte Daten: Sample Selektion mit Heckman Korrektur corr2data lwage educ s, n(100) corr(1, 65, 8 \ 65, 1, 78 \ 8, 78, 1) sds(7 > 23 1) means( ) (obs 100) gen sel=(s>0) tab sel sel Freq Percent Cum Total gen sel_s=s if sel (13 missing values generated) gen nsel_s=s if sel==0 (87 missing values generated) graph twoway scatter sel_s nsel_s educ, yline(0, lc(red)) msize(small medlarge) l > egend(off) clc(blue red) ytitle("arbeitsangebot") xtitle("ausbildung") lfit s > educ, clc(blue) gen sel_lwage=lwage if sel (13 missing values generated) gen nsel_lwage=lwage if sel==0 (87 missing values generated) label variable sel_lwage "erwerbstätig" label variable nsel_lwage " nicht erwerbstätig" graph twoway scatter sel_lwage nsel_lwage educ, msize(small medlarge) xtitle("aus > bildung") ytitle("logarithm Lohnofferte") lfit lwage educ, clc(blue) lfit > lwage educ if sel, clc(red) legend(off) reg lwage educ Source SS df MS Number of obs = F( 1, 98) = 7170 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef Std Err t P> t [95% Conf Interval] educ _cons

7 reg lwage educ if sel Source SS df MS Number of obs = F( 1, 85) = 3555 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lwage Coef Std Err t P> t [95% Conf Interval] educ _cons gen hwage=lwage if sel (13 missing values generated) heckman hwage educ, select(sel = educ) twostep Heckman selection model -- two-step estimates Number of obs = 100 (regression model with sample selection) Censored obs = 13 Uncensored obs = 87 Wald chi2(2) = 3440 Prob > chi2 = Coef Std Err z P> z [95% Conf Interval] hwage educ _cons sel educ _cons mills lambda rho sigma lambda