İnsan Tunalı November 29, 2018 Econ 511: Econometrics I. ANSWERS TO ASSIGNMENT 10: Part II STATA Supplement

Size: px
Start display at page:

Download "İnsan Tunalı November 29, 2018 Econ 511: Econometrics I. ANSWERS TO ASSIGNMENT 10: Part II STATA Supplement"

Transcription

1 İnsan Tunalı November 29, 2018 Econ 511: Econometrics I STATA Exercise 1 ANSWERS TO ASSIGNMENT 10: Part II STATA Supplement TASK 1: --- name: <unnamed> log: g:\econ511\heter_housinglog log type: text opened on: 26 Nov 2016, 13:20:38 * This exercise uses housing price data discussed in Verbeek Ch 3 * Executed by do file housingdo use f:\courses\grads\econ511\share\housing, clear des Contains data from f:\courses\grads\econ511\share\housingdta obs: 546 vars: Apr :12 size: 26, storage display value variable name type format label variable label --- price float %90g lotsize float %90g bedrooms float %90g bathrms float %90g stories float %90g driveway float %90g recroom float %90g fullbase float %90g gashw float %90g airco float %90g garagepl float %90g prefarea float %90g --- Sorted by: sum price lotsize bedrooms bathrms airco Variable Obs Mean Std Dev Min Max price lotsize bedrooms bathrms airco * Rescale variables gen price1000=price/1000

2 gen lotsize1000=lotsize/1000 sum price1000 lotsize1000 bedrooms bathrms airco Variable Obs Mean Std Dev Min Max price lotsize bedrooms bathrms airco * Here is the version without scaling: reg price lotsize bedrooms bathrms airco, robust Linear regression Number of obs = 546 F( 4, 541) = Prob > F = R-squared = Root MSE = Robust price Coef Std Err t P> t [95% Conf Interval] lotsize bedrooms bathrms airco _cons reg price1000 lotsize1000 bedrooms bathrms airco F( 4, 541) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = price1000 Coef Std Err t P> t [95% Conf Interval] lotsize bedrooms bathrms airco _cons * Which set of results are easier to interpret, those on the original data, or the rescaled data? test airco ( 1) airco = 0 F( 1, 541) = 9108 Prob > F = 00000

3 predict pres, resid predict phat, xb scatter pres phat Residuals Linear prediction * Route 1: Eicker-Huber-White heterosckedasticity-consistent inference * See Handout, Week 10: Heteroskedasticity and Autocorrelation reg price lotsize bedrooms bathrms airco, robust Linear regression Number of obs = 546 F( 4, 541) = Prob > F = R-squared = Root MSE = Robust price Coef Std Err t P> t [95% Conf Interval] lotsize bedrooms bathrms airco _cons * Inspection reveals that some standard errors go up, some down Thus direction of bias in t-tests cannot be known in advance test airco ( 1) airco = 0 F( 1, 541) = 7875 Prob > F = 00000

4 * Check: Robust F-test(1, n-k-1) value equals squared robust t-ratio (887)^2 = 7875 * Route 2: Test for heteroskedasticty * See Handout, Week 10: Heteroskedasticity and Autocorrelation * Artificial regression gen pressq=pres^2 reg pressq lotsize bedrooms bathrms airco F( 4, 541) = 1054 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = pressq Coef Std Err t P> t [95% Conf Interval] lotsize bedrooms bathrms airco _cons * Based on the F-stat used for testing regression significance, there is strong evidence in favor of heteroscedasticity See below for an asymptotic version based on nr-sq predict pvar (option xb assumed; fitted values) sum pvar Variable Obs Mean Std Dev Min Max pvar * STATA features tests which do not require the artificial regression; * they can be implemented after running the regression of interest * These are described in detail in the STATA manual * See: regress postestimation qui reg price1000 lotsize1000 bedrooms bathrms airco

5 * Breusch-Pagan and related tests in STATA: * Approach A Heteroskedasticity is a function of the original * explanatory variables: * Use subcommand rhs (= use right-hand-side variables in the test) estat hettest, rhs fstat Variables: lotsize1000 bedrooms bathrms airco F(4, 541) = 1054 Prob > F = * Asymptotic (Chi-square) version of the F-test estat hettest, rhs Variables: lotsize1000 bedrooms bathrms airco chi2(4) = 8633 Prob > chi2 = * LM test (also asymptotic, Chi-square based) version (see Verbeek 442) estat hettest, rhs iid Variables: lotsize1000 bedrooms bathrms airco chi2(4) = 3949 Prob > chi2 = * From the artificial regression reported above: LM-stat = nrsq di 546* * Approach B Heteroskedasticity is a function of the fitted values: * Same command as above, exclude subcommand rhs estat hettest, fstat Variables: fitted values of price1000 estat hettest F(1, 544) = 3511 Prob > F = 00000

6 Variables: fitted values of price1000 chi2(1) = 7237 Prob > chi2 = estat hettest, iid Variables: fitted values of price1000 chi2(1) = 3310 Prob > chi2 = * White test: Asymptotic test based on more general h(x) specification * See Verbeek 443 estat imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(13) = 4666 Prob > chi2 = Cameron & Trivedi's decomposition of IM-test Source chi2 df p Heteroskedasticity Skewness Kurtosis Total * This test also gives evidence on normality of the residuals hist pres (bin=23, start= , width= ) kdensity pres Density 0 10e-05 20e-05 30e Residuals Density e Kernel density estimate Residuals kernel = epanechnikov, bandwidth = 39e+03

7 * Route 3: Fix heteroskedasticity - FGLS * See Handout, Week 10: Heteroskedasticity and Autocorrelation gen logpressq=log(pressq) reg logpressq lotsize bedrooms bathrms airco F( 4, 541) = 1161 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logpressq Coef Std Err t P> t [95% Conf Interval] lotsize bedrooms bathrms airco _cons predict logpressq_hat, xb gen pvarest=exp(logpressq_hat) * Remark: Veerbeek (p104) calls the variance estimates h-hat * STATA allows you to run weighted regressions where each observation has * a weight attached to it We want to attach weights which equal inverses * of the variance estimates For more info, type help weights gen inv_pvarest=1/pvarest reg price lotsize bedrooms bathrms airco [aweight=inv_pvarest] (sum of wgt is 85774e+00) F( 4, 541) = Model 13123e e+10 Prob > F = Residual 12174e R-squared = Adj R-squared = Total 25297e Root MSE = price Coef Std Err t P> t [95% Conf Interval] lotsize bedrooms bathrms airco _cons

8 * From STATA help on weights : aweights, or analytic weights, are weights that are inversely proportional to the variance of an observation; ie, the variance of the jth observation is assumed to be sigma^2/w_j, where w_j are the weights Typically, the observations represent averages and the weights are the number of elements that gave rise to the average For most Stata commands, the recorded scale of aweights is irrelevant; Stata internally rescales them to sum to N, the number of observations in your data, when it uses them * Check: Feasible GLS is equivalent to OLS on suitably transformed data gen Tprice=price/sqrt(pvarest) gen Tlotsize=lotsize/sqrt(pvarest) gen Tbdrms=bedrooms/sqrt(pvarest) gen Tbathrms=bathrms/sqrt(pvarest) gen Tairco=airco/sqrt(pvarest) gen T_const=1/sqrt(pvarest) reg Tprice Tlotsize Tbdrms Tbathrms Tairco T_const, noconst F( 5, 541) = Model 31900e e+09 Prob > F = Residual 19124e R-squared = Adj R-squared = Total 33813e Root MSE = Tprice Coef Std Err t P> t [95% Conf Interval] Tlotsize Tbdrms Tbathrms Tairco T_const log close name: <unnamed> log: g:\econ511\heter_housinglog log type: text closed on: 26 Nov 2016, 13:12:27 ---

9 TASK 2: -- name: <unnamed> log: g:\econ511\heter_housing2log log type: text opened on: 1 Dec 2016, 18:08:54 * This exercise uses housing price data discussed in Verbeek Ch 3 * It is motivated by example 84 in Wooldridge, Introductory Econometrics, 4th > Ed, 2009 use "C:\Documents and Settings\itunali1\My Documents\COURSES\Econ 511\Verbeek\D > ata\chapter 3\housingdta", clear gen lprice=log(price) gen llotsize=log(lotsize) sum lprice llotsize bedrooms bathrms airco Variable Obs Mean Std Dev Min Max lprice llotsize bedrooms bathrms airco reg lprice llotsize bedrooms bathrms airco F( 4, 541) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lprice Coef Std Err t P> t [95% Conf Interval] llotsize bedrooms bathrms airco _cons test airco ( 1) airco = 0 F( 1, 541) = 7963 Prob > F = predict lpres, resid predict lphat, xb

10 scatter lpres lphat Residuals * Route 1: Eicker-Huber-White heterosckedasticity-consistent inference reg lprice llotsize bedrooms bathrms airco, robust Linear regression Number of obs = 546 F( 4, 541) = Prob > F = R-squared = Root MSE = Robust lprice Coef Std Err t P> t [95% Conf Interval] llotsize bedrooms bathrms airco _cons test airco Linear prediction ( 1) airco = 0 F( 1, 541) = 8785 Prob > F = 00000

11 * Route 2: Test for heteroskedasticty * Artificial regression gen lpressq=lpres^2 reg lpressq llotsize bedrooms bathrms airco F( 4, 541) = 295 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = lpressq Coef Std Err t P> t [95% Conf Interval] llotsize bedrooms bathrms airco _cons predict lpvar (option xb assumed; fitted values) sum lpvar Variable Obs Mean Std Dev Min Max lpvar * STATA features tests which do not require the artifical regression; * they can be implemented after running the regression of interest * These are described in detail in the STATA manual (see: regress postestimatio > n) qui reg lprice llotsize bedrooms bathrms airco * Breusch-Pagan and related tests in STATA: * Approach A Heteroskedasticity is a function of the original explanatory vari > ables: * Use subcommand rhs (= use right-hand-side variables in the test) estat hettest, rhs fstat Variables: llotsize bedrooms bathrms airco F(4, 541) = 295 Prob > F = 00199

12 * Asymptotic (Chi-square) version of the F-test estat hettest, rhs Variables: llotsize bedrooms bathrms airco chi2(4) = 1340 Prob > chi2 = * LM test (also asymptotic, Chi-square based) version (see Verbeek 442) estat hettest, rhs iid Variables: llotsize bedrooms bathrms airco chi2(4) = 1164 Prob > chi2 = * Approach B Heteroskedasticity is a function of the fitted values: * Same command as above, exclude subcommand rhs estat hettest, fstat Variables: fitted values of lprice estat hettest F(1, 544) = 087 Prob > F = Variables: fitted values of lprice chi2(1) = 100 Prob > chi2 = estat hettest, iid Variables: fitted values of lprice chi2(1) = 087 Prob > chi2 = 03513

13 * White test: Asymptotic test based on more general h(x) specification (see Ve > rbeek 443) estat imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(13) = 1662 Prob > chi2 = Cameron & Trivedi's decomposition of IM-test Source chi2 df p Heteroskedasticity Skewness Kurtosis Total * Route 3: Fix heteroskedasticity - FGLS gen loglpressq=log(lpressq) reg loglpressq llotsize bedrooms bathrms airco F( 4, 541) = 472 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = loglpressq Coef Std Err t P> t [95% Conf Interval] llotsize bedrooms bathrms airco _cons predict loglpressq_hat, xb gen lpvarest=exp(loglpressq_hat) * STATA allows you to run weighted regressions where each observation has * a weight attached to it We want to attach weights which equal inverses * of the variance estimates For more info, type help weights gen inv_lpvarest=1/lpvarest

14 reg price lotsize bedrooms bathrms airco [aweight=inv_lpvarest] (sum of wgt is 36086e+04) F( 4, 541) = Model 28560e e+10 Prob > F = Residual 16461e R-squared = Adj R-squared = Total 45021e Root MSE = price Coef Std Err t P> t [95% Conf Interval] lotsize bedrooms bathrms airco _cons * Check: Feasible GLS is equivalent to OLS on suitably transformed data gen Tlprice=lprice/sqrt(lpvarest) gen Tllotsize=llotsize/sqrt(lpvarest) gen Tbdrms=bedrooms/sqrt(lpvarest) gen Tbathrms=bathrms/sqrt(lpvarest) gen Tairco=airco/sqrt(lpvarest) gen T_const=1/sqrt(lpvarest) reg Tlprice Tllotsize Tbdrms Tbathrms Tairco T_const, noconst F( 5, 541) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = Tlprice Coef Std Err t P> t [95% Conf Interval] Tllotsize Tbdrms Tbathrms Tairco T_const end of do-file log close name: <unnamed> log: g:\econ511\heter_housing2log log type: text closed on: 1 Dec 2014, 18:12:

15 STATA Exercise 1 use "K:\My Documents\COURSES\Econ 312\Wooldridge 4th Intl\Stata Data\GPA1DTA" regress colgpa skipped ACT hsgpa Source SS df MS Number of obs = F( 3, 137) = 1392 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = colgpa Coef Std Err t P> t [95% Conf Interval] skipped ACT hsgpa _cons predict rescolgpa, resid sum soph junior senior senior5 Variable Obs Mean Std Dev Min Max soph junior senior senior generate year=2*soph+3*junior+4*senior+5*senior5 tab year year Freq Percent Cum Total

16 regress colgpa skipped ACT hsgpa, cluster(year) Linear regression Number of obs = 141 F( 2, 3) = Prob > F = R-squared = Root MSE = (Std Err adjusted for 4 clusters in year) Robust colgpa Coef Std Err t P> t [95% Conf Interval] skipped ACT hsgpa _cons gen rescolgpasq=rescolgpa^2 reg rescolgpasq soph junior senior senior5 note: senior5 omitted because of collinearity Source SS df MS Number of obs = F( 3, 137) = 121 Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = rescolgpasq Coef Std Err t P> t [95% Conf Interval] soph junior senior senior5 0 (omitted) _cons display "NRsq= " _N*e(r2) NRsq= di 1-chi2(3,_N*e(r2))