İnsan Tunalı November 29, 2018 Econ 511: Econometrics I. ANSWERS TO ASSIGNMENT 10: Part II STATA Supplement

Similar documents
Transcription:

İnsan Tunalı November 29, 2018 Econ 511: Econometrics I STATA Exercise 1 ANSWERS TO ASSIGNMENT 10: Part II STATA Supplement TASK 1: --- name: <unnamed> log: g:\econ511\heter_housinglog log type: text opened on: 26 Nov 2016, 13:20:38 * This exercise uses housing price data discussed in Verbeek Ch 3 * Executed by do file housingdo use f:\courses\grads\econ511\share\housing, clear des Contains data from f:\courses\grads\econ511\share\housingdta obs: 546 vars: 12 20 Apr 2000 14:12 size: 26,208 --- storage display value variable name type format label variable label --- price float %90g lotsize float %90g bedrooms float %90g bathrms float %90g stories float %90g driveway float %90g recroom float %90g fullbase float %90g gashw float %90g airco float %90g garagepl float %90g prefarea float %90g --- Sorted by: sum price lotsize bedrooms bathrms airco Variable Obs Mean Std Dev Min Max -------------+-------------------------------------------------------- price 546 681216 2670267 25000 190000 lotsize 546 5150266 2168159 1650 16200 bedrooms 546 2965201 7373879 1 6 bathrms 546 1285714 5021579 1 4 airco 546 3168498 465675 0 1 * Rescale variables gen price1000=price/1000

gen lotsize1000=lotsize/1000 sum price1000 lotsize1000 bedrooms bathrms airco Variable Obs Mean Std Dev Min Max -------------+-------------------------------------------------------- price1000 546 681216 2670267 25 190 lotsize1000 546 5150266 2168159 165 162 bedrooms 546 2965201 7373879 1 6 bathrms 546 1285714 5021579 1 4 airco 546 3168498 465675 0 1 * Here is the version without scaling: reg price lotsize bedrooms bathrms airco, robust Linear regression Number of obs = 546 F( 4, 541) = 12913 Prob > F = 00000 R-squared = 05603 Root MSE = 17772 Robust price Coef Std Err t P> t [95% Conf Interval] lotsize 4763104 4486251 1062 0000 3881843 5644365 bedrooms 4914639 1143381 430 0000 2668629 716065 bathrms 1800821 2075693 868 0000 139308 2208561 airco 1623769 1829726 887 0000 1264345 1983193 _cons 7191346 3493035 021 0837-6142439 7580708 reg price1000 lotsize1000 bedrooms bathrms airco -------------+------------------------------ F( 4, 541) = 17236 Model 217739877 4 544349694 Prob > F = 00000 Residual 170862909 541 315827927 R-squared = 05603 -------------+------------------------------ Adj R-squared = 05571 Total 388602786 545 713032635 Root MSE = 17772 price1000 Coef Std Err t P> t [95% Conf Interval] lotsize1000 4763104 3656241 1303 0000 4044887 5481321 bedrooms 4914639 1121373 438 0000 2711861 7117417 bathrms 1800821 1663046 1083 0000 1474139 2127502 airco 1623769 1701431 954 0000 1289547 1957991 _cons 7191346 3515165 020 0838-6185911 762418 * Which set of results are easier to interpret, those on the original data, or the rescaled data? test airco ( 1) airco = 0 F( 1, 541) = 9108 Prob > F = 00000

predict pres, resid predict phat, xb scatter pres phat Residuals -50 0 50 100 50 100 150 Linear prediction * Route 1: Eicker-Huber-White heterosckedasticity-consistent inference * See Handout, Week 10: Heteroskedasticity and Autocorrelation reg price lotsize bedrooms bathrms airco, robust Linear regression Number of obs = 546 F( 4, 541) = 12913 Prob > F = 00000 R-squared = 05603 Root MSE = 17772 Robust price Coef Std Err t P> t [95% Conf Interval] lotsize 4763104 4486251 1062 0000 3881843 5644365 bedrooms 4914639 1143381 430 0000 2668629 716065 bathrms 1800821 2075693 868 0000 139308 2208561 airco 1623769 1829726 887 0000 1264345 1983193 _cons 7191346 3493035 021 0837-6142439 7580708 * Inspection reveals that some standard errors go up, some down Thus direction of bias in t-tests cannot be known in advance test airco ( 1) airco = 0 F( 1, 541) = 7875 Prob > F = 00000

* Check: Robust F-test(1, n-k-1) value equals squared robust t-ratio (887)^2 = 7875 * Route 2: Test for heteroskedasticty * See Handout, Week 10: Heteroskedasticity and Autocorrelation * Artificial regression gen pressq=pres^2 reg pressq lotsize bedrooms bathrms airco -------------+------------------------------ F( 4, 541) = 1054 Model 169086287 4 422715717 Prob > F = 00000 Residual 216885183 541 400896826 R-squared = 00723 -------------+------------------------------ Adj R-squared = 00655 Total 233793811 545 42897947 Root MSE = 63316 pressq Coef Std Err t P> t [95% Conf Interval] lotsize 0496627 0130264 381 0000 0240741 0752514 bedrooms 1200813 3995223 301 0003 416008 1985618 bathrms 9129972 5925094 154 0124-2509036 2076898 airco 3260058 6061854 054 0591-8647597 1516771 _cons -4266207 1252382-341 0001-6726334 -180608 * Based on the F-stat used for testing regression significance, there is strong evidence in favor of heteroscedasticity See below for an asymptotic version based on nr-sq predict pvar (option xb assumed; fitted values) sum pvar Variable Obs Mean Std Dev Min Max -------------+-------------------------------------------------------- pvar 546 3129357 1761392-439032 1252221 * STATA features tests which do not require the artificial regression; * they can be implemented after running the regression of interest * These are described in detail in the STATA manual * See: regress postestimation qui reg price1000 lotsize1000 bedrooms bathrms airco

* Breusch-Pagan and related tests in STATA: * Approach A Heteroskedasticity is a function of the original * explanatory variables: * Use subcommand rhs (= use right-hand-side variables in the test) estat hettest, rhs fstat Variables: lotsize1000 bedrooms bathrms airco F(4, 541) = 1054 Prob > F = 00000 * Asymptotic (Chi-square) version of the F-test estat hettest, rhs Variables: lotsize1000 bedrooms bathrms airco chi2(4) = 8633 Prob > chi2 = 00000 * LM test (also asymptotic, Chi-square based) version (see Verbeek 442) estat hettest, rhs iid Variables: lotsize1000 bedrooms bathrms airco chi2(4) = 3949 Prob > chi2 = 00000 * From the artificial regression reported above: LM-stat = nrsq di 546*00723 394758 * Approach B Heteroskedasticity is a function of the fitted values: * Same command as above, exclude subcommand rhs estat hettest, fstat Variables: fitted values of price1000 estat hettest F(1, 544) = 3511 Prob > F = 00000

Variables: fitted values of price1000 chi2(1) = 7237 Prob > chi2 = 00000 estat hettest, iid Variables: fitted values of price1000 chi2(1) = 3310 Prob > chi2 = 00000 * White test: Asymptotic test based on more general h(x) specification * See Verbeek 443 estat imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(13) = 4666 Prob > chi2 = 00000 Cameron & Trivedi's decomposition of IM-test --------------------------------------------------- Source chi2 df p ---------------------+----------------------------- Heteroskedasticity 4666 13 00000 Skewness 1979 4 00005 Kurtosis 643 1 00112 ---------------------+----------------------------- Total 7288 18 00000 --------------------------------------------------- * This test also gives evidence on normality of the residuals hist pres (bin=23, start=-48777734, width=56871179) kdensity pres Density 0 10e-05 20e-05 30e-05-50000 0 50000 100000 Residuals Density 0 5000e-0600001 000015 00002 000025 Kernel density estimate -50000 0 50000 100000 Residuals kernel = epanechnikov, bandwidth = 39e+03

* Route 3: Fix heteroskedasticity - FGLS * See Handout, Week 10: Heteroskedasticity and Autocorrelation gen logpressq=log(pressq) reg logpressq lotsize bedrooms bathrms airco -------------+------------------------------ F( 4, 541) = 1161 Model 204590031 4 511475077 Prob > F = 00000 Residual 238273517 541 440431639 R-squared = 00791 -------------+------------------------------ Adj R-squared = 00723 Total 25873252 545 474738569 Root MSE = 20986 logpressq Coef Std Err t P> t [95% Conf Interval] lotsize 000173 0000432 401 0000 0000882 0002579 bedrooms 4012885 1324232 303 0003 1411619 6614151 bathrms 3449943 1963895 176 0080-040785 7307736 airco 1126302 2009224 056 0575-2820535 5073139 _cons 175605 4151067 423 0000 940632 2571469 predict logpressq_hat, xb gen pvarest=exp(logpressq_hat) * Remark: Veerbeek (p104) calls the variance estimates h-hat * STATA allows you to run weighted regressions where each observation has * a weight attached to it We want to attach weights which equal inverses * of the variance estimates For more info, type help weights gen inv_pvarest=1/pvarest reg price lotsize bedrooms bathrms airco [aweight=inv_pvarest] (sum of wgt is 85774e+00) -------------+------------------------------ F( 4, 541) = 14580 Model 13123e+11 4 32807e+10 Prob > F = 00000 Residual 12174e+11 541 225021472 R-squared = 05188 -------------+------------------------------ Adj R-squared = 05152 Total 25297e+11 545 464157334 Root MSE = 15001 price Coef Std Err t P> t [95% Conf Interval] lotsize 5233899 3985344 1313 0000 4451035 6016764 bedrooms 6273254 100359 625 0000 4301842 8244665 bathrms 1573508 1818535 865 0000 1216282 1930733 airco 1248716 162213 770 0000 9300713 156736 _cons -1605931 3206194-050 0617-7904047 4692184

* From STATA help on weights : aweights, or analytic weights, are weights that are inversely proportional to the variance of an observation; ie, the variance of the jth observation is assumed to be sigma^2/w_j, where w_j are the weights Typically, the observations represent averages and the weights are the number of elements that gave rise to the average For most Stata commands, the recorded scale of aweights is irrelevant; Stata internally rescales them to sum to N, the number of observations in your data, when it uses them * Check: Feasible GLS is equivalent to OLS on suitably transformed data gen Tprice=price/sqrt(pvarest) gen Tlotsize=lotsize/sqrt(pvarest) gen Tbdrms=bedrooms/sqrt(pvarest) gen Tbathrms=bathrms/sqrt(pvarest) gen Tairco=airco/sqrt(pvarest) gen T_const=1/sqrt(pvarest) reg Tprice Tlotsize Tbdrms Tbathrms Tairco T_const, noconst -------------+------------------------------ F( 5, 541) = 180485 Model 31900e+10 5 63801e+09 Prob > F = 00000 Residual 19124e+09 541 35349601 R-squared = 09434 -------------+------------------------------ Adj R-squared = 09429 Total 33813e+10 546 619283221 Root MSE = 18801 Tprice Coef Std Err t P> t [95% Conf Interval] Tlotsize 5233899 3985344 1313 0000 4451035 6016764 Tbdrms 6273254 100359 625 0000 4301842 8244665 Tbathrms 1573508 1818535 865 0000 1216282 1930733 Tairco 1248716 162213 770 0000 9300714 156736 T_const -1605932 3206194-050 0617-7904047 4692184 log close name: <unnamed> log: g:\econ511\heter_housinglog log type: text closed on: 26 Nov 2016, 13:12:27 ---

TASK 2: -- name: <unnamed> log: g:\econ511\heter_housing2log log type: text opened on: 1 Dec 2016, 18:08:54 * This exercise uses housing price data discussed in Verbeek Ch 3 * It is motivated by example 84 in Wooldridge, Introductory Econometrics, 4th > Ed, 2009 use "C:\Documents and Settings\itunali1\My Documents\COURSES\Econ 511\Verbeek\D > ata\chapter 3\housingdta", clear gen lprice=log(price) gen llotsize=log(lotsize) sum lprice llotsize bedrooms bathrms airco Variable Obs Mean Std Dev Min Max -------------+-------------------------------------------------------- lprice 546 1105896 3719849 1012663 1215478 llotsize 546 846663 3979238 7408531 9692766 bedrooms 546 2965201 7373879 1 6 bathrms 546 1285714 5021579 1 4 airco 546 3168498 465675 0 1 reg lprice llotsize bedrooms bathrms airco -------------+------------------------------ F( 4, 541) = 17741 Model 42790971 4 106977427 Prob > F = 00000 Residual 326221992 541 060299814 R-squared = 05674 -------------+------------------------------ Adj R-squared = 05642 Total 754131702 545 138372789 Root MSE = 24556 lprice Coef Std Err t P> t [95% Conf Interval] llotsize 4004218 0278122 1440 0000 3457886 455055 bedrooms 0776997 0154859 502 0000 0472798 1081195 bathrms 2158305 0229961 939 0000 1706578 2610031 airco 2116745 0237213 892 0000 1650775 2582716 _cons 7093777 231547 3064 0000 6638935 7548618 test airco ( 1) airco = 0 F( 1, 541) = 7963 Prob > F = 00000 predict lpres, resid predict lphat, xb

scatter lpres lphat Residuals -1-5 0 5 1 * Route 1: Eicker-Huber-White heterosckedasticity-consistent inference reg lprice llotsize bedrooms bathrms airco, robust Linear regression Number of obs = 546 F( 4, 541) = 20111 Prob > F = 00000 R-squared = 05674 Root MSE = 24556 Robust lprice Coef Std Err t P> t [95% Conf Interval] llotsize 4004218 0284978 1405 0000 3444418 4564018 bedrooms 0776997 0165802 469 0000 0451303 110269 bathrms 2158305 0240048 899 0000 1686764 2629845 airco 2116745 0225844 937 0000 1673107 2560384 _cons 7093777 2333334 3040 0000 6635426 7552127 test airco 105 11 115 12 Linear prediction ( 1) airco = 0 F( 1, 541) = 8785 Prob > F = 00000

* Route 2: Test for heteroskedasticty * Artificial regression gen lpressq=lpres^2 reg lpressq llotsize bedrooms bathrms airco -------------+------------------------------ F( 4, 541) = 295 Model 095642149 4 023910537 Prob > F = 00199 Residual 439060417 541 008115719 R-squared = 00213 -------------+------------------------------ Adj R-squared = 00141 Total 448624632 545 008231645 Root MSE = 09009 lpressq Coef Std Err t P> t [95% Conf Interval] llotsize 0016589 0102033 016 0871-018384 0217019 bedrooms 0146183 0056812 257 0010 0034583 0257783 bathrms -0117954 0084365-140 0163-0283676 0047769 airco -0199028 0087025-229 0023-0369976 -002808 _cons 0238275 0849464 028 0779-1430376 1906925 predict lpvar (option xb assumed; fitted values) sum lpvar Variable Obs Mean Std Dev Min Max -------------+-------------------------------------------------------- lpvar 546 0597476 0132473 0235617 1133264 * STATA features tests which do not require the artifical regression; * they can be implemented after running the regression of interest * These are described in detail in the STATA manual (see: regress postestimatio > n) qui reg lprice llotsize bedrooms bathrms airco * Breusch-Pagan and related tests in STATA: * Approach A Heteroskedasticity is a function of the original explanatory vari > ables: * Use subcommand rhs (= use right-hand-side variables in the test) estat hettest, rhs fstat Variables: llotsize bedrooms bathrms airco F(4, 541) = 295 Prob > F = 00199

* Asymptotic (Chi-square) version of the F-test estat hettest, rhs Variables: llotsize bedrooms bathrms airco chi2(4) = 1340 Prob > chi2 = 00095 * LM test (also asymptotic, Chi-square based) version (see Verbeek 442) estat hettest, rhs iid Variables: llotsize bedrooms bathrms airco chi2(4) = 1164 Prob > chi2 = 00202 * Approach B Heteroskedasticity is a function of the fitted values: * Same command as above, exclude subcommand rhs estat hettest, fstat Variables: fitted values of lprice estat hettest F(1, 544) = 087 Prob > F = 03522 Variables: fitted values of lprice chi2(1) = 100 Prob > chi2 = 03174 estat hettest, iid Variables: fitted values of lprice chi2(1) = 087 Prob > chi2 = 03513

* White test: Asymptotic test based on more general h(x) specification (see Ve > rbeek 443) estat imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted heteroskedasticity chi2(13) = 1662 Prob > chi2 = 02171 Cameron & Trivedi's decomposition of IM-test --------------------------------------------------- Source chi2 df p ---------------------+----------------------------- Heteroskedasticity 1662 13 02171 Skewness 342 4 04902 Kurtosis 168 1 01953 ---------------------+----------------------------- Total 2172 18 02446 --------------------------------------------------- * Route 3: Fix heteroskedasticity - FGLS gen loglpressq=log(lpressq) reg loglpressq llotsize bedrooms bathrms airco -------------+------------------------------ F( 4, 541) = 472 Model 86695341 4 216738353 Prob > F = 00009 Residual 248550836 541 459428533 R-squared = 00337 -------------+------------------------------ Adj R-squared = 00266 Total 25722037 545 471963982 Root MSE = 21434 loglpressq Coef Std Err t P> t [95% Conf Interval] llotsize 3319673 2427656 137 0172-1449113 8088459 bedrooms 4384695 1351722 324 0001 1729428 7039963 bathrms -6283065 2007271-313 0002-1022607 -2340065 airco -3534014 2070564-171 0088-7601344 0533316 _cons -7302059 2021111-361 0000-1127225 -3331871 predict loglpressq_hat, xb gen lpvarest=exp(loglpressq_hat) * STATA allows you to run weighted regressions where each observation has * a weight attached to it We want to attach weights which equal inverses * of the variance estimates For more info, type help weights gen inv_lpvarest=1/lpvarest

reg price lotsize bedrooms bathrms airco [aweight=inv_lpvarest] (sum of wgt is 36086e+04) -------------+------------------------------ F( 4, 541) = 23466 Model 28560e+11 4 71401e+10 Prob > F = 00000 Residual 16461e+11 541 304269785 R-squared = 06344 -------------+------------------------------ Adj R-squared = 06317 Total 45021e+11 545 826079337 Root MSE = 17443 price Coef Std Err t P> t [95% Conf Interval] lotsize 5457405 3911021 1395 0000 468914 622567 bedrooms 5792989 1193339 485 0000 3448843 8137135 bathrms 1819635 143059 1272 0000 1538616 2100654 airco 1578272 1634296 966 0000 1257238 1899306 _cons -5524985 3385861-163 0103-1217603 112606 * Check: Feasible GLS is equivalent to OLS on suitably transformed data gen Tlprice=lprice/sqrt(lpvarest) gen Tllotsize=llotsize/sqrt(lpvarest) gen Tbdrms=bedrooms/sqrt(lpvarest) gen Tbathrms=bathrms/sqrt(lpvarest) gen Tairco=airco/sqrt(lpvarest) gen T_const=1/sqrt(lpvarest) reg Tlprice Tllotsize Tbdrms Tbathrms Tairco T_const, noconst -------------+------------------------------ F( 5, 541) = Model 443182073 5 886364146 Prob > F = 00000 Residual 199782769 541 36928423 R-squared = 09995 -------------+------------------------------ Adj R-squared = 09995 Total 443381856 546 812054681 Root MSE = 19217 Tlprice Coef Std Err t P> t [95% Conf Interval] Tllotsize 42023 0280567 1498 0000 3651165 4753435 Tbdrms 096 0161453 595 0000 0642848 1277151 Tbathrms 206718 0194054 1065 0000 1685989 2448371 Tairco 20864 022385 932 0000 1646678 2526122 T_const 6886545 2310943 2980 0000 6432593 7340497 end of do-file log close name: <unnamed> log: g:\econ511\heter_housing2log log type: text closed on: 1 Dec 2014, 18:12:46 ----------------------------------------------------------------------------

STATA Exercise 1 use "K:\My Documents\COURSES\Econ 312\Wooldridge 4th Intl\Stata Data\GPA1DTA" regress colgpa skipped ACT hsgpa Source SS df MS Number of obs = 141 -------------+------------------------------ F( 3, 137) = 1392 Model 453313314 3 151104438 Prob > F = 00000 Residual 148729663 137 108561798 R-squared = 02336 -------------+------------------------------ Adj R-squared = 02168 Total 194060994 140 138614996 Root MSE = 32949 colgpa Coef Std Err t P> t [95% Conf Interval] skipped -0831131 0259985-320 0002-1345234 -0317028 ACT 0147202 0105649 139 0166-0061711 0356115 hsgpa 4118162 0936742 440 0000 2265819 5970505 _cons 1389554 3315535 419 0000 7339295 2045178 predict rescolgpa, resid sum soph junior senior senior5 Variable Obs Mean Std Dev Min Max -------------+-------------------------------------------------------- soph 141 0212766 1448194 0 1 junior 141 3829787 4878462 0 1 senior 141 5035461 5017699 0 1 senior5 141 0921986 2903375 0 1 generate year=2*soph+3*junior+4*senior+5*senior5 tab year year Freq Percent Cum ------------+----------------------------------- 2 3 213 213 3 54 3830 4043 4 71 5035 9078 5 13 922 10000 ------------+----------------------------------- Total 141 10000

regress colgpa skipped ACT hsgpa, cluster(year) Linear regression Number of obs = 141 F( 2, 3) = Prob > F = R-squared = 02336 Root MSE = 32949 (Std Err adjusted for 4 clusters in year) Robust colgpa Coef Std Err t P> t [95% Conf Interval] skipped -0831131 0193885-429 0023-1448159 -0214104 ACT 0147202 0048222 305 0055-000626 0300665 hsgpa 4118162 1128976 365 0036 0525256 7711068 _cons 1389554 2884966 482 0017 4714288 2307679 gen rescolgpasq=rescolgpa^2 reg rescolgpasq soph junior senior senior5 note: senior5 omitted because of collinearity Source SS df MS Number of obs = 141 -------------+------------------------------ F( 3, 137) = 121 Model 061927897 3 020642632 Prob > F = 03103 Residual 234666483 137 01712894 R-squared = 00257 -------------+------------------------------ Adj R-squared = 00044 Total 240859272 140 017204234 Root MSE = 13088 rescolgpasq Coef Std Err t P> t [95% Conf Interval] soph 0648746 0838287 077 0440-100891 2306401 junior 0105068 0404328 026 0795-0694463 09046 senior 0491597 0394824 125 0215-0289141 1272335 senior5 0 (omitted) _cons 0753237 0362989 208 0040 0035451 1471023 display "NRsq= " _N*e(r2) NRsq= 36252843 di 1-chi2(3,_N*e(r2)) 30487298