Eco311, Final Exam, Fall 2017 Prof. Bill Even. Your Name (Please print) Directions. Each question is worth 4 points unless indicated otherwise.

Size: px
Start display at page:

Download "Eco311, Final Exam, Fall 2017 Prof. Bill Even. Your Name (Please print) Directions. Each question is worth 4 points unless indicated otherwise."

Transcription

1 Your Name (Please print) Directions Each question is worth 4 points unless indicated otherwise. Place all answers in the space provided below or within each question. Round all numerical answers to the nearest 100 th (e.g. 1.23) unless told otherwise.

2 Using a sample of people aged 66 and above drawn from the Current Population Survey, I estimated a regression of annual social security income measured in 1000s of $ (SSINC) on a person s age, years of education, and a dummy variable indicating whether the person is female and followed this regression with several supplemental regressions.. reg ssinc female age school Source SS df MS Number of obs = 17,478 F(3, 17474) = Model Prob > F = Residual , R-squared = Adj R-squared = Total , Root MSE = ssinc Coef. Std. Err. t P> t [95% Conf. Interval] female age school _cons predict u, residual. predict y, xb. gen u2=u^2. gen y2=y^2. reg u2 female age school Source SS df MS Number of obs = 17,478 F(3, 17474) = 3.63 Model Prob > F = Residual , R-squared = Adj R-squared = Total , Root MSE = u2 Coef. Std. Err. t P> t [95% Conf. Interval] female age school _cons test female=age=school=0 ( 1) female - age = 0 ( 2) female - school = 0 ( 3) female = 0 F( 3, 17474) = 3.63 Prob > F = reg u2 y y2 Source SS df MS Number of obs = 17,478 F(2, 17475) = 3.78 Model Prob > F = Residual , R-squared = Adj R-squared = Total , Root MSE = u2 Coef. Std. Err. t P> t [95% Conf. Interval] y y _cons test y=y2=0 ( 1) y - y2 = 0 ( 2) y = 0 F( 2, 17475) = 3.78 Prob > F = reg y2 female age school Source SS df MS Number of obs = 17,478 F(3, 17474) > Model Prob > F = Residual , R-squared = Adj R-squared = Total , Root MSE = y2 Coef. Std. Err. t P> t [95% Conf. Interval] female age school _cons test female=age=school=0 ( 1) female - age = 0 ( 2) female - school = 0 ( 3) female = 0 F( 3, 17474) = 1.2e+06 Prob > F =

3 Use the regressions from the prior page to answer the following questions. 1. Holding education and age constant, women s average social security benefits are $3107 lower than men s. 2. The f-statistic for the Breusch-Pagan test of heteroscedasticity is 3.63 and implies that the null hypothesis of homoscedasticity (should, should not) be rejected at the.01 level of significance. 3. The LM statistic for the Breusch-Pagan test for heteroscedasticity is _10.49 and has _3_ degrees of freedom. 4. The f-statistic for the simple version of the White test for heteroscedasticity is 3.78 and implies that that null hypothesis of homoscedasticity (should, should not) be rejected at the.05 level of significance. 5. Based on the regressions presented, indicate whether each of the following is true (T) or false (F). The variance of the residual from the social security regression is a. Greater for women than men. F b. Greater for older than younger people. _F c. Greater for more educated people. T 6. If the null hypothesis of homoscedasticity is rejected, indicate whether each of the following is true (T) or false (F) a. The standard OLS estimates of the coefficients remain unbiased _T b. The standard OLS estimates of the standard errors for the coefficients are incorrect T c. The standard OLS estimates of the coefficients remain efficient F 7. Suppose you have data on expenditures per pupil by school district and you estimate the following regression eeeeeeeeeeeeeepp ii = ββ 0 + ββ 1 iiiiiiiiiiee ii + ββ 2 mmmmmmmmmmmmdd ii + uu ii Where the subscript i indexes the school district, expendpp is expenditures per pupil, income is average family income, and married is the percent of households with married couples. The residuals from this regression (u) are likely to be a. Heteroskedastic and their variance will be greater in districts with larger populations b. Homoskedastic and their variance will be greater districts with larger populations c. Heteroskedastic and their variance will be greater in districts with smaller populations. d. Homoskedastic and their variance will be greater districts with smaller populations

4 A recent study examined the effect of wind farms on property values. 1 The objective was to determine whether property values were damaged when a wind farm was built in view of that property. For the sake of simplicity, assume the windfarm was built in The study used a diff-in-diff methodology to determine whether there was an effect. The below table reflects hypothetical averages of property values that reflect the data in their study. Properties where new windfarm is visible Properties where new windfarm is not visible from property Price in 2009 $101,600 $100,000 Price in 2011 $89,200 $106, What is the diff-in diff estimate of the effect of windfarm visibility on property values? Be sure to indicate whether windfarm visbility increased or decreased property values. (89, ,600)-(106, ,000)= -$18, Suppose that windfarms are built in more rural areas and that the price of crops grown on rural properties was falling between 2009 and Would this cause the diff-in-diff estimate of the windfarm effect on property values to be biased upward or downward? WHY? This would cause land prices to fall more on the treated properties (i.e. where windfarms are visible) and would lead to a downward bias in the estimated effect of windfarms on property values. Suppose that the windfarm is built in 2010 and that the price of properties is observed in 2009 and Consider the following variable definitions: p it is the price of property in period t (t=2009, 2011) y2011 t is a dummy variable that equals one in 2011 and is zero in wf it is a dummy variable that equals one in period t if a windfarm is visible in period t on property i. d i is a dummy variable that equals one in both periods (2009 and 2011) if a windfarm is visible in 2011 and is zero otherwise. 10. Using the above variables, write out a regression equation that allows you to estimate the diff-in-diff effect of windfarm visibility on property values using the above variables. If you need to create any other variables for your regression, be clear about their definition. If some of the above variables are unnecessary, don t include them in your regression. pp iiii = ππ 0 + ππ 1 yy2011 tt + ππ 2 dd ii + ππ 3 (yy2011 tt dd ii ) + uu iiii 11. Which coefficient in the above regression equation provides the diff-in-diff estimate of the effect of windfarms on property values? ππ 3 1 Sunak, Y., & Madlener, R. (2016). The impact of wind farm visibility on property values: A spatial difference-indifferences analysis. Energy Economics, 55,

5 12. (6 points) Suppose that you estimate a regression model using only the 2011 data on property values. You estimate the regression pp ii = ββ 0 + ββ 1 WWFF ii + uu ii where p i is the price in 2011 of property i and WF i is a dummy variable that equals one if the windfarm is in view in 2011 for property i. Provide a story that would cause the estimate of ββ 1 to be biased upward and explain why it creates an upward bias. Suppose windfarms are built near cities to reduce the cost of the highwires necessary to deliver the electric to customers. Moreover, suppose the price of land is higher in cities than in rural areas. This would lead to an upward bias in the estimated effect of WF i because the city effect on land prices would be in the error term (u i) and it would be positively related to the visibility of a windfarm. 13. Suppose you estimate a linear probability model of whether a person works. Indicate whether each of the following statements would be true (T) or false (F). a. The OLS coefficient estimates are biased. _F b. The standard OLS estimates of the standard errors are correct. _F c. The errors in the regression will be heteroskedastic._t d. The variance of the residuals in the LPM model will be [pp ii (1 pp ii )] where p i is the predicted probability that person i works. T 14. If you estimate a linear probability model of whether a person works, what could make it inappropriate to use weighted least squares? Explain. The analytical weights should be 1/[p i(1-p i)] where p i is the predicted probability that a person works. If any of the predicted probabilities of working are outside of the unit interval, the weights would be negative and WLS would be infeasible.

6 Answer any 5 of the last 7 questions. Each question is worth 6 points. Write the word SKIPPED for the 2 questions you choose not to answer. If you answer more than 5 questions, I will grade the first 5. Suppose you work for the department of agriculture and have been charged with the task of determining the effect of irrigation on corn yields. You have cross-sectional data on 1000 different farms containing crop yield in bushels per acre (Y), and a dummy variable indicating whether the land is irrigated or not (IRR). Keep in mind that farmers decide whether to irrigate based upon the profitability of such an investment. If a plot of land has very poor soil, irrigation is not likely to be profitable. 15. Suppose you estimate a simple OLS regression model of YY ii = ββ 0 + ββ 1 IIIIRR ii + ee ii where i indexes the farm. Is the estimate of ββ 1 likely to be biased upward or downward? WHY? [Note: notice that land quality has not been controlled for in this regression.] Since land quality is not controlled for in the regression, the impact of land quality (LQ) is included in the error term and it s expected effect on yields is positive. This results in cov(irr i, u i)>0 and would lead to an upward bias in the estimated value of ββ (6 points) Suppose you have panel data for the same 1000 farms on corn yields and irrigation. YY iiii represents yields on farm i in period t and IIIIRR iiii is a dummy indicating whether farm i has irrigation in period t. A fixed effects model could be estimated by including a dummy variable for each of the 1000 farms. Alternatively, how can the data be transformed to eliminate the need to include 1000 dummy variables? For full credit, it is essential that your explanation be unambiguous in terms of how the data is transformed and the regression you would estimate. For each farm, calculate the farm-specific mean of yields (yy ii ) and the irrigation dummy (IIIIRR ii ). Compute deviations from farm specific means for yields as yy iiii = yy iiii yy ii and the irrigation dummy (IIIIRR iiii = IIIIRR iiii IIIIRR ii ). Then estimate the following regression: yy iiii = ββ 00 + ββ 11 IIIIRR iiii + ee iiii 17. If you estimated the model with fixed effects, would you expect the estimate of ββ 1 (i.e. the effect of irrigation) to increase or decrease relative to the model that did not include fixed effects? WHY? As noted in my answer to (15), the model without fixed effects is likely to generate an estimate of ββ 11 that is biased upward. By including farm specific fixed effects, I can difference-out the effect of land quality on yields and eliminate the endogeneity bias. As a result, the estimate of ββ 11 should be reduced when the model is estimated with farm specific fixed effects.

7 18. Suppose that, in addition to corn yields and irrigation, you have information on how much fertilizer is applied per acre (FFFFFFTT iiii ). Can this control variable be added to the fixed effects regression you described above? Why or why not? This variable can be included in the fixed effects regression if the amount of fertilizer used varies across time on some farms. If any given farm uses the same amount of fertilizer each year, the deviations from farm specific means will always equal zero, there will be no variation in the variable, and the control variable cannot be included in the fixed effects regression. 19. Suppose that you have only one year of data on yields and irrigation. However, you discover that some counties have laws that make it easier for farmers to receive permission to irrigate. You have a variable (Z i) that ranges from 0 to 10 across counties where 0 means there are very stringent restrictions on irrigation and 10 means there are no restrictions. Describe, in as few words as possible, the two necessary conditions for ZZ ii to be an appropriate instrumental variable for irrigation. cov(z i,e i)=0 and cov(z i,irr i) 0. That is, the instrument cannot be correlated with the unobserved factors that cause yields to vary across properties, but must be correlated with the likelihood that a farm has irrigation. 20. Suppose that counties with better land quality tend to be richer and have more restrictions on irrigation. Would this violate the necessary assumptions for Z i to be an appropriate instrument? If so, which condition discussed in the prior equation is violated? This would violate the assumption that cov(z i,u i)=0 since higher land quality is not controlled for in the regression and its effect on yields would be in the error term.

8

9 21. Describe the 2 steps of two-stage least squares that could be used to estimate yield as a function of irrigation using Z as an instrumental variable. You must describe precisely what is in each regression of the first two stages. Step 1. Estimate IIIIRR ii = ππ 0 + ππ 1 ZZ ii + vv ii with OLS and generate predicted values of IRR i. Step 2. Estimate yy ii = ββ 0 + ββ 1 IIIIRR ıı + ee ii where IIIIRR ıı is the predicted value of the IRR dummy from step 1.