Semester 2, 2015/2016

Size: px
Start display at page:

Download "Semester 2, 2015/2016"

Transcription

1 ECN 3202 APPLIED ECONOMETRICS 3. MULTIPLE REGRESSION B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016

2 MODEL SPECIFICATION What happens if we omit a relevant variable? If we omit a relevant variable from a multiple regression model then the Gauss-Markov theorem (and other important properties) may not hold, and OLS estimates of regression model can (and likely to) be biased and inconsistent and so would be the predictions and confidence intervals, tests, etc. An exception is when omitted variable does not correlate with any other included explanatory variable Maybe we should include any variable we have, even if irrelevant? If we include an irrelevant variable in a multiple regression model then OLS is still unbiased and consistent, but inefficient. Variances (and se) of estimates could be so large that many of estimates would appear as statistically insignificant, while being significant when irrelevant variables are not in the model 2

3 3

4 4

5 Example family income: Omitting a relevant variable The estimated regression model with two explanatory variables: while, with one explanatory variable: while with three explanatory variables: 5

6 Omission of a relevant variable leads to omitted variable bias. bias increases with increase in correlation b/w included and omitted variables no (or small) bias only if such correlation is zero (or close to zero)! 6

7 S Effect of inclusion of irrelevant variables: Complicates the model unnecessarily OLS estimates remain unbiased and consistent, but values of the estimates may change! Variances of OLS estimates may increase, sometimes by a lot some coefficients may turn insignificant or have wide conf. intervals 7

8 A Practical Approach We should choose a functional form that is consistent with what economic theory tells us about the relationship between the variables; compatible with assumptions MR1 to MR5; and flexible enough to fit the data In a multiple regression context, this mainly involves hypothesis testing performing residual analysis assessing forecasting performance comparing information criteria and using the principle of parsimony. 8

9 Hypothesis Testing The usual t- and F-tests are available for testing simple and joint hypotheses concerning the coefficients. As usual, failure to reject a null hypothesis can occur because the data are not sufficiently rich to disprove the hypothesis. e.g., low variation in xk, small sample size, large noise, etc. If a variable has an insignificant coefficient, it can either be (a)discarded because it is irrelevant, or (b)retained in the model if there are theoretical reasons for keeping it in the model The adequacy of a model can also be tested using a general specification test known as RESET (or Ramsey test) 9

10 Hypothesis Testing cont. The Regression Error Specification Test (RESET) is designed to detect omitted variables and incorrect functional form. Suppose we have estimated the regression model: To conduct the RESET test we estimate the two artificial models: where the values are predictions from model (1). We then test for mis-specification by testing H0: γ1 = 0 in model (2) (using a t-test), or by testing H0: γ1 = γ2 = 0 in model (3) (using an F-test). Rejecting H0 implies the model is inadequate and can be improved. 10

11 Example family income cont. Earlier we estimated the model: Applying the RESET test to this equation: The null hypothesis of no misspecification is rejected at 5% level perhaps because we omitted KL6 perhaps need try quadratic terms and interaction terms or try other functional forms, etc. In any case, the test suggests one must try to improve the model 11

12 Information Criteria Adjusted R2 can be used to choose between competing nonnested models that have the same dependent variable Other criteria that can be used include: Akaike Information Criterion Schwarz Criterion (BIC) Both of these criteria: measure the unexplained variation (SSE) have penalty for inclusion of additional variables The idea of use: From a set of adequate and parsimonious models, it is advised to prefer a model that minimizes the unexplained variation at minimal inclusion of extra variables 12

13

14 USING NON-SAMPLE INFORMATION In many estimation problems, economic theory and experience provide us with information on the model (its parameters, etc.) that is beyond the information contained in the sample data. If this non-sample information is correct, and if we can combine it with the sample information, then we can estimate the parameters with greater precision Some non-sample information can be written in the form of linear equality restrictions on the unknown parameters, e.g., constant returns to scale in Cobb-Douglass production function, specified as log-log model would imply the slopes sum to 1, etc. Homogeneity of demand, supply, cost, and revenue functions imply restriction on certain coefficients of the models Sometimes, it is possible to incorporate such out-of-sample 14 information into the estimation process by simply substituting the restrictions into the model.

15 S Economic theory tells us that demand functions are homogeneous of degree zero in prices and incomes, i.e., if all prices and incomes double, quantity demanded must be unchanged. that is, consumers do not suffer from money illusion But, how can we incorporate this knowledge into our model? 15

16 Example beer demand cont. If we use a log-log functional form: The property of homogeneity of degree zero of prices and incomes will be satisfied if Substituting this constraint (2) into model (1) and re-arranging the terms yields the restricted model. For example, solving (2) as, substituting to (1) and rearranging it, yields the following restricted model: 16

17 17

18 The Restricted Least Squares Estimator The least squares estimates we obtain after imposing the restrictions are known as restricted least squares (RLS) estimates. The RLS estimator is More efficient, i.e., it has a smaller variance than unrestricted OLS estimator, whether or not the restrictions are true. Unbiased and Consistent only if the restrictions are exactly true biased and inconsistent if these restriction are not true! So, how can we test whether the restrictions are true or not? Use F-test! 18

19 See tutorial for details 19

20 MULTICOLLINEARITY When an exact linear relationship exists among the explanatory variables, then perfect multicollinearity exists. In this case we cannot estimate the parameters using OLS. When a near-exact linear relationship exists among the explanatory variables, then: OLS standard errors may be too large: t-tests about coefficients may lead to conclusion that they are not significant an F-test concerning these same coefficients may lead to the conclusion that they are significant R2 could also be very high accurate forecasts may still be possible. estimates may be sensitive to the addition or deletion of a few observations, or deletion of an apparently insignificant variable. 20

21 Example fuel consumption Using a sample of size N = 392 on miles/gallon (MPG), number of cylinders (CYL), engine capacity (ENG) and vehicle weight (WGT): Note: F-test of the joint null hypothesis H0:β2=β3=0 (i.e., coeff. of CYL and ENG are both zero) has a p-value of So, we reject H0 at %, concluding that at least one of these two coefficients (β2,β3) is significantly different from zero! Yet, the t-test suggests that: CYL variable is not significantly different from zero! ENG variable is not significantly different from zero! Isn t this strange? 21

22 Example fuel consumption cont. After deleting the apparently insignificant variable ENG from the model, and re-estimating the restricted model, we get So, the variable CYL became quite significant! while other coefficients did not change by much! Results of this type are symptoms of multicollinearity 22

23 Detecting Multicollinearity We can detect multicollinearity by computing sample correlation coefficients between variables. A common rule of thumb is that multicollinearity is a problem if the sample correlation between any pair of variables is greater than 0.9 (or even 0.8). estimating auxiliary regressions: Regress each explanatory variable on all the other explanatory variables. Multicollinearity is usually considered a problem if the R² from an auxiliary regression is greater than about

24 Example fuel consumption cont. Correlation matrix: 24

25 Mitigating the Effects of Multicollinearity In practice, the multicollinearity problem often occurs because data do not contain enough information about the effects of the individual explanatory variables. We can include more information into the estimation process by obtaining more, and better, data not always possible in non-experimental contexts introduce non-sample information into the estimation process in the form of restrictions on the parameters, if any Combine highly correlated variables into one index, if possible and adequate 25

26 NONLINEAR RELATIONSHIPS Relationships between economic variables cannot always be adequately represented by straight lines. We saw in Lecture 3 that we can add more flexibility to a regression model by considering logarithmic, reciprocal, polynomial (quadratic, cubic, etc.) and other nonlinear-in-the-variables functional forms For multiple regression models: We can also use these and other types of functional forms allows estimating various curvatures (U-shape, sinusoidshape, etc.) allows estimating interaction terms (see Lecture 4 for details). Remember: when using these types of models, some changes in model interpretation are required!... for slopes 26 for elasticities, for predictions, etc.

27 27

28 Example hamburgers cont. Consider Note: Now, a test of the significance of advertising in this model is a test of the joint null hypothesis: H0:β3=β4=0 To test this H0 we can use: F-test Easy in EViews! EViews output suggests: We shall reject the hypothesis that there is no impact of Advertisement on Sales (even at 1% of significance) 28

29 Example hamburgers cont.: Search for Optimality For the model Economic theory says the firm should increase advertising expenditure to the point where an extra $1 of expenditure results in an extra $1 of sales i.e., marginal cost (MC) = marginal revenue (MR). Note, the marginal effect of a change in advertising expenditure on sales here is not just β3, as in the linear model, but is given by: So, advertising expenditure should be increased to such A0 where β3+2β4a0=1 which can be estimated via b3+2b4a 0=1=MC 29

30 Example hamburgers cont. The estimated model: The estimated optimal level of advertising is found by solving: i.e., and the solution to it is A0=2.014 implying that the estimated optimal level of advertising expenditure is about $2,014. Note on the units of measurement!... Remember, it is only a point-estimate! Confidence interval estimate is also very important! All estimates inside that interval would be estimated optimal levels of the advertisement! 30

31 Example hamburgers cont. We can also test the H0 that certain level of A, e.g., $1900, is the optimal level of advertising, for maximizing sales at a given price. For our model just above, such a test can be formalized as To test this H0 we can use: t-test a bit complicated F-test Easy in EViews! EViews output suggests: - We can t reject that $ is optimal (p-value=0.33) 31