Analysis of Cross- Sectional Data Exercise WS 2017/18

Size: px
Start display at page:

Download "Analysis of Cross- Sectional Data Exercise WS 2017/18"

Transcription

1 Analysis of Cross- Sectional Data Exercise WS 2017/18 Exercise VI: Multiple regression November 22 / 23, 2017 czymara@wiso.uni-koeln.de

2 Where we are So far Bi- & trivariate regression Partialling out / controlling third variables Today: comparing variables: standardizing models: (adjusted) R² How to write a research report

3 Comparing effect sizes: standardization (z-transformation) How to standardize variable: Subtract mean Divide by standard deviation Result: new variable with mean = 0 & s. d. = 1 Regression with standardized variables standardized coefficients Interpretation (x & y standardized): increasing x by one standard deviation changes y by standard deviations

4 Standardize & de-standardize coefficients Standardize coefficients: Unstandardized coefficient of independent variable X multiplied by the ratio of the standard deviations of X to Y: De-standardize coefficients: Standardized coefficient of X multiplied by the ratio of the st. dev. of Y to X: Since standard deviations of variables depend on the particular sample, standardizing is sample specific

5 Standardized coefficients in Stata I Standardized coefficients can be obtained w/o manual z-transformation Beta option: reg y x, beta Post-estimation command (i. e., after running reg y x): regress, beta Standardized coefficients

6 Standardized coefficients in Stata II

7 Comparing effect sizes Situation How to compare Comment 1 model, variables measured on same scale 1 model, variables measured on different scales 2 or more models, same sample, variables w/ different scales 2 or more models, different samples, variables w/ same scale 2 or more models, different samples, variables w/ different scales Unstandardized coefficients Standardized coefficients Standardized coefficients Unstandardized coefficients No elegant solution Easy Because standardizing is sample specific What s the point?

8 Comparing models: Adjusted R² Adjusted R² penalizes the inclusion of more independent variables (because more isn t always better ) a= b= = c = d But: can t be interpreted as amount of explained variance

9 Comparing model fits Situation same outcome variable, same number of explanatory variables same outcome variable, different number of explanatory variables How to compare R² (explained variance) Adjusted R² (or, cautiously, R²) different outcome variables Not directly comparable with either measure

10 Exercise VI: Multiple regression (p. 79 ff.) Writing a (short) research report

11 Exercise VI, 1 b): introduction Income differences are of interest to all social science disciplines. The social reproduction of inequality is a major theme of theoretical sociology. In this analysis, we are going to ask whether the actual wage of a respondent depends on his social background. Therefore, we will analyze the effect of a father s educational level on the actual hourly wage of the respondents. We are going to use a sample of 935 male employees which was collected in the US in Due to some missing values, the number of observations in our final model is 741.

12 Exercise VI, 2 e): research design We will use OLS regression to analyze whether or not the education of the respondent s father has an influence on the actual wage. We already know that children of more highly educated fathers tend to have higher educational achievements and hence, higher wages. Now we are going to test whether there is an effect of the father s education on actual earnings, even if we control for the characteristics of the person himself (e. g., the person s education). Therefore, we started to build a model containing the human capital factors of an employee as explanatory variables. Our final model contains the education in years, labor force experience in years, the years worked for the current employer and the IQ of the respondents. The first three factors are acquired human capital factors. The IQ, in contrast, is a measurement of natural abilities. After estimating a model with these control variables, we will add an indicator of the respondent s social background as an additional explanatory variable.

13 Exercise VI, 2 e): hypothesis We expect the education of an employee s father to have additional explanatory power if we control for the characteristics of the employee. In statistical terms, our hypothesis with respect to β 5, the effect of father s education, is H 0 : β father s education 0 H A : β father s education > 0

14 Exercise VI, 3 d): results I Models M1 and M2 estimate the effects of human capital factors of the person himself. Model M3 adds the father s education (measured in years of education). It explains about 13% of the variance in hourly wages. All model estimates are shown in Table I.

15 Exercise VI, 3 d): results II Table I: Models M1-M3, OLS Regression Variable M1 M2 M3 Education 0.349*** 0.253*** 0.217*** Expercience 0.095*** 0.095*** 0.099*** Tenure 0.037* * IQ 0.027*** 0.024*** Fathers educ ** Constant ** ** If you re interested in exporting nice regression tables, check out the estout ado! n R² Adj. R² legend: * p<0.05; ** p<0.01; *** p<0.001; Source: wage2.dta

16 Exercise VI, 3 d): results III The control variables education, experience and tenure show the expected positive sign. The results support the classical human capital theory. However, a comparison between models M1 and M2 shows that the employee s natural abilities (IQ) must be taken into account. Otherwise, we will overestimate the effects of education and experience. The educational level of the employees has the largest effect on the actual income. Specifically, in comparison to the effect of the education of the respondent itself, the effect of the education of his father is relatively weak (see Model M3). Nevertheless, the effect is significant at a level of 1% and the explained variance increased by about 1 percentage point. The adjusted R2 increases as well. Hence, the social background of an employee (measured in years of father s education) has an effect on the actual hourly wage.

17 Exercise VI, 3 d): results IV Figure I plots the predicted hourly wages for two employees: One has a father with minimal education (0 years), one has a father with maximal education (18 years). The difference in hourly wages between two employees who share all characteristics (mean values for experience, tenure and IQ) but have different fathers could be up to $ 1.47.

18 Multiple choice questions

19

20

21

22 Recall: comparing effect sizes Situation How to compare 1 model, variables measured on same scale Unstandardized coefficients 1 model, variables measured on different scales 2 or more models, same sample, variables w/ different scales Standardized coefficients Standardized coefficients 2 or more models, different samples, variables w/ same scale 2 or more models, different samples, variables w/ different scales Unstandardized coefficients No elegant solution

23 De-standardize coefficients Formula: Germany: 0.25 * (960/4) = 60 France: 0.48 * (625/5) = 60

24

25