CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA

Similar documents
Transcription:

PLS 802 Spring 2018 Professor Jacoby CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA This handout shows the log from a Stata session that examines the Duncan Occupational Prestige data for influential observations. Recall that, previously, the data seemed to support the meritocracy theory but not the materialistic theory. Does an analysis of the influence statistics lead to a different conclusion about the determinants of occupational prestige? - (FIRST FEW LINES OMITTED FOR SPACE).. set more off. #delimit ; delimiter now ; Read data from text file, "occprest.txt". infile str21 occup income educ prestige > using "occprest.txt"; (15 observations read) Estimate multiple regression model. regress prestige income educ, beta; Source SS df MS Number of obs = 15 -------------+------------------------------ F( 2, 12) = 54.81 Model 13395.3858 2 6697.69289 Prob > F = 0.0000 Residual 1466.34754 12 122.195629 R-squared = 0.9013 -------------+------------------------------ Adj R-squared = 0.8849 Total 14861.7333 14 1061.55238 Root MSE = 11.054 prestige Coef. Std. Err. t P> t Beta income.6504486.4101684 1.59 0.139.2540748 educ 4.015703.8826033 4.55 0.001.728965 _cons -17.59156 6.816913-2.58 0.024. Use post-estimation commands to obtain added variable plots. First, get basic versions of plots, then enhance with graph options.

Page 2. avplot income, > name(avplot1, replace);. graph export avplot1.pdf, replace; (file avplot1.pdf written in PDF format). avplot educ, > name(avplot2, replace);. graph export avplot2.pdf, replace; (file avplot2.pdf written in PDF format). avplot income, > scheme(s1color) > msymbol(oh) > mcolor(black) > msize(*1.5) > xaxis (1 2) > yaxis (1 2) > ylabel(, axis(2) nolabel) > xlabel(, axis(2) nolabel) > ylabel(#4, axis(1) labsize(small)) > xlabel(#3, axis(1) labsize(small)) > ylabel(#4, axis(2) labsize(small)) > xlabel(#3, axis(2) labsize(small)) > xtitle("", axis(2)) > ytitle("", axis(2)) > mlabel(occup) > mlabsize(small) > aspectratio(1) > name(avplot3, replace). graph export avplot3.pdf, replace; (file avplot3.pdf written in PDF format). avplot educ, > scheme(s1color) > msymbol(oh) > mcolor(black) > msize(*1.5) > xaxis (1 2) > yaxis (1 2) > ylabel(, axis(2) nolabel) > xlabel(, axis(2) nolabel) > ylabel(#4, axis(1) labsize(small)) > xlabel(#3, axis(1) labsize(small)) > ylabel(#4, axis(2) labsize(small)) > xlabel(#3, axis(2) labsize(small)) > xtitle("", axis(2)) > ytitle("", axis(2)) > mlabel(occup) > mlabsize(small) > aspectratio(1) > name(avplot2, replace). graph export avplot4.pdf, replace; (file avplot4.pdf written in PDF format)

Page 3 Calculate and print influence statistics. predict hatvalue, leverage;. predict studresid, rstudent;. predict cookdist, cooksd;. predict dffits, dfits;. list occup hatvalue studresid cookdist dffits; +---------------------------------------------------------------------+ occup hatvalue studresid cookdist dffits --------------------------------------------------------------------- 1. Accountant.1673161.0500744.0001832.0224463 2. Author.2522005 -.7752217.0698841 -.4502009 3. Professor.2020227.6790627.0407439.3416762 4. Civil Engineer.1772063.3141363.0076597.1457848 5. Physician.2310396.2917997.0092315.159947 --------------------------------------------------------------------- 6. RR Conductor.8124188-1.787546 3.899598-3.720082 7. Store Manager.0772897.1948803.0011528.0564022 8. Mail Carrier.0729764-1.969525.0820919 -.5525955 9. Carpenter.1208088 1.257002.0690343.4659547 10. Machinist.0987655 2.945689.1933039.9751482 --------------------------------------------------------------------- 11. Gas Station Attendant.139751-1.228837.078437 -.4952903 12. Taxi Driver.1741431 -.2251263.0038683 -.1033777 13. Barber.1372752.1564827.0014137.0624203 14. Cook.1482712 -.1572121.0015611 -.0655939 15. Janitor.1885149 -.3584203.0107269 -.1727529 +---------------------------------------------------------------------+ Re-run regression, omitting the influential observation.. regress prestige income educ if > occup!= "RR Conductor", beta; Source SS df MS Number of obs = 14 -------------+------------------------------ F( 2, 11) = 66.00 Model 13636.0796 2 6818.03981 Prob > F = 0.0000 Residual 1136.27752 11 103.297956 R-squared = 0.9231 -------------+------------------------------ Adj R-squared = 0.9091 Total 14772.3571 13 1136.33516 Root MSE = 10.164 prestige Coef. Std. Err. t P> t Beta income 1.976385.8321258 2.38 0.037.7168149 educ 1.395073 1.675654 0.83 0.423.2512678 _cons -15.61611 6.364348-2.45 0.032.

Page 4 Create observation-specific dummy variable and include it in regression. generate rrcond = 0;. replace rrcond = 1 if occup == "RR Conductor"; (1 real change made). regress prestige income educ rrcond; Source SS df MS Number of obs = 15 -------------+------------------------------ F( 3, 11) = 44.29 Model 13725.4558 3 4575.15194 Prob > F = 0.0000 Residual 1136.27752 11 103.297956 R-squared = 0.9235 -------------+------------------------------ Adj R-squared = 0.9027 Total 14861.7333 14 1061.55238 Root MSE = 10.164 prestige Coef. Std. Err. t P> t [95% Conf. Interval] income 1.976385.8321258 2.38 0.037.1448888 3.807882 educ 1.395073 1.675654 0.83 0.423-2.293017 5.083163 rrcond -41.94772 23.46666-1.79 0.101-93.59748 9.702039 _cons -15.61611 6.364348-2.45 0.032-29.62395-1.608277 Create standardized variables and re-run regression.. egen sprestige = std(prestige);. egen sincome = std(income);. egen seduc = std(educ);. regress sprestige sincome seduc rrcond; Source SS df MS Number of obs = 15 -------------+------------------------------ F( 3, 11) = 44.29 Model 12.9296077 3 4.30986924 Prob > F = 0.0000 Residual 1.07039226 11.097308388 R-squared = 0.9235 -------------+------------------------------ Adj R-squared = 0.9027 Total 14 14 1 Root MSE =.31194 sprestige Coef. Std. Err. t P> t [95% Conf. Interval] sincome.772005.3250405 2.38 0.037.0565957 1.487414 seduc.2532456.3041792 0.83 0.423 -.4162483.9227396 rrcond -1.287472.7202454-1.79 0.101-2.872721.2977778 _cons.0858314.0937699 0.92 0.380 -.1205547.2922176. log close; log: l:\pls 802, spring 2018\influence\influence in stata\influence1.smcl -

Page 5 Figure 1: Added variable plot for income. Graphical display created with Stata defaults. -20-10 0 10 20 30-10 0 10 20 30 e( income X ) coef =.65044865, se =.41016836, t = 1.59 Figure 2: Added variable plot for education. Graphical display created with Stata defaults. -40-20 0 20-10 -5 0 5 e( educ X ) coef = 4.0157029, se =.8826033, t = 4.55

Page 6 Figure 3: Added variable plot for income. Display is enhanced with graph options. -20 0 20 40 Taxi CookDriver Janitor Author Carpenter Professor Civil Engineer Physician Store Manager Barber Accountant Machinist Gas Station Attendant Mail Carrier -10 0 10 20 e( income X ) coef =.65044865, se =.41016836, t = 1.59 RR Conductor Figure 4: Added variable plot for education. Display is enhanced with graph options. -40-20 0 20 Professor Machinist Physician Accountant Author CarpenterCivil Engineer Barber Cook Store Taxi Janitor Manager Driver Gas Station Attendant Mail Carrier RR Conductor -10-5 0 5 e( educ X ) coef = 4.0157029, se =.8826033, t = 4.55