Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Size: px
Start display at page:

Download "Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version"

Transcription

1 Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams) Consider the following three variables. Variable Codings Label in STATA case 1 = yes 0= no Case status agegp 1=25-34 Age, grouped 2= = = = =75+ tobgrp 1=0-9 g/day 2= = =30+ Tobacco use, grouped.sol_logistic part 2 of 2.docx Page 1 of 6

2 1. Understanding the Global Test of the Fitted Model In this exercise, you will see that the global chi Square test of the current model is equivalent to a LR Ratio test that compares the reduced model containing the intercept only versus the full model that contains the current model. In developing your answer, include as predictors in your full model indicator variables for agegp. Tip To fit the intercept only model explicitly: generate a variable called one (you can name it what you like, actually) that is equal to 1 for all subjects; then issue the logit or logistic command with the option noconstant. SOLUTION:. use "/Users/cbigelow/Desktop/1. Teaching/web640/datasets/illeetvilaine.dta". * Fit model of interest. Highlight in yellow the global chi square test. xi: logistic case i.agegp i.agegp _Iagegp_1-6 (naturally coded; _Iagegp_1 omitted) LR chi2(5) = This is global test Prob > chi2 = analogous to overall F Log likelihood = Pseudo R2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] _Iagegp_ _Iagegp_ _Iagegp_ _Iagegp_ _Iagegp_ _cons * LR test to compare reduced = intercept only versus full = model w agegp. * REDUCED model. generate one=1. logistic case one, noconstant Wald chi2(1) = Log likelihood = Prob > chi2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] one * Save results for use in LR test. * estimates store NAMEYOUPICK. estimates store m1_reduced.sol_logistic part 2 of 2.docx Page 2 of 6

3 . * FULL model. xi: logistic case i.agegp i.agegp _Iagegp_1-6 (naturally coded; _Iagegp_1 omitted) LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = case Odds Ratio Std. Err. z P> z [95% Conf. Interval] _Iagegp_ _Iagegp_ _Iagegp_ _Iagegp_ _Iagegp_ _cons * Save results for use in LR test. * estimates store NAMEYOUPICK. estimates store m2_full. * LR ratio test. * lrtest REDUCEDNAME FULLNAME. lrtest m1_reduced m2_full Likelihood-ratio test LR chi2(5) = Match! (Assumption: m1_reduced nested in m2_full) Prob > chi2 = Illustration of Hosmer-Lemeshow Goodness of Fit Test (NULL: model is a good fit) See also unit 5 notes, pp For this illustration, fit the model containing design variables for age group and tobacco use. Then perform the Hosmer-Lemeshow goodness of fit test. SOLUTION. xi: logit case i.agegp i.tobgp i.agegp _Iagegp_1-6 (naturally coded; _Iagegp_1 omitted) i.tobgp _Itobgp_1-4 (naturally coded; _Itobgp_1 omitted) Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Iteration 6: log likelihood = sol_logistic part 2 of 2.docx Page 3 of 6

4 LR chi2(8) = Prob > chi2 = Better than intercept only Log likelihood = Pseudo R2 = case Coef. Std. Err. z P> z [95% Conf. Interval] _Iagegp_ _Iagegp_ _Iagegp_ _Iagegp_ _Iagegp_ _Itobgp_ _Itobgp_ _Itobgp_ _cons estat gof, group(8) table Logistic model for case, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) Group Prob Obs_1 Exp_1 Obs_0 Exp_0 Total number of observations = 975 number of groups = 8 Hosmer-Lemeshow chi2(6) = 2.30 Prob > chi2 = Null is retained; no evidence of departure from good..sol_logistic part 2 of 2.docx Page 4 of 6

5 3. Illustration of Link Test (NULL: current model is adequate; no additional predictors needed) See also unit 5 notes, pp For the current model, perform the link test for omitted variables. SOLUTION. * Preliminary You must have first fit a model.. * Link test for omitted variables. linktest Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = case Coef. Std. Err. z P> z [95% Conf. Interval] _hat _hatsq _cons The null hypothesis is retained (p=.657). We have no evidence of a need for additional predictors 4. Illustration of Classification Table See also unit 5 notes, pp For the current model, obtain the classification table SOLUTION. * Preliminary Again, must have fit a model first.. *Classification table. estat classification Logistic model for case True Classified D ~D Total Total sol_logistic part 2 of 2.docx Page 5 of 6

6 Classified + if predicted Pr(D) >=.5 True D defined as case!= 0 Sensitivity Pr( + D) 10.00% Specificity Pr( - ~D) 98.71% Positive predictive value Pr( D +) 66.67% Negative predictive value Pr(~D -) 80.95% False + rate for true ~D Pr( + ~D) 1.29% False - rate for true D Pr( - D) 90.00% False + rate for classified + Pr(~D +) 33.33% False - rate for classified - Pr( D -) 19.05% Correctly classified 80.51% 5. Illustration of ROC Curve See also unit 5 notes, pp For the current model, produce the ROC curve.. * Produce ROC curve. lroc.sol_logistic part 2 of 2.docx Page 6 of 6