Stata version 12. Lab Session 2 April 2013

Size: px

Start display at page:

Download "Stata version 12. Lab Session 2 April 2013"

Marcus Berry
6 years ago
Views:

1 Stata version 12 Lab Session 2 April Probability Calculations (p-values and such).. (a) Binomial. (b) Chi Square.. (c) F.. (d) Hypergeometric (Central).... (e) Normal (f) Poisson. (g) Student t 2. Categorical: Single 2x2 Table.. (a) Cohort Study Design (b) Case-Control Study Design. 3. Categorical: K 2x2 Tables... (a) Cohort Study Design (b) Case-Control Study Design.. (c) Graph of OR (95% CI) Over Stratification Variable 4. Categorical: Test of Trend.. (a) 2xC Table (b) RxC Table.. 5. Logistic Regression..... (a) Estimation and Hypothesis Tests (b) Graphical Assessments (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 1 of 25

2 1. Probability Calculations (p-values and such) Preliminary: Download the module probcalc This user-created module is used for the following distributions: binomial, poisson, and normal. Type the following in the command window.. ssc install probcalc (a) Binomial Distribution Binomial(n, pi): Probability of exactly k events, Pr[X = k] probcalc b ntrials pi exactly k. * Binomial(n=20, pi=.03) Prob[X=2]. probcalc b exactly 2 Distribution: Binomial n=20 p=.03 option:exactly x=2 P(X=2)= Binomial(n, pi): Probability of at most k events, Pr[X < k] probcalc b ntrials pi atmost k. * Binomial(n=20, pi=.03) Prob[X <= 2]. probcalc b atmost 2 Distribution: Binomial n=20 p=.03 option:atmost x=2 P(X=0)= P(X=1)= P(X=2)= pmf Method 1: P(X<=2)= cdf Method 2: P(X<=2)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 2 of 25

3 Binomial(n, pi): Probability of less than k events, Pr[X < k] probcalc b ntrials pi atmost k-1. * Binomial(n=20, pi=.03) Prob[X < 2]. probcalc b atmost 1 Distribution: Binomial n=20 p=.03 option:atmost x=1 P(X=0)= P(X=1)= pmf Method 1: P(X<=1)= cdf Method 2: P(X<=1)= Binomial(n, pi): Probability of at least k events, Pr[X > k] probcalc b ntrials pi atleast k. * Binomial(n=20, pi=.03) Prob[X >= 2]. probcalc b atleast 2 Distribution: Binomial n=20 p=.03 option:atleast x=2 P(X=2)= output omitted -- P(X=20)=3.487e-31 pmf Method 1: P(X>=2)= cdf Method 2: P(X>=2)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 3 of 25

4 Binomial(n, pi): Probability of more than k events, Pr[X > k] probcalc b ntrials pi atleast k+1. *. * Binomial(n=20, pi=.03) Prob[X > 2]. probcalc b atleast 3 Distribution: Binomial n=20 p=.03 option:atleast x=3 P(X=3)= output omitted -- P(X=20)=3.487e-31 pmf Method 1: P(X>=3)= cdf Method 2: P(X>=3)= (b) Chi Square Distribution Chi Square (degrees of freedom = df): Probability [Y < y ] is the same as Probability [Y < y ] display chi2(df,y). * Pr[Chi square df=2 <= 1.5]. display chi2(2,1.5) Chi Square (degrees of freedom = df): Probability [Y > y ] is the same as Probability [Y > y ] display chi2tail(df,y). * Pr[Chi square df=2 >= 1.5]. display chi2tail(2,1.5) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 4 of 25

5 Chi Square (degrees of freedom = df): Solution for pth quantile display invchi2(df,p). * Chi Square df=2: Solution for 95th percentile. display invchi2(2,.975) (c) F Distribution F (degrees of freedom = df1 and df2): Probability [Y < y ] is the same as Probability [Y < y ] display F(df1, df2,y). * Pr[F(df=2,6) < 2.3]. display F(2,6,2.3) F (degrees of freedom = df1 and df2): Probability [Y > y ] is the same as Probability [Y > y ] display Ftail(df1,df2,y). * Pr[F(df=2,6) > 2.3]. display Ftail(2,6,2.3) F (degrees of freedom = df1 and df2): Solution for pth quantile display invftail(df1,df2,1-p). * F with df=2,6: Solution for 95th percentile. display invftail(2,6,.05) * F with df=2,6: Solution for 5th percentile. display invftail(2,6,.95) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 5 of 25

6 (d) Hypergeometric Distribution (Central) Disease Exposure Yes No Yes a b n = total # exposed No c d K = total # with disease N=grand total Hypergeometric (N total, K disease, n exposed): Probability [Exactly a with exposure AND disease ] display hypergeometricp(n,k,n,a). * Pr[Hypergeometric N=259, K=4, n=23, a=2]. display hypergeometricp(259,4,23,2) Hypergeometric (N total, K disease, n exposed): Probability [a or less with exposure AND disease ] display hypergeometric(n,k,n,a). * Pr[Hypergeometric N=259, K=4, n=23, a<=2]. display hypergeometric(259,4,23,2) Hypergeometric (N total, K disease, n exposed): Probability [a or more with exposure AND disease ] Tips: (1) Use this for p-values; and (2) Note that that a needs to be reduced by 1. display 1 - hypergeometric(n,k,n,a-1). * Pr[Hypergeometric N=259, K=4, n=23, a>=2]. display 1-hypergeometric(259,4,23,1) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 6 of 25

7 (e) Normal Distribution(mean=mu, standard deviation=sigma) Normal(mu, sigma), between: Probability[a < X < b] is the same as Probability[a < X < b] probcalc n mu sigma between a b. * Pr[Normal(mu=100, sigma=15) is between 85 and 115. probcalc n between Distribution: Normal mean:100 s.d.:15 option:between x= cdf Method: P(85<=X<115)= Normal(mu, sigma), at most: Probability[ X < b] is the same as Probability[X < b] probcalc n mu sigma atmost b. * Pr[Normal(mu=100, sigma=15) is at most 115. probcalc n atmost 115 Distribution: Normal mean:100 s.d.:15 option:atmost x=115 cdf Method: P(X<=115)= Normal(mu, sigma), at least: Probability[ X > a] is the same as Probability[X > a] probcalc n mu sigma atleast a. * Pr[Normal(mu=100, sigma=15) is at least 85. probcalc n atleast 85 Distribution: Normal mean:100 s.d.:15 option:atleast x=85 cdf Method: P(X>=85)= Normal(mu, sigma): Solution for pth quantile display mu+sigma*invnormal(p). * Normal(mu=100, sigma=15): Solution for 95th percentile. display *invnormal(.95) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 7 of 25

8 (f) Poisson Distribution Poisson(mu): Probability of exactly k events, Pr[X = k] probcalc p mu exactly k. * Pr[Poisson(mu=1.8) = 6]. probcalc p 1.8 exactly 6 Distribution: Poisson mu=1.8 option:exactly x=6 P(X=6)= Poisson(mu): Probability of at most k events, Pr[X < k] probcalc p mu atmost k. * Pr[Poisson(mu=1.8) <= 6]. probcalc p 1.8 atmost 6 Distribution: Poisson mu=1.8 option:atmost x=6 P(X=0)= output omitted -- P(X=6)= pmf Method 1: P(X<=6)= cdf Method 2: P(X<=6)= Poisson(mu): Probability of less than k events, Pr[X < k] probcalc p mu atmost k-1. * Pr[Poisson(mu=1.8) < 6]. probcalc p 1.8 atmost 5 Distribution: Poisson mu=1.8 option:atmost x=5 P(X=0)= output omitted -- P(X=5)= pmf Method 1: P(X<=5)= cdf Method 2: P(X<=5)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 8 of 25

9 Poisson(mu): Probability of at least k events, Pr[X > k] probcalc p mu atleast k. * Pr[Poisson(mu=1.8) >= 6]. probcalc p 1.8 atleast 6 Distribution: Poisson mu=1.8 option:atleast x=6 P(X=0)= output omitted -- pmf Method 1: P(X>=6)= cdf Method 2: P(X>=6)= Poisson(mu): Probability of more than k events, Pr[X > k] probcalc p mu atleast k+1. * Pr[Poisson(mu=1.8) > 6]. probcalc p 1.8 atleast 7 Distribution: Poisson mu=1.8 option:atleast x=7 Note: For Poisson ''at least'' questions, the sum of the lower tail pmf's is subtracted from one. So only variates less than x are reported below. P(X=0)= output omitted -- P(X=6)= pmf Method 1: P(X>=7)= cdf Method 2: P(X>=7)= (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 9 of 25

10 (g) Student-t Distribution Student-t (degrees of freedom = df): Probability [T < t ] is the same as Probability [T < t ] display 1 - ttail(df,t). * Pr[Student-t(df=12) < 2.1]. display 1-ttail(12,2.1) Student-t (degrees of freedom = df): Probability [T > t ] is the same as Probability [T > t ] display ttail(df,t). * Pr[Student-t(df=12) > 2.1]. display ttail(12,2.1) Student-t (degrees of freedom = df): Solution for pth quantile display invttail(df,1-p). * Student-t(df=12): Solution for 97.5th percentile. display invttail(12,.025) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 10 of 25

11 2. Categorical: Single 2x2 Table Preliminary We will use the following data for both: (a) Cohort Study Design and (b) Case-Control Study design Disease (Lung Cancer) Exposure (Smoking) Yes No Yes 9 31 No 2 47 Create a data set called single2x2.dta and save it.. generate smoking=.. generate lungca=.. generate tally=. -- click on data editor to enter data. Close window. Data will not be lost! --. label define smokingf 1 "Smoker" 0 "Non-smoker". label values smoking smokingf. label define lungcaf 1 "Cancer" 0 "Healthy". label values lungca lungcaf. * Use command expand to create data set with individual records. expand tally (85 observations created). * Check data before saving. tab2 smoking lungca -> tabulation of smoking by lungca lungca smoking Healthy Cancer Total Non-smoker Smoker Total save "/Users/carolbigelow/Desktop/single2x2.dta" file /Users/carolbigelow/Desktop/single2x2.dta saved (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 11 of 25

12 (a) Cohort Study Design. * Estimation of Relative Risk (RR) and Fisher Exact Test. * cs DISEASEVARIABLE EXPOSUREVARIABLE, exact. cs lungca smoking, exact smoking Exposed Unexposed Total Cases Noncases Total Risk Point estimate [95% Conf. Interval] Risk difference Risk ratio Attr. frac. ex Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = * Estimation of Relative Risk (RR) and Fisher Exact Test IMMEDIATE VERSION. * csi DISEASED-EXPOSED DISEASED-NOTEXPOSED HEALTHY-EXPOSED HEALTH-NOTEXPOSED, exact. csi , exact Exposed Unexposed Total Cases Noncases Total Risk Point estimate [95% Conf. Interval] Risk difference Risk ratio Attr. frac. ex Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 12 of 25

13 (b) Case-Control Study Design. * Estimation of Odds Ratio (OR) and Fisher Exact Test. * cc DISEASEVARIABLE EXPOSUREVARIABLE, exact. cc lungca smoking, exact Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (exact) Attr. frac. ex (exact) Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = * Estimation of Odds Ratio (OR) and Fisher Exact Test IMMEDIATE VERSION. * cci DISEASED-EXPOSED DISEASED-NOTEXPOSED HEALTHY-EXPOSED HEALTH-NOTEXPOSED, exact. cci , exact Proportion Exposed Unexposed Total Exposed Cases Controls Total Point estimate [95% Conf. Interval] Odds ratio (exact) Attr. frac. ex (exact) Attr. frac. pop sided Fisher's exact P = sided Fisher's exact P = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 13 of 25

14 3. Categorical: K 2x2 Tables Preliminary The data are provided for you in a data set called coffeemi_full.dta. It consists of K=4 2x2 tables. The exposure of interest is coffee (1=> cups/day, 0=<5 cups/day). The outcome is MI (1=yes, 0 = no). The stratifying variables is cigarettes (1=former smoker, 2=1-14 cigarettes/day, 3=35-44 cigarettes/day, 4=45+ cigarettes/day) Stratum 1: FORMER SMOKER Cups Coffee per day MI Control > < Stratum 2: 1-14 CIGARETTES/DAY Cups Coffee per day MI Control > < Stratum 3: CIGARETTES/DAY Cups Coffee per day MI Control > < Stratum 4: 45+ CIGARETTES/DAY Cups Coffee per day MI Control > < Input the data set coffeemi_full.dta and check.. clear. use " (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 14 of 25

15 (a) Cohort Study Design. * Check.. * table DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable) row column. table mi coffee, by(smoking) row col Stratum of Smoking and MI-Myocardial Cups of Coffee/Day Infarction Less 5+ cups Total Former Smoker Non-MI MI Total cigs/day Non-MI MI Total cigs/day Non-MI MI Total cigs/day Non-MI MI Total * Compact display of % experiencing DISEASE because MI is coded 1=disease 0=healthy. tabulate smoking coffee, summarize(mi) means Means of MI-Myocardial Infarction Stratum of Cups of Coffee/Day Smoking Less 5+ cups Total Former Sm cigs cig cigs/ Total (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 15 of 25

16 . * Stratified analysis of Relative Risk (Event=mi) w Exposure (Coffee). * cs DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). cs mi coffee, by(smoking) Stratum of Smoki RR [95% Conf. Interval] M-H Weight Former Smoker cigs/day cigs/day cigs/day Crude M-H combined Test of homogeneity (M-H) chi2(3) = Pr>chi2 = (b) Case-Control Study Design. sort smoking. * Stratified analysis of Odds Ratio (Event=mi) w Exposure (Coffee). * cc DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). cc mi coffee, by(smoking) Stratum of Smoki OR [95% Conf. Interval] M-H Weight Former Smoker (exact) 1-14 cigs/day (exact) cigs/day (exact) 45+ cigs/day (exact) Crude (exact) M-H combined Test of homogeneity (M-H) chi2(3) = Pr>chi2 = Test that combined OR = 1: Mantel-Haenszel chi2(1) = 1.65 Pr>chi2 = * Test of Trend in Odds Ratio (Event=mi, Exposure=coffee) over increasing strata. * tabodds DISEASEVARIABLE STRATUMVARIABLE, or. tabodds mi smoking, or smoking Odds Ratio chi2 P>chi2 [95% Conf. Interval] Former ~r ci~y c~y cig~y Test of homogeneity (equal odds): chi2(3) = Pr>chi2 = Score test for trend of odds: chi2(1) = Pr>chi2 = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 16 of 25

17 (c) Graph of OR (95% CI) Over Stratification Variable. *. * Obtain stratum specific OR and 95% CI limits. * mhodds DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). mhodds mi coffee, by(smoking) Maximum likelihood estimate of the odds ratio Comparing coffee==1 vs. coffee==0 by smoking smoking Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] Former S cigs ci cigs Mantel-Haenszel estimate controlling for smoking Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] Test of homogeneity of ORs (approx): chi2(3) = Pr>chi2 = *. **** Create a new "little" data set containing the information to be plotted. clear. generate or=.. generate high=.. generate low=.. generate smoking=. (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 17 of 25

Click on the DATA EDITOR icon to enter the data. Close this window when done.. * graph twoway (scatter or STRATUMVARIABLE) (rcap low high STRATUMVARIABLE).

$ytitle("odds Ratio, 95% CI") legend(off) caption("mi_or.png", size(vsmall)) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.$

18 Click on the DATA EDITOR icon to enter the data. Close this window when done.. * graph twoway (scatter or STRATUMVARIABLE) (rcap low high STRATUMVARIABLE). graph twoway (scatter or smoking, msymbol(d)) (rcap low high smoking), yline(1,lwidth(thin) lpattern(dash) lcolor(black)) xlabel(0 "Overall" 1 "Former" 2 "1-4 cigs" 3 "35-44 cigs" 4 "45+ cigs", angle(45)) title("relative Odds Mycardial Infarction") subtitle("associated with High Coffee Consumption") ytitle("odds Ratio, 95% CI") legend(off) caption("mi_or.png", size(vsmall)) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 18 of 25

19 4. Categorical: Test of Trend Preliminary The data are provided for you in a data set called esophageal_cancer.dta. It is in tabular form so you will need to use the command expand. This data set has n=117 observations on cases (#=25) of esophageal cancer and controls (#=92). The exposure of interest is alcohol consumption (g/day) at 4 levels: (0-39g/day, g/day, g/day and 120+ g/day).. use " expand tally (224 observations created) (a) 2xC Table Use the command tabodds when you have case-control data (1=cases, 0=controls) and you have more than 2 levels of expsosure.. * Test of Trend for 2xC table. * tabodds CASEVARIABLE EXPOSUREVARIABLE. tabodds case alcohol alcohol cases controls odds [95% Conf. Interval] g g Test of homogeneity (equal odds): chi2(3) = Pr>chi2 = Score test for trend of odds: chi2(1) = Pr>chi2 = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 19 of 25

20 (b) RxC Table The command tabodds will not work (you will get an error message) if the number of rows is more than 2. For testing trend in the more general setting of an RxC table, use the command nptrend.. * Test of Trend for RxC table must use command nptrend. sort alcohol. nptrend case, by(alcohol) alcohol score obs sum of ranks z = 4.57 Prob > z = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 20 of 25

21 5. Logistic Regression Preliminary The data are provided for you in a data set called depress_small.dta. This data set has n=294 observations, including 50 cases of depression (depressed=1). For this illustration, we consider 3 predictors: female gender (female=1), alcohol (drinker=1) and age.. clear. use " Descriptives and Creation of Indicator Variables. summarize Variable Obs Mean Std. Dev. Min Max sex age cases drink depressed * recode 1/2 variables to 0/1 variables. Note- I happen to know the codes already.. * SEX: males (1->0), females (2->1). recode sex (1=0) (2=1), generate(female). label variable female "female". * DRINK: Non-drinker (2->0). recode drink (2=0), generate(drinker). label variable drinker drinker. * AGE: Create 0/1 indicators for quartiles. centile age, centile( ) -- Binom. Interp. -- Variable Obs Percentile Centile [95% Conf. Interval] age * * * Lower (upper) confidence limit held at minimum (maximum) of sample (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 21 of 25

22 . recode age (18/28=1) (28/42.5=2) (42.5/59=3) (59/89=4), generate(age_quartile). generate age1828=(age_quartile==1). generate age2843=(age_quartile==2). generate age4359=(age_quartile==3). generate age5989=(age_quartile==4) (a) Estimation and Hypothesis Tests. * Fit model: Command logit yields betas. Command logistic yields Ors.. logistic depressed female drinker age2843 age4359 age5989 Logistic regression Number of obs = 294 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = depressed Odds Ratio Std. Err. z P> z [95% Conf. Interval] female drinker age age age _cons * Display predicted probabilities of event=depressed using command adjust w option pr. adjust, by(female) pr Dependent variable: depressed Equation: depressed Command: logistic Variables left as is: drinker, age2843, age4359, age female pr males, on average, have an estimated pr[depression] = females, on average, have an estimated pr[depression] = adjust, by(female drinker) pr Dependent variable: depressed Equation: depressed Command: logistic Variables left as is: age2843, age4359, age drinker female females who drink, on average, have an estimated pr[depression] = (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 22 of 25

23 . * Wald test of age modeled using 3 0/1 indicators. test (age2843=0) (age4359=0) (age5989=0) ( 1) [depressed]age2843 = 0 ( 2) [depressed]age4359 = 0 ( 3) [depressed]age5989 = 0 chi2( 3) = 3.65 Prob > chi2 = Not significant. We can probably drop age. * LR test of age modeled using 3 0/1 indicators. * full model. quietly: logistic depressed female drinker age2843 age4359 age5989. estimates store full. * reduced model. quietly: logistic depressed female drinker. estimates store reduced. lrtest full reduced Likelihood-ratio test LR chi2(3) = 3.78 (Assumption: reduced nested in full) Prob > chi2 = * nice tabular display of fit of several models, showing betas and SE(beta). quietly: logistic depressed female. estimates store modelf. quietly: logistic depressed drinker. estimates store modeld. quietly: logistic depressed age2843 age4359 age5989. estimates store modelage. quietly: logistic depressed female drinker. estimates store reduced. quietly: logistic depressed female drinker age2843 age4359 age5989. estimates store full. estimates table modelf modeld modelage reduced full, b(%7.2f) se(%7.2f) stfmt(%7.4g) Variable modelf modeld model~e reduced full female drinker age age age _cons legend: b/se (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 23 of 25

24 (b) Regression Diagnostics. * Must first fit model to be evaluated. logistic depressed female drinker age2843 age4359 age5989 Logistic regression Number of obs = 294 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = depressed Odds Ratio Std. Err. z P> z [95% Conf. Interval] female drinker age age age _cons * Hosmer Lemeshow Goodness of Fit Test. estat gof, group(10) Logistic model for depressed, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) (There are only 9 distinct quantiles because of ties) number of observations = 294 number of groups = 9 Hosmer-Lemeshow chi2(7) = 2.45 Prob > chi2 = * Link Test for Omitted Predictors. linktest -- output omitted - - Logistic regression Number of obs = 294 LR chi2(2) = Prob > chi2 = Log likelihood = Pseudo R2 = depressed Coef. Std. Err. z P> z [95% Conf. Interval] _hat _hatsq Good. NS _cons (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 24 of 25

25 . * Identify observations with standardized residuals > 2. predict std_residual, res. label variable std_residual "Standardized Residual". * I don't have an id variable so I am creating one here for illustration purposes". generate id=_n. list id std_residual drinker female age if abs(std_residual)>2. * none > 2. Let's try 1 for illustration purposes.. list id std_residual drinker female age if abs(std_residual)> id std_re~l drinker female age rest of output omitted --. * Plot of Cook's Distance. predict cook, dbeta. label variable cook "Cook's Distance". scatter cook id, mlabel(id) msize(1) mlabsize(2) jitter(*10) title("plot of Cook's Distance") subtitle("by Study ID") caption("cooks.png") (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 25 of 25

Green-comments black-commands blue-output

Green-comments black-commands blue-output PubHlth 640 Spring 2011 Stata v10or 11 Categorical Data Analysis Page 1 of 13 From top menu bar - - Create a log of your session by clicking on FILE > LOG > BEGIN Format the log file as a stata log. At