Stata version 12. Lab Session 2 April 2013

Similar documents
Green-comments black-commands blue-output

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

BIOSTATS 640 Spring 2016 At Your Request! Stata Lab #2 Basics & Logistic Regression. 1. Start a log Read in a dataset...

Never Smokers Exposure Case Control Yes No

* STATA.OUTPUT -- Chapter 5

Soci Statistics for Sociologists

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Working with Stata Inference on proportions

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments.

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION

Example Analysis with STATA

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

(LDA lecture 4/15/08: Transition model for binary data. -- TL)

Example Analysis with STATA

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Tabulate and plot measures of association after restricted cubic spline models

Ille-et-Vilaine case-control study

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Guideline on evaluating the impact of policies -Quantitative approach-

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output.

Introduction of STATA

Analyzing CHIS Data Using Stata

Logistic Regression Part II. Spring 2013 Biostat

Interactions made easy

Longitudinal Data Analysis, p.12

Appendix C: Lab Guide for Stata

Survival analysis. Solutions to exercises

Week 10: Heteroskedasticity

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Categorical Data Analysis

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, Last revised February 13, 2017

Stata v 12 Illustration. One Way Analysis of Variance

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

Sociology 7704: Regression Models for Categorical Data Instructor: Natasha Sarkisian. Preliminary Data Screening

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Categorical Data Analysis for Social Scientists

*STATA.OUTPUT -- Chapter 13

MNLM for Nominal Outcomes

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Using Excel s Analysis ToolPak Add-In

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap

Basics of Stata language

Lecture 2a: Model building I

Center for Demography and Ecology

X. Mixed Effects Analysis of Variance

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro.

Tutorial #7: LC Segmentation with Ratings-based Conjoint Data

********************************************************************************************** *******************************

. *increase the memory or there will problems. set memory 40m (40960k)

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2.

17.871: PS3 Key. Part I

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).)

B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh tap sas82, sas <--Old tap sas913, sas <--New Version

Examples of Using Stata v11.0 with JRR replicate weights Provided in the NHANES data set

JMP TIP SHEET FOR BUSINESS STATISTICS CENGAGE LEARNING

CHECKING INFLUENCE DIAGNOSTICS IN THE OCCUPATIONAL PRESTIGE DATA

Multilevel/ Mixed Effects Models: A Brief Overview

THE GUIDE TO SPSS. David Le

PubHlth Introduction to Biostatistics. 1. Summarizing Data Illustration: STATA version 10 or 11. A Visit to Yellowstone National Park, USA

Trunkierte Regression: simulierte Daten

Lab 1: A review of linear models

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

A Little Stata Session 1

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware

Logistic Regression Analysis

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria

Getting Started With PROC LOGISTIC

RACE 616: Advance Analysis in Medical Research Jan 5 th Feb 7 th 2017

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

DAY 2 Advanced comparison of methods of measurements

= = Name: Lab Session: CID Number: The database can be found on our class website: Donald s used car data

Applying Regression Analysis

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Biostatistics 208 Data Exploration

Module 202 Statistical methods in epidemiology. Study guide and Reader

Basic Statistics, Sampling Error, and Confidence Intervals

Introduction to Categorical Data Analysis Procedures (Chapter)

Telecommunications Churn Analysis Using Cox Regression

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.

Timing Production Runs

Risk-adjustment procedures and graphical representations of outcome rates for institutional comparisons Jacopo Lenzi

This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation indicator.

SUDAAN Analysis Example Replication C6

The Multivariate Regression Model

Florida. Difference-in-Difference Models 8/23/2016

SUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below.

CHAPTER FIVE CROSSTABS PROCEDURE

Mixed Mode Surveys in Business Research: A Natural Experiment. Dr Andrew Engeli March 14 th 2018

Transcription:

Stata version 12 Lab Session 2 April 2013 1. Probability Calculations (p-values and such).. (a) Binomial. (b) Chi Square.. (c) F.. (d) Hypergeometric (Central).... (e) Normal (f) Poisson. (g) Student t 2. Categorical: Single 2x2 Table.. (a) Cohort Study Design (b) Case-Control Study Design. 3. Categorical: K 2x2 Tables... (a) Cohort Study Design (b) Case-Control Study Design.. (c) Graph of OR (95% CI) Over Stratification Variable 4. Categorical: Test of Trend.. (a) 2xC Table (b) RxC Table.. 5. Logistic Regression..... (a) Estimation and Hypothesis Tests (b) Graphical Assessments.. 2 2 4 5 6 7 8 10 11 12 13 14 15 16 17 19 19 20 21 22 24 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 1 of 25

1. Probability Calculations (p-values and such) Preliminary: Download the module probcalc This user-created module is used for the following distributions: binomial, poisson, and normal. Type the following in the command window.. ssc install probcalc (a) Binomial Distribution Binomial(n, pi): Probability of exactly k events, Pr[X = k] probcalc b ntrials pi exactly k. * Binomial(n=20, pi=.03) Prob[X=2]. probcalc b 20.03 exactly 2 Distribution: Binomial n=20 p=.03 option:exactly x=2 P(X=2)=.09882967 Binomial(n, pi): Probability of at most k events, Pr[X < k] probcalc b ntrials pi atmost k. * Binomial(n=20, pi=.03) Prob[X <= 2]. probcalc b 20.03 atmost 2 Distribution: Binomial n=20 p=.03 option:atmost x=2 P(X=0)=.54379434 P(X=1)=.33636763 P(X=2)=.09882967 pmf Method 1: P(X<=2)=.97899164 cdf Method 2: P(X<=2)=.97899164 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 2 of 25

Binomial(n, pi): Probability of less than k events, Pr[X < k] probcalc b ntrials pi atmost k-1. * Binomial(n=20, pi=.03) Prob[X < 2]. probcalc b 20.03 atmost 1 Distribution: Binomial n=20 p=.03 option:atmost x=1 P(X=0)=.54379434 P(X=1)=.33636763 pmf Method 1: P(X<=1)=.88016198 cdf Method 2: P(X<=1)=.88016198 Binomial(n, pi): Probability of at least k events, Pr[X > k] probcalc b ntrials pi atleast k. * Binomial(n=20, pi=.03) Prob[X >= 2]. probcalc b 20.03 atleast 2 Distribution: Binomial n=20 p=.03 option:atleast x=2 P(X=2)=.09882967 -- output omitted -- P(X=20)=3.487e-31 pmf Method 1: P(X>=2)=.11983802 cdf Method 2: P(X>=2)=.11983802 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 3 of 25

Binomial(n, pi): Probability of more than k events, Pr[X > k] probcalc b ntrials pi atleast k+1. *. * Binomial(n=20, pi=.03) Prob[X > 2]. probcalc b 20.03 atleast 3 Distribution: Binomial n=20 p=.03 option:atleast x=3 P(X=3)=.01833953 -- output omitted -- P(X=20)=3.487e-31 pmf Method 1: P(X>=3)=.02100836 cdf Method 2: P(X>=3)=.02100836 (b) Chi Square Distribution Chi Square (degrees of freedom = df): Probability [Y < y ] is the same as Probability [Y < y ] display chi2(df,y). * Pr[Chi square df=2 <= 1.5]. display chi2(2,1.5).52763345 Chi Square (degrees of freedom = df): Probability [Y > y ] is the same as Probability [Y > y ] display chi2tail(df,y). * Pr[Chi square df=2 >= 1.5]. display chi2tail(2,1.5).47236655 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 4 of 25

Chi Square (degrees of freedom = df): Solution for pth quantile display invchi2(df,p). * Chi Square df=2: Solution for 95th percentile. display invchi2(2,.975) 7.3777589 (c) F Distribution F (degrees of freedom = df1 and df2): Probability [Y < y ] is the same as Probability [Y < y ] display F(df1, df2,y). * Pr[F(df=2,6) < 2.3]. display F(2,6,2.3).81864223 F (degrees of freedom = df1 and df2): Probability [Y > y ] is the same as Probability [Y > y ] display Ftail(df1,df2,y). * Pr[F(df=2,6) > 2.3]. display Ftail(2,6,2.3).18135777 F (degrees of freedom = df1 and df2): Solution for pth quantile display invftail(df1,df2,1-p). * F with df=2,6: Solution for 95th percentile. display invftail(2,6,.05) 5.1432528. * F with df=2,6: Solution for 5th percentile. display invftail(2,6,.95).0517343 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 5 of 25

(d) Hypergeometric Distribution (Central) Disease Exposure Yes No Yes a b n = total # exposed No c d K = total # with disease N=grand total Hypergeometric (N total, K disease, n exposed): Probability [Exactly a with exposure AND disease ] display hypergeometricp(n,k,n,a). * Pr[Hypergeometric N=259, K=4, n=23, a=2]. display hypergeometricp(259,4,23,2).03829914 Hypergeometric (N total, K disease, n exposed): Probability [a or less with exposure AND disease ] display hypergeometric(n,k,n,a). * Pr[Hypergeometric N=259, K=4, n=23, a<=2]. display hypergeometric(259,4,23,2).99767001 Hypergeometric (N total, K disease, n exposed): Probability [a or more with exposure AND disease ] Tips: (1) Use this for p-values; and (2) Note that that a needs to be reduced by 1. display 1 - hypergeometric(n,k,n,a-1). * Pr[Hypergeometric N=259, K=4, n=23, a>=2]. display 1-hypergeometric(259,4,23,1).04062914 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 6 of 25

(e) Normal Distribution(mean=mu, standard deviation=sigma) Normal(mu, sigma), between: Probability[a < X < b] is the same as Probability[a < X < b] probcalc n mu sigma between a b. * Pr[Normal(mu=100, sigma=15) is between 85 and 115. probcalc n 100 15 between 85 115 Distribution: Normal mean:100 s.d.:15 option:between x=85 115 cdf Method: P(85<=X<115)=.68268949 Normal(mu, sigma), at most: Probability[ X < b] is the same as Probability[X < b] probcalc n mu sigma atmost b. * Pr[Normal(mu=100, sigma=15) is at most 115. probcalc n 100 15 atmost 115 Distribution: Normal mean:100 s.d.:15 option:atmost x=115 cdf Method: P(X<=115)=.84134475 Normal(mu, sigma), at least: Probability[ X > a] is the same as Probability[X > a] probcalc n mu sigma atleast a. * Pr[Normal(mu=100, sigma=15) is at least 85. probcalc n 100 15 atleast 85 Distribution: Normal mean:100 s.d.:15 option:atleast x=85 cdf Method: P(X>=85)=.84134475 Normal(mu, sigma): Solution for pth quantile display mu+sigma*invnormal(p). * Normal(mu=100, sigma=15): Solution for 95th percentile. display 100+15*invnormal(.95) 124.6728 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 7 of 25

(f) Poisson Distribution Poisson(mu): Probability of exactly k events, Pr[X = k] probcalc p mu exactly k. * Pr[Poisson(mu=1.8) = 6]. probcalc p 1.8 exactly 6 Distribution: Poisson mu=1.8 option:exactly x=6 P(X=6)=.00780859 Poisson(mu): Probability of at most k events, Pr[X < k] probcalc p mu atmost k. * Pr[Poisson(mu=1.8) <= 6]. probcalc p 1.8 atmost 6 Distribution: Poisson mu=1.8 option:atmost x=6 P(X=0)=.16529889 -- output omitted -- P(X=6)=.00780859 pmf Method 1: P(X<=6)=.99743055 cdf Method 2: P(X<=6)=.99743055 Poisson(mu): Probability of less than k events, Pr[X < k] probcalc p mu atmost k-1. * Pr[Poisson(mu=1.8) < 6]. probcalc p 1.8 atmost 5 Distribution: Poisson mu=1.8 option:atmost x=5 P(X=0)=.16529889 -- output omitted -- P(X=5)=.02602862 pmf Method 1: P(X<=5)=.98962196 cdf Method 2: P(X<=5)=.98962196 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 8 of 25

Poisson(mu): Probability of at least k events, Pr[X > k] probcalc p mu atleast k. * Pr[Poisson(mu=1.8) >= 6]. probcalc p 1.8 atleast 6 Distribution: Poisson mu=1.8 option:atleast x=6 P(X=0)=.16529889 -- output omitted -- pmf Method 1: P(X>=6)=.01037804 cdf Method 2: P(X>=6)=.01037804 Poisson(mu): Probability of more than k events, Pr[X > k] probcalc p mu atleast k+1. * Pr[Poisson(mu=1.8) > 6]. probcalc p 1.8 atleast 7 Distribution: Poisson mu=1.8 option:atleast x=7 Note: For Poisson ''at least'' questions, the sum of the lower tail pmf's is subtracted from one. So only variates less than x are reported below. P(X=0)=.16529889 -- output omitted -- P(X=6)=.00780859 pmf Method 1: P(X>=7)=.00256945 cdf Method 2: P(X>=7)=.00256945 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 9 of 25

(g) Student-t Distribution Student-t (degrees of freedom = df): Probability [T < t ] is the same as Probability [T < t ] display 1 - ttail(df,t). * Pr[Student-t(df=12) < 2.1]. display 1-ttail(12,2.1).97122753 Student-t (degrees of freedom = df): Probability [T > t ] is the same as Probability [T > t ] display ttail(df,t). * Pr[Student-t(df=12) > 2.1]. display ttail(12,2.1).02877247 Student-t (degrees of freedom = df): Solution for pth quantile display invttail(df,1-p). * Student-t(df=12): Solution for 97.5th percentile. display invttail(12,.025) 2.1788128 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 10 of 25

2. Categorical: Single 2x2 Table Preliminary We will use the following data for both: (a) Cohort Study Design and (b) Case-Control Study design Disease (Lung Cancer) Exposure (Smoking) Yes No Yes 9 31 No 2 47 Create a data set called single2x2.dta and save it.. generate smoking=.. generate lungca=.. generate tally=. -- click on data editor to enter data. Close window. Data will not be lost! --. label define smokingf 1 "Smoker" 0 "Non-smoker". label values smoking smokingf. label define lungcaf 1 "Cancer" 0 "Healthy". label values lungca lungcaf. * Use command expand to create data set with individual records. expand tally (85 observations created). * Check data before saving. tab2 smoking lungca -> tabulation of smoking by lungca lungca smoking Healthy Cancer Total -----------+----------------------+---------- Non-smoker 47 2 49 Smoker 31 9 40 -----------+----------------------+---------- Total 78 11 89. save "/Users/carolbigelow/Desktop/single2x2.dta" file /Users/carolbigelow/Desktop/single2x2.dta saved (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 11 of 25

(a) Cohort Study Design. * Estimation of Relative Risk (RR) and Fisher Exact Test. * cs DISEASEVARIABLE EXPOSUREVARIABLE, exact. cs lungca smoking, exact smoking Exposed Unexposed Total -----------------+------------------------+------------ Cases 9 2 11 Noncases 31 47 78 -----------------+------------------------+------------ Total 40 49 89 Risk.225.0408163.1235955 Point estimate [95% Conf. Interval] ------------------------+------------------------ Risk difference.1841837.0434157.3249517 Risk ratio 5.5125 1.262212 24.07491 Attr. frac. ex..8185941.2077403.958463 Attr. frac. pop.6697588 +------------------------------------------------- 1-sided Fisher's exact P = 0.0100 2-sided Fisher's exact P = 0.0108. * Estimation of Relative Risk (RR) and Fisher Exact Test IMMEDIATE VERSION. * csi DISEASED-EXPOSED DISEASED-NOTEXPOSED HEALTHY-EXPOSED HEALTH-NOTEXPOSED, exact. csi 9 2 31 47, exact Exposed Unexposed Total -----------------+------------------------+------------ Cases 9 2 11 Noncases 31 47 78 -----------------+------------------------+------------ Total 40 49 89 Risk.225.0408163.1235955 Point estimate [95% Conf. Interval] ------------------------+------------------------ Risk difference.1841837.0434157.3249517 Risk ratio 5.5125 1.262212 24.07491 Attr. frac. ex..8185941.2077403.958463 Attr. frac. pop.6697588 +------------------------------------------------- 1-sided Fisher's exact P = 0.0100 2-sided Fisher's exact P = 0.0108 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 12 of 25

(b) Case-Control Study Design. * Estimation of Odds Ratio (OR) and Fisher Exact Test. * cc DISEASEVARIABLE EXPOSUREVARIABLE, exact. cc lungca smoking, exact Proportion Exposed Unexposed Total Exposed -----------------+------------------------+------------------------ Cases 9 2 11 0.8182 Controls 31 47 78 0.3974 -----------------+------------------------+------------------------ Total 40 49 89 0.4494 Point estimate [95% Conf. Interval] ------------------------+------------------------ Odds ratio 6.822581 1.263628 67.65054 (exact) Attr. frac. ex..8534279.2086276.9852182 (exact) Attr. frac. pop.6982592 +------------------------------------------------- 1-sided Fisher's exact P = 0.0100 2-sided Fisher's exact P = 0.0108. * Estimation of Odds Ratio (OR) and Fisher Exact Test IMMEDIATE VERSION. * cci DISEASED-EXPOSED DISEASED-NOTEXPOSED HEALTHY-EXPOSED HEALTH-NOTEXPOSED, exact. cci 9 2 31 47, exact Proportion Exposed Unexposed Total Exposed -----------------+------------------------+------------------------ Cases 9 2 11 0.8182 Controls 31 47 78 0.3974 -----------------+------------------------+------------------------ Total 40 49 89 0.4494 Point estimate [95% Conf. Interval] ------------------------+------------------------ Odds ratio 6.822581 1.263628 67.65054 (exact) Attr. frac. ex..8534279.2086276.9852182 (exact) Attr. frac. pop.6982592 +------------------------------------------------- 1-sided Fisher's exact P = 0.0100 2-sided Fisher's exact P = 0.0108 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 13 of 25

3. Categorical: K 2x2 Tables Preliminary The data are provided for you in a data set called coffeemi_full.dta. It consists of K=4 2x2 tables. The exposure of interest is coffee (1=> cups/day, 0=<5 cups/day). The outcome is MI (1=yes, 0 = no). The stratifying variables is cigarettes (1=former smoker, 2=1-14 cigarettes/day, 3=35-44 cigarettes/day, 4=45+ cigarettes/day) Stratum 1: FORMER SMOKER Cups Coffee per day MI Control > 5 7 18 < 5 20 112 Stratum 2: 1-14 CIGARETTES/DAY Cups Coffee per day MI Control > 5 7 24 < 5 33 11 Stratum 3: 35-44 CIGARETTES/DAY Cups Coffee per day MI Control > 5 27 24 < 5 55 58 Stratum 4: 45+ CIGARETTES/DAY Cups Coffee per day MI Control > 5 30 17 < 5 34 17 Input the data set coffeemi_full.dta and check.. clear. use "http://people.umass.edu/biep640w/datasets/coffeemi_full.dta" (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 14 of 25

(a) Cohort Study Design. * Check.. * table DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable) row column. table mi coffee, by(smoking) row col ------------------------------------------ Stratum of Smoking and MI-Myocardial Cups of Coffee/Day Infarction Less 5+ cups Total ---------------+-------------------------- Former Smoker Non-MI 112 18 130 MI 20 7 27 Total 132 25 157 ---------------+-------------------------- 1-14 cigs/day Non-MI 11 24 35 MI 33 7 40 Total 44 31 75 ---------------+-------------------------- 35-44 cigs/day Non-MI 58 24 82 MI 55 27 82 Total 113 51 164 ---------------+-------------------------- 45+ cigs/day Non-MI 17 17 34 MI 34 30 64 Total 51 47 98 ------------------------------------------. * Compact display of % experiencing DISEASE because MI is coded 1=disease 0=healthy. tabulate smoking coffee, summarize(mi) means Means of MI-Myocardial Infarction Stratum of Cups of Coffee/Day Smoking Less 5+ cups Total -----------+----------------------+---------- Former Sm.15151515.28.17197452 1-14 cigs.75.22580645.53333333 35-44 cig.48672566.52941176.5 45+ cigs/.66666667.63829787.65306122 -----------+----------------------+---------- Total.41764706.46103896.43117409 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 15 of 25

. * Stratified analysis of Relative Risk (Event=mi) w Exposure (Coffee). * cs DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). cs mi coffee, by(smoking) Stratum of Smoki RR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- Former Smoker 1.848.8755072 3.900715 3.184713 1-14 cigs/day.3010753.1534833.5905939 13.64 35-44 cigs/day 1.087701.789336 1.498845 17.10366 45+ cigs/day.9574468.7165748 1.279286 16.30612 -----------------+------------------------------------------------- Crude 1.103896.8930748 1.364484 M-H combined.8800312.719581 1.076258 ------------------------------------------------------------------- Test of homogeneity (M-H) chi2(3) = 15.526 Pr>chi2 = 0.0014 (b) Case-Control Study Design. sort smoking. * Stratified analysis of Odds Ratio (Event=mi) w Exposure (Coffee). * cc DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). cc mi coffee, by(smoking) Stratum of Smoki OR [95% Conf. Interval] M-H Weight -----------------+------------------------------------------------- Former Smoker 2.177778.6752163 6.360992 2.292994 (exact) 1-14 cigs/day.0972222.0281342.3212659 10.56 (exact) 35-44 cigs/day 1.186364.5805495 2.429507 8.04878 (exact) 45+ cigs/day.8823529.353352 2.204447 5.897959 (exact) -----------------+------------------------------------------------- Crude 1.192771.7976463 1.781013 (exact) M-H combined.7751256.5172801 1.161498 ------------------------------------------------------------------- Test of homogeneity (M-H) chi2(3) = 19.92 Pr>chi2 = 0.0002 Test that combined OR = 1: Mantel-Haenszel chi2(1) = 1.65 Pr>chi2 = 0.1992. * Test of Trend in Odds Ratio (Event=mi, Exposure=coffee) over increasing strata. * tabodds DISEASEVARIABLE STRATUMVARIABLE, or. tabodds mi smoking, or --------------------------------------------------------------------------- smoking Odds Ratio chi2 P>chi2 [95% Conf. Interval] -------------+------------------------------------------------------------- Former ~r 1.000000.... 1-14 ci~y 5.502646 32.13 0.0000 2.833492 10.686146 35-44 c~y 4.814815 38.37 0.0000 2.777569 8.346306 45+ cig~y 9.063181 60.61 0.0000 4.617804 17.787945 --------------------------------------------------------------------------- Test of homogeneity (equal odds): chi2(3) = 68.90 Pr>chi2 = 0.0000 Score test for trend of odds: chi2(1) = 58.91 Pr>chi2 = 0.0000 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 16 of 25

(c) Graph of OR (95% CI) Over Stratification Variable. *. * Obtain stratum specific OR and 95% CI limits. * mhodds DISEASEVARIABLE EXPOSUREVARIABLE, by(stratumvariable). mhodds mi coffee, by(smoking) Maximum likelihood estimate of the odds ratio Comparing coffee==1 vs. coffee==0 by smoking ------------------------------------------------------------------------------- smoking Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] ----------+-------------------------------------------------------------------- Former S 2.177778 2.42 0.1197 0.79694 5.95115 1-4 cigs 0.097222 19.81 0.0000 0.02717 0.34791 35-44 ci 1.186364 0.25 0.6139 0.61031 2.30612 45+ cigs 0.882353 0.09 0.7693 0.38203 2.03793 ------------------------------------------------------------------------------- Mantel-Haenszel estimate controlling for smoking ---------------------------------------------------------------- Odds Ratio chi2(1) P>chi2 [95% Conf. Interval] ---------------------------------------------------------------- 0.775126 1.65 0.1992 0.524827 1.144795 ---------------------------------------------------------------- Test of homogeneity of ORs (approx): chi2(3) = 20.53 Pr>chi2 = 0.0001. *. **** Create a new "little" data set containing the information to be plotted. clear. generate or=.. generate high=.. generate low=.. generate smoking=. (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 17 of 25

Click on the DATA EDITOR icon to enter the data. Close this window when done.. * graph twoway (scatter or STRATUMVARIABLE) (rcap low high STRATUMVARIABLE). graph twoway (scatter or smoking, msymbol(d)) (rcap low high smoking), yline(1,lwidth(thin) lpattern(dash) lcolor(black)) xlabel(0 "Overall" 1 "Former" 2 "1-4 cigs" 3 "35-44 cigs" 4 "45+ cigs", angle(45)) title("relative Odds Mycardial Infarction") subtitle("associated with High Coffee Consumption") ytitle("odds Ratio, 95% CI") legend(off) caption("mi_or.png", size(vsmall)) (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 18 of 25

4. Categorical: Test of Trend Preliminary The data are provided for you in a data set called esophageal_cancer.dta. It is in tabular form so you will need to use the command expand. This data set has n=117 observations on cases (#=25) of esophageal cancer and controls (#=92). The exposure of interest is alcohol consumption (g/day) at 4 levels: (0-39g/day, 40-79 g/day, 80-119g/day and 120+ g/day).. use "http://people.umass.edu/biep640w/datasets/esophageal_cancer.dta". expand tally (224 observations created) (a) 2xC Table Use the command tabodds when you have case-control data (1=cases, 0=controls) and you have more than 2 levels of expsosure.. * Test of Trend for 2xC table. * tabodds CASEVARIABLE EXPOSUREVARIABLE. tabodds case alcohol -------------------------------------------------------------------------- alcohol cases controls odds [95% Conf. Interval] ------------+------------------------------------------------------------- 0-39g 2 47 0.04255 0.01034 0.17518 40-79 9 31 0.29032 0.13822 0.60979 80-119g 9 9 1.00000 0.39695 2.51919 120+ 5 5 1.00000 0.28950 3.45420 -------------------------------------------------------------------------- Test of homogeneity (equal odds): chi2(3) = 22.22 Pr>chi2 = 0.0001 Score test for trend of odds: chi2(1) = 20.85 Pr>chi2 = 0.0000 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 19 of 25

(b) RxC Table The command tabodds will not work (you will get an error message) if the number of rows is more than 2. For testing trend in the more general setting of an RxC table, use the command nptrend.. * Test of Trend for RxC table must use command nptrend. sort alcohol. nptrend case, by(alcohol) alcohol score obs sum of ranks 1 1 49 2395.5 2 2 40 2386.5 3 3 18 1363.5 4 4 10 757.5 z = 4.57 Prob > z = 0.000 (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 20 of 25

5. Logistic Regression Preliminary The data are provided for you in a data set called depress_small.dta. This data set has n=294 observations, including 50 cases of depression (depressed=1). For this illustration, we consider 3 predictors: female gender (female=1), alcohol (drinker=1) and age.. clear. use "http://people.umass.edu/biep640w/datasets/depress_small.dta" Descriptives and Creation of Indicator Variables. summarize Variable Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- sex 294 1.622449.485601 1 2 age 294 44.41497 18.08544 18 89 cases 294.170068.3763331 0 1 drink 294 1.204082.4037161 1 2 depressed 294.170068.3763331 0 1. * recode 1/2 variables to 0/1 variables. Note- I happen to know the codes already.. * SEX: males (1->0), females (2->1). recode sex (1=0) (2=1), generate(female). label variable female "female". * DRINK: Non-drinker (2->0). recode drink (2=0), generate(drinker). label variable drinker drinker. * AGE: Create 0/1 indicators for quartiles. centile age, centile(0 25 50 75 100) -- Binom. Interp. -- Variable Obs Percentile Centile [95% Conf. Interval] -------------+------------------------------------------------------------- age 294 0 18 18 18* 25 28 26 31 50 42.5 38 47 75 59 57 61 100 89 89 89* * Lower (upper) confidence limit held at minimum (maximum) of sample (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 21 of 25

. recode age (18/28=1) (28/42.5=2) (42.5/59=3) (59/89=4), generate(age_quartile). generate age1828=(age_quartile==1). generate age2843=(age_quartile==2). generate age4359=(age_quartile==3). generate age5989=(age_quartile==4) (a) Estimation and Hypothesis Tests. * Fit model: Command logit yields betas. Command logistic yields Ors.. logistic depressed female drinker age2843 age4359 age5989 Logistic regression Number of obs = 294 LR chi2(5) = 13.15 Prob > chi2 = 0.0220 Log likelihood = -127.48507 Pseudo R2 = 0.0491 ------------------------------------------------------------------------------ depressed Odds Ratio Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- female 2.983863 1.137352 2.87 0.004 1.413606 6.298388 drinker 1.227484.5076479 0.50 0.620.5457409 2.760867 age2843 1.011404.4173488 0.03 0.978.4504867 2.270738 age4359.5978412.2625704-1.17 0.241.2527785 1.413942 age5989.4849052.2323103-1.51 0.131.1896094 1.240092 _cons.1060434.0593354-4.01 0.000.0354163.3175149 ------------------------------------------------------------------------------. * Display predicted probabilities of event=depressed using command adjust w option pr. adjust, by(female) pr ------------------------------------------------------------------------------------------ Dependent variable: depressed Equation: depressed Command: logistic Variables left as is: drinker, age2843, age4359, age5989 ------------------------------------------------------------------------------------------ ---------------------- female pr ----------+----------- 0.086565 males, on average, have an estimated pr[depression] =.08 1.212992 females, on average, have an estimated pr[depression] =.21 ----------------------. adjust, by(female drinker) pr ------------------------------------------------------------------------------------------ Dependent variable: depressed Equation: depressed Command: logistic Variables left as is: age2843, age4359, age5989 ------------------------------------------------------------------------------------------ ---------------------------- drinker female 0 1 ----------+----------------- 0.068138.090081 1.174429.226424 females who drink, on average, have an estimated pr[depression] =.23 ---------------------------- (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 22 of 25

. * Wald test of age modeled using 3 0/1 indicators. test (age2843=0) (age4359=0) (age5989=0) ( 1) [depressed]age2843 = 0 ( 2) [depressed]age4359 = 0 ( 3) [depressed]age5989 = 0 chi2( 3) = 3.65 Prob > chi2 = 0.3023 Not significant. We can probably drop age. * LR test of age modeled using 3 0/1 indicators. * full model. quietly: logistic depressed female drinker age2843 age4359 age5989. estimates store full. * reduced model. quietly: logistic depressed female drinker. estimates store reduced. lrtest full reduced Likelihood-ratio test LR chi2(3) = 3.78 (Assumption: reduced nested in full) Prob > chi2 = 0.2859. * nice tabular display of fit of several models, showing betas and SE(beta). quietly: logistic depressed female. estimates store modelf. quietly: logistic depressed drinker. estimates store modeld. quietly: logistic depressed age2843 age4359 age5989. estimates store modelage. quietly: logistic depressed female drinker. estimates store reduced. quietly: logistic depressed female drinker age2843 age4359 age5989. estimates store full. estimates table modelf modeld modelage reduced full, b(%7.2f) se(%7.2f) stfmt(%7.4g) ---------------------------------------------------------------- Variable modelf modeld model~e reduced full -------------+-------------------------------------------------- female 1.04 1.07 1.09 0.38 0.38 0.38 drinker 0.19 0.32 0.20 0.40 0.41 0.41 age2843-0.03 0.01 0.40 0.41 age4359-0.52-0.51 0.43 0.44 age5989-0.71-0.72 0.47 0.48 _cons -2.31-1.73-1.30-2.59-2.24 0.33 0.36 0.28 0.49 0.56 ---------------------------------------------------------------- legend: b/se (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 23 of 25

(b) Regression Diagnostics. * Must first fit model to be evaluated. logistic depressed female drinker age2843 age4359 age5989 Logistic regression Number of obs = 294 LR chi2(5) = 13.15 Prob > chi2 = 0.0220 Log likelihood = -127.48507 Pseudo R2 = 0.0491 ------------------------------------------------------------------------------ depressed Odds Ratio Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- female 2.983863 1.137352 2.87 0.004 1.413606 6.298388 drinker 1.227484.5076479 0.50 0.620.5457409 2.760867 age2843 1.011404.4173488 0.03 0.978.4504867 2.270738 age4359.5978412.2625704-1.17 0.241.2527785 1.413942 age5989.4849052.2323103-1.51 0.131.1896094 1.240092 _cons.1060434.0593354-4.01 0.000.0354163.3175149 ------------------------------------------------------------------------------. * Hosmer Lemeshow Goodness of Fit Test. estat gof, group(10) Logistic model for depressed, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) (There are only 9 distinct quantiles because of ties) number of observations = 294 number of groups = 9 Hosmer-Lemeshow chi2(7) = 2.45 Prob > chi2 = 0.9311. * Link Test for Omitted Predictors. linktest -- output omitted - - Logistic regression Number of obs = 294 LR chi2(2) = 13.74 Prob > chi2 = 0.0010 Log likelihood = -127.19157 Pseudo R2 = 0.0512 ------------------------------------------------------------------------------ depressed Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- _hat 2.285572 1.69331 1.35 0.177-1.033255 5.604399 _hatsq.3809407.4903572 0.78 0.437 -.5801417 1.342023 Good. NS _cons.9506792 1.316593 0.72 0.470-1.629796 3.531154 ------------------------------------------------------------------------------ (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 24 of 25

. * Identify observations with standardized residuals > 2. predict std_residual, res. label variable std_residual "Standardized Residual". * I don't have an id variable so I am creating one here for illustration purposes". generate id=_n. list id std_residual drinker female age if abs(std_residual)>2. * none > 2. Let's try 1 for illustration purposes.. list id std_residual drinker female age if abs(std_residual)>1 +-----------------------------------------+ id std_re~l drinker female age ----------------------------------------- 4. 4 1.139302 0 1 50 7. 7 1.139302 0 1 58 12. 12 1.139302 0 1 57 18. 18 1.139302 0 1 55 25. 25 1.139302 0 1 43 ----------------------------------------- 57. 57 1.139302 0 1 48 58. 58 1.139302 0 1 55 -- rest of output omitted --. * Plot of Cook's Distance. predict cook, dbeta. label variable cook "Cook's Distance". scatter cook id, mlabel(id) msize(1) mlabsize(2) jitter(*10) title("plot of Cook's Distance") subtitle("by Study ID") caption("cooks.png") (mac)\teaching\stata\stata version 12\stata v 12 lab session 2.doc 4/17/2013 Page 25 of 25