Calculating Absolute Rate Differences and Relative Rate Ratios in SAS/SUDAAN and STATA. Ashley Hirai, PhD May 20, 2014

Size: px
Start display at page:

Download "Calculating Absolute Rate Differences and Relative Rate Ratios in SAS/SUDAAN and STATA. Ashley Hirai, PhD May 20, 2014"

Transcription

1 Calculating Absolute Rate Differences and Relative Rate Ratios in SAS/SUDAAN and STATA Ashley Hirai, PhD May 20, 2014

2 Outline Importance of absolute and relative measures Problems with odds ratios as a relative measure Estimating adjusted Rate Differences (RDs) and Rate Ratios (RRs) in SAS and STATA Additive and multiplicative interactions

3 Measures of Association Differences between groups (person, place, time) in an outcome can be assessed by simple differences (subtraction) or ratios (division) Absolute risk/rate difference (attributable risk) = P(outcome) 1 P(outcome) 2 Relative risk/rate ratio = P(outcome) 1 P(outcome) 2

4 Absolute Measures Absolute risk/prevalence differences carry advantage of assessing actual impact Potentially avertable or excess cases (attributable fractions) Number needed to treat (cost-benefit) Decomposition analyses (Kitagawa, PPOR) However, can be difficult to compare strength of association across risk factors, outcomes, etc

5 Relative Measures of Association Help to control or standardize assessments across indicators or person/place/time to compare strength or magnitude of association More typically used in etiologic studies However, can be misleading A doubling of risk sounds dramatic, RR=2 1% to 2%, RR=2 but absolute increase is 1%, still very unlikely to have outcome Y 30% to 60%, RR=2 but absolute increase is 30%, now more likely than not to have outcome Y

6 Absolute and Relative Measures Provide unique information and are complementary When evaluating the effect of a single factor within one group or time period, there is qualitative concordance A positive RD will correspond with RR>1 A negative RD will correspond with RR<1 However, indicators can be inconsistent when comparing the effect in two groups or time periods (interactions)

7 2) Ratio Measures Can t Be Easily Compared

8 = = = 28.8 per 100,000 population Black White = 10.3

9 Healthy People Decline in both absolute and relative differences is best evidence of progress in disparity elimination Relative measures of disparity are primary indicator of progress because they adjust for changes in the level of the reference point over time Relative measures also have advantage of adjusting for differences in reference point when comparisons are made across objectives Keppel KG, Pearcy JN, Klein RJ. Measuring progress in Healthy People Healthy People 2010 Stat Notes Sep;(25):1-16.

10 Additive versus Multiplicative Interaction Multiplicative interaction may be an extreme standard; cases where multiplicative interaction is not present but additive is with important public health implications Stroke Incidence per 1,000 Smoke - Smoke + Risk Difference Relative Risk OC Pill OC Pill Joint effects exhibit additive interaction: increase of 50 cases versus expected 30 Multiplicative interaction not present, 3*2=6, RR of 6 expected and observed

11 Why both absolute and relative measures matter Absolute measures quantify actual risks and number affected Necessary to evaluate/interpret the meaning of a given RR Relative measures allow standardized comparisons across groups, time periods, indicators Lack of correspondence for interactions creates controversy of which is better but they provide complementary information

12 Problems with the Odds Ratio and why it should be avoided Non-intuitive, difficult to communicate/interpret correctly Exaggeration of relative risk for common outcomes Breastfeeding, obesity, medical home Not collapsible across strata; crude OR < average of stratum-specific OR can lead to apparent (negative) confounding when none exists or positive confounding not to be detected

13 RR versus OR RR = P2 P1 OR = P2 (1 P2) P1 (1 P1) = P2 P1 *(1 P1) (1 P2) = RR*(1 P1) (1 P2) OR RR = (1 P1) (1 P2) For RRs>1, a doubling can occur When P1 is small and P2 is much greater (high effect size) For p1=.1, p2=(.1+1)/2=.55 ; RR=5.5; OR=11 As P1 increases, the distance to P2 doesn t have to be as large (high prevalence) For p1=.5, p2=(.5+1)/2=0.75; RR=1.5; OR=3 Basically, high prevalence in at least one strata

14 Estimation Options for Risk Differences and Risk Ratios Showing code in SAS and STATA Examples with non-sampled and complex survey data Acknowledgement: Jay Kaufman, PhD McGill University

15 Model Options 1) Linear Probability Model 2) Generalized Linear Model (Binomial, Poisson) 3) Logistic Model (probability conversions)

16 Simple Data Example Birth certificate data Outcome: Preterm Birth (<37 weeks gestation) Covariates: Smoking, race/ethnicity, maternal age Example applies to cohort or cross-sectional data generally and population-level (nonsampled) or simple random samples

17 1) Linear Probability Model: Advantages: Disadvantages: very easy to fit single uniform estimate of RD economists will love you possible to get impossible estimates does not directly estimate RR biostatisticians will hate you Fit an OLS linear regression on the binary outcome variable: Pr(Y=1 X=x) = β 0 + β 1 X Note: Homoskedasticity assumption cannot be met, since variance is a function of p. Therefore, use robust variance.

18 Linear Probability Model (OLS) proc surveyreg order=formatted; class racex; model ptb = smoke mager mager*mager racex /clparm solution; run; Regression Analysis for Dependent Variable ptb Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < smoke < mager < mager*mager < racex a Non-Hispanic Black < racex b Hispanic < racex c Other racex d Non-Hispanic White Adjusted RD for smoking= (95% CI 0.028, 0.031)

19 regress ptb smoke c.mager##c.mager i.racex, vce(robust) cformat(%6.4f) Linear regression Number of obs = F( 6, ) = Prob > F = R-squared = Root MSE = Robust ptb Coef. Std. Err. t P> t [95% Conf. Interval] smoke , mager , c.mager# c.mager , racex , , , _cons , Adjusted RD for smoking = (95% CI: , )

20 2) Generalized Linear Model: Advantages: Disadvantages: single uniform estimate biostatisticians will love you can be difficult to fit still possible to get impossible values Fit a GLM with a binomial or Poisson distribution For RD: identity link For RR: log link g[pr(y=1 X=x)] = β 0 + β 1 X Generally fit Poisson when binomial fails to converge, must use robust standard errors due to binary data Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 2005 Aug 1;162(3):

21 Binomial Model Risk Difference, Identity Link proc genmod descending; class racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=bin link=identity; format racex racex.; run; Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Intercept smoke mager mager*mager racex a Non-Hispanic Black racex b Hispanic racex c Other racex d Non-Hispanic White Scale Adjusted RD for smoking = (95% CI , )

22 glm ptb smoke c.mager##c.mager i.racex, fam(b) lin(ident) binreg ptb smoke c.mager##c.mager i.racex, ml rd Generalized linear models No. of obs = Optimization : ML Residual df = Scale parameter = 1 Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = u [Identity] Log likelihood = OIM ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke , mager , c.mager# c.mager , racex , , , _cons , Coefficients are the risk differences.

23 Binomial Model Risk Ratio, Log Link proc genmod descending; class racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=bin link=log; estimate 'RR smoke' smoke 1; format racex racex.; run; Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Intercept smoke mager mager*mager racex a Non-Hispanic Black racex b Hispanic racex c Other racex d Non-Hispanic White Scale Contrast Estimate Results Mean Mean L'Beta Standard L'Beta Label Estimate Confidence Limits Estimate Error Alpha Confidence Limits RR smoke Adjusted RR for smoking = 1.31 (95% CI 1.30, 1.33)

24 glm ptb smoke c.mager##c.mager i.racex, fam(b) lin(log) eform binreg ptb smoke c.mager##c.mager i.racex, ml rr Generalized linear models No. of obs = Optimization : ML Residual df = Scale parameter = 1 Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u) [Log] AIC = Log likelihood = BIC = -2.93e+07 OIM ptb Risk Ratio Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager racex Adjusted RR for smoking = 1.31 (95% CI: 1.30, 1.33)

25 Additive Interaction proc genmod; class id smoke racex/param=ref ref=first; model ptb = smoke mager mager*mager racex smoke*racex/ dist=bin link=id; estimate 'smoking among NH Black' smoke 1 smoke*racex 1 0 0; format racex racex.; run; Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept <.0001 smoke <.0001 mager <.0001 mager*mager <.0001 racex a Non-Hispanic Black <.0001 racex b Hispanic <.0001 racex c Other <.0001 racex d Non-Hispanic White smoke*racex a Non-Hispanic Black smoke*racex b Hispanic smoke*racex c Other smoke*racex d Non-Hispanic White Scale Contrast Estimate Results Mean Mean L'Beta Standard L'Beta Chi- Label Estimate Confidence Limits Estimate Error Alpha Confidence Limits Square Pr > ChiSq smoking among NH Black <.0001 Effect of smoking greater among Black than White women

26 glm ptb smoke c.mager##c.mager i.racex smoke##i.racex, fam(bin) lin(id) binreg ptb smoke c.mager##c.mager i.racex, ml rd lincom smoke + 1.smoke#2.racex // smoking for Black women OIM ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager e racex smoke#racex _cons Coefficients are the risk differences. ( 1) smoke + 1.smoke#2.racex = 0 ptb Coef. Std. Err. z P> z [95% Conf. Interval] (1)

27 Multiplicative Interaction proc genmod; class racex/order=formatted; model ptb = smoke mager mager*mager racex smoke*racex/ dist=bin link=log; estimate 'smoking among White' smoke 1; estimate 'smoking among NH Black' smoke 1 smoke*racex 1 0 0; format racex racex.; Run; Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept <.0001 smoke <.0001 mager <.0001 mager*mager <.0001 racex a Non-Hispanic Black <.0001 racex b Hispanic <.0001 racex c Other racex d Non-Hispanic White smoke*racex a Non-Hispanic Black smoke*racex b Hispanic smoke*racex c Other smoke*racex d Non-Hispanic White Scale Contrast Estimate Results Mean Mean Label Estimate Confidence Limits smoking among White ssmoking among NH Black Additive but not multiplicative interaction

28 glm ptb smoke c.mager##c.mager i.racex smoke##i.racex, fam(bin) lin(log) binreg ptb smoke c.mager##c.mager i.racex, ml rr lincom smoke, eform // smoking for White lincom smoke + 1.smoke#2.racex, eform //smoking for Black OIM ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager racex smoke#racex _cons ptb exp(b) Std. Err. z P> z [95% Conf. Interval] Smoking for White Smoking for Black

29 If Binomial fails to converge, try starting with a negative intercept proc genmod data=ahs.sample_10; class racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=bin link=log intercept=-4; Run; binreg ptb smoke c.mager##c.mager i.racex, ml rr search Otherwise, try Modified Poisson less efficient but more likely to converge SAS: generate a unique id number in data step id=_n_; proc genmod; class id racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=poi link=id; repeated subject=id / type=ind; run; glm ptb i.smoke c.mager##c.mager i.racex [freq=count], fam(p) lin(log) vce(robust)

30 3) Logistic Regression or Probit Regression Model: Advantages: Disadvantages: always fits easily can never get impossible estimates epidemiologists will love you does not give a single uniform estimate choose between different formulations Fit a standard logistic regression model: ln Pr(Y=1 X = x) ( 1-Pr(Y=1 X = x) ) = α + β1x then just obtain and contrast the predicted probabilities: Pr(Y=1 X ) e 1+ e ( α+ β1x) = x = ( α+ β x) 1

31 logit ptb smoke c.mager##c.mager i.racex [freq=count], nolog Logistic regression Number of obs = LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager racex _cons Predicted probability of PTB for a 25 year old non-hispanic white woman smoker: (25*0.0984) + (25 *0.0020) e Pr(PTB=1 X = x) = 2 = (25*0.0984) + (25 *0.0020) 1 + e

32 But this is for a specific covariate pattern (in this case, unmarried NH-white women aged 25) Could evaluate the RD & RR holding all covariates at their means: marginal effect at the mean But there may be no one in the data set with this covariate combination and marginal effect - No woman is 52% White, 10% Black, 31% Hispanic or even 27.5 years old (integer year rather than exact age) Better alternative is to take the average of each individual RD, setting everyone to smoking and then no smoking (average marginal effect) - But generally only a small difference in large samples

33 But Stata has a handy utility that makes this easier: quietly logit ptb i.smoke c.mager##c.mager i.racex margins, dydx(smoke) dy/dx Delta-method SE z P> z [95% Conf. Int] smoke , Average individual adjusted RD = (95% CI: , ) quietly logit ptb i.smoke c.mager##c.mager i.racex margins, dydx(smoke) atmeans mager = (mean) 1.racex = (mean) 2.racex = (mean) 3.racex = (mean) 4.racex = (mean) Delta-method dy/dx Std. Err. z P> z [95% Conf. Interval] smoke Adjusted RD for the average woman = (95% CI: , )

34 Margins also works on sub-populations to look at additive interactions margins, dydx(smoke) over(racex) Average marginal effects Number of obs = Delta-method dy/dx Std. Err. z P> z [95% Conf. Int] smoke racex , , , , Note: dy/dx for factor levels is the discrete change from the base level. margins, dydx(smoke) atmeans over(racex) Conditional marginal effects Number of obs = Delta-method dy/dx Std. Err. z P> z [95% Conf. Interval] smoke racex Note: dy/dx for factor levels is the discrete change from the base level.

35 Test if NH Black RD is larger than the NH White RD: margins smoke, at(race=(1 2)) post Predictive margins Number of obs = Expression : Pr(ptb), predict() 1._at : racex = 1 2._at : racex = Delta-method Margin Std. Err. z P> z [95% Conf. Int] _at#smoke , , , , lincom (_b[2._at#1.smoke]-_b[2._at#0.smoke])-( _b[1._at#1.smoke]-_b[1._at#0.smoke]) Coef. Std. Err. z P> z [95% Conf. Int] (1) , test (_b[2._at#1.smoke]-_b[2._at#0.smoke]) = ( _b[1._at#1.smoke]-_b[1._at#0.smoke]) chi2( 1) = Prob > chi2 =

36 This is a different model, however, than one which includes a race x treatment interaction explicitly: logit ptb i.smoke##i.racex c.mager##c.mager, nolog Logistic regression Number of obs = ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke racex smoke#racex mager c.mager# c.mager _cons margins, dydx(smoke) at(race=(1 2)) Additive Interaction = Delta-method dy/dx Std. Err. z P> z [95% Conf. Int] smoke RD NHW , smoke RD NHB ,

37 Rate Ratio quietly logit ptb i.smoke c.mager##c.mager i.racex margins i.smoke, post Delta-method Margin Std. Err. z P> z [95% Conf. Interval] smoke nlcom _b[1.smoke] / _b[0.smoke] Coef. Std. Err. z P> z [95% Conf. Interval] _nl_ quietly logit ptb i.smoke c.mager##c.mager i.racex adjrr smoke // add-on called adjrr, computes adjusted RR and RD R1 = (0.0008) 95% CI (0.1262, ) R0 = (0.0002) 95% CI (0.0970, ) ARR = (0.0086) 95% CI (1.2943, ) ARD = (0.0008) 95% CI (0.0287, ) p-value (R0 = R1): p-value (ln(r1/r0) = 0): Norton et al. Computing adjusted risk ratios and risk differences in Stata. STATA Journal, 2013

38 SAS Logistic Average Marginal Effect Complicated macro but well annotated; must run one variable of interest at a time and rerun macro No possibility of assessing additive interaction Handles survey and non-survey data Need an IDNUMR in you data set to identify unique observations Inputs Key predictor of interest, contrast you want Covariate list (including that predictor) Specify categorical variables Outcome variable Kleinman LC, Norton EC. What's the Risk? A simple approach for estimating adjusted risk measures from nonlinear models including logistic regression. Health Serv Res Feb;44(1): lawrence.kleinman@mssm.edu

39 SAS Output The MEANS Procedure Results comparing smoke = 1 to smoke = 0 Variable Label Mean ARR Adjusted Risk Ratio SE_ARR Standard Error of ARR UCL_ARR 95% UCL ARR LCL_ARR 95% LCL ARR T_arr T statistic ARR P_ARR p-value ARR 6.90E-175 ARD Adjusted Risk Difference SE_ARD Standard Error ARD UCL_ARD 95% UCL ARD LCL_ARD 95% LCL ARD t_ard T statistic ARD P_ARD p-value ARD 2.97E-189 Risk0 Unexposed Risk SE0 Std Err Unexposed Risk 3.66E-07 Risk1 Exposed Risk SE1 Std Err Exposed Risk 6.92E-07

40 SAS-callable SUDAAN Logistic Model Average Marginal Effect (pred_eff) or Marginal Effect at Mean (cond_eff) Robust SEs even though we re using non-sampled data Also cannot assess additive interaction without including a multiplicative term PROC RLOGIST design=srs; class smoke racex /dir=descending; model ptb = smoke mager mager_2 racex; effects smoke=(1-1)/exp name="or:smoke"; predmarg smoke /adjrr; pred_eff smoke=(1-1)/name="rd:smoke"; rformat racex racex.; SETENV decwidth=4; run; Bieler GS, Brown GG, Williams RL, Brogan DJ. Estimating model-adjusted risks, risk differences, and risk ratios from complex survey data. Am J Epidemiol Mar 1;171(5):

41 Variance Estimation Method: Taylor Series (SRS) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Logit Response variable PTB: PTB by: Independent Variables and Effects Variables and Lower 95% Upper 95% Effects Odds Ratio Limit OR Limit OR SMOKE Predicted Marginal Predicted Lower 95% Upper 95% #1 Marginal SE Limit Limit T:Marg= SMOKE Contrasted Predicted PREDMARG Marginal #1 Contrast SE T-Stat P-value RD:smoke Predicted Marginal PREDMARG Lower Upper Risk Ratio #1 Risk 95% 95% Ratio SE Limit Limit SMOKE 1 vs Same point estimates as in STATA but robust SEs PTB is not very common so OR is not greatly inflated but RR is more interpretable

42 PROC RLOGIST design=srs data=ahs.sper_example; class smoke racex /dir=descending; model ptb = smoke mager mager_2 racex smoke*racex; predmarg smoke*racex; pred_eff racex=( )*smoke=(1-1) /name="rd:smoke for non-hispanic White"; pred_eff racex=( )*smoke=(1-1) /name="rd:smoke for non-hispanic Black"; pred_eff racex=( )*smoke=(1-1) /name="rd:smoke for Black versus White"; rformat racex racex.; SETENV decwidth=4; run; Contrasted Predicted PREDMARG Marginal #1 Contrast SE T-Stat P-value RD:smoke for non- Hispanic White RD:smoke for non- Hispanic Black Contrasted Predicted PREDMARG Marginal #3 Contrast SE T-Stat P-value RD:smoke for Black versus White Slightly different than STATA output Still shows an additive interaction of about 0.01; greater effect for Black women

43 Complex Survey Example 2007 National Survey of Children s Health Design: Children sampled within State-level strata, weights to account for unequal probability of selection, non-response, and population totals Outcome: Breastfed to 6 months among subpopulation of children 6 months to 5 years Covariates: poverty (multiply imputed), race/ethnicity Direct models, logistic margins Interpretation of OR, RR, and RD

44 Common Outcome PROC CROSSTAB data = example design=wr; nest State idnumr; supopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; class breastfed duration_6; TABLE breastfed duration_6; PRINT nsum wsum rowper serow lowrow uprow /style=nchs nsumfmt=f10.0 wsumfmt=f10.0; Run; Variance Estimation Method: Taylor Series (WR) For Subpopulation: FLG_06_MNTH = 0 AND AGEYR_CHILD <= 5 by: Breastfed for 6 months Breastfed for 6 Lower Upper months 95% 95% Sample Weighted Row SE Row Limit Limit Size Size Percent Percent ROWPER ROWPER Total Prevalence of 45%, we will see inflated ORs

45 SAS: Linear Probability Model (OLS) PROC REGRESS DATA=mimp1 design=wr mi_count=5; nest State idnumr; subpopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; subgroup povl hisprace; levels 4 5; reflevel povl=1 hisprace=2; rformat povl povl. ; rformat hisprace hisprace.; model duration_6 = povl hisprace; run; Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Identity Independent Variables and Beta Lower 95% Upper 95% Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B= Intercept HH Federal Poverty Level < 100% % % % Race/Ethnicity Hispanic NH white NH black NH multi nh other Or Use PROC SURVEYREG in SAS

46 duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] poverty hisprace _cons STATA: Linear Probability Model mi estimate: svy, subpop(subpop): regress duration_6 i.poverty ib2.hisprace Multiple-imputation estimates Imputations = 5 Survey: Linear regression Number of obs = Number of strata = 51 Population size = Number of PSUs = Subpop. no. of obs = Subpop. size = Average RVI = Complete DF = DF adjustment: Small sample DF: min = avg = max = Model F test: Equal FMI F( 7, ) = Within VCE type: Linearized Prob > F =

47 SAS: Generalized Linear Model (GLM) Poisson with log link may be only SUDAAN option, so RRs only No SAS survey procedure PROC LOGLINK DATA=mimp1 design=wr mi_count=5; nest State idnumr; subpopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; subgroup povl hisprace; levels 4 5; reflevel povl=1 hisprace=2; rformat povl povl. ; rformat hisprace hisprace.; model duration_6 = povl hisprace; run; Independent Incidence Variables and Density Lower 95% Upper 95% Effects Ratio Limit IDR Limit IDR Intercept HH Federal Poverty Level < 100% % % % Race/Ethnicity Hispanic NH white NH black NH multi nh other

48 STATA: Generalized Linear Model mi estimate: svy, subpop(subpop): glm duration_6 i.poverty ib2.hisprace, family(bin) link(identity) Multiple-imputation estimates Imputations = 5 Survey: Generalized linear models Number of obs = Number of strata = 51 Population size = Number of PSUs = Subpop. no. of obs = Subpop. size = Average RVI = Complete DF = DF adjustment: Small sample DF: min = avg = Within VCE type: Linearized max = duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] poverty hisprace _cons

49 STATA: Generalized Linear Model mi estimate, saving (miest): svy, subpop(subpop): glm duration_6 i.poverty ib2.hisprace, family(bin) link(log) mi estimate (rr: exp(_b[4.poverty])) using miest duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] poverty hisprace _cons Transformations rr: exp(_b[4.poverty]) duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] rr

50 SAS: Logistic Model SUDAAN predmarg, pred_eff For SAS only, try new macro PROC RLOGIST DATA=mimp1 design=wr mi_count=5; nest State idnumr; subpopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; subgroup povl hisprace; levels 4 5; reflevel povl=1 hisprace=2; rformat povl povl. ; rformat hisprace hisprace.; model duration_6 = povl hisprace ; predmarg povl(1)/adjrr; predmarg hisprace(2)/adjrr; pred_eff povl=( )/name="RD: %FPL v. <100% FPL"; pred_eff povl=( )/name="RD: %FPL v. <100% FPL"; pred_eff povl=( )/name="RD: 400%+ FPL v. <100% FPL"; pred_eff hisprace=( )/name="RD: NH Black v. NH White"; pred_eff hisprace=( )/name="RD: Hispanic v. NH White"; run;

51 Risk Difference: Poverty Predicted Marginal Predicted Lower 95% Upper 95% #1 Marginal SE Limit Limit T:Marg= HH Federal Poverty Level < 100% % % % Contrasted Predicted PREDMARG Marginal #2 Contrast SE T-Stat P-value RD: %FPL v. <100% FPL RD: 400%+ FPL v. <100% FPL

52 Advantage of Absolute Scale Can calculate actual numbers affected, excess cases attributable to a factor Risk Difference x Number with factor = excess cases Excess cases / Total cases = PAF Weighted N for children <100% FPL is 5.1 million If children <100%FPL had same probability of being breastfed to 6 months as children 400%+, 0.18*5.1 = 0.9 million more children would have been breastfed to 6 months

53 OR versus RR: Poverty Independent Variables and Lower 95% Upper 95% Effects Odds Ratio Limit OR Limit OR Intercept HH Federal Poverty Level < 100% % % % Predicted Marginal PREDMARG Lower Upper Risk Ratio #1 Risk 95% 95% Ratio SE Limit Limit HH Federal Poverty Level % vs. <100% % vs. <100% % vs. < 100% Excess risk estimate is doubled for OR versus RR (~100% v. 50% for 400%+ Poverty)

54 STATA: Logistic Model Margins command can t be used with multiple imputation so select a single imputation mi extract 1 svy, subpop(subpop): logistic duration_6 i.poverty ib2.hisprace Survey: Logistic regression Number of strata = 51 Number of obs = Number of PSUs = Population size = Subpop. no. of obs = Subpop. size = Design df = F( 7, 90861) = Prob > F = Linearized duration_6 Odds Ratio Std. Err. t P> t [95% Conf. Interval] poverty hisprace

55 STATA Logistic: Rate Difference - Use margins with the subpop since analyzing a subset of total sample (age<=5) - Use vce(unconditional) to adjust SEs for survey design svy, subpop(subpop): logistic duration_6 i.poverty ib2.hisprace margins, subpop(subpop) dydx(*) vce(unconditional) Average marginal effects Number of obs = Subpop. no. of obs = Expression : Pr(duration_6), predict() dy/dx w.r.t. : 2.poverty 3.poverty 4.poverty 1.hisprace 3.hisprace 4.hisprace 5.hisprace Linearized dy/dx Std. Err. t P> t [95% Conf. Interval] poverty hisprace Note: dy/dx for factor levels is the discrete change from the base level.

56 STATA Logistic: Rate Ratio svy, subpop(subpop): logistic duration_6 i.poverty ib2.hisprace margins poverty, subpop(subpop) vce(unconditional) post Predictive margins Number of obs = Subpop. no. of obs = Expression : Pr(duration_6), predict() Linearized Margin Std. Err. t P> t [95% Conf. Interval] poverty nlcom _b[4.poverty] / _b[1.poverty] _nl_1: _b[4.poverty] / _b[1.poverty] Coef. Std. Err. t P> t [95% Conf. Interval] _nl_

57 / /relative-riskabsolute-comic-health-medicalreporting.htm

58 Alcohol Use and Breast Cancer Appropriately interpreted as a 51% increase in breast cancer risk comparing 0 daily intake to 2+ drinks/day, translating to a 1.3% increase in the incidence of breast cancer over 10 years while the increased risk found in this study is real, it is quite small. Women will need to weigh this slight increase in breast cancer risk with the beneficial effects alcohol is known to have on heart heath, said Dr. Wendy Chen, of Brigham and Women's Hospital in Boston. Any woman's decision will likely factor in her risk of either disease, Chen said. MSNBC

LOGLINK Example #2. Using the 2006 National Health Interview Survey (NHIS), Predict Self-Reported Doctor s Visits During the Past 2 Weeks.

LOGLINK Example #2. Using the 2006 National Health Interview Survey (NHIS), Predict Self-Reported Doctor s Visits During the Past 2 Weeks. LOGLINK Example #2 SUDAAN Statements and Results Illustrated Log-linear regression modeling SEMETHOD REFLEVEL EFFECTS PREDMARG Input Data Set(s): PERSONSX.SAS7BDAT Example Using the 2006 National Health

More information

MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution

MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution MULTILOG Example #1 SUDAAN Statements and Results Illustrated Logistic regression modeling R and SEMETHOD options CONDMARG ADJRR option CATLEVEL Input Data Set(s): DARESSD Example Evaluate the effect of

More information

Logistic (RLOGIST) Example #2

Logistic (RLOGIST) Example #2 Logistic (RLOGIST) Example #2 SUDAAN Statements and Results Illustrated Zeger and Liang s SE method Naïve SE method Conditional marginals REFLEVEL SETENV Input Data Set(s): BRFWGTSAS7bdat Example Teratology

More information

MULTILOG Example #3. SUDAAN Statements and Results Illustrated. Input Data Set(s): IRONSUD.SSD. Example. Solution

MULTILOG Example #3. SUDAAN Statements and Results Illustrated. Input Data Set(s): IRONSUD.SSD. Example. Solution MULTILOG Example #3 SUDAAN Statements and Results Illustrated REFLEVEL CUMLOGIT option SETENV LEVELS WEIGHT Input Data Set(s): IRONSUD.SSD Example Using data from the NHANES I and its Longitudinal Follow-up

More information

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17 Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic

More information

SUDAAN Analysis Example Replication C6

SUDAAN Analysis Example Replication C6 SUDAAN Analysis Example Replication C6 * Sudaan Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 6 ; libname d "P:\ASDA 2\Data sets\nhanes 2011_2012\" ; ods graphics off

More information

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes

More information

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN 10.0.1 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017

Interpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017 Interpreting and Visualizing Regression models with Stata Margins and Marginsplot Boriana Pratt May 2017 Interpreting regression models Often regression results are presented in a table format, which makes

More information

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors. Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 Be sure to read the Stata Manual s

More information

APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS

APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS As mentioned in the section called "Organization and Use of the Data File," selected interval variables

More information

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white Appendix. Use these results to answer 2012 Midterm questions Dataset Description Data on 526 infants with very low (

More information

Multilevel/ Mixed Effects Models: A Brief Overview

Multilevel/ Mixed Effects Models: A Brief Overview Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually

More information

Logistic (RLOGIST) Example #4

Logistic (RLOGIST) Example #4 Logistic (RLOGIST) Example #4 SUDAAN Statements and Results Illustrated SEs by replicate method REPWGT EFFECTS EXP option REFLEVEL Input Data Set(s): NH3MI1.SAS7bdat - NH3MI5.SAS7bdat Example Using the

More information

Analyzing CHIS Data Using Stata

Analyzing CHIS Data Using Stata Analyzing CHIS Data Using Stata Christine Wells UCLA IDRE Statistical Consulting Group February 2014 Christine Wells Analyzing CHIS Data Using Stata 1/ 34 The variables bmi p: BMI povll2: Poverty level

More information

CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN 10.0.1 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)

More information

MNLM for Nominal Outcomes

MNLM for Nominal Outcomes MNLM for Nominal Outcomes Objectives Introduce the MNLM as an extension of the BLM Derive the model as a nonlinear probability model Illustrate the difficulties in interpretation due to the large number

More information

Survey commands in STATA

Survey commands in STATA Survey commands in STATA Carlo Azzarri DECRG Sample survey: Albania 2005 LSMS 4 strata (Central, Coastal, Mountain, Tirana) 455 Primary Sampling Units (PSU) 8 HHs by PSU * 455 = 3,640 HHs svy command:

More information

Categorical Data Analysis

Categorical Data Analysis Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical

More information

Tabulate and plot measures of association after restricted cubic spline models

Tabulate and plot measures of association after restricted cubic spline models Tabulate and plot measures of association after restricted cubic spline models Nicola Orsini Institute of Environmental Medicine Karolinska Institutet 3 rd Nordic and Baltic countries Stata Users Group

More information

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily

More information

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1 Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to

More information

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre

More information

Interactions made easy

Interactions made easy Interactions made easy André Charlett Neville Q Verlander Health Protection Agency Centre for Infections Motivation Scientific staff within institute using Stata to fit many types of regression models

More information

Applying Regression Analysis

Applying Regression Analysis Applying Regression Analysis Jean-Philippe Gauvin Université de Montréal January 7 2016 Goals for Today What is regression? How do we do it? First hour: OLS Bivariate regression Multiple regression Interactions

More information

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017

Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 These notes borrow heavily (sometimes verbatim) from Long &

More information

Logistic (RLOGIST) Example #9

Logistic (RLOGIST) Example #9 Logistic (RLOGIST) Example #9 SUDAAN Statements and Results Illustrated Calculation of response rates and standard errors PREDSTAT RESPRATE SETENV NEST Input Data Set(s): ELS.SAS7bdat Example Using data

More information

Week 10: Heteroskedasticity

Week 10: Heteroskedasticity Week 10: Heteroskedasticity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline The problem of (conditional)

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands

More information

Working with Stata Inference on proportions

Working with Stata Inference on proportions Working with Stata Inference on proportions Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Outline Inference on one population proportion Principle of maximum

More information

Logistic Regression Part II. Spring 2013 Biostat

Logistic Regression Part II. Spring 2013 Biostat Logistic Regression Part II Spring 2013 Biostat 513 132 Q: What is the relationship between one (or more) exposure variables, E, and a binary disease or illness outcome, Y, while adjusting for potential

More information

CROSSTAB Example #8. This example illustrates the variety of hypotheses and test statistics now available on the TEST statement in CROSSTAB.

CROSSTAB Example #8. This example illustrates the variety of hypotheses and test statistics now available on the TEST statement in CROSSTAB. CROSSTAB Example #8 SUDAAN Statements and Results Illustrated Stratum-specific Chi-square (CHISQ) Test Stratum-adjusted Cochran-Mantel-Haenszel (CMH) Test ANOVA-type (ACMH) Test ALL Test option DISPLAY

More information

Never Smokers Exposure Case Control Yes No

Never Smokers Exposure Case Control Yes No Question 0.4 Never Smokers Exosure Case Control Yes 33 7 50 No 86 4 597 29 428 647 OR^ Never Smokers (33)(4)/(7)(86) 4.29 Past or Present Smokers Exosure Case Control Yes 7 4 2 No 52 3 65 69 7 86 OR^ Smokers

More information

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Amber Barnato MD MPH MS University of Pittsburgh Scott Halpern MD PhD University of Pennsylvania Learning objectives 1. List

More information

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway. Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,

More information

* STATA.OUTPUT -- Chapter 5

* STATA.OUTPUT -- Chapter 5 * STATA.OUTPUT -- Chapter 5.*bwt/confounder example.infile bwt smk gest using bwt.data.correlate (obs=754) bwt smk gest -------------+----- bwt 1.0000 smk -0.1381 1.0000 gest 0.3629 0.0000 1.0000.regress

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

SAS program for Alcohol, Cigarette and Marijuana use for high school seniors:

SAS program for Alcohol, Cigarette and Marijuana use for high school seniors: SAS program for Alcohol, Cigarette and Marijuana use for high school seniors: options number date; data ; input $ $ $ count @@; datalines; 9 9 proc genmod data= order=data; class ; model count = / dist=poi

More information

Example Analysis with STATA

Example Analysis with STATA Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group

More information

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION PLS 802 Spring 2018 Professor Jacoby COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION This handout shows the log of a STATA session that compares alternative estimates of

More information

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap

Checking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap Checking the model Linearity Normality Constant variance Influential points Covariate overlap 1 Checking the model: linearity Average value of outcome initially assumed to be linear function of continuous

More information

Notes on PS2

Notes on PS2 17.871 - Notes on PS2 Mike Sances MIT April 2, 2012 Mike Sances (MIT) 17.871 - Notes on PS2 April 2, 2012 1 / 9 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs

More information

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups

Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 We saw that the

More information

A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing?

A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing? A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing? Frauke Kreuter & Richard Valliant Joint Program in Survey Methodology University of Maryland, College Park fkreuter@survey.umd.edu

More information

Getting Started With PROC LOGISTIC

Getting Started With PROC LOGISTIC Getting Started With PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. 19229 Sonoma Hwy. PMB 264 Sonoma, California 95476 707 996 7380 SierraInfo@aol.com www.sierrainformation.com Getting

More information

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro.

Biostatistics 208. Lecture 1: Overview & Linear Regression Intro. Biostatistics 208 Lecture 1: Overview & Linear Regression Intro. Steve Shiboski Division of Biostatistics, UCSF January 8, 2019 1 Organization Office hours by appointment (Mission Hall 2540) E-mail to

More information

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output.

All analysis examples presented can be done in Stata 10.1 and are included in this chapter s output. Chapter 9 Stata v10.1 Analysis Examples Syntax and Output General Notes on Stata 10.1 Given that this tool is used throughout the ASDA textbook this chapter includes only the syntax and output for the

More information

Correlated Random Effects Panel Data Models

Correlated Random Effects Panel Data Models NONLINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Why Nonlinear Models? 2. CRE versus

More information

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Lecture 2a: Model building I

Lecture 2a: Model building I Epidemiology/Biostats VHM 812/802 Course Winter 2015, Atlantic Veterinary College, PEI Javier Sanchez Lecture 2a: Model building I Index Page Predictors (X variables)...2 Categorical predictors...2 Indicator

More information

(LDA lecture 4/15/08: Transition model for binary data. -- TL)

(LDA lecture 4/15/08: Transition model for binary data. -- TL) (LDA lecture 4/5/08: Transition model for binary data -- TL) (updated 4/24/2008) log: G:\public_html\courses\LDA2008\Data\CTQ2log log type: text opened on: 5 Apr 2008, 2:27:54 *** read in data ******************************************************

More information

Compartmental Pharmacokinetic Analysis. Dr Julie Simpson

Compartmental Pharmacokinetic Analysis. Dr Julie Simpson Compartmental Pharmacokinetic Analysis Dr Julie Simpson Email: julieas@unimelb.edu.au BACKGROUND Describes how the drug concentration changes over time using physiological parameters. Gut compartment Absorption,

More information

THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures

THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures Katherine Baisden, SRI International, Menlo Park, California ABSTRACT Have you ever worked with survey data that are based

More information

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).)

Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) There are two data sets; each as the same treatment group of 185 men. JTRAIN2 includes 260

More information

Center for Demography and Ecology

Center for Demography and Ecology Center for Demography and Ecology University of Wisconsin-Madison A Comparative Evaluation of Selected Statistical Software for Computing Multinomial Models Nancy McDermott CDE Working Paper No. 95-01

More information

This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation indicator.

This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation indicator. Analysis Example-Stata 11.0 sgmediation Command with Survey Data Correction March 25, 2011 This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation

More information

Survey Data Analysis in Stata 10: Accessible and Comprehensive

Survey Data Analysis in Stata 10: Accessible and Comprehensive Survey Data Analysis in Stata 10: Accessible and Comprehensive Christine Wells Statistical Consulting Group Academic Technology Services University of California, Los Angeles Thursday, October 25, 2007

More information

Module 20 Case Studies in Longitudinal Data Analysis

Module 20 Case Studies in Longitudinal Data Analysis Module 20 Case Studies in Longitudinal Data Analysis Benjamin French, PhD Radiation Effects Research Foundation University of Pennsylvania SISCR 2016 July 29, 2016 Learning objectives This module will

More information

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use k:mydirectory, Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income

More information

Examples of Using Stata v11.0 with JRR replicate weights Provided in the NHANES data set

Examples of Using Stata v11.0 with JRR replicate weights Provided in the NHANES data set Examples of Using Stata v110 with JRR replicate weights Provided in the NHANES 1999-2000 data set This document is designed to illustrate comparisons of methods to use JRR replicate weights sometimes provided

More information

Categorical Data Analysis for Social Scientists

Categorical Data Analysis for Social Scientists Categorical Data Analysis for Social Scientists Brendan Halpin, Sociological Research Methods Cluster, Dept of Sociology, University of Limerick June 20-21 2016 Outline 1 Introduction 2 Logistic regression

More information

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments.

3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments. Lab Guide Written by Trent Mize for ICPSRCDA14 [Last updated: 17 July 2017] 1. The Lab Guide is divided into sections corresponding to class lectures. Each section should be reviewed before starting the

More information

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria

Multilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria Multilevel Mixed-Effects Generalized Linear Models in aaaa Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria SUMMARY - Theoretical Fundamentals of Multilevel Models. - Estimation of Multilevel Mixed-Effects

More information

THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures

THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures Katherine Baisden, SRI International, Menlo Park, California ABSTRACT Once upon a time in the days of

More information

Ille-et-Vilaine case-control study

Ille-et-Vilaine case-control study Ille-et-Vilaine case-control study Cases: 200 males diagnosed in one of regional hospitals in French department of Ille-et-Vilaine (Brittany) between Jan 1972 and Apr 1974 Controls: Random sample of 778

More information

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National

More information

Methods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Methods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University Methods for Multilevel Modeling and Design Effects Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a.

More information

Week 11: Collinearity

Week 11: Collinearity Week 11: Collinearity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Regression and holding other

More information

Introduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory

Introduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory Introduction to Survey Data Analysis Young Ik Cho, PhD Research Assistant Professor University of Illinois at Chicago Fall 2008 Focus of the Seminar Data Cleaning/Missing Data Sampling Bias Reduction When

More information

CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis

More information

Module 6 Case Studies in Longitudinal Data Analysis

Module 6 Case Studies in Longitudinal Data Analysis Module 6 Case Studies in Longitudinal Data Analysis Benjamin French, PhD Radiation Effects Research Foundation SISCR 2018 July 24, 2018 Learning objectives This module will focus on the design of longitudinal

More information

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2.

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. It is critical that when you run your own analyses, you generate your own syntax. Both of these procedures have very

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

Modeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Modeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University Modeling Contextual Data in the Add Health Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a. Multilevel

More information

Final Exam Spring Bread-and-Butter Edition

Final Exam Spring Bread-and-Butter Edition Final Exam Spring 1996 Bread-and-Butter Edition An advantage of the general linear model approach or the neoclassical approach used in Judd & McClelland (1989) is the ability to generate and test complex

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

MULTIPLE IMPUTATION. Adrienne D. Woods Methods Hour Brown Bag April 14, 2017

MULTIPLE IMPUTATION. Adrienne D. Woods Methods Hour Brown Bag April 14, 2017 MULTIPLE IMPUTATION Adrienne D. Woods Methods Hour Brown Bag April 14, 2017 A COLLECTIVIST APPROACH TO BEST PRACTICES As I began learning about MI last semester, I realized that there are a lot of guidelines

More information

Case study: Modelling berry yield through GLMMs

Case study: Modelling berry yield through GLMMs Case study: Modelling berry yield through GLMMs Jari Miina Finnish Forest Research Institute (Metla) European NWFPs network Action FP1203 www.nwfps.eu TRAINING SCHOOL Modelling NWFP El Escorial, 29 th

More information

Guideline on evaluating the impact of policies -Quantitative approach-

Guideline on evaluating the impact of policies -Quantitative approach- Guideline on evaluating the impact of policies -Quantitative approach- 1 2 3 1 The term treatment derives from the medical sciences and has more meaning when is used in that context. However, this term

More information

BIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points]

BIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points] Prof. Brent Coull TA Shira Mitchell BIO 226: Applied Longitudinal Analysis Homework 2 Solutions Due Thursday, February 21, 2013 [100 points] Purpose: To provide an introduction to the use of PROC MIXED

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis In any longitudinal analysis, we can distinguish between analyzing trends vs individual change that is, model

More information

Introduction to Survey Data Analysis

Introduction to Survey Data Analysis Introduction to Survey Data Analysis Young Cho at Chicago 1 The Circle of Research Process Theory Evaluation Real World Theory Hypotheses Test Hypotheses Data Collection Sample Operationalization/ Measurement

More information

*STATA.OUTPUT -- Chapter 13

*STATA.OUTPUT -- Chapter 13 *STATA.OUTPUT -- Chapter 13.*small example of rank sum test.input x grp x grp 1. 4 1 2. 35 1 3. 21 1 4. 28 1 5. 66 1 6. 10 2 7. 42 2 8. 71 2 9. 77 2 10. 90 2 11. end.ranksum x, by(grp) porder Two-sample

More information

Count model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa

Count model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa Count model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa Presented by Kodjo Kondo PhD Candidate, UNE Business School Supervisors Emeritus Prof. Euan

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4137 Applied Micro Econometrics Date of exam: Thursday, May 31, 2018 Grades are given: June 15, 2018 Time for exam: 09.00 to 12.00 The problem set covers

More information

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods Multilevel Cross-Classified and Multi-Membership Models Don Hedeker Division of Epidemiology & Biostatistics Institute for Health Research and Policy School of Public Health University of Illinois at Chicago

More information

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Statistics 101 Thomas Leininger June 18, 2013 Types of outliers in linear regression Types of outliers How do(es) the outlier(s) influence

More information

SUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below.

SUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below. 450-2 Winter 2008 Problem Set #1: SUGGESTED SOLUTIONS The results are attached below. 1. The balanced panel contains larger firms (sales 120-130% bigger than the full sample on average), which are more

More information

How to reduce bias in the estimates of count data regression? Ashwini Joshi Sumit Singh PhUSE 2015, Vienna

How to reduce bias in the estimates of count data regression? Ashwini Joshi Sumit Singh PhUSE 2015, Vienna How to reduce bias in the estimates of count data regression? Ashwini Joshi Sumit Singh PhUSE 2015, Vienna Precision Problem more less more bias less 2 Agenda Count Data Poisson Regression Maximum Likelihood

More information

Analysis of Longitudinal Survey Data

Analysis of Longitudinal Survey Data Analysis of Longitudinal Survey Data Introduction to Generalized Estimating Equations with Examples from the ITC Survey Pete Driezen June 13, 2016 Introduction To date, an ITC Survey has been conducted

More information

ESS Round 8 Sample Design Data File: User Guide

ESS Round 8 Sample Design Data File: User Guide ESS Round 8 Sample Design Data File: User Guide Peter Lynn INSTITUTE FOR SOCIAL AND ECONOMIC RESEARCH, UNIVERSITY OF ESSEX 07 February 2019 v2 Contents Page Number 1. Introduction 1 2. Variables 2 2.1

More information

********************************************************************************************** *******************************

********************************************************************************************** ******************************* 1 /* Workshop of impact evaluation MEASURE Evaluation-INSP, 2015*/ ********************************************************************************************** ******************************* DEMO: Propensity

More information

Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5

Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5 Software for the Statistical Analysis of Correlated Data Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5 by Gayle S. Bieler gbmac@rti.org Research Triangle Institute and

More information

I am an experienced SAS programmer but I have not used many SAS/STAT procedures

I am an experienced SAS programmer but I have not used many SAS/STAT procedures Which Proc Should I Learn First? A STAT Instructor s Top 5 Modeling Procedures Catherine Truxillo, Ph.D. Manager, Analytical Education SAS Copyright 2010, SAS Institute Inc. All rights reserved. The Target

More information

Valuation of Lost Productivity (VOLP) questionnaire and outcomes. The paid work productivity loss obtained from the VOLP included three components: 1)

Valuation of Lost Productivity (VOLP) questionnaire and outcomes. The paid work productivity loss obtained from the VOLP included three components: 1) APPENDIX. ADDITIONAL METHODS AND RESULTS Valuation of Lost Productivity (VOLP) questionnaire and outcomes The paid work productivity loss obtained from the VOLP included three components: 1) absenteeism:

More information

PSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508

PSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508 PSC 508 Jim Battista Univ. at Buffalo, SUNY Dummies Dummy variables Sometimes we want to include categorical variables in our models Numerical variables that don t necessarily have any inherent order and

More information

Appendix C: Lab Guide for Stata

Appendix C: Lab Guide for Stata Appendix C: Lab Guide for Stata 2011 1. The Lab Guide is divided into sections corresponding to class lectures. Each section includes both a review, which everyone should complete and an exercise, which

More information

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10 1 C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING CAS Annual Meeting November 7-10 GLM Process 2 Data Prep Model Form Validation Reduction Simplification Interactions GLM Process 3 Opportunities

More information