Calculating Absolute Rate Differences and Relative Rate Ratios in SAS/SUDAAN and STATA. Ashley Hirai, PhD May 20, 2014
|
|
- Suzan Casey
- 6 years ago
- Views:
Transcription
1 Calculating Absolute Rate Differences and Relative Rate Ratios in SAS/SUDAAN and STATA Ashley Hirai, PhD May 20, 2014
2 Outline Importance of absolute and relative measures Problems with odds ratios as a relative measure Estimating adjusted Rate Differences (RDs) and Rate Ratios (RRs) in SAS and STATA Additive and multiplicative interactions
3 Measures of Association Differences between groups (person, place, time) in an outcome can be assessed by simple differences (subtraction) or ratios (division) Absolute risk/rate difference (attributable risk) = P(outcome) 1 P(outcome) 2 Relative risk/rate ratio = P(outcome) 1 P(outcome) 2
4 Absolute Measures Absolute risk/prevalence differences carry advantage of assessing actual impact Potentially avertable or excess cases (attributable fractions) Number needed to treat (cost-benefit) Decomposition analyses (Kitagawa, PPOR) However, can be difficult to compare strength of association across risk factors, outcomes, etc
5 Relative Measures of Association Help to control or standardize assessments across indicators or person/place/time to compare strength or magnitude of association More typically used in etiologic studies However, can be misleading A doubling of risk sounds dramatic, RR=2 1% to 2%, RR=2 but absolute increase is 1%, still very unlikely to have outcome Y 30% to 60%, RR=2 but absolute increase is 30%, now more likely than not to have outcome Y
6 Absolute and Relative Measures Provide unique information and are complementary When evaluating the effect of a single factor within one group or time period, there is qualitative concordance A positive RD will correspond with RR>1 A negative RD will correspond with RR<1 However, indicators can be inconsistent when comparing the effect in two groups or time periods (interactions)
7 2) Ratio Measures Can t Be Easily Compared
8 = = = 28.8 per 100,000 population Black White = 10.3
9 Healthy People Decline in both absolute and relative differences is best evidence of progress in disparity elimination Relative measures of disparity are primary indicator of progress because they adjust for changes in the level of the reference point over time Relative measures also have advantage of adjusting for differences in reference point when comparisons are made across objectives Keppel KG, Pearcy JN, Klein RJ. Measuring progress in Healthy People Healthy People 2010 Stat Notes Sep;(25):1-16.
10 Additive versus Multiplicative Interaction Multiplicative interaction may be an extreme standard; cases where multiplicative interaction is not present but additive is with important public health implications Stroke Incidence per 1,000 Smoke - Smoke + Risk Difference Relative Risk OC Pill OC Pill Joint effects exhibit additive interaction: increase of 50 cases versus expected 30 Multiplicative interaction not present, 3*2=6, RR of 6 expected and observed
11 Why both absolute and relative measures matter Absolute measures quantify actual risks and number affected Necessary to evaluate/interpret the meaning of a given RR Relative measures allow standardized comparisons across groups, time periods, indicators Lack of correspondence for interactions creates controversy of which is better but they provide complementary information
12 Problems with the Odds Ratio and why it should be avoided Non-intuitive, difficult to communicate/interpret correctly Exaggeration of relative risk for common outcomes Breastfeeding, obesity, medical home Not collapsible across strata; crude OR < average of stratum-specific OR can lead to apparent (negative) confounding when none exists or positive confounding not to be detected
13 RR versus OR RR = P2 P1 OR = P2 (1 P2) P1 (1 P1) = P2 P1 *(1 P1) (1 P2) = RR*(1 P1) (1 P2) OR RR = (1 P1) (1 P2) For RRs>1, a doubling can occur When P1 is small and P2 is much greater (high effect size) For p1=.1, p2=(.1+1)/2=.55 ; RR=5.5; OR=11 As P1 increases, the distance to P2 doesn t have to be as large (high prevalence) For p1=.5, p2=(.5+1)/2=0.75; RR=1.5; OR=3 Basically, high prevalence in at least one strata
14 Estimation Options for Risk Differences and Risk Ratios Showing code in SAS and STATA Examples with non-sampled and complex survey data Acknowledgement: Jay Kaufman, PhD McGill University
15 Model Options 1) Linear Probability Model 2) Generalized Linear Model (Binomial, Poisson) 3) Logistic Model (probability conversions)
16 Simple Data Example Birth certificate data Outcome: Preterm Birth (<37 weeks gestation) Covariates: Smoking, race/ethnicity, maternal age Example applies to cohort or cross-sectional data generally and population-level (nonsampled) or simple random samples
17 1) Linear Probability Model: Advantages: Disadvantages: very easy to fit single uniform estimate of RD economists will love you possible to get impossible estimates does not directly estimate RR biostatisticians will hate you Fit an OLS linear regression on the binary outcome variable: Pr(Y=1 X=x) = β 0 + β 1 X Note: Homoskedasticity assumption cannot be met, since variance is a function of p. Therefore, use robust variance.
18 Linear Probability Model (OLS) proc surveyreg order=formatted; class racex; model ptb = smoke mager mager*mager racex /clparm solution; run; Regression Analysis for Dependent Variable ptb Estimated Regression Coefficients Standard 95% Confidence Parameter Estimate Error t Value Pr > t Interval Intercept < smoke < mager < mager*mager < racex a Non-Hispanic Black < racex b Hispanic < racex c Other racex d Non-Hispanic White Adjusted RD for smoking= (95% CI 0.028, 0.031)
19 regress ptb smoke c.mager##c.mager i.racex, vce(robust) cformat(%6.4f) Linear regression Number of obs = F( 6, ) = Prob > F = R-squared = Root MSE = Robust ptb Coef. Std. Err. t P> t [95% Conf. Interval] smoke , mager , c.mager# c.mager , racex , , , _cons , Adjusted RD for smoking = (95% CI: , )
20 2) Generalized Linear Model: Advantages: Disadvantages: single uniform estimate biostatisticians will love you can be difficult to fit still possible to get impossible values Fit a GLM with a binomial or Poisson distribution For RD: identity link For RR: log link g[pr(y=1 X=x)] = β 0 + β 1 X Generally fit Poisson when binomial fails to converge, must use robust standard errors due to binary data Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 2005 Aug 1;162(3):
21 Binomial Model Risk Difference, Identity Link proc genmod descending; class racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=bin link=identity; format racex racex.; run; Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Intercept smoke mager mager*mager racex a Non-Hispanic Black racex b Hispanic racex c Other racex d Non-Hispanic White Scale Adjusted RD for smoking = (95% CI , )
22 glm ptb smoke c.mager##c.mager i.racex, fam(b) lin(ident) binreg ptb smoke c.mager##c.mager i.racex, ml rd Generalized linear models No. of obs = Optimization : ML Residual df = Scale parameter = 1 Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = u [Identity] Log likelihood = OIM ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke , mager , c.mager# c.mager , racex , , , _cons , Coefficients are the risk differences.
23 Binomial Model Risk Ratio, Log Link proc genmod descending; class racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=bin link=log; estimate 'RR smoke' smoke 1; format racex racex.; run; Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Intercept smoke mager mager*mager racex a Non-Hispanic Black racex b Hispanic racex c Other racex d Non-Hispanic White Scale Contrast Estimate Results Mean Mean L'Beta Standard L'Beta Label Estimate Confidence Limits Estimate Error Alpha Confidence Limits RR smoke Adjusted RR for smoking = 1.31 (95% CI 1.30, 1.33)
24 glm ptb smoke c.mager##c.mager i.racex, fam(b) lin(log) eform binreg ptb smoke c.mager##c.mager i.racex, ml rr Generalized linear models No. of obs = Optimization : ML Residual df = Scale parameter = 1 Deviance = (1/df) Deviance = Pearson = (1/df) Pearson = Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u) [Log] AIC = Log likelihood = BIC = -2.93e+07 OIM ptb Risk Ratio Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager racex Adjusted RR for smoking = 1.31 (95% CI: 1.30, 1.33)
25 Additive Interaction proc genmod; class id smoke racex/param=ref ref=first; model ptb = smoke mager mager*mager racex smoke*racex/ dist=bin link=id; estimate 'smoking among NH Black' smoke 1 smoke*racex 1 0 0; format racex racex.; run; Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept <.0001 smoke <.0001 mager <.0001 mager*mager <.0001 racex a Non-Hispanic Black <.0001 racex b Hispanic <.0001 racex c Other <.0001 racex d Non-Hispanic White smoke*racex a Non-Hispanic Black smoke*racex b Hispanic smoke*racex c Other smoke*racex d Non-Hispanic White Scale Contrast Estimate Results Mean Mean L'Beta Standard L'Beta Chi- Label Estimate Confidence Limits Estimate Error Alpha Confidence Limits Square Pr > ChiSq smoking among NH Black <.0001 Effect of smoking greater among Black than White women
26 glm ptb smoke c.mager##c.mager i.racex smoke##i.racex, fam(bin) lin(id) binreg ptb smoke c.mager##c.mager i.racex, ml rd lincom smoke + 1.smoke#2.racex // smoking for Black women OIM ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager e racex smoke#racex _cons Coefficients are the risk differences. ( 1) smoke + 1.smoke#2.racex = 0 ptb Coef. Std. Err. z P> z [95% Conf. Interval] (1)
27 Multiplicative Interaction proc genmod; class racex/order=formatted; model ptb = smoke mager mager*mager racex smoke*racex/ dist=bin link=log; estimate 'smoking among White' smoke 1; estimate 'smoking among NH Black' smoke 1 smoke*racex 1 0 0; format racex racex.; Run; Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept <.0001 smoke <.0001 mager <.0001 mager*mager <.0001 racex a Non-Hispanic Black <.0001 racex b Hispanic <.0001 racex c Other racex d Non-Hispanic White smoke*racex a Non-Hispanic Black smoke*racex b Hispanic smoke*racex c Other smoke*racex d Non-Hispanic White Scale Contrast Estimate Results Mean Mean Label Estimate Confidence Limits smoking among White ssmoking among NH Black Additive but not multiplicative interaction
28 glm ptb smoke c.mager##c.mager i.racex smoke##i.racex, fam(bin) lin(log) binreg ptb smoke c.mager##c.mager i.racex, ml rr lincom smoke, eform // smoking for White lincom smoke + 1.smoke#2.racex, eform //smoking for Black OIM ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager racex smoke#racex _cons ptb exp(b) Std. Err. z P> z [95% Conf. Interval] Smoking for White Smoking for Black
29 If Binomial fails to converge, try starting with a negative intercept proc genmod data=ahs.sample_10; class racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=bin link=log intercept=-4; Run; binreg ptb smoke c.mager##c.mager i.racex, ml rr search Otherwise, try Modified Poisson less efficient but more likely to converge SAS: generate a unique id number in data step id=_n_; proc genmod; class id racex/order=formatted; model ptb = smoke mager mager*mager racex / dist=poi link=id; repeated subject=id / type=ind; run; glm ptb i.smoke c.mager##c.mager i.racex [freq=count], fam(p) lin(log) vce(robust)
30 3) Logistic Regression or Probit Regression Model: Advantages: Disadvantages: always fits easily can never get impossible estimates epidemiologists will love you does not give a single uniform estimate choose between different formulations Fit a standard logistic regression model: ln Pr(Y=1 X = x) ( 1-Pr(Y=1 X = x) ) = α + β1x then just obtain and contrast the predicted probabilities: Pr(Y=1 X ) e 1+ e ( α+ β1x) = x = ( α+ β x) 1
31 logit ptb smoke c.mager##c.mager i.racex [freq=count], nolog Logistic regression Number of obs = LR chi2(6) = Prob > chi2 = Log likelihood = Pseudo R2 = ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke mager c.mager# c.mager racex _cons Predicted probability of PTB for a 25 year old non-hispanic white woman smoker: (25*0.0984) + (25 *0.0020) e Pr(PTB=1 X = x) = 2 = (25*0.0984) + (25 *0.0020) 1 + e
32 But this is for a specific covariate pattern (in this case, unmarried NH-white women aged 25) Could evaluate the RD & RR holding all covariates at their means: marginal effect at the mean But there may be no one in the data set with this covariate combination and marginal effect - No woman is 52% White, 10% Black, 31% Hispanic or even 27.5 years old (integer year rather than exact age) Better alternative is to take the average of each individual RD, setting everyone to smoking and then no smoking (average marginal effect) - But generally only a small difference in large samples
33 But Stata has a handy utility that makes this easier: quietly logit ptb i.smoke c.mager##c.mager i.racex margins, dydx(smoke) dy/dx Delta-method SE z P> z [95% Conf. Int] smoke , Average individual adjusted RD = (95% CI: , ) quietly logit ptb i.smoke c.mager##c.mager i.racex margins, dydx(smoke) atmeans mager = (mean) 1.racex = (mean) 2.racex = (mean) 3.racex = (mean) 4.racex = (mean) Delta-method dy/dx Std. Err. z P> z [95% Conf. Interval] smoke Adjusted RD for the average woman = (95% CI: , )
34 Margins also works on sub-populations to look at additive interactions margins, dydx(smoke) over(racex) Average marginal effects Number of obs = Delta-method dy/dx Std. Err. z P> z [95% Conf. Int] smoke racex , , , , Note: dy/dx for factor levels is the discrete change from the base level. margins, dydx(smoke) atmeans over(racex) Conditional marginal effects Number of obs = Delta-method dy/dx Std. Err. z P> z [95% Conf. Interval] smoke racex Note: dy/dx for factor levels is the discrete change from the base level.
35 Test if NH Black RD is larger than the NH White RD: margins smoke, at(race=(1 2)) post Predictive margins Number of obs = Expression : Pr(ptb), predict() 1._at : racex = 1 2._at : racex = Delta-method Margin Std. Err. z P> z [95% Conf. Int] _at#smoke , , , , lincom (_b[2._at#1.smoke]-_b[2._at#0.smoke])-( _b[1._at#1.smoke]-_b[1._at#0.smoke]) Coef. Std. Err. z P> z [95% Conf. Int] (1) , test (_b[2._at#1.smoke]-_b[2._at#0.smoke]) = ( _b[1._at#1.smoke]-_b[1._at#0.smoke]) chi2( 1) = Prob > chi2 =
36 This is a different model, however, than one which includes a race x treatment interaction explicitly: logit ptb i.smoke##i.racex c.mager##c.mager, nolog Logistic regression Number of obs = ptb Coef. Std. Err. z P> z [95% Conf. Interval] smoke racex smoke#racex mager c.mager# c.mager _cons margins, dydx(smoke) at(race=(1 2)) Additive Interaction = Delta-method dy/dx Std. Err. z P> z [95% Conf. Int] smoke RD NHW , smoke RD NHB ,
37 Rate Ratio quietly logit ptb i.smoke c.mager##c.mager i.racex margins i.smoke, post Delta-method Margin Std. Err. z P> z [95% Conf. Interval] smoke nlcom _b[1.smoke] / _b[0.smoke] Coef. Std. Err. z P> z [95% Conf. Interval] _nl_ quietly logit ptb i.smoke c.mager##c.mager i.racex adjrr smoke // add-on called adjrr, computes adjusted RR and RD R1 = (0.0008) 95% CI (0.1262, ) R0 = (0.0002) 95% CI (0.0970, ) ARR = (0.0086) 95% CI (1.2943, ) ARD = (0.0008) 95% CI (0.0287, ) p-value (R0 = R1): p-value (ln(r1/r0) = 0): Norton et al. Computing adjusted risk ratios and risk differences in Stata. STATA Journal, 2013
38 SAS Logistic Average Marginal Effect Complicated macro but well annotated; must run one variable of interest at a time and rerun macro No possibility of assessing additive interaction Handles survey and non-survey data Need an IDNUMR in you data set to identify unique observations Inputs Key predictor of interest, contrast you want Covariate list (including that predictor) Specify categorical variables Outcome variable Kleinman LC, Norton EC. What's the Risk? A simple approach for estimating adjusted risk measures from nonlinear models including logistic regression. Health Serv Res Feb;44(1): lawrence.kleinman@mssm.edu
39 SAS Output The MEANS Procedure Results comparing smoke = 1 to smoke = 0 Variable Label Mean ARR Adjusted Risk Ratio SE_ARR Standard Error of ARR UCL_ARR 95% UCL ARR LCL_ARR 95% LCL ARR T_arr T statistic ARR P_ARR p-value ARR 6.90E-175 ARD Adjusted Risk Difference SE_ARD Standard Error ARD UCL_ARD 95% UCL ARD LCL_ARD 95% LCL ARD t_ard T statistic ARD P_ARD p-value ARD 2.97E-189 Risk0 Unexposed Risk SE0 Std Err Unexposed Risk 3.66E-07 Risk1 Exposed Risk SE1 Std Err Exposed Risk 6.92E-07
40 SAS-callable SUDAAN Logistic Model Average Marginal Effect (pred_eff) or Marginal Effect at Mean (cond_eff) Robust SEs even though we re using non-sampled data Also cannot assess additive interaction without including a multiplicative term PROC RLOGIST design=srs; class smoke racex /dir=descending; model ptb = smoke mager mager_2 racex; effects smoke=(1-1)/exp name="or:smoke"; predmarg smoke /adjrr; pred_eff smoke=(1-1)/name="rd:smoke"; rformat racex racex.; SETENV decwidth=4; run; Bieler GS, Brown GG, Williams RL, Brogan DJ. Estimating model-adjusted risks, risk differences, and risk ratios from complex survey data. Am J Epidemiol Mar 1;171(5):
41 Variance Estimation Method: Taylor Series (SRS) SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Logit Response variable PTB: PTB by: Independent Variables and Effects Variables and Lower 95% Upper 95% Effects Odds Ratio Limit OR Limit OR SMOKE Predicted Marginal Predicted Lower 95% Upper 95% #1 Marginal SE Limit Limit T:Marg= SMOKE Contrasted Predicted PREDMARG Marginal #1 Contrast SE T-Stat P-value RD:smoke Predicted Marginal PREDMARG Lower Upper Risk Ratio #1 Risk 95% 95% Ratio SE Limit Limit SMOKE 1 vs Same point estimates as in STATA but robust SEs PTB is not very common so OR is not greatly inflated but RR is more interpretable
42 PROC RLOGIST design=srs data=ahs.sper_example; class smoke racex /dir=descending; model ptb = smoke mager mager_2 racex smoke*racex; predmarg smoke*racex; pred_eff racex=( )*smoke=(1-1) /name="rd:smoke for non-hispanic White"; pred_eff racex=( )*smoke=(1-1) /name="rd:smoke for non-hispanic Black"; pred_eff racex=( )*smoke=(1-1) /name="rd:smoke for Black versus White"; rformat racex racex.; SETENV decwidth=4; run; Contrasted Predicted PREDMARG Marginal #1 Contrast SE T-Stat P-value RD:smoke for non- Hispanic White RD:smoke for non- Hispanic Black Contrasted Predicted PREDMARG Marginal #3 Contrast SE T-Stat P-value RD:smoke for Black versus White Slightly different than STATA output Still shows an additive interaction of about 0.01; greater effect for Black women
43 Complex Survey Example 2007 National Survey of Children s Health Design: Children sampled within State-level strata, weights to account for unequal probability of selection, non-response, and population totals Outcome: Breastfed to 6 months among subpopulation of children 6 months to 5 years Covariates: poverty (multiply imputed), race/ethnicity Direct models, logistic margins Interpretation of OR, RR, and RD
44 Common Outcome PROC CROSSTAB data = example design=wr; nest State idnumr; supopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; class breastfed duration_6; TABLE breastfed duration_6; PRINT nsum wsum rowper serow lowrow uprow /style=nchs nsumfmt=f10.0 wsumfmt=f10.0; Run; Variance Estimation Method: Taylor Series (WR) For Subpopulation: FLG_06_MNTH = 0 AND AGEYR_CHILD <= 5 by: Breastfed for 6 months Breastfed for 6 Lower Upper months 95% 95% Sample Weighted Row SE Row Limit Limit Size Size Percent Percent ROWPER ROWPER Total Prevalence of 45%, we will see inflated ORs
45 SAS: Linear Probability Model (OLS) PROC REGRESS DATA=mimp1 design=wr mi_count=5; nest State idnumr; subpopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; subgroup povl hisprace; levels 4 5; reflevel povl=1 hisprace=2; rformat povl povl. ; rformat hisprace hisprace.; model duration_6 = povl hisprace; run; Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data SE Method: Robust (Binder, 1983) Working Correlations: Independent Link Function: Identity Independent Variables and Beta Lower 95% Upper 95% Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B= Intercept HH Federal Poverty Level < 100% % % % Race/Ethnicity Hispanic NH white NH black NH multi nh other Or Use PROC SURVEYREG in SAS
46 duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] poverty hisprace _cons STATA: Linear Probability Model mi estimate: svy, subpop(subpop): regress duration_6 i.poverty ib2.hisprace Multiple-imputation estimates Imputations = 5 Survey: Linear regression Number of obs = Number of strata = 51 Population size = Number of PSUs = Subpop. no. of obs = Subpop. size = Average RVI = Complete DF = DF adjustment: Small sample DF: min = avg = max = Model F test: Equal FMI F( 7, ) = Within VCE type: Linearized Prob > F =
47 SAS: Generalized Linear Model (GLM) Poisson with log link may be only SUDAAN option, so RRs only No SAS survey procedure PROC LOGLINK DATA=mimp1 design=wr mi_count=5; nest State idnumr; subpopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; subgroup povl hisprace; levels 4 5; reflevel povl=1 hisprace=2; rformat povl povl. ; rformat hisprace hisprace.; model duration_6 = povl hisprace; run; Independent Incidence Variables and Density Lower 95% Upper 95% Effects Ratio Limit IDR Limit IDR Intercept HH Federal Poverty Level < 100% % % % Race/Ethnicity Hispanic NH white NH black NH multi nh other
48 STATA: Generalized Linear Model mi estimate: svy, subpop(subpop): glm duration_6 i.poverty ib2.hisprace, family(bin) link(identity) Multiple-imputation estimates Imputations = 5 Survey: Generalized linear models Number of obs = Number of strata = 51 Population size = Number of PSUs = Subpop. no. of obs = Subpop. size = Average RVI = Complete DF = DF adjustment: Small sample DF: min = avg = Within VCE type: Linearized max = duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] poverty hisprace _cons
49 STATA: Generalized Linear Model mi estimate, saving (miest): svy, subpop(subpop): glm duration_6 i.poverty ib2.hisprace, family(bin) link(log) mi estimate (rr: exp(_b[4.poverty])) using miest duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] poverty hisprace _cons Transformations rr: exp(_b[4.poverty]) duration_6 Coef. Std. Err. t P> t [95% Conf. Interval] rr
50 SAS: Logistic Model SUDAAN predmarg, pred_eff For SAS only, try new macro PROC RLOGIST DATA=mimp1 design=wr mi_count=5; nest State idnumr; subpopn FLG_06_MNTH=0 and ageyr_child<=5; WEIGHT NSCHWT; subgroup povl hisprace; levels 4 5; reflevel povl=1 hisprace=2; rformat povl povl. ; rformat hisprace hisprace.; model duration_6 = povl hisprace ; predmarg povl(1)/adjrr; predmarg hisprace(2)/adjrr; pred_eff povl=( )/name="RD: %FPL v. <100% FPL"; pred_eff povl=( )/name="RD: %FPL v. <100% FPL"; pred_eff povl=( )/name="RD: 400%+ FPL v. <100% FPL"; pred_eff hisprace=( )/name="RD: NH Black v. NH White"; pred_eff hisprace=( )/name="RD: Hispanic v. NH White"; run;
51 Risk Difference: Poverty Predicted Marginal Predicted Lower 95% Upper 95% #1 Marginal SE Limit Limit T:Marg= HH Federal Poverty Level < 100% % % % Contrasted Predicted PREDMARG Marginal #2 Contrast SE T-Stat P-value RD: %FPL v. <100% FPL RD: 400%+ FPL v. <100% FPL
52 Advantage of Absolute Scale Can calculate actual numbers affected, excess cases attributable to a factor Risk Difference x Number with factor = excess cases Excess cases / Total cases = PAF Weighted N for children <100% FPL is 5.1 million If children <100%FPL had same probability of being breastfed to 6 months as children 400%+, 0.18*5.1 = 0.9 million more children would have been breastfed to 6 months
53 OR versus RR: Poverty Independent Variables and Lower 95% Upper 95% Effects Odds Ratio Limit OR Limit OR Intercept HH Federal Poverty Level < 100% % % % Predicted Marginal PREDMARG Lower Upper Risk Ratio #1 Risk 95% 95% Ratio SE Limit Limit HH Federal Poverty Level % vs. <100% % vs. <100% % vs. < 100% Excess risk estimate is doubled for OR versus RR (~100% v. 50% for 400%+ Poverty)
54 STATA: Logistic Model Margins command can t be used with multiple imputation so select a single imputation mi extract 1 svy, subpop(subpop): logistic duration_6 i.poverty ib2.hisprace Survey: Logistic regression Number of strata = 51 Number of obs = Number of PSUs = Population size = Subpop. no. of obs = Subpop. size = Design df = F( 7, 90861) = Prob > F = Linearized duration_6 Odds Ratio Std. Err. t P> t [95% Conf. Interval] poverty hisprace
55 STATA Logistic: Rate Difference - Use margins with the subpop since analyzing a subset of total sample (age<=5) - Use vce(unconditional) to adjust SEs for survey design svy, subpop(subpop): logistic duration_6 i.poverty ib2.hisprace margins, subpop(subpop) dydx(*) vce(unconditional) Average marginal effects Number of obs = Subpop. no. of obs = Expression : Pr(duration_6), predict() dy/dx w.r.t. : 2.poverty 3.poverty 4.poverty 1.hisprace 3.hisprace 4.hisprace 5.hisprace Linearized dy/dx Std. Err. t P> t [95% Conf. Interval] poverty hisprace Note: dy/dx for factor levels is the discrete change from the base level.
56 STATA Logistic: Rate Ratio svy, subpop(subpop): logistic duration_6 i.poverty ib2.hisprace margins poverty, subpop(subpop) vce(unconditional) post Predictive margins Number of obs = Subpop. no. of obs = Expression : Pr(duration_6), predict() Linearized Margin Std. Err. t P> t [95% Conf. Interval] poverty nlcom _b[4.poverty] / _b[1.poverty] _nl_1: _b[4.poverty] / _b[1.poverty] Coef. Std. Err. t P> t [95% Conf. Interval] _nl_
57 / /relative-riskabsolute-comic-health-medicalreporting.htm
58 Alcohol Use and Breast Cancer Appropriately interpreted as a 51% increase in breast cancer risk comparing 0 daily intake to 2+ drinks/day, translating to a 1.3% increase in the incidence of breast cancer over 10 years while the increased risk found in this study is real, it is quite small. Women will need to weigh this slight increase in breast cancer risk with the beneficial effects alcohol is known to have on heart heath, said Dr. Wendy Chen, of Brigham and Women's Hospital in Boston. Any woman's decision will likely factor in her risk of either disease, Chen said. MSNBC
LOGLINK Example #2. Using the 2006 National Health Interview Survey (NHIS), Predict Self-Reported Doctor s Visits During the Past 2 Weeks.
LOGLINK Example #2 SUDAAN Statements and Results Illustrated Log-linear regression modeling SEMETHOD REFLEVEL EFFECTS PREDMARG Input Data Set(s): PERSONSX.SAS7BDAT Example Using the 2006 National Health
More informationMULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution
MULTILOG Example #1 SUDAAN Statements and Results Illustrated Logistic regression modeling R and SEMETHOD options CONDMARG ADJRR option CATLEVEL Input Data Set(s): DARESSD Example Evaluate the effect of
More informationLogistic (RLOGIST) Example #2
Logistic (RLOGIST) Example #2 SUDAAN Statements and Results Illustrated Zeger and Liang s SE method Naïve SE method Conditional marginals REFLEVEL SETENV Input Data Set(s): BRFWGTSAS7bdat Example Teratology
More informationMULTILOG Example #3. SUDAAN Statements and Results Illustrated. Input Data Set(s): IRONSUD.SSD. Example. Solution
MULTILOG Example #3 SUDAAN Statements and Results Illustrated REFLEVEL CUMLOGIT option SETENV LEVELS WEIGHT Input Data Set(s): IRONSUD.SSD Example Using data from the NHANES I and its Longitudinal Follow-up
More informationTopics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17
Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic
More informationSUDAAN Analysis Example Replication C6
SUDAAN Analysis Example Replication C6 * Sudaan Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 6 ; libname d "P:\ASDA 2\Data sets\nhanes 2011_2012\" ; ods graphics off
More informationUsing Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015
Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes
More informationCHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN
CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN 10.0.1 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis
More informationInterpreting and Visualizing Regression models with Stata Margins and Marginsplot. Boriana Pratt May 2017
Interpreting and Visualizing Regression models with Stata Margins and Marginsplot Boriana Pratt May 2017 Interpreting regression models Often regression results are presented in a table format, which makes
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 20, 2018 Be sure to read the Stata Manual s
More informationAPPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS
APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS As mentioned in the section called "Organization and Use of the Data File," selected interval variables
More informationBios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white
Appendix. Use these results to answer 2012 Midterm questions Dataset Description Data on 526 infants with very low (
More informationMultilevel/ Mixed Effects Models: A Brief Overview
Multilevel/ Mixed Effects Models: A Brief Overview Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised March 27, 2018 These notes borrow very heavily, often/usually
More informationLogistic (RLOGIST) Example #4
Logistic (RLOGIST) Example #4 SUDAAN Statements and Results Illustrated SEs by replicate method REPWGT EFFECTS EXP option REFLEVEL Input Data Set(s): NH3MI1.SAS7bdat - NH3MI5.SAS7bdat Example Using the
More informationAnalyzing CHIS Data Using Stata
Analyzing CHIS Data Using Stata Christine Wells UCLA IDRE Statistical Consulting Group February 2014 Christine Wells Analyzing CHIS Data Using Stata 1/ 34 The variables bmi p: BMI povll2: Poverty level
More informationCHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN
CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN 10.0.1 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis
More informationCHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2
CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2 GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis
More informationUnit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version
Unit 5 Logistic Regression Homework #7 Practice Problems SOLUTIONS Stata version Before You Begin Download STATA data set illeetvilaine.dta from the course website page, ASSIGNMENTS (Homeworks and Exams)
More informationMNLM for Nominal Outcomes
MNLM for Nominal Outcomes Objectives Introduce the MNLM as an extension of the BLM Derive the model as a nonlinear probability model Illustrate the difficulties in interpretation due to the large number
More informationSurvey commands in STATA
Survey commands in STATA Carlo Azzarri DECRG Sample survey: Albania 2005 LSMS 4 strata (Central, Coastal, Mountain, Tirana) 455 Primary Sampling Units (PSU) 8 HHs by PSU * 455 = 3,640 HHs svy command:
More informationCategorical Data Analysis
Categorical Data Analysis Hsueh-Sheng Wu Center for Family and Demographic Research October 4, 200 Outline What are categorical variables? When do we need categorical data analysis? Some methods for categorical
More informationTabulate and plot measures of association after restricted cubic spline models
Tabulate and plot measures of association after restricted cubic spline models Nicola Orsini Institute of Environmental Medicine Karolinska Institutet 3 rd Nordic and Baltic countries Stata Users Group
More informationLogistic Regression, Part III: Hypothesis Testing, Comparisons to OLS
Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily
More informationThe study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1
Soc 73994, Homework #2: Basics of Logistic Regression Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised January 14, 2018 All answers should be typed and mailed to
More informationDealing with missing data in practice: Methods, applications, and implications for HIV cohort studies
Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies Belen Alejos Ferreras Centro Nacional de Epidemiología Instituto de Salud Carlos III 19 de Octubre
More informationInteractions made easy
Interactions made easy André Charlett Neville Q Verlander Health Protection Agency Centre for Infections Motivation Scientific staff within institute using Stata to fit many types of regression models
More informationApplying Regression Analysis
Applying Regression Analysis Jean-Philippe Gauvin Université de Montréal January 7 2016 Goals for Today What is regression? How do we do it? First hour: OLS Bivariate regression Multiple regression Interactions
More informationPost-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017
Post-Estimation Commands for MLogit Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 13, 2017 These notes borrow heavily (sometimes verbatim) from Long &
More informationLogistic (RLOGIST) Example #9
Logistic (RLOGIST) Example #9 SUDAAN Statements and Results Illustrated Calculation of response rates and standard errors PREDSTAT RESPRATE SETENV NEST Input Data Set(s): ELS.SAS7bdat Example Using data
More informationWeek 10: Heteroskedasticity
Week 10: Heteroskedasticity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline The problem of (conditional)
More informationStatistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models
Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Modelling categorical variables using logit models Software commands
More informationWorking with Stata Inference on proportions
Working with Stata Inference on proportions Nicola Orsini Biostatistics Team Department of Public Health Sciences Karolinska Institutet Outline Inference on one population proportion Principle of maximum
More informationLogistic Regression Part II. Spring 2013 Biostat
Logistic Regression Part II Spring 2013 Biostat 513 132 Q: What is the relationship between one (or more) exposure variables, E, and a binary disease or illness outcome, Y, while adjusting for potential
More informationCROSSTAB Example #8. This example illustrates the variety of hypotheses and test statistics now available on the TEST statement in CROSSTAB.
CROSSTAB Example #8 SUDAAN Statements and Results Illustrated Stratum-specific Chi-square (CHISQ) Test Stratum-adjusted Cochran-Mantel-Haenszel (CMH) Test ANOVA-type (ACMH) Test ALL Test option DISPLAY
More informationNever Smokers Exposure Case Control Yes No
Question 0.4 Never Smokers Exosure Case Control Yes 33 7 50 No 86 4 597 29 428 647 OR^ Never Smokers (33)(4)/(7)(86) 4.29 Past or Present Smokers Exosure Case Control Yes 7 4 2 No 52 3 65 69 7 86 OR^ Smokers
More informationFoley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling
Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling Amber Barnato MD MPH MS University of Pittsburgh Scott Halpern MD PhD University of Pennsylvania Learning objectives 1. List
More informationStatistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.
Statistical Modelling for Business and Management J.E. Cairnes School of Business & Economics National University of Ireland Galway June 28 30, 2010 Graeme Hutcheson, University of Manchester Luiz Moutinho,
More information* STATA.OUTPUT -- Chapter 5
* STATA.OUTPUT -- Chapter 5.*bwt/confounder example.infile bwt smk gest using bwt.data.correlate (obs=754) bwt smk gest -------------+----- bwt 1.0000 smk -0.1381 1.0000 gest 0.3629 0.0000 1.0000.regress
More informationExample Analysis with STATA
Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group
More informationSAS program for Alcohol, Cigarette and Marijuana use for high school seniors:
SAS program for Alcohol, Cigarette and Marijuana use for high school seniors: options number date; data ; input $ $ $ count @@; datalines; 9 9 proc genmod data= order=data; class ; model count = / dist=poi
More informationExample Analysis with STATA
Example Analysis with STATA Exploratory Data Analysis Means and Variance by Time and Group Correlation Individual Series Derived Variable Analysis Fitting a Line to Each Subject Summarizing Slopes by Group
More informationCOMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION
PLS 802 Spring 2018 Professor Jacoby COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION This handout shows the log of a STATA session that compares alternative estimates of
More informationChecking the model. Linearity. Normality. Constant variance. Influential points. Covariate overlap
Checking the model Linearity Normality Constant variance Influential points Covariate overlap 1 Checking the model: linearity Average value of outcome initially assumed to be linear function of continuous
More informationNotes on PS2
17.871 - Notes on PS2 Mike Sances MIT April 2, 2012 Mike Sances (MIT) 17.871 - Notes on PS2 April 2, 2012 1 / 9 Interpreting Regression: Coecient regress success_rate dist Source SS df MS Number of obs
More informationGroup Comparisons: Using What If Scenarios to Decompose Differences Across Groups
Group Comparisons: Using What If Scenarios to Decompose Differences Across Groups Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 15, 2015 We saw that the
More informationA Survey on Survey Statistics: What is done, can be done in Stata, and what s missing?
A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing? Frauke Kreuter & Richard Valliant Joint Program in Survey Methodology University of Maryland, College Park fkreuter@survey.umd.edu
More informationGetting Started With PROC LOGISTIC
Getting Started With PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. 19229 Sonoma Hwy. PMB 264 Sonoma, California 95476 707 996 7380 SierraInfo@aol.com www.sierrainformation.com Getting
More informationBiostatistics 208. Lecture 1: Overview & Linear Regression Intro.
Biostatistics 208 Lecture 1: Overview & Linear Regression Intro. Steve Shiboski Division of Biostatistics, UCSF January 8, 2019 1 Organization Office hours by appointment (Mission Hall 2540) E-mail to
More informationAll analysis examples presented can be done in Stata 10.1 and are included in this chapter s output.
Chapter 9 Stata v10.1 Analysis Examples Syntax and Output General Notes on Stata 10.1 Given that this tool is used throughout the ASDA textbook this chapter includes only the syntax and output for the
More informationCorrelated Random Effects Panel Data Models
NONLINEAR MODELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M. Wooldridge Michigan State University 1. Why Nonlinear Models? 2. CRE versus
More informationCHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware
CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis
More informationLecture 2a: Model building I
Epidemiology/Biostats VHM 812/802 Course Winter 2015, Atlantic Veterinary College, PEI Javier Sanchez Lecture 2a: Model building I Index Page Predictors (X variables)...2 Categorical predictors...2 Indicator
More information(LDA lecture 4/15/08: Transition model for binary data. -- TL)
(LDA lecture 4/5/08: Transition model for binary data -- TL) (updated 4/24/2008) log: G:\public_html\courses\LDA2008\Data\CTQ2log log type: text opened on: 5 Apr 2008, 2:27:54 *** read in data ******************************************************
More informationCompartmental Pharmacokinetic Analysis. Dr Julie Simpson
Compartmental Pharmacokinetic Analysis Dr Julie Simpson Email: julieas@unimelb.edu.au BACKGROUND Describes how the drug concentration changes over time using physiological parameters. Gut compartment Absorption,
More informationTHE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures
THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures Katherine Baisden, SRI International, Menlo Park, California ABSTRACT Have you ever worked with survey data that are based
More informationApplication: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).)
Application: Effects of Job Training Program (Data are the Dehejia and Wahba (1999) version of Lalonde (1986).) There are two data sets; each as the same treatment group of 185 men. JTRAIN2 includes 260
More informationCenter for Demography and Ecology
Center for Demography and Ecology University of Wisconsin-Madison A Comparative Evaluation of Selected Statistical Software for Computing Multinomial Models Nancy McDermott CDE Working Paper No. 95-01
More informationThis example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation indicator.
Analysis Example-Stata 11.0 sgmediation Command with Survey Data Correction March 25, 2011 This example demonstrates the use of the Stata 11.1 sgmediation command with survey correction and a subpopulation
More informationSurvey Data Analysis in Stata 10: Accessible and Comprehensive
Survey Data Analysis in Stata 10: Accessible and Comprehensive Christine Wells Statistical Consulting Group Academic Technology Services University of California, Los Angeles Thursday, October 25, 2007
More informationModule 20 Case Studies in Longitudinal Data Analysis
Module 20 Case Studies in Longitudinal Data Analysis Benjamin French, PhD Radiation Effects Research Foundation University of Pennsylvania SISCR 2016 July 29, 2016 Learning objectives This module will
More informationTable. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,
Table XTMIXED Procedure in STATA with Output Systolic Blood Pressure, 2001. use "k:mydirectory,. xtmixed sbp nage20 nage30 nage40 nage50 nage70 nage80 nage90 winter male dept2 edu_bachelor median_household_income
More informationExamples of Using Stata v11.0 with JRR replicate weights Provided in the NHANES data set
Examples of Using Stata v110 with JRR replicate weights Provided in the NHANES 1999-2000 data set This document is designed to illustrate comparisons of methods to use JRR replicate weights sometimes provided
More informationCategorical Data Analysis for Social Scientists
Categorical Data Analysis for Social Scientists Brendan Halpin, Sociological Research Methods Cluster, Dept of Sociology, University of Limerick June 20-21 2016 Outline 1 Introduction 2 Logistic regression
More information3. The lab guide uses the data set cda_scireview3.dta. These data cannot be used to complete assignments.
Lab Guide Written by Trent Mize for ICPSRCDA14 [Last updated: 17 July 2017] 1. The Lab Guide is divided into sections corresponding to class lectures. Each section should be reviewed before starting the
More informationMultilevel Mixed-Effects Generalized Linear Models. Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria
Multilevel Mixed-Effects Generalized Linear Models in aaaa Prof. Dr. Luiz Paulo Fávero Prof. Dr. Matheus Albergaria SUMMARY - Theoretical Fundamentals of Multilevel Models. - Estimation of Multilevel Mixed-Effects
More informationTHE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures
THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures Katherine Baisden, SRI International, Menlo Park, California ABSTRACT Once upon a time in the days of
More informationIlle-et-Vilaine case-control study
Ille-et-Vilaine case-control study Cases: 200 males diagnosed in one of regional hospitals in French department of Ille-et-Vilaine (Brittany) between Jan 1972 and Apr 1974 Controls: Random sample of 778
More informationUnit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users
Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users Data Set for this Assignment: Download from the course website: Stata Users: framingham_1000.dta Source: Levy (1999) National
More informationMethods for Multilevel Modeling and Design Effects. Sharon L. Christ Departments of HDFS and Statistics Purdue University
Methods for Multilevel Modeling and Design Effects Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a.
More informationWeek 11: Collinearity
Week 11: Collinearity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Regression and holding other
More informationIntroduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory
Introduction to Survey Data Analysis Young Ik Cho, PhD Research Assistant Professor University of Illinois at Chicago Fall 2008 Focus of the Seminar Data Cleaning/Missing Data Sampling Bias Reduction When
More informationCHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN
CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures for analysis
More informationModule 6 Case Studies in Longitudinal Data Analysis
Module 6 Case Studies in Longitudinal Data Analysis Benjamin French, PhD Radiation Effects Research Foundation SISCR 2018 July 24, 2018 Learning objectives This module will focus on the design of longitudinal
More informationThis is a quick-and-dirty example for some syntax and output from pscore and psmatch2.
This is a quick-and-dirty example for some syntax and output from pscore and psmatch2. It is critical that when you run your own analyses, you generate your own syntax. Both of these procedures have very
More informationECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011
ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each
More informationModeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University
Modeling Contextual Data in the Add Health Sharon L. Christ Departments of HDFS and Statistics Purdue University Talk Outline 1. Review of Add Health Sample Design 2. Modeling Add Health Data a. Multilevel
More informationFinal Exam Spring Bread-and-Butter Edition
Final Exam Spring 1996 Bread-and-Butter Edition An advantage of the general linear model approach or the neoclassical approach used in Judd & McClelland (1989) is the ability to generate and test complex
More informationLab 1: A review of linear models
Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need
More informationMULTIPLE IMPUTATION. Adrienne D. Woods Methods Hour Brown Bag April 14, 2017
MULTIPLE IMPUTATION Adrienne D. Woods Methods Hour Brown Bag April 14, 2017 A COLLECTIVIST APPROACH TO BEST PRACTICES As I began learning about MI last semester, I realized that there are a lot of guidelines
More informationCase study: Modelling berry yield through GLMMs
Case study: Modelling berry yield through GLMMs Jari Miina Finnish Forest Research Institute (Metla) European NWFPs network Action FP1203 www.nwfps.eu TRAINING SCHOOL Modelling NWFP El Escorial, 29 th
More informationGuideline on evaluating the impact of policies -Quantitative approach-
Guideline on evaluating the impact of policies -Quantitative approach- 1 2 3 1 The term treatment derives from the medical sciences and has more meaning when is used in that context. However, this term
More informationBIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points]
Prof. Brent Coull TA Shira Mitchell BIO 226: Applied Longitudinal Analysis Homework 2 Solutions Due Thursday, February 21, 2013 [100 points] Purpose: To provide an introduction to the use of PROC MIXED
More informationSOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis
SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis In any longitudinal analysis, we can distinguish between analyzing trends vs individual change that is, model
More informationIntroduction to Survey Data Analysis
Introduction to Survey Data Analysis Young Cho at Chicago 1 The Circle of Research Process Theory Evaluation Real World Theory Hypotheses Test Hypotheses Data Collection Sample Operationalization/ Measurement
More information*STATA.OUTPUT -- Chapter 13
*STATA.OUTPUT -- Chapter 13.*small example of rank sum test.input x grp x grp 1. 4 1 2. 35 1 3. 21 1 4. 28 1 5. 66 1 6. 10 2 7. 42 2 8. 71 2 9. 77 2 10. 90 2 11. end.ranksum x, by(grp) porder Two-sample
More informationCount model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa
Count model selection and post-estimation to evaluate composite flour technology adoption in Senegal-West Africa Presented by Kodjo Kondo PhD Candidate, UNE Business School Supervisors Emeritus Prof. Euan
More informationUNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS
UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4137 Applied Micro Econometrics Date of exam: Thursday, May 31, 2018 Grades are given: June 15, 2018 Time for exam: 09.00 to 12.00 The problem set covers
More informationNested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods
Multilevel Cross-Classified and Multi-Membership Models Don Hedeker Division of Epidemiology & Biostatistics Institute for Health Research and Policy School of Public Health University of Illinois at Chicago
More informationUnit 6: Simple Linear Regression Lecture 2: Outliers and inference
Unit 6: Simple Linear Regression Lecture 2: Outliers and inference Statistics 101 Thomas Leininger June 18, 2013 Types of outliers in linear regression Types of outliers How do(es) the outlier(s) influence
More informationSUGGESTED SOLUTIONS Winter Problem Set #1: The results are attached below.
450-2 Winter 2008 Problem Set #1: SUGGESTED SOLUTIONS The results are attached below. 1. The balanced panel contains larger firms (sales 120-130% bigger than the full sample on average), which are more
More informationHow to reduce bias in the estimates of count data regression? Ashwini Joshi Sumit Singh PhUSE 2015, Vienna
How to reduce bias in the estimates of count data regression? Ashwini Joshi Sumit Singh PhUSE 2015, Vienna Precision Problem more less more bias less 2 Agenda Count Data Poisson Regression Maximum Likelihood
More informationAnalysis of Longitudinal Survey Data
Analysis of Longitudinal Survey Data Introduction to Generalized Estimating Equations with Examples from the ITC Survey Pete Driezen June 13, 2016 Introduction To date, an ITC Survey has been conducted
More informationESS Round 8 Sample Design Data File: User Guide
ESS Round 8 Sample Design Data File: User Guide Peter Lynn INSTITUTE FOR SOCIAL AND ECONOMIC RESEARCH, UNIVERSITY OF ESSEX 07 February 2019 v2 Contents Page Number 1. Introduction 1 2. Variables 2 2.1
More information********************************************************************************************** *******************************
1 /* Workshop of impact evaluation MEASURE Evaluation-INSP, 2015*/ ********************************************************************************************** ******************************* DEMO: Propensity
More informationAnalyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5
Software for the Statistical Analysis of Correlated Data Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5 by Gayle S. Bieler gbmac@rti.org Research Triangle Institute and
More informationI am an experienced SAS programmer but I have not used many SAS/STAT procedures
Which Proc Should I Learn First? A STAT Instructor s Top 5 Modeling Procedures Catherine Truxillo, Ph.D. Manager, Analytical Education SAS Copyright 2010, SAS Institute Inc. All rights reserved. The Target
More informationValuation of Lost Productivity (VOLP) questionnaire and outcomes. The paid work productivity loss obtained from the VOLP included three components: 1)
APPENDIX. ADDITIONAL METHODS AND RESULTS Valuation of Lost Productivity (VOLP) questionnaire and outcomes The paid work productivity loss obtained from the VOLP included three components: 1) absenteeism:
More informationPSC 508. Jim Battista. Dummies. Univ. at Buffalo, SUNY. Jim Battista PSC 508
PSC 508 Jim Battista Univ. at Buffalo, SUNY Dummies Dummy variables Sometimes we want to include categorical variables in our models Numerical variables that don t necessarily have any inherent order and
More informationAppendix C: Lab Guide for Stata
Appendix C: Lab Guide for Stata 2011 1. The Lab Guide is divided into sections corresponding to class lectures. Each section includes both a review, which everyone should complete and an exercise, which
More informationC-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10
1 C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING CAS Annual Meeting November 7-10 GLM Process 2 Data Prep Model Form Validation Reduction Simplification Interactions GLM Process 3 Opportunities
More information