MULTILOG Example #3. SUDAAN Statements and Results Illustrated. Input Data Set(s): IRONSUD.SSD. Example. Solution

Similar documents
MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution

Analyzing Repeated Measures and Cluster-Correlated Data Using SUDAAN Release 7.5

Logistic (RLOGIST) Example #2

LOGLINK Example #2. Using the 2006 National Health Interview Survey (NHIS), Predict Self-Reported Doctor s Visits During the Past 2 Weeks.

Logistic (RLOGIST) Example #9

CROSSTAB Example #8. This example illustrates the variety of hypotheses and test statistics now available on the TEST statement in CROSSTAB.

SUDAAN Analysis Example Replication C6

Logistic (RLOGIST) Example #4

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

CHAPTER 11 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

THE CONTINUING QUANDARY OF SURVEY DATA PART II: Comparison of SAS Procedures and SUDAAN Procedures

APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS

SUDAAN Analysis Example Replication C5

CHAPTER 5 ASDA ANALYSIS EXAMPLES REPLICATION-SUDAAN

THE QUANDARY OF SURVEY DATA: Comparison of SAS Procedures and SUDAAN Procedures

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

SAS program for Alcohol, Cigarette and Marijuana use for high school seniors:

A Survey on Survey Statistics: What is done, can be done in Stata, and what s missing?

Getting Started with HLM 5. For Windows

The SAS System 1. RM-ANOVA analysis of sheep data assuming circularity 2

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

Improving long run model performance using Deviance statistics. Matt Goward August 2011

Calculating Absolute Rate Differences and Relative Rate Ratios in SAS/SUDAAN and STATA. Ashley Hirai, PhD May 20, 2014

Analysis of Longitudinal Survey Data

STATISTICS PART Instructor: Dr. Samir Safi Name:

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

BIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points]

Module 7: Multilevel Models for Binary Responses. Practical. Introduction to the Bangladesh Demographic and Health Survey 2004 Dataset.

Multilevel Models. David M. Rocke. June 6, David M. Rocke Multilevel Models June 6, / 19

Getting Started With PROC LOGISTIC

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

SAS Log. 1 The SAS System 17:05 Friday, January 5, 2001

Telecommunications Churn Analysis Using Cox Regression

Computer Handout Two

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

CREDIT RISK MODELLING Using SAS

Center for Demography and Ecology

Never Smokers Exposure Case Control Yes No

Examples of Statistical Methods at CMMI Levels 4 and 5

Multilevel/ Mixed Effects Models: A Brief Overview

SPSS 14: quick guide

Unit 5 Logistic Regression Homework #7 Practice Problems. SOLUTIONS Stata version

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

MCUAAAR Workshop. Structural Equation Modeling with AMOS Part I: Basic Modeling and Measurement Applications

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

Application Information Bulletin: Lymphocyte Subset Analysis

Using Weights in the Analysis of Survey Data

Final Exam Spring Bread-and-Butter Edition

Elementary tests. proc ttest; title3 'Two-sample t-test: Does consumption depend on Damper Type?'; class damper; var dampin dampout diff ;

Example Analysis with STATA

Example Analysis with STATA

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015

Introduction to Survey Data Analysis

Small Business advice seeking behaviour technical report. An analysis of the 2018 small business legal need survey July 2018

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

RESIDUAL FILES IN HLM OVERVIEW

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

Logistic Regression Analysis

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Latent Growth Curve Analysis. Daniel Boduszek Department of Behavioural and Social Sciences University of Huddersfield, UK

Segmentation, Targeting and Positioning in the Diaper Market

Advanced Analytics through the credit cycle

Failure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.

Multiple Imputation and Multiple Regression with SAS and IBM SPSS

R-SQUARED RESID. MEAN SQUARE (MSE) 1.885E+07 ADJUSTED R-SQUARED STANDARD ERROR OF ESTIMATE

CHAPTER 10 ASDA ANALYSIS EXAMPLES REPLICATION IVEware

Lab 1: A review of linear models

Industrial organisation and procurement in defence firms: An economic analysis of UK aerospace markets.

How to reduce bias in the estimates of count data regression? Ashwini Joshi Sumit Singh PhUSE 2015, Vienna

Introduction to Survey Data Analysis. Linda K. Owens, PhD. Assistant Director for Sampling & Analysis

1. Understand & evaluate survey. What is survey data? When analyzing survey data... General information. Focus of the webinar

Survey Analysis: Options for Missing Data

Introduction to Survey Data Analysis. Focus of the Seminar. When analyzing survey data... Young Ik Cho, PhD. Survey Research Laboratory

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods

GETTING STARTED WITH PROC LOGISTIC

Visits to a general practitioner

Dealing with missing data in practice: Methods, applications, and implications for HIV cohort studies

Modeling Contextual Data in. Sharon L. Christ Departments of HDFS and Statistics Purdue University

Timing Production Runs

Tutorial Segmentation and Classification

A Methodological Note on a Stochastic Frontier Model for the Analysis of the Effects of Quality of Irrigation Water on Crop Yields

GETTING STARTED WITH PROC LOGISTIC

Economic impact of Water users cooperatives: Institutional and economic dynamics in Cauvery Basin, India

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Categorical Data Analysis

Are Energy Audits Worth It? Teasing Apart the Role of Audits in Driving Customer Efficiency Actions

CHAPTER FIVE CROSSTABS PROCEDURE

White Paper. AML Customer Risk Rating. Modernize customer risk rating models to meet risk governance regulatory expectations

ROBUST ESTIMATION OF STANDARD ERRORS

PhD Student (presenting): Ian Gregory.

CHAPTER 4 EXAMPLES: EXPLORATORY FACTOR ANALYSIS

FOLLOW-UP NOTE ON MARKET STATE MODELS

An Application of Categorical Analysis of Variance in Nested Arrangements

********************************************************************************************** *******************************

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

(LDA lecture 4/15/08: Transition model for binary data. -- TL)

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC

B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh tap sas82, sas <--Old tap sas913, sas <--New Version

Transcription:

MULTILOG Example #3 SUDAAN Statements and Results Illustrated REFLEVEL CUMLOGIT option SETENV LEVELS WEIGHT Input Data Set(s): IRONSUD.SSD Example Using data from the NHANES I and its Longitudinal Follow-up Study, model the effects of body iron stores, age group, and smoking status on follow-up cancer status. Compare the results using the default reference cells for categorical covariates against the same model formulated with the first level of each categorical variable defined as the reference level (using the REFLEVEL statement). Solution The following example comes from the NHANES I Study and its Longitudinal Follow-up Study conducted 10 years later. In this analysis, we wish to determine whether follow-up cancer status (CANCER12, 1=yes, 2=no) is associated with a measure of body iron stores at the initial exam (B_TIBC, total iron-binding capacity), while adjusting for age group at initial exam (AGEGROUP, 1=20-49, 2=50+) and smoking status (SMOKE, 1=current, 2=former, 3=never, 4=unknown). First, we supply results with the default reference cells, which is the last level of each categorical covariate that appears on the SUBGROUP statement (i.e., SMOKE=4 and AGEGROUP=2). The program was run in SAS-Callable SUDAAN, and the SAS program and *.LST files are provided. Page 1 of 6

Exhibit 1. SAS-Callable MULTILOG Code (Default Reference Cells) LIBNAME IN V604 "C:\Program Files\SUDAAN\SUDAAN10\data"; PROC MULTILOG DATA=IN.IRONSUD FILETYPE=SAS DESIGN=WR DEFT2; NEST Q_STRATA PSU1; WEIGHT B_WTIRON; SUBGROUP CANCER12 AGEGROUP SMOKE; LEVELS 2 2 4; MODEL CANCER12 = B_TIBC AGEGROUP SMOKE / CUMLOGIT; SETENV COLSPCE=1 LABWIDTH=25 COLWIDTH=8 DECWIDTH=4; PRINT BETA="BETA" SEBETA="S.E." DEFT="DESIGN EFFECT" T_BETA="T:BETA=0" P_BETA="P-VALUE" DF WALDCHI WALDCHP / T_BETAFMT=F8.2 DEFTFMT=F6.2 WALDCHIFMT=F8.2 DFFMT=F8.0; RTITLE "Default Reference Cell Model"; CANCER12 is the outcome variable in the model, while B_TIBC, AGEGROUP and SMOKE are covariates (Exhibit 1). In MULTILOG, the categorical response variable and all covariates that are to be modeled as categorical must appear on the SUBGROUP/LEVELS or CLASS statement. The SETENV and PRINT statements are optional. The PRINT statement is used in this example to request individual statistics of interest, to change default labels for those statistics, and to specify a variety of formats for those printed statistics. Without the PRINT statement, default statistics are produced from each PRINT group, with default formats. The SETENV statement sets up default formats for printed statistics and further manipulates the printout to the needs of the user. Page 2 of 6

Exhibit 2. First Page of MULTILOG Output S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute November 2011 Release 11.0.0 DESIGN SUMMARY: Variances will be computed using the Taylor Linearization Method, Assuming a With Replacement (WR) Design Sample Weight: B_WTIRON Stratification Variables(s): Q_STRATA Primary Sampling Unit: PSU1 Independence parameters have converged in 6 iterations Number of observations read : 3290 Weighted count: 40570323 Observations used in the analysis : 3290 Weighted count: 40570323 Denominator degrees of freedom : 35 Maximum number of estimable parameters for the model is 6 File IN.IRONSUD contains 67 Clusters 67 clusters were used to fit the model Maximum cluster size is 111 records Minimum cluster size is 15 records Sample and Population Counts for Response Variable CANCER12 Based on observations used in the analysis 1: Sample Count 232 Population Count 1745695 2: Sample Count 3058 Population Count 38824628-2 * Normalized Log-Likelihood with Intercepts Only : 1167.64-2 * Normalized Log-Likelihood Full Model : 1026.95 Approximate Chi-Square (-2 * Log-L Ratio) : 140.70 Degrees of Freedom : 5 Note: The approximate Chi-Square is not adjusted for clustering. Refer to hypothesis test table for adjusted test. Exhibit 2 indicates that there are 3,290 records on the file, corresponding to 67 clusters, with a minimum and maximum cluster size of 15 and 111, respectively. There are no missing values in the data set and no SUBPOPN statement to subset the analysis, so all observations on the file are used in fitting the model. SUDAAN displays the weighted and unweighted frequency distributions of the response in the data and the number of iterations needed to estimate the regression coefficients. Page 3 of 6

Exhibit 3. Estimated Regression Coefficients (Default Reference Cells) Default Reference Cell Model ------------------------------------------------------------------------------ CANCER12 (cum-logit), Independent Variables DESIGN and Effects BETA S.E. EFFECT T:BETA=0 P-VALUE ------------------------------------------------------------------------------ CANCER12 (cum-logit) Intercept 1-0.8618 0.6605 0.94-1.30 0.2004 TOTAL IRON BINDING CAPACITY -0.0024 0.0018 1.10-1.29 0.2052 Age Cohort 20-49 yrs. -2.2525 0.3343 1.89-6.74 0.0000 50+ yrs. 0.0000 0.0000... Smoking Status Current -0.5858 0.2771 0.77-2.11 0.0417 Former -0.9418 0.2922 0.84-3.22 0.0027 Never -0.4998 0.2743 0.85-1.82 0.0770 Unknown 0.0000 0.0000... ------------------------------------------------------------------------------ In this analysis (Exhibit 3), each smoking group is automatically compared to the unknown smoking status (SMOKE=4), which may not be very meaningful. Exhibit 4. ANOVA Table Default Reference Cell Model ---------------------------------------------------------- Contrast Degrees P-value of Wald Wald Freedom ChiSq ChiSq ---------------------------------------------------------- OVERALL MODEL 6 708.28 0.0000 MODEL MINUS INTERCEPT 5 64.47 0.0000 B_TIBC 1 1.67 0.1967 AGEGROUP 1 45.39 0.0000 SMOKE 3 10.60 0.0141 ---------------------------------------------------------- In the ANOVA table (Exhibit 4), we see that age group and smoking status are significantly associated with follow-up cancer status, but total iron-binding capacity is not (p=0.1967). Using the REFLEVEL Statement Next, using the REFLEVEL statement, we redefine the reference cells to be the first level of each categorical variable. Note that the only differences in the results are in the estimates of the regression Page 4 of 6

coefficients, where the expected value of the response for each level of the categorical covariate(s) is now compared to the user-specified first level instead of the last. The main effects tests remain unchanged. Exhibit 5. SAS-Callable MULTILOG Code (REFLEVEL Statement) PROC MULTILOG DATA="IN.IRONSUD" FILETYPE=SAS DESIGN=WR DEFT2; NEST Q_STRATA PSU1; WEIGHT B_WTIRON; REFLEVEL AGEGROUP=1 SMOKE=1; SUBGROUP CANCER12 AGEGROUP SMOKE; LEVELS 2 2 4; MODEL CANCER12 = B_TIBC AGEGROUP SMOKE / CUMLOGIT; SETENV COLSPCE=1 LABWIDTH=25 COLWIDTH=8 DECWIDTH=4; PRINT BETA="BETA" SEBETA="S.E." DEFT="DESIGN EFFECT" T_BETA="T:BETA=0" P_BETA="P-VALUE" DF WALDCHI WALDCHP / T_BETAFMT=F8.2 DEFTFMT=F6.2 WALDCHIFMT=F8.2 DFFMT=F8.0; RTITLE "Using the REFLEVEL Statement"; Exhibit 6. Estimated Regression Coefficients (REFLEVEL Statement) Using the REFLEVEL Statement -------------------------------------------------------------------- CANCER12 (cum-logit), Independent Variables DESIGN and Effects BETA S.E. EFFECT T:BETA=0 P-VALUE -------------------------------------------------------------------- CANCER12 (cum-logit) Intercept 1-3.7002 0.6967 1.06-5.31 0.0000 TOTAL IRON BINDING CAPACITY -0.0024 0.0018 1.10-1.29 0.2052 Age Cohort 20-49 yrs. 0.0000 0.0000... 50+ yrs. 2.2525 0.3343 1.89 6.74 0.0000 Smoking Status Current 0.0000 0.0000... Former -0.3560 0.2716 1.16-1.31 0.1985 Never 0.0860 0.2500 1.26 0.34 0.7330 Unknown 0.5858 0.2771 0.77 2.11 0.0417 -------------------------------------------------------------------- Now each smoking group is compared to the current smokers (SMOKE=1), and we see immediately that current smokers are not significantly different from former smokers (p=0.1985), nor from those who have never smoked (p=0.7330). These results are displayed in Exhibit 6, above. Page 5 of 6

Exhibit 7. ANOVA Table Using the REFLEVEL Statement ---------------------------------------------------- Contrast Degrees P-value of Wald Wald Freedom ChiSq ChiSq ---------------------------------------------------- OVERALL MODEL 6 708.28 0.0000 MODEL MINUS INTERCEPT 5 64.47 0.0000 B_TIBC 1 1.67 0.1967 AGEGROUP 1 45.39 0.0000 SMOKE 3 10.60 0.0141 ---------------------------------------------------- As noted earlier in this example, the tests of main effects are the same no matter which groups are designated as the reference cells. Page 6 of 6