BIO 226: Applied Longitudinal Analysis. Homework 2 Solutions Due Thursday, February 21, 2013 [100 points]

Similar documents
Elementary tests. proc ttest; title3 'Two-sample t-test: Does consumption depend on Damper Type?'; class damper; var dampin dampout diff ;

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

The SAS System 1. RM-ANOVA analysis of sheep data assuming circularity 2

Example Analysis with STATA

Foley Retreat Research Methods Workshop: Introduction to Hierarchical Modeling

Read and Describe the SENIC Data

Example Analysis with STATA

Regression diagnostics

Multiple Imputation and Multiple Regression with SAS and IBM SPSS

Nested or Hierarchical Structure School 1 School 2 School 3 School 4 Neighborhood1 xxx xx. students nested within schools within neighborhoods

Multilevel Models. David M. Rocke. June 6, David M. Rocke Multilevel Models June 6, / 19

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

Timing Production Runs

SAS program for Alcohol, Cigarette and Marijuana use for high school seniors:

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011

. *increase the memory or there will problems. set memory 40m (40960k)

Two Way ANOVA. Turkheimer PSYC 771. Page 1 Two-Way ANOVA

Stata Program Notes Biostatistics: A Guide to Design, Analysis, and Discovery Second Edition Chapter 12: Analysis of Variance

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Computer Handout Two

CHAPTER 6 ASDA ANALYSIS EXAMPLES REPLICATION SAS V9.2

B. Kedem, STAT 430 SAS Examples SAS3 ===================== ssh tap sas82, sas <--Old tap sas913, sas <--New Version

Lab 1: A review of linear models

MULTILOG Example #3. SUDAAN Statements and Results Illustrated. Input Data Set(s): IRONSUD.SSD. Example. Solution

Sensitivity Analysis of Nonlinear Mixed-Effects Models for. Longitudinal Data That Are Incomplete

Final Exam Spring Bread-and-Butter Edition

DAY 2 Advanced comparison of methods of measurements

Getting Started with HLM 5. For Windows

Multilevel/ Mixed Effects Models: A Brief Overview

FOLLOW-UP NOTE ON MARKET STATE MODELS

Quantification of Harm -advanced techniques- Mihail Busu, PhD Romanian Competition Council

Getting Started With PROC LOGISTIC

(LDA lecture 4/15/08: Transition model for binary data. -- TL)

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian Two Wave Panel Data Analysis

Bios 312 Midterm: Appendix of Results March 1, Race of mother: Coded as 0==black, 1==Asian, 2==White. . table race white

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

APPENDIX 2 Examples of SAS and SUDAAN Programs Combining Respondent and Interval File Data Using SAS

Center for Demography and Ecology

Soci Statistics for Sociologists

Unit 2 Regression and Correlation 2 of 2 - Practice Problems SOLUTIONS Stata Users

SPSS Guide Page 1 of 13

Improving long run model performance using Deviance statistics. Matt Goward August 2011

MNLM for Nominal Outcomes

Module 20 Case Studies in Longitudinal Data Analysis

A Simple Methodology for Customer Classification in Two Dimensions

Correlations. Regression. Page 1. Correlations SQUAREFO BEDROOMS BATHS ASKINGPR

ROBUST ESTIMATION OF STANDARD ERRORS

Categorical Data Analysis

SUDAAN Analysis Example Replication C6

SAS Proc STATESPACE and the forecasting of Bureau of Labor Statistics JOLT and Diffusion Index Martin Jetton, Kronos, Inc, Beaverton OR

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

COMPARING MODEL ESTIMATES: THE LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION

ALL POSSIBLE MODEL SELECTION IN PROC MIXED A SAS MACRO APPLICATION

!! NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA ! NOTE: The SAS System used:!

Please respond to each of the following attitude statement using the scale below:

MULTILOG Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DARE.SSD. Example. Solution

Appendix 6: Proposal 6 Supporting Evidence: 2015 AOSA Elymus referee.ppt June 2015

Unit 6: Simple Linear Regression Lecture 2: Outliers and inference

All Possible Mixed Model Selection - A User-friendly SAS Macro Application

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

CHAPTER 5 RESULTS AND ANALYSIS

Regression Analysis I & II

Module 6 Case Studies in Longitudinal Data Analysis

Paper MPSF-073. Model Validity Checks In Data Mining: A Luxury or A Necessity?

Case study: Modelling berry yield through GLMMs

Multilevel Modeling and Cross-Cultural Research

Problem Points Score USE YOUR TIME WISELY SHOW YOUR WORK TO RECEIVE PARTIAL CREDIT

Applied Econometrics

Week 10: Heteroskedasticity

Apply Response Surface Analysis for Interaction of Dose Response Combine Treatment Drug Study

STAT 350 (Spring 2016) Homework 12 Online 1

Online Appendix to Card, Heining, and Kline (2013)

*STATA.OUTPUT -- Chapter 13

Need Additional Statistics in Your Report? - ODS OUTPUT to the Rescue!

Logistic (RLOGIST) Example #2

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

Chapter 2 Part 1B. Measures of Location. September 4, 2008

Table. XTMIXED Procedure in STATA with Output Systolic Blood Pressure, use "k:mydirectory,

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Modelling categorical variables using logit models

5 CHAPTER: DATA COLLECTION AND ANALYSIS

The Multivariate Regression Model

SPSS 14: quick guide

F u = t n+1, t f = 1994, 2005

Bootstrap simulations to estimate overall survival based on the distribution of a historical control

= = Intro to Statistics for the Social Sciences. Name: Lab Session: Spring, 2015, Dr. Suzanne Delaney

GCTA/GREML. Rebecca Johnson. March 30th, 2017

Week 3 [16+ Sept.] Class Activities. File: week-03-15sep08.doc/.pdf Directory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts

Statistical Modelling for Business and Management. J.E. Cairnes School of Business & Economics National University of Ireland Galway.

The Dummy s Guide to Data Analysis Using SPSS

STATISTICS PART Instructor: Dr. Samir Safi Name:

Sylvain Tremblay SAS Canada

Creative Commons Attribution-NonCommercial-Share Alike License

STAT 430 SAS Examples SAS1 ==================== ssh tap sas913 (or sas82), sas

Biostatistics 208 Data Exploration

Statistics: Data Analysis and Presentation. Fr Clinic II

The study obtains the following results: Homework #2 Basics of Logistic Regression Page 1. . version 13.1

Model Building Process Part 2: Factor Assumptions

Analyzing Ordinal Data With Linear Models

Week 11: Collinearity

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014

Transcription:

Prof. Brent Coull TA Shira Mitchell BIO 226: Applied Longitudinal Analysis Homework 2 Solutions Due Thursday, February 21, 2013 [100 points] Purpose: To provide an introduction to the use of PROC MIXED for analyzing data from a single-group repeated measures design. Instructions: 1. For each question requiring data analysis, support your conclusions by including the relevant SAS output in your answer. 2. Include your SAS program as an Appendix to your solutions (PLEASE dont include the output from your program!). 3. Homework should be turned by class on the due date. Joint Modeling of Mean and Covariance in the Dental Growth Study: In a study of dental growth, measurements of the distance (mm) from the center of the pituitary gland to the pteryomaxillary fissure were obtained on 11 girls and 16 boys at ages 8, 10, 12 and 14 years (Potthoff and Roy, 1964). The data are in the file dental.txt on the course web page. Each row of the data set contains the following six variables: subject ID, gender ( F or M ), and the measurements at ages 8, 10, 12 and 14 years, respectively. For all analyses in this homework, subset the dataset to contain only the measurements for the girls. PART A: Descriptive Analyses Problem 1 [15 points: 5 for plot, 10 for description that includes mention of linearity and increasing. Minus 2 if person doesn t mention linearity, and minus 2 if doesn t mention increasing trend.] Plot the observed trajectories of distance for the 11 girls (all on one plot). Briefly, describe the important patterns that are apparent in the plot. Measurements seems to increase over time for all eleven girls in the study at approximately the same rate. data dental1; infile dental.txt ; input id gender $ y8 y10 y12 y14; data girls1; set dental1; 1

if gender="f"; data dental2; set dental1; age=8; age8 = 0; agecat=8; y=y8; output; age=10; age8 = 2; agecat=10; y=y10; output; age=12; age8 = 4; agecat=12; y=y12; output; age=14; age8 = 6; agecat=14; y=y14; output; keep id gender age agecat age8 y; data girls2; set dental2; if gender="f"; proc gplot data=girls2; symbol1 interpol=join value=dot; symbol2 interpol=join value=triangle; plot y*age=id /haxis=axis1 vaxis=axis2; axis1 label=("years"); axis2 label=("growth"); title Growth against Age for Each Girl ; 2

3

Problem 2 [15 points: 5 for statistics, 5 for plot, 5 for comment.] Obtain descriptive statistics for the distances in girls including means and standard deviations at each measurement occasion. Plot the means against time and comment on the shape of their trajectory. Also, comment on how the standard deviations are changing with time. proc means data=girls2 n mean std var nway; var y; class gender age; output out=meandata mean=mngrowth; ---------------------------------------------------------------------------- The MEANS Procedure Analysis Variable : y N gender age Obs N Mean Std Dev Variance ------------------------------------------------------------------------------- F 8 11 11 21.1818182 2.1245320 4.5136364 10 11 11 22.2272727 1.9021519 3.6181818 12 11 11 23.0909091 2.3645103 5.5909091 14 11 11 24.0909091 2.4373980 5.9409091 ------------------------------------------------------------------------------- ----------------------------------------------------------------------------- proc gplot data=meandata; symbol1 color=red interpol=join value=dot; symbol2 interpol=join value=triangle; plot mngrowth*age=gender; title Mean Growth against Age ; The mean growth increases over time in a roughly linear way. The standard deviations appear relatively constant over time, with only a slight increase. 4

5

Problem 3 [15 points: 5 for covariances, 5 for correlations, 5 for comments on correlations.] Obtain the variancecovariance and correlation matrices for the repeated measurements over time. Briefly, describe the important patterns in the correlations. proc corr data=girls1 cov; by gender; var y8 y10 y12 y14; Covariance Matrix, DF = 10 y8 y10 y12 y14 y8 4.513636364 3.354545455 4.331818182 4.356818182 y10 3.354545455 3.618181818 4.027272727 4.077272727 y12 4.331818182 4.027272727 5.590909091 5.465909091 y14 4.356818182 4.077272727 5.465909091 5.940909091 ----------------------------------------------------------------------------------------- Pearson Correlation Coefficients, N = 11 Prob > r under H0: Rho=0 y8 y10 y12 y14 y8 1.00000 0.83009 0.86231 0.84136 0.0016 0.0006 0.0012 y10 0.83009 1.00000 0.89542 0.87942 0.0016 0.0002 0.0004 y12 0.86231 0.89542 1.00000 0.94841 0.0006 0.0002 <.0001 y14 0.84136 0.87942 0.94841 1.00000 0.0012 0.0004 <.0001 The correlations are roughly constant across time. They are quite high and positive, at above 0.8. Problem 4 [10 points] Identify one unexpected feature of this correlation structure generated in question 3. Provide a possible reason for this unexpected feature. The correlation does not really seem to decrease over time. Note that the y8 and y10 correlation is lower than the y8 and y12 correlation. Part of the reason may be small sample sizes, which would mean that the certainty about these correlations is low. 6

PART B: Repeated Measures Models Problem 5 [20 points: 5 for fitting correct models, 5 for each part a,b,c] Use PROC MIXED to fit a repeated measures model in which there is separate parameter for the mean distance at each of the four ages [HINT: Use the NOINT option on the MODEL statement]. Use an unstructured variance-covariance structure. For estimating parameters, fit the model using (i) the METHOD=ML option and (ii) the METHOD=REML option on the PROC MIXED statement. title ML method ; proc mixed data=girls2 method=ml; class id agecat age; model y=agecat / noint chisq s; /* NOINT - excludes fixed-effect intercept from model */ repeated age / type=un subject=id r rcorr; title; ---------------------------------------------------------------------------------- Estimated R Matrix for id 1 Row Col1 Col2 Col3 Col4 1 4.1033 3.0496 3.9380 3.9607 2 3.0496 3.2893 3.6612 3.7066 3 3.9380 3.6612 5.0826 4.9690 4 3.9607 3.7066 4.9690 5.4008 Estimated R Correlation Matrix for id 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.8301 0.8623 0.8414 2 0.8301 1.0000 0.8954 0.8794 3 0.8623 0.8954 1.0000 0.9484 4 0.8414 0.8794 0.9484 1.0000 Standard Effect agecat Estimate Error DF t Value Pr > t agecat 8 21.1818 0.6108 11 34.68 <.0001 agecat 10 22.2273 0.5468 11 40.65 <.0001 agecat 12 23.0909 0.6797 11 33.97 <.0001 agecat 14 24.0909 0.7007 11 34.38 <.0001 ----------------------------------------------------------------------------------- title REML method ; proc mixed data=girls2; class id agecat age; 7

model y=agecat / noint chisq s; repeated age / type=un subject=id r rcorr; title; ---------------------------------------------------------------------------------- Estimated R Matrix for id 1 Row Col1 Col2 Col3 Col4 1 4.5136 3.3545 4.3318 4.3568 2 3.3545 3.6182 4.0273 4.0773 3 4.3318 4.0273 5.5909 5.4659 4 4.3568 4.0773 5.4659 5.9409 Estimated R Correlation Matrix for id 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.8301 0.8623 0.8414 2 0.8301 1.0000 0.8954 0.8794 3 0.8623 0.8954 1.0000 0.9484 4 0.8414 0.8794 0.9484 1.0000 Standard Effect agecat Estimate Error DF t Value Pr > t agecat 8 21.1818 0.6406 11 33.07 <.0001 agecat 10 22.2273 0.5735 11 38.76 <.0001 agecat 12 23.0909 0.7129 11 32.39 <.0001 agecat 14 24.0909 0.7349 11 32.78 <.0001 (a) Does either method of estimation give the same estimates of the variances and covariances as in your answer to question 3? If so, which one or ones? REML gives the same estimates of the variances and covariances as in question 3. ML does not. (b) Compare the variances and covariances for the two methods of estimation (i.e. ML versus REML). If they are not the same, comment on why they are different and state which method should be preferred and why? REML gives a less biased estimate of the variances and covariances than ML because it takes into account the fact that β is also estimated. Thus, REML gives a preferred estimate for the variances and covariances. (c) Compare the estimated betas in the regression model to the sample means calculated in question 2. Provide a reason for any relationship you notice between these two sets of estimates. The estimated βs are the same for both ML and REML, and both are identical to the sample means because we fit the saturated model to the data. 8

Problem 6 [25 points: 10 for part a, 10 for part b, and 5 for part c] Using METHOD=ML and an unstructured variance-covariance structure, fit a model which assumes that the mean distance changes linearly with time. proc mixed data=girls2 method=ml; class id agecat age; model y=agecat / noint chisq s; /* NOINT - excludes fixed-effect intercept from model */ repeated age / type=un subject=id r rcorr; ----------------------------------------------------------------------------- Fit Statistics -2 Log Likelihood 130.5 ---------------------------------------------------------------------------------- proc mixed data=girls2 method=ml; class id age; model y=agecat / chisq s; repeated age / type=un subject=id r rcorr; --------------------------------------------------------------------------------------- Fit Statistics -2 Log Likelihood 130.6 (a) Why can this model be considered to be nested within the model fitted in question 5? As in slide 28 Lab 2, E[growth ij ] = β c 0I(age ij = 8) + β c 1I(age ij = 10) + β c 2I(age ij = 12) + β c 3I(age ij = 14) E[growth ij ] = β 0 + β 1 age ij at age 8 E[growth] = β c 0 = β 0 + 8β 1 at age 10 E[growth] = β c 1 = β 0 + 10β 1 at age 12 E[growth] = β c 2 = β 0 + 12β 1 at age 14 E[growth] = β c 3 = β 0 + 14β 1 so we see that model 2 is a special case of model 1. Model 2 is nested in model 1. (b) By undertaking a likelihood ratio test, evaluate the goodness of fit of this model compared with the model fitted (with METHOD=ML) in question 5. Which model do you prefer and why? As in slide 29 Lab 2, χ 2 2 = 2(log ˆL full log ˆL reduced ) = ( 2 log ˆL reduced ) ( 2 log ˆL full ) = 130.6 130.5 = 0.1 9

DATA pvalues; chsq = SDF( chisquare,0.1,2); RUN; PROC PRINT DATA=pvalues; RUN; ------------------------------------------------------------------------- Obs chsq 1 0.95123 Thus, we do not reject the linear model. We prefer the simpler linear model because it fits well. (c) Why should you not use the REML log likelihood to compare models in this question? REML should not be used to compare different mean functions because the penalty term in REML depends on the regression mean model specification. See Lecture 5, slide 25. 10