Sensitivity Analysis of Nonlinear Mixed-Effects Models for. Longitudinal Data That Are Incomplete

Size: px
Start display at page:

Download "Sensitivity Analysis of Nonlinear Mixed-Effects Models for. Longitudinal Data That Are Incomplete"

Transcription

1 ABSTRACT Sensitivity Analysis of Nonlinear Mixed-Effects Models for Longitudinal Data That Are Incomplete Shelley A. Blozis, University of California, Davis, CA Appropriate applications of methods for missing data are essential to behavioral research. A central problem lies in the particular analytic method needed for data analysis and the source of the missing data. Specifically, the problem concerns whether or not a response is missing (known as missingness) is ignorable or nonignorable under the plan for analysis. Ignorable missingness means that parameter estimates are unbiased with regard to missing data; under nonignorable missingness, estimates may be biased unless the missingness is addressed. The major types of data are briefly reviewed: Missing completely at random, missing at random, and missing not at random. Analytic methods appropriate under these different types of data are reviewed with special attention given to methods for data that are not missing at random. Methods are illustrated by an application to longitudinal measures for a sample of individuals diagnosed with multiple sclerosis. SAS PROC NLMIXED is used to perform a sensitivity analysis of parameter estimates of a longitudinal model under different assumptions about the missing data process. This talk assumes the audience is familiar with the application of mixed-effects models for longitudinal data. INTRODUCTION Mixed-effects models provide a major framework for the analysis of longitudinal data. The models may be applied to data due to a variety of distributions. With regard to describing change or growth in a response, these models may rely on various mathematical functions, including those that are linear or nonlinear in their parameters. The times at which responses are recorded may be unique to each individual, and data need not be complete. With regard to missing data, statistical inference is considered valid if data are missing completely at random (MCAR) or missing at random (MAR). Data are MCAR if whether or not data are missing, known as missingness, is not related to either the missing or the observed data. Complete-case analysis (a.k.a. listwise deletion) is valid under MCAR. Data are MAR if the missingness is not related to the missing data but is related to the observed data. Methods that rely on full information maximum likelihood for estimation are valid under MAR (and MCAR). If data are MCAR or MAR, the source of the missing data may be ignored and inference is valid from a mixed-effects model (Laird, 1988). In other words, it is not necessary to include in the data analysis information about the source of the missing data. If data are not missing at random (MNAR), such that the missingness is related to the missing data, then it may be necessary to include this information in the data analysis to draw valid inference (Molenberghs & Kenward, 2007). Methods that specifically address data that are MNAR have been developed for mixed-effects models. Xu and Blozis (2011) describe a nonlinear mixed-effects selection model. This paper describes a nonlinear mixed-effects pattern-mixture model for longitudinal data that are MNAR and for which a nonlinear function is needed to describe the longitudinal response. A nonlinear mixed-effects model is a model in which random coefficients may enter the model in a nonlinear manner. SAS PROC NLNIXED may be used to obtain maximum likelihood estimates for a nonlinear mixed-effects model that includes a pattern-mixture model to address data that are MNAR. An example is presented that uses an empirical set of longitudinal data. METHODS FOR DATA THAT ARE MNAR Major frameworks for data that are MNAR include selection models (which include a shared parameter model as a special case) and pattern-mixture models. Mixed-effects selection models for longitudinal data that are MNAR allow missingness to depend on the longitudinal data (Little, 1995; Wu & Carroll, 1988). Separate models are specified for the longitudinal response and the missingness. The two models are joined such as by allowing the missingness to depend on the longitudinal response that may include both the observed and the missing values (Diggle & Kenward, 1994), or by allowing the missingness to depend on the random coefficients of the longitudinal model at the second level (Wu & Carroll, 1988). In a pattern-mixture model, the longitudinal response depends on the missingness. In this latter approach, a longitudinal response may be studied according to patterns of missingness. Estimates of the overall fixed effects may be obtained as a weighted average of the effects obtained for each pattern of missing data. Multiple imputation (MI) using correlates of the missingness or missing data (referred to as auxiliary variables) is another strategy for addressing data that are MNAR. Using MI, two or more imputed data sets are generated from the observed data with the use of auxiliary variables. Data analysis then proceeds on the imputed data using only those variables of a data model. If correlates of the missingness or missing data are informative, then the missingness may

2 be ignorable. Another approach that also makes use of auxiliary variables is one in which auxiliary variables are included in a data model as correlates of the variables of the data model, such as correlates of predictors or response variables or the residuals that may result from a regression (Graham, 2003). Similar to MI in which correlates of the missingness or missing data are included in the data model, if the auxiliary variables provide the needed information about the missingness or missing data, then nonresponse is ignorable. SENSITIVITY ANALYSIS For all three data types, MCAR, MAR and MNAR, it is not possible to empirically test the assumption that the missingness and the missing data are independent. Even under the most flexible formulation of a mixed-effects model in which a missing data process has been addressed in some manner with MNAR assumed, definitive conclusions about the missing data process should not be drawn because the missing data are not available to evaluate such hypotheses. Given this problem, sensitivity analysis has gained popularity as an approach to understanding how parameter estimates and statistical inference may depend on assumptions that are made about a missing data process (Molenberghs & Kenward, 2007). Sensitivity analysis may proceed by considering a model for a longitudinal response under different assumptions about a given missing data problem. Changes in parameter estimates or differences in statistical inference across models that make different assumptions about the missing data process may suggest that the missing data process is important and thus ought to be addressed in the evaluation of the longitudinal response. PATTERN-MIXTURE NONLINEAR MIXED-EFFECTS MODELS Hedeker and Gibbons (1998) describe a pattern-mixture random-effects model in which the parameters of the random-effects model are linear. In a pattern-mixture random-effects model, the random coefficients at the second level of the longitudinal model depend on indicators of missingness that represent patterns of missing data, such as a pattern of monotonic dropout. The random coefficients depend on these indicators as they would for any fixed person-level covariate, such as sex. A pattern-mixture random-effects model for which the parameters enter the model in a linear manner may be fitted using either PROC MIXED or PROC NLMIXED. PROC MIXED code for fitting a pattern-mixture random-effects model (using maximum likelihood estimation) in which a longitudinal response is assumed to follow a random intercept and slope model with an indicator of dropout is: PROC MIXED METHOD=ML; CLASS id; MODEL y = time dropout time*dropout / SOLUTION; RANDOM int time / type=un SUB=id; RUN; The same model may be fitted using PROC NLMIXED: proc nlmixed; predv = (b0+u0) + (b1+u1)*time + b2*dropout + b3*time*dropout; random u0 u1 ~ normal([0,0],[s2u1,cu0u1,s2u1]) subject=id; run; A pattern-mixture random-effects model for which one or more of the random coefficients of the longitudinal model enter in a nonlinear manner may be fitted using PROC NLMIXED. PROC NLMIXED code for fitting a pattern-mixture random-effects model in which a longitudinal response is assumed to follow an exponential function with a random coefficient that enters the model in a nonlinear manner is: proc nlmixed; f1 = b0+u0; f2 = b1+u1; f3 = b2+u2; predv = (f1) - ((f1)-(f0))*exp(-f2*time); random u0 u1 u2 ~ normal([0,0,0],[s2u0, c10,s2u1,

3 run; c20,c21,s2u2]) subject=id; EXAMPLE The study of functional limitations over time is essential for the development of interventions that aim to promote health in individuals diagnosed with a chronic illness. Seven annual repeated measures of functional limitations for a sample of n = 606 adults diagnosed with multiple sclerosis are studied here. Higher scores represent greater levels of impairment. About 20% of the sample had a pattern of attrition such that data were available until a certain time point and were missing thereafter. For these individuals, an indicator variable was created: D i = 1 for dropouts and D i = 0 for completers. All other patterns of missing data were assumed to be MAR as sample size was too small for reliable tests of these various patterns of missing data. GROWTH MODELS ASSUMING MAR A two-parameter exponential function was fitted to the repeated measures of functional limitations, denoted by : exp where, at the second level of the model, where is the level of functional limitations at time = 0, and is a weight that with measures of time governs change in the functional limitations scores. In this model, no information is provided about the missing data. Inference from the model is based on an assumption that the data are MAR. PROC NLMIXED syntax for fitting this exponential growth model is: proc nlmixed method=firo; f1 = b1 + u1; f2 = b2 + u2; predv = f1*exp(-f2*time); random u1 u2 ~ normal([0,0],[s2u1, c21,s2u2]) subject=id; run; Results from fitting the model to a sample of n = 606 individuals are

4 PATTERN-MIXTURE NONLINEAR MIXED MODEL An indicator of dropout, D, may be added to the exponential growth model at the second level as a predictor of each of the two random coefficients to account for participant attrition. The exponential growth model with an indicator of dropout is: exp where, at the second level of the model, where, for those not considered to have dropped from the study, is the level of functional limitations at time = 0, is a weight that with measures of time governs change in the functional limitations scores. The coefficients and are the differences in the growth coefficients between those considered to have dropped from the study and those to have completed the study. Inference from the model is based on an assumption that the missingness is ignorable. PROC NLMIXED syntax for fitting the model is: proc nlmixed method=firo; f1 = b1 + b11*d + u1; f2 = b2 + b21*d + u2; predv = f1*exp(-f2*time); random u1 u2 ~ normal([0,0],[s2u1, c21,s2u2]) subject=id; run; The corresponding output is:

5 Based on the results, the growth coefficients differed between those considered to have dropped from the study and those to have not. Specifically, the level of functional limitations is estimated to be 1.65 points higher on average at the start of the study for those who dropped relative to those who completed the study. Further, the two groups differed by.014 with regard to the change rate parameter. CONCLUSION A nonlinear mixed-effects pattern-mixture model provides an approach to the study of how patterns of missing data may be related to particular aspects of a nonlinear growth model for longitudinal data. The approach is analogous to including fixed covariates in the second level of a growth model, making estimation of the model relatively straightforward. REFERENCES Diggle, P., & Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis. Applied Statistics, 43, Graham J. W. (2003). Adding missing-data-relevant variables to FIML-based structural equation models. Structural Equation Modeling, 10, Laird N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7, Little, R. J. A. (1995). Modeling the drop-out mechanism in longitudinal studies. Journal of the American Statistical Association, 90, Molenberghs, G., & Kenward, M. G. (2007). Missing Data in Clinical Studies. John Wiley and Sons Ltd, Chichester, UK. Wu, M. C., & Carroll, R. J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics, 44, Xu, S., & Blozis, S. A. (2011). Sensitivity analysis of mixed models for incomplete longitudinal data. Journal of Educational and Behavioral Statistics 36(2),

6 CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Shelley A. Blozis Enterprise: University of California, Davis Address: One Shields Avenue City, State ZIP: Davis, CA Work Phone: Fax: Web: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.