Topics in Testing Mediation Models: Power, Confounding, and Bias DISSERTATION

Size: px
Start display at page:

Download "Topics in Testing Mediation Models: Power, Confounding, and Bias DISSERTATION"

Transcription

1 Topics in Testing Mediation Models: Power, Confounding, and Bias DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Robert Arthur Agler Graduate Program in Psychology The Ohio State University 2015 Dissertation Committee: Dr. Paulus De Boeck, Advisor Dr. Robert Cudeck Dr. Andrew Hayes Dr. Duane Wegener

2 Copyrighted by Robert Arthur Agler 2015

3 Abstract In this dissertation we consider different statistical methodologies to be employed at all stages of testing mediation claims. We begin by examining the relative performance of various methods of testing direct and indirect effects, both in terms of statistical power and the risk of Type I errors. Specifically, we compare a normal-theory approach to testing direct and indirect effects (Sobel, 1982) using either regression or structural equation models with different estimations to bootstrapping techniques (Efron, 2003). We then discuss factor models as an alternative model to mediation models for cases where they make conceptual sense, and as a method of examining worst-case confounding scenarios. We present formulae that discuss their relationships, and investigate the use of structural equation modeling as a way to distinguish between these two models. Finally, we investigate the utility of fungible weights (Waller, 2008) for examining parameter sensitivity in mediation. Fungible weights provide almost equal description of the dependent variable as do the optimal weights, yet may be quite discrepant with the optimal weights and suggest alternative interpretations. We also provide a function to facilitate their use. ii

4 Acknowledgments I do not believe that I can adequately express my gratitude for the opportunities and support I have been given by my friends, family, and colleagues. Specifically, I wish to thank my advisor Dr. Paulus De Boeck for the opportunity to study quantitative psychology, my parents for always believing in me, and my girlfriend for being by my side through this process. I have come further and overcome far more than I could have ever believed before I began my schooling, and it is because of the many chances and words of encouragement that my friends and family have given me. iii

5 Vita B.S. Psychology, James Madison University M.A. Social Psychology, The Ohio State...University Graduate Fellow, Department of...psychology, The Ohio State University present...graduate Teaching Associate, Department...of Psychology, The Ohio State University Publications Arkin, R.M, & Agler, R.A. (2012). Focus on individual differences: A throwback and a throw down. PsycCRITIQUES, 57(23). Carroll, P. J., Agler, R. A., & Newhart, D. W. (2015). Beyond Cause to Consequence: The Road from Possible to Core Self-Revision. Self and Identity, 14(4), Hayes, A. F., & Agler, R. A. (2014). On the standard error of the difference between independent regression coefficients in moderation analysis. Multiple Linear Regression Viewpoints, 40 (2), iv

6 Fields of Study Major Field: Psychology v

7 Table of Contents Abstract... ii Acknowledgments... iii Vita... iv List of Tables... viii Chapter 1: Introduction...1 Chapter 2: Relative Performance of Methods of Testing Mediation Effects...14 Chapter 3: Factor Model as an Alternative Explanation...57 Chapter 4: Testing a Factor Model against a Mediation Model...86 Chapter 5: Fungible Weights in Mediation Chapter 6: General Discussion References Appendix A: Full Results for Chapter Appendix B: Formulas for Converting Correlations to Factor Loadings, One Factor Appendix C Formulas for Converting Regression Weights to Factor Loadings, vi

8 One Factor Appendix D: Formulas for Converting Correlations to Factor Loadings, Two Factors..159 Appendix E: Fungible Mediation Function Appendix F: Fungible Mediation Example vii

9 List of Tables Table 1. Power for testing the direct effect for all methods, collapsed across all effect size combinations Table 2. Type I error rates when testing the direct effect for all methods, collapsed across all effect size combinations Table 3. Power for testing the indirect effect for all methods, collapsed across all effect size combinations Table 4. Type I error rates when testing the indirect effect for all methods, collapsed across all effect size combinations Table 5. Sample correlation and regression coefficients based on vector angles and lengths, for four select cases, and vector lengths of 0.8 and Table 6. Comparison of model fit statistics for the models we estimate here. EL = equal loadings. CE = correlated errors. LL = lag-lag from the latent variable at t 0 to the one at t Table 7. Regression results for predictors of fungible interval of the direct and indirect effects in the single mediator case Table 8. Regression results for the predictors of the fungible interval of the direct and indirect effects. The results for the indirect effect ab 2 are not shown, but the results are of viii

10 the same nature for ab 2, excepting that the effects are related to,, and rather than,, and Table 9. Results presented in Chapter 3, based on possible factor space angles and vector lengths of.8 for a given combination of mediation results Table 10.. Results presented in Chapter 3, based on possible factor space angles and vector lengths of.5 for a given combination of mediation results ix

11 List of Figures Figure 1. ROC curve for the direct effect and N = 50, collapsed across all effect size combinations. Full plot comparing the specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison Figure 2. ROC curve for the direct effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison Figure 3. ROC curve for the direct effect and N = 200, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison Figure 4. ROC curve for the indirect effect and N = 50, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison xi

12 Figure 5. ROC curve for the indirect effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison Figure 6. ROC curve for the indirect effect and N = 200, collapsed across all effect size combinations Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison Figure 7. Difference between the standard deviation of ab estimates across all replications for a given condition, compared to the observed standard deviation of the replications. Calculated as the mean estimated SE for regression across all replications SD of replications Figure 8. Difference between the standard deviation of ab estimates across all replications for a given condition, compared to the observed standard deviation of the replications. Calculated as the mean estimated SE for SEM with ML across all replications SD of replications Figure 9. Mean of the mean for bootstrap resamples across all replications, compared to the mean of all replications Figure 10. Median of the median for bootstrap resamples across all replications, compared to the median of all replications Figure 11. Mean of the skew for bootstrap resamples across all replications, compared to the skew of all replications xi

13 Figure 12. Mean of the kurtosis for bootstrap resamples across all replications, compared to the kurtosis of all replications Figure 13. Mean of the standard deviation for bootstrap resamples across all replications, compared to the standard deviation of all replications Figure 14. Difference between the mean ab across all replications for a given condition, compared to the mean ab from each bootstrapped distribution. Calculated as mean mean of bootstrapped distributions - mean of replications...48 Figure 15. Difference between the median ab across all replications for a given condition, compared to the median of the median ab from each bootstrapped distribution. Calculated as the median median of the bootstrapped distributions - median of replications Figure 16. Difference between the skew of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean skew from each bootstrapped distribution. Calculated as mean skew of bootstrapped distributions - skew of replications...49 Figure 17. Difference between the kurtosis of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean kurtosis from each bootstrapped distribution. Calculated as mean kurtosis of bootstrapped distributions - kurtosis of replications Figure 18. Difference between the standard deviation of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean standard deviation from each bootstrapped distribution. Calculated as mean standard deviation of bootstrapped estimates the standard deviation of the replications xii

14 Figure 19. One-factor longitudinal model. X 0, M 1, and Y 2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis Figure 20. Affect circumplex (taken with permission from Yik, Russell, & Steiger, 2011). 68 Figure 21. Sample mediation triangle placed within factorial space. The three variables from the core affect example are represented as vectors. The specific situation shown is the case where the vector length of each variable is.8, X and Y are orthogonal, and the mediator is 45 from both X (frustrated) and Y (depressed). In this case, X and Y are uncorrelated, and the mediator is correlated with both X and Y at r =.45. Variable labels are somewhat arbitrary and for illustrative purposes only Figure 22. Heat map of the calculated indirect and direct effects when r XY = 0. Values vary as a function of the magnitude of r XM and r MY Figure 23. Calculated direct effects in a shared factor space. X-axis represents the magnitude of angle MY as a proportion of angle XY. For example, for XY = 60 and proportion =.33 (1/3), XM = 20 and MY = Figure 24. Calculated indirect effects in a shared factor space. X-axis represents the magnitude of MY as a proportion of XY. For example, for XY = 60 and proportion =.33 (1/3), XM = 20 and MY = Figure 25. One-factor longitudinal model. X 0, M 1, and Y 2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis xiii

15 Figure 26. Plot of fungible weights with three predictors when all variable correlations are r =.5 and =.98. The point in the center is the OLS estimate of the weights, b 1 = b 2 = c =.25. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them Figure 27. Plot of fungible weights with three predictors when all variable correlations are r =.3 and =.98. The point in the center is the OLS estimate of the weights, b 1 = b 2 = c =.19. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them Figure 28. Comparison of fungible intervals and confidence intervals with R 2. The lines represent the intervals about the OLS estimated weights. As a given value of R 2 value may result in different fungible intervals depending on the correlations between the variables, the lines have been jiggered about R 2. Grey lines indicate confidence intervals, and the black lines indicate fungible intervals. Confidence intervals are based on Sobel standard error and N = Figure 29. Relation of fungible intervals and confidence intervals with the magnitude of the indirect effect. The lines represent the interval about the OLS estimated weights. Grey lines indicate confidence intervals, and the black lines indicate fungible intervals. The lines are overlain because a given value of R 2 value may result in different fungible intervals, depending on the correlations between the variables. Confidence intervals are based on Sobel standard error and N = xiv

16 Figure 30. Range of fungible intervals for the direct effect in the one mediator case as a function of, based on. The results are the same for the range of b as a function of Figure 31. Range of fungible intervals for indirect effects in the one mediator case compared as a function of to, based on Figure 32. Fungible intervals for the direct effect based on a criterion value r crit derived from the standard error of R 2 and N= Figure 33. Fungible intervals for the indirect effect based on a criterion value derived from the standard error of R 2 and N= xv

17 Chapter 1: Introduction Theories that describe causal relationships are made more convincing when one may provide evidence of intervening processes. Doing so requires researchers to develop and test causal hypotheses in a variety of ways, both in terms of the methodologies employed and the statistical analyses used. In regards to the statistical analysis, whatever the model employed, it is then necessary to both test the statistical significance of its parameter estimates, and to examine the quality of the estimates (e.g., degree of bias) that the model yields, as both are crucial components of demonstrating the validity of a causal theory. Perhaps the most popular analysis explicitly intended for making causal claims is mediation. As the many citations of Baron and Kenny s (1986) seminal article suggest, using mediation is straightforward. In addition to the numerous software packages created for mediation analyses (e.g., PROCESS; Hayes, 2013; mediation; Tingley, Yamamoto, Hirose, Keele, & Imai, 2014), its use is facilitated by the fact that the logic of mediation is straightforward in that it supposes that one variable causes one or more variables, which then cause some ultimate outcome. In the simplest case, this takes the form of an independent variable (X), a mediator (M), and a dependent variable (Y), with X predicting M, and both then predicting Y, although it is easy to add additional mediators, either in parallel or in serial. When using regression, parallel and single mediator models may be represented by the following formulas: 1

18 (1) (2) Where represents the intercept for regressed on X, represents the regression weight for X in the regression equation for, the intercept for Y regressed on X and all Ms, and the regression coefficients for each mediator in the regression equation for. The indirect effects are quantified by the product terms a i b i, and c represents the direct effect of X on Y. If there is no missing data, these two quantities then sum to the total effect c, which is itself equal to the covariance between X and Y. Although the model itself is easy to use and understand, neither testing nor estimation has proven to be simple. Significance testing requires the utilization of methods with excellent Type I and Type II error rates, and in the case of mediation this has proven to be a difficult issue because the product term ab is non-normal, and so many standard methods of testing significance have dubious statistical properties, with deflated Type I error rates and low power (MacKinnon, Lockwood, Hoffman, West, & Sheets, 2002). Accurate estimation of the parameters of interest is similarly difficult, as the b and c paths are necessarily partly correlational (Sobel, 2008). As with any analysis, establishing a clear relationship between variables is difficult because of the common problems of missing variables, unmodeled moderators, redundancy of measurement (e.g., M and Y being the same variable), etc. For mediation analysis this difficulty is compounded because mediation necessarily has at least two hypothesized effects on Y 2

19 (this remains true even in the absence of a significant direct effect), and it is further the case that the indirect effect is comprised of two estimated parameters. In general, the estimated effects for a given model can be expected to be biased if the model is in some way inaccurate and not fully reflective of the truth - an issue that many would consider all but given when using statistical analyses (cf. Box, 1976). The degree of bias may be minor and safely ignorable, but in extreme cases bias may strongly affect how a set of variable relationships are interpreted. Before continuing, it is useful to further explicate the issues that arise when testing and estimating mediation models. There is of course a rich literature for issues that arise when using regression in general that apply here as well, as it is how mediation models are often tested. However, in the interest of conciseness we limit ourselves to discussing work exclusively focused on mediation. Testing Effects The original causal steps associated with Baron and Kenny (1986) required significant a, b, and c paths before any claims of mediation could be made. This approach has been outmoded for a variety of reasons, including its lack of quantification of the indirect effect, and because it is has low power (Hayes, 2013; MacKinnon et al., 2002). This is because, contrary to what might be expected because estimating the indirect effect requires estimation of two parameters rather than one, the test of the total effect often has lower power than the test of the indirect effect (Kenny & Judd, 2013; Rucker, Preacher, Tormala, & Petty, 2011). Further, the indirect effect(s) and the direct effect may be in opposing directions, resulting in suppression (MacKinnon, Krull, & Lockwood, 2000). 3

20 Together, these issues make the causal steps approach requirement of a significant total effect an unnecessary and high bar for making mediation claims that may prevent detection of genuine effects. Baron and Kenny (1986) also suggested the use of a normal-theory standard error derived by Sobel (1982) using the multivariate delta method. Although the performance of this of this method of testing indirect effects is clearly superior to that of the causal steps approach and has much higher power (MacKinnon et al., 2002), it nonetheless suffers from a few shortcomings related to the distribution of ab. The distribution of a product term is typically non-normal without large sample sizes (Kisbu-Sakarya, MacKinnon, & Miočević, 2014), and so the normal theory approximation rarely holds in practice. The result of this then is that the Sobel test is overly conservative, with relatively low power and Type I error rates well below the nominal and normative α =.05, and users of mediation are advised to avoiding using it (e.g., Hayes & Scharkow, 2013). At present, the preferred approach for testing indirect effects is to make use of bootstrapped confidence intervals. Bootstrapping as used in mediation simply resamples the data naively with replacement some number of times (e.g., 5000), and for each sample estimates of the indirect effect ab are obtained. These estimates may be used to obtain significance values, using either percentile cutoffs (e.g., 2.5% and 97.5% in the case of α =.05) or some form of correction for bias in the case of the bias-corrected bootstrap, or bias and skew in the case of the bias-corrected and accelerated bootstrap (Efron, 1987). Though these bootstrapped confidence intervals differ in their details, each 4

21 generally outperforms the Sobel test in terms of statistical power (Hayes & Scharkow, 2013). Their Type I error rates are generally more accurate than the Sobel test as well, although still below the nominal rates in some cases (Hayes & Scharkow, 2013; MacKinnon et al., 2002), and inflated in others. Specifically, the bias-corrected and accelerated bootstrap shows some signs of inflated Type I error rate when either the a or the b path is large and the other path is zero, or when sample size is small (Koopman, Howe, Hollenbeck, & Sin, 2015). An additional approach to testing mediation is the use of structural equation modeling (SEM). This is typically done with latent variable models, but strictly manifest variable models are easily employed as well. Latent variable mediation models introduce additional considerations beyond the scope of this dissertation, and so we generally limit ourselves to SEM-based mediation models using manifest variables alone (an exception may be found in Chapter 4 where we use SEM as a way to test the validity of mediation claims). To the author s knowledge, only one article has explicitly advocated for the use of structural equation modeling over regression, regardless of whether latent or manifest variables are used. Iacobucci and colleagues (Iacobucci, Saldanha, & Xeng, 2007) argued that structural equation modeling always outperforms ordinary least squares (OLS) regression, and so should always be used when testing mediation. Although Iacobucci et al. are unique in their suggestion to use SEM exclusively over regression, their results are not, as Cheung (2009) compared multiple methods of estimating standardized confidence intervals for the indirect effect, and showed that at least when using the Sobel standard error, SEM outperforms regression. 5

22 Although the suggestion by Iacobucci et al. (2007) to use SEM exclusively over regression is not without merit, it must be qualified by the fact that they based their claims on the use of the Sobel test and did not consider bootstrapped confidence intervals. The issue that arises here when comparing SEM to regression is that both will yield identical parameter estimates for a mediation model, and so will also yield identical bootstrapping performance whether using regression or SEM. For testing indirect effects then, whether one uses regression or SEM is then irrelevant if one also makes use of bootstrapped confidence intervals (cf. Cheung, 2009) Further, Iacobucci et al. (2007) only used one of many methods to estimate structural equation models. Although they do not state which method they used, it is safe to assume that they used maximum likelihood (ML) because of its popularity and statistical properties (e.g., efficiency). Nonetheless, there is also weighted least squares (WLS), unweighted least squares (ULS), diagonally weighted least squares (DWLS), and generalized least squares (GLS) available to estimate SEMs. Although each estimation method is asymptotically equivalent (Browne, 1974; Shapiro, 1985) and further will yield identical parameter estimates for a saturated model such as mediation, their performance in finite sample sizes can be expected to differ in terms of Type I and Type II error rates due to their differing assumptions regarding the error terms associated with the predicted covariance matrix, e.g., homoscedasticity, independence, etc. (Savalei, 2014). Of course, some a priori predictions may be made based on the assumptions each method makes, but it is difficult to anticipate the tradeoff between types of errors when testing indirect effects because of its unusual distribution. Applying past work is further 6

23 complicated by the fact that most such work is focused on latent variable models (which are to be recommended in their own right of course) rather than a strictly observed and saturated model like mediation, and so it is unclear which is to be preferred when testing for mediation. The superiority of SEM is then conditional on the behavior of these various estimations methods relative to bootstrapping, rather than being a general truth. Examining Effects In addition to significance testing, it is also important to consider any bias or confounding that may affect the parameters of interest. Such bias may be modest and mostly ignorable, but in extreme cases the estimates may be so biased as to yield oppositely signed estimates that would entail completely different interpretations. The issue of biased estimates has been raised in one form or another by many authors (e.g., Bullock, Green, & Ha, 2010; Jo, 2008; MacKinnon, Krull, & Lockwood, 2000; Sobel, 2008; VanderWeele & Vansteelandt, 2009), who point out that in general it is necessary for many strong assumptions to be satisfied in order for mediation models to yield unbiased estimates of the indirect effect (cf. Jo, 2008; Sobel, 2008). These assumptions will be enumerated further later in this dissertation (in Chapter 5), but discussions chiefly focus on the frequency with which confounding may occur (e.g., spurious mediator or a proposed mediator as simply a correlate of the dependent variable; Bullock, Green, & Ha, 2010; Fiedler, Schott, & Meiser, 2011). The need for methods to consider possible bias in the estimated direct and indirect effects has not been ignored in the mediation literature, and there are multiple methods to examine what effects such variables may have on the trustworthiness of the effect 7

24 estimates. As with any analysis, the most straightforward method of dealing with biased estimates is to statistically control for additional covariates. Somewhat similarly, the design of the study itself may be changed to one that better supports testing competing explanations. In addition to experimental studies, which when possible are always to be preferred over correlational studies (Stone-Romero & Rosopa, 2008), it is also advisable to make use of longitudinal designs. One such example may be found in Maxwell and Cole (2007), who showed that biased estimates may occur with cross-sectional designs such that genuine effects may be not significant or of opposite sign, in addition to the possibility that variables with no direct relationship may yield significant paths. To deal with this possibility it is necessary to measure X, M, and Y each at three different points using a cross-lagged panel design to deal with this possibility, and to analyze it accordingly. Latent growth curves and latent difference scores models may also be used to reduce bias, provided that they are more appropriate for the data (cf. Selig & Preacher, 2009). In cases where alternative study designs or measuring possible confounds are not possible (e.g., because of ethical or financial constraints), there are also methods that instead make use of formulas to determine the sensitivity of the estimated weights to a potential confounder. Rosenbaum and Rubin (1983) and VanderWeele (2010) worked with binary confounders, and derived formulas that may make use of arbitrarily selected bias estimates to examine parameter sensitivity. VanderWeele and Arah (2011) extended those approaches to be much more general and able to deal with categorical or continuous variables in a way that is not restricted to any given estimation method (Cox, Kisbu- 8

25 Sakarya, Miočević, & MacKinnon, 2013). Additional methods make use of correlated residuals, based on the logic that if there is no confounding, then the residuals of the two regressions shown in equation (1) should be uncorrelated (e.g., Imai, Keele, & Yamamoto, 2010). As confounders are typically latent in nature, it is a simple extension to view them through the lens of factor analysis. For mediation this is perhaps not often considered because factor models typically require a larger number of variables to estimate than are often used in mediation models and further because factors are typically used for scale construction. In such cases, the factors serve as a useful summary of variable relationships that also ideally explains the relationships. As such, our argument here is that factor models may be used to examine a worst-case confounding scenario by representing a perfectly confounded relationship rather than mediation. This remains true regardless of the true number of confounding variables, or of the number of estimated factors, and the factor(s) may be viewed as composites that summarize all confounding effects, with a weight for each manifest or latent variable. In addition to describing relationships, factors may also be used to rule out some confounding explanations. This is because the possible patterns of loadings on a number of factors are limited to some degree, with some correlation patterns requiring at least two factors (i.e., confounds). With that being said, even if a single factor is adequate to describe the variable relationships, it may simply represent a composite of multiple confounding variables. Showing that one factor is adequate is then only a starting point, 9

26 but demonstrating as such is useful to establish the validity of claims of mediation as at least some competing explanations are ruled out. In addition to establishing that the general conceptual form of a mediation model is appropriate to describe a set of variables (i.e., one variable causing others, which in turn cause another), it is still necessary to consider the effects of less extreme confounding, and to consider the effects of other forms of model violations (e.g., inaccurate functional forms of the relationships or inappropriate error term assumptions). More specifically, it is necessary to consider the degree to which the estimates may be biased as a result of such possibilities (e.g., Imai et al., 2010). Parameter sensitivity is perhaps not often considered in the practice of mediation (though it is advised in the literature, e.g., VanderWeele, 2010), but doing so is necessary because in practice one rarely knows just how a model is inaccurate. Rather, it is necessary to consider the degree to which a model must be wrong before the conclusions drawn from it are invalid (Box, 1976). Despite the fact that the reason a model is wrong is rarely known, the previously mentioned approaches generally assume some knowledge of the form of the inaccuracy in that using them implies a belief that the model is biased only because the relationships are confounded. They are then inappropriate for other violations of model assumptions. Such violations arise because a model is necessarily incomplete, with a few examples being missing paths, measures with less than perfect reliability, missing terms (e.g., quadratic), and issues related to error terms such as non-normal residuals or outliers. Although all approaches relevant to regression apply to mediation (e.g., outlier detection 10

27 and residual inspection), at present, no general techniques have been developed and applied to mediation specifically. The general approach to considering parameter sensitivity in mediation that we will explore is based on fungible parameters (Waller, 2008), which all yield identical values of R 2. In brief, the appeal of fungible parameters is that all sets of them explain the dependent variable equally well, and do so only slightly worse than the optimal (e.g., ordinary least squares) weights. If these weights are highly discrepant yet still explain the dependent variable almost as well, then any conclusions drawn from them should be considered less trustworthy (Green, 1977; Waller, 2008). Purpose Fully proving or disproving mediation claims is of course often impossible, but the plausibility of mediation claims may nonetheless be improved by considering sampling variability, and by testing mediation against alternative explanations. The purpose of this dissertation then is to examine the performance of various methods of testing direct and indirect effects, and to develop tools that researchers may use to consider the quality of the mediation estimates obtained. Of course, that is not to say that the methodologies to be developed here will be sufficient in all cases (and they will of course also require some refinement), but instead may serve as useful and simple tools applicable to many situations. All studies presented herein are focused on mediation between manifest variables of the Gaussian outcome type with normally distributed error terms, as this is in keeping with common practice for mediation research. 11

28 We will begin by comparing various methods of testing direct and indirect effects in Chapter 2, with special consideration paid to structural equation modeling using manifest variables methods due to the relatively few examinations of it in the literature. For each method employed, we will examine the Type I and Type II error rates across a variety of α levels (rather than just the normative α =.05 level) so as to obtain a fuller picture of the performance and quality of each method. This will be facilitated by the use of receiver-operating characteristic (ROC) curves (Hanley & McNeil, 1982), which compare the risks of Type I and Type II errors across decision thresholds (i.e., α levels in this case). As an extension of considering alternative α levels, we will also make some attempt to better understand the performance of bootstrapping investigating the shape of the bootstrap distribution for indirect effects. We will take two approaches to biased estimates. The first approach regarding model inaccuracies focuses on confounding effects and makes use of an alternative logic to explanation the relationships between variables. All possible confounding effects can be captured through a factor model with as many factors as there are variables in the mediation model minus one (e.g., two factors for three variables). Alternatively, factor models may be viewed as viable alternative explanations in their own right. We will first consider how mediation results may look given a factor model in Chapter 3, and consider under what conditions the models may diverge. In Chapter 4, we will also make use of the flexibility of SEM to propose an analysis for longitudinal designs that is a mediation model in its own right, but in the sense that a variable may mediate itself. In brief, the approach simply tests the possibility that a latent variable is sufficient to explain the 12

29 variable correlations, and whether or not the inclusion of additional paths that would yield claims consistent with a mediation model of the sort shown in equations 1 and 2 is warranted. In Chapter 5, we will consider a general approach to model inaccuracies that may be used without specification or for that matter, knowledge - of the type of model inaccuracy. For this area of research simulations are not of much use, as there are infinitely many possible model violations. Instead, we will work with the basic idea that if there are model inaccuracies, then the parameters from a valid model (vis-à-vis the true weights) can lead to a decreased when used in an inaccurate model (e.g., a model missing a mediator or moderator). However, when the weights do not change much due to such a decrease, then one may trust the results even when they stem from an inaccurate model. We investigate the range of the alternative, fungible weights as a means of quantifying parameter sensitivity and the uncertainty associated with a possibly biased model. 13

30 Chapter 2: Relative Performance of Methods of Testing Mediation Effects How best to test for mediating effects has been a recurrent point in the mediation literature. Much of this has been driven by the fact that the distribution of the indirect effect is non-normal, and so it has been necessary to both closely examine approaches that assume normality (e.g., MacKinnon et al., 2002), as well as make use of approaches that do not assume normality (e.g., bootstrapping; Shrout & Bolger, 2002). The historically most popular approach does not test the indirect effect per se, but rather the individual significance levels of the two constituent terms a and b (Baron & Kenny, 1986). Known as the causal steps approach, this approach has been criticized on numerous grounds, including lack of power, the lack of quantification of the indirect effect, and no testing of the indirect effect itself (Hayes, 2013). The limitations of this approach are such that mediations methodologists uniformly agree that it should be not used by researchers (e.g., Hayes, 2013; MacKinnon et al., 2002). In addition to their popularized approach, Baron and Kenny (1986) also suggested using a normal-theory standard error based upon the multivariate delta method (Sobel, 1982). The formula for the standard error is as follows: 14

31 Where is the estimated standard error, and are the squared estimates of the path coefficients, and and are their respective squared standard errors. A test of significance is performed by way of the following formula: With then compared to the appropriate value. This approach is also advised against, as it tends to have relatively poor small sample size performance because it assumes normality, but distribution of the product approaches normality only with large sample sizes (e.g., MacKinnon, Fritz, Williams, & Lockwood, 2007). At present, the preferred approach for testing indirect effects is to create confidence intervals using nonparametric bootstrapping, with the percentile bootstrap or the bias-corrected and accelerated (BCa) approaches being preferred over other alternatives. Much of the advantage of bootstrapping is owed to the fact that it does not assume normality of the indirect effect, and so it has higher power than the Sobel test (e.g., Fritz & MacKinnon, 2007; Hayes & Scharkow, 2013). Bootstrapping methods also tend to have Type I error rates more in line with the nominal α level than the Sobel test does. However, this is conditional on the magnitude of the non-zero path and the sample size. For smaller sample sizes (e.g., N < 200), deflated Type I error rates are observed when both paths are zero or one path is small (Fritz, Taylor, & MacKinnon, 2012; Koopman et al., 2015). Another approach that may be used to test mediation effects is structural equation modeling. In addition to the generally higher power afforded by SEM over regression (cf. Cheung, 2009; Iacobucci, Saldanha, & Xeng., 2007), such an approach is appealing 15

32 because mediation models are global models meant to describe the relationships between multiple variables, and estimating all paths at once is consistent with that fact. Despite the promise of SEM for testing mediation, a potential limitation of SEMs for mediation is that it is necessary to make additional decisions when using SEMs that are not necessary for regression, and the advantages of SEM must be considered while keeping in mind the various estimation methods for SEM. It is important to note that each of these various estimation methods have effects upon the standard errors for each parameter, and so on their power and Type I error rates. It is further the case that SEMs tend to perform poorly in small sample sizes, and the apparent power that comes with its use may simply reflect a downwardly biased standard error estimate because ML is considered a large sample estimator (Bentler & Yuan, 1999). An additional consideration that must be made when evaluating SEMs for mediation is that both regression and structural equation modeling yield identical parameter estimates, and so will perform identically when bootstrapped. Iacobucci et al. (2007) neglected this point, and so their claims must be qualified by that fact. Rather than claim that SEM is superior to regression for testing mediation, it is instead a question of the relative performance of the significance tests that result from each method of estimating the standard errors for each parameter of interest. Model and Estimation Method Assumptions There are multiple common estimation methods for structural equation modeling, and here we make use of the five readily available in the lavaan package in R (Rosseel, 2012), which are maximum likelihood (ML), unweighted least squares (ULS), 16

33 generalized least squares (GLS), weighted least squares (WLS; also known as asymptotically distribution free estimation), and diagonally weighted least squares (DWLS). These estimation methods are readily available in most statistical packages and therefore easy to implement in practice. Although each method yields identical parameter estimates for a saturated model like mediation, the relative performance of each method in regards to Type I and Type II error rates will depend on the assumptions (or lack thereof) of each method regarding the error terms, and so will be the focus of our investigation. To better understand the importance of the assumptions made by an SEM estimation method, it is useful to view SEM as a non-linear regression model that predicts a covariance matrix, rather than individual data points (Savalei, 2014). There are three relevant assumptions made by each estimation method for SEMs in terms of the residuals. The first regards the distribution of the errors (normally distributed or not), the equality of variance of the errors (homoscedasticity), and independence of the errors of the covariance matrix. Weighted least squares does not make any of these assumptions; maximum likelihood and generalized least squares assume normality, and diagonally weighted least squares assumes normality and independence, and unweighted least squares assumes all three (Savalei, 2014; Schumacher & Lomax, 2004). As with linear regression, the assumptions of each method will affect the standard errors of the parameter estimates. The first two assumptions, normality and equality of the error terms, are not necessarily violated when estimating SEMs (Savalei, 2014), although the normality assumption for the covariance error terms may not be satisfied in 17

34 small sample sizes because the variances and covariances of the residuals are only asymptotically normally distributed (Savalei, 2014). The remaining assumption, independence of errors, is particularly crucial for our purposes here. For ordinary least squares regression this assumption is easily satisfied. As long as no observation affects another (e.g., participants are all measured individually, and do not have any pre-existing relationships) then the error terms may be assumed to be independent. In contrast, for SEM this assumption is necessarily violated because it is not individual observations predicted, but rather variances and covariances which necessarily share at least some information in their calculation. As a result, most SEM estimation methods do not make such an assumption. Nonetheless, unweighted least squares (ULS) and diagonally weighted least squares (DWLS) assume independence of the covariance residuals, and so will always be inefficient in an SEM context and perform poorly without some form of correction (Savalei, 2014). Such corrections, often known as robust standard errors, serve to account for the model misspecification made by certain inefficient estimators such as ULS and DWLS, as well as violations for other estimation methods that may arise due to small sample sizes such as non-normality of the residuals. Robust standard errors are calculated using the following formula (adapted from Savalei, 2014): (3) Where includes the naïve covariance matrix of the parameter estimates, is the true asymptotic covariance matrix as estimated by the sample covariance matrix of the estimates, and is the matrix of the model derivatives evaluated at the parameter 18

35 estimates for. The subscript LS refers to any fit function that results in model residuals of a quadratic form, which here includes the estimators ULS, DWLS, GLS, and WLS. The naïve covariance matrix correctly specified. For when it is not, is correct only when the fit function is serves to correct misspecification that may arise when e.g., the residuals associated with the covariance matrix are not independent. In addition to using robust standard errors to correct for the inefficiency of LS estimators, one may also make use of robust standard errors to minimize the impact of violations of the assumptions of ML. Under the ideal conditions we will investigate here (i.e., normally distributed variables with standard normal error), robust standard errors for maximum likelihood may seem an odd inclusion because the corrections are largely unnecessary. Even so, robust standard errors are considered of use in small sample sizes (Savalei, 2014), and the corrections may meaningfully affect testing of the effects of interest in mediation. Another alternative standard error one may use for ML makes use of first-order partial derivatives of the marginal log-likelihood, rather than the second-order partial derivatives. This method typically underestimates standard errors (Cai, 2008), and so might be expected to have upwardly biased Type I error rates for testing mediation. However, such an underestimate may prove appropriate here because most methods of testing indirect effects have very low Type I error rates. An underestimate may then result in more accurate Type I error rates ideally, without exceeding the nominal rate as well as higher power for testing the indirect effect, and so we consider their use here. 19

36 Note that these concerns are not simply a matter of considering the asymptotic efficiency of an estimator, even for the direct effect. In finite sample sizes it is not straight-forward to rank the efficiency of all estimators across all conditions (Savalei, 2014), and similarly Type I error rates need not be constant, let alone always reflect the nominal rate. This issue is compounded by the fact that the distribution of the indirect effect complicates significance testing, making it further unclear how well each estimation method will perform when testing mediation. Bootstrapping Relative to the other methods of testing significance we have discussed, bootstrapping is unique in that tests are performed without any assumptions regarding the distribution of the parameter of interest. This has proven to be advantageous for dealing with the non-normal nature of the indirect effect, and investigations into its performance have consistently shown that it is more powerful than the Sobel (1982) test, and it has become the standard approach for testing mediation. Two often used bootstrapping confidence intervals for tests of mediation are the percentile bootstrap (Efron, 1979 and the bias-corrected and accelerated bootstrap (Efron, 1987). The former simply makes use of percentile cut-offs applied to the bootstrapped resamples to yield a confidence interval. The latter applies a correction for bias in the estimated effect, as well as a correction for skewness in the distribution (Efron, 1987). Which method is to be preferred is somewhat unclear at present due to concerns regarding Type I errors (e.g., Fritz, Taylor, & MacKinnon, 2012; Hayes & Scharkow, 2013), but nonetheless both methods are considered superior to the Sobel test. 20

37 However, despite the popularity of bootstrapping, the reasons it performs so well have not been thoroughly investigated. Although bootstrapping is known to generally perform well when considering assumption-free distributions such as that of the indirect effect, little is known regarding the empirical distribution of the bootstrapped resamples, e.g., bias, dispersion, skewness, etc. Further, although recommended for small sample sizes (Shrout & Bolger, 2002), bootstrapping is in fact not necessarily stable in small sample sizes and may suffer from increased Type I error rates when one path is large (Koopman et al., 2015). That this is the case suggests that the bootstrapped distribution of the indirect does not always accurately reflect the true distribution of the indirect effect. Comparing Method across α levels. Nearly all research regarding methods of testing indirect effects has focused on α =.05 (although one exception is MacKinnon, Lockwood, and Williams (2004) who also considered α levels of.1 and.2, but did so largely in passing). This focus on α =.05 seems likely due to two reasons. The first is that it is the normative level for researchers, who are unlikely to use higher α levels. The second is that focusing on α =.05 is to some degree a matter of necessity, as typically investigations regarding the performance of tests of the indirect effect present results in a number of tables. Often this includes one table for Type I error, another for statistical power, and perhaps another for coverage rates. Considering other α levels is then difficult, if only because of space limitations. Further, it is a great deal of information to integrate, and may be done so only with considerable effort and risk of error. 21

38 Despite this difficulty, there is some merit in considering both lower and higher α levels for tests of the indirect effect, as doing so may help to better understand the performance of the bootstrap, as well as whether or not its use should be encouraged for other α levels. Ideally, the bootstrap distribution should have higher power at all α levels relevant to practice (typically.1 or less), but the asymmetry of the bootstrapped distribution may result in the superiority of bootstrapping over the Sobel test being reduced, if not reversed, at different decision thresholds. One way to deal with the difficulty of considering other α levels is to make use of receiver-operating characteristic (ROC) curves (Hanley & McNeil, 1982). Within the context of statistical testing, ROC curves compare the ability of a method to detect an effect given that it is there (i.e., sensitivity, power, or 1 the risk of a type II error [β]) with the risk of false positives when there is no effect (i.e., specificity or 1 the risk of a Type I error [α]), based on various decision thresholds (in this case, α levels). These values are then used to create a curve, and method comparison may be accomplished by considering the area under the curve (AUC). The AUC is equal to the probability that a testing method will correctly classify a randomly drawn effect (or lack thereof) from two categories as being present or not across all decision thresholds, and may range from 0 to 1. A value of.5 indicates chance performance, and higher values are associated with better classification methods. The obvious advantage of ROC curves is that they afford a simple means to consider the relative performance of methods across a variety of decision thresholds. If two curves do not cross, then the relative ranking of the two methods are constant across 22

39 all α levels. The parametric methods we use here can be expected to behave in such a manner, with all methods performing consistently better (or worse) than another method across decision thresholds. In contrast, the ROC curves for nonparametric bootstrapping might be expected to cross those of the Sobel-based approaches, as bootstrapping does not assume symmetry of the distribution of the parameter of interest. Whether or not this is the case is unclear, but it is possible that at some decision thresholds bootstrapping may not be as clearly preferable when using other α levels, and may in fact be worse in some cases. Purpose The primary purpose of this study is simply to compare the relative performance of a variety of methods of testing for indirect effects, and to do so across different α levels both as a means of informing practice, and as a means of better understanding bootstrapping. In service to that, we will also discuss the direct effect because it serves as a control condition and provides a simple starting point before discussing the more complex indirect effect. Further, the secondary purpose of this study is to better understand the performance of bootstrapping by way of considering its distribution. We do so because it is well-known that the BCa has higher power than the percentile bootstrap (e.g., Hayes & Scharkow, 2013), but at present it is unclear why its corrections are necessary to achieve this result. Both purposes will be facilitated by the use of ROC curves. In regards to SEM estimation methods and regression, in general it can be expected that the estimation methods will perform comparably well, with a slight 23

40 advantage for ML because of its efficiency for multivariate normal data, which is what we make use of here. Caveats to the hypothesized superiority of SEM are the estimation methods ULS and DWLS, which are known to be inefficient in an SEM context because of their use of a diagonal weighting matrix (Savalei, 2014). As such, they are likely to have much lower power than other estimation methods, and so lower AUC. The expected behavior of the ROC curve for bootstrapping is less clear however. Work on bootstrapping in mediation has typically, if not exclusively, focused on the relative performance of each method at a nominal α of.05 but because of the positive skew of the distribution of the product the advantages of bootstrapping may be reduced, if not reversed, at other α levels. At what α levels this may be true was not predicted a priori. Method Design. Sample sizes. The present study made use of sample sizes of N = 50, 100, and 200. These sample sizes were chosen as being reasonably representative of common sample sizes employed in psychological research, and have often been employed in past mediation simulation research (e.g., Hayes & Scharkow, 2013). Regression weights. We used regression weights of 0,.14,.39, and.59, as these weights are often used in mediation simulation work (e.g., Hayes & Scharkow, 2013), and represent no, small, medium, and large effect sizes, respectively. Data Generation. In order to generate the data employed here, the data were created by generating values of X from a standard normal distribution, with M generated 24

41 from the values of X using the appropriate effect size and standard normal error (e.g., values of X *.14). Values of Y were then generated based on the appropriate weights based on the values of X and M, again with standard normal error. Together there are 64 possible effect size combinations, and with sample size considered there were in total 192 conditions. For each condition there were 1000 replications, for a total of simulated data sets. Methods of Significance Testing. For each method listed below, we calculated confidence intervals and rejection rates based on α levels of.01,.02,.03,.04,.05,.06,.08,.10,.12,.14,.16,.18,.20,.25,.30,.04, and.5. Additionally, we also calculated the average standard error of each method per condition, so as to compare it to the standard deviation of the estimates themselves. This afforded a method of examining any bias that might occur in the estimation of standard errors for parametric methods. For bootstrapping, we saved the standard deviation of the bootstrapped resamples for each replication, as well as their mean, skewness, and kurtosis, so as to provide some sense of the average shape of the bootstrapped distribution to compare it to the distribution of the estimates across all replications. It is worth acknowledging that the standard deviation is somewhat irrelevant for bootstrapping because of its use of cutoffs from an empirical distribution and the asymmetry of the indirect effect, but it nonetheless serves as a measure of dispersion (that is admittedly non-optimal because of its reliance on the mean of a skewed distribution) that may be used to describe the distribution of the bootstrapped resamples. 25

42 Regression. For regression models, tests of the direct and indirect effect were calculated using the normal theory approaches of the Sobel standard error for the indirect effect, and typical regression standard errors for the direct effect. Structural equation modeling. We made use of five SEM estimation methods, and refer to them using the abbreviations from the package lavaan (Rosseel, 2012). Specifically, we used maximum likelihood (ML), unweighted least squares (ULS), generalized least squares (GLS), weighted least squares (WLS), and diagonally weighted least squares (DWLS). Further, we also considered alternative standard errors. In the case of maximum likelihood, we made use of standard errors based on first-order partial derivatives (MLF), as well as robust standard errors (MLM). For robust standard errors we applied them only to ULS (ULSM), because any least squares estimation method for a mediation model yields identical results using robust standard errors. Bootstrapping. Each simulated data set was bootstrapped 5000 times, and we made use of two methods for bootstrapping confidence intervals. The first was the percentile bootstrap, which simply selects values at appropriate percentiles to obtain a significance test (Efron, 1979). The second was the bias-corrected and accelerated bootstrap (BCa; Efron, 1987). This method includes corrections for bias and skewness in the bootstrapped distribution. Results To begin, we created ROC curves to compare power to detect effects of any magnitude, relative to the risk of Type I error rates when at least one path in the indirect 26

43 effect (a or b) was equal to 0. This allowed a comparison of Type I error rates against the gains in power that are afforded by increasing α levels, and whether or not the relative performance of a method changed across α levels. In order to create each curve and to calculate the AUC, an additional α level of 1 was added, for a total of 17 points. The points between all points were then linearly interpolated to yield smooth curves, with the exception of straight lines observable in the curves at lower levels of specificity as a result of the gap between α =.5 and α = 1. Additionally, so as to avoid confusion, all percentages referred to here are absolute percentages or differences, e.g., if one method rejected 20% of the time and another 60% of the time, then this was reported as a 40% difference. ROC curves for the direct effect are shown in Figures 1, 2, and 3, and for the indirect effect in Figures 4, 5, and 6. Each of these figures provides plots of the sensitivity across the full range of specificity values, as well as additional plots of the sensitivity as a function of nominal α levels for a restricted range of values (α =.01 to.2) because the uniformly low observed Type I error rates result in overlapping lines for the AUC plots that are difficult to visually distinguish. Tables 1 and 2 provide sharper resolution regarding the correct and incorrect rejection rates for the direct effect for αs of.01,.05,.10, and.20, and Tables 3 and 4 do so for the indirect effect. 27

44 28 Figure 1. ROC curve for the direct effect and N = 50, collapsed across all effect size combinations. Full plot comparing the specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison. 28

45 29 Figure 2. ROC curve for the direct effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison. 29

46 30 Figure 3. ROC curve for the direct effect and N = 200, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison. 30

47 31 Observed Power for Methods of Testing the Direct Effect α N Regression Percentile BCa ML MLM MLF WLS ULS ULSM GLS DWLS Table 1. Power for testing the direct effect for all methods, collapsed across all effect size combinations. 31

48 32 Observed Type I Error Rates for Methods of Testing the Direct Effect α N Regression Percentile BCa ML MLM MLF WLS T ULS ULSM GLS DWLS Table 2. Type I error rates when testing the direct effect for all methods, collapsed across all effect size combinations. 32

49 Direct Effect. We begin with the direct effect because it is normally distributed, and simple to examine before discussing the more complex indirect effect. Immediately apparent is that most methods performed comparably well for testing c, with minimal differences between them across α levels and sample size combinations. As there are clear groupings of estimator performance, we will discuss them in four blocks. The first grouping consisted of regression, maximum likelihood (ML), and generalized least squares (GLS), and these methods performed similarly across conditions. Interestingly, regression had a slight advantage in terms of AUC over both estimation methods when averaged across conditions. However, this was driven partly by the fact that regression was observed here to have Type I error rates below the nominal rate, whereas SEM had more accurate Type I error rates. Further, SEM using ML had higher power than did regression, at least for detecting small effects (for larger effects the differences were minimal, e.g.,.2%). Where c =.14, regression rejected 17.1%, 26.8%, and 48.4% of the time for N = 50, 100, and 200, respectively. In contrast, ML rejected 18.6%, 28.3%, and 49.1% of the time. This is of course a modest difference, but it does appear to show that SEM outperformed regression in terms of power and in terms of the accuracy of Type I error rates. The second grouping consisted of weighted least squares, unweighted least squares with robust SEs (ULSM), and maximum likelihood with robust SEs (MLM), all performed about the same. These methods were generally worse than ML or regression in terms of AUC, albeit only slightly so. This difference was driven by the fact that although these methods generally had slightly higher power rates than e.g., ML (roughly 1-2% 33

50 higher), this gain in power came at the cost of Type I error rates that were 1-2% higher than the nominal rate for N = 50 or N = 100; this difference decreased for N = 200. The third category consists of the remaining parametric methods, which all performed clearly worse than all other methods. This category consisted of maximum likelihood with first-order SEs (MLF), unweighted least squares (ULS), and diagonally weighted least squares (DWLS) generally performed worse. In regards to MLF, although its AUC values were comparable to ML, this similarity belies the fact that MLF had both low Type I error rates and low power, particularly for small sample sizes. The remaining two, ULS and DWLS, performed particularly poorly. Similar to the previous category, these methods had Type I error rates 1-2% lower than the nominal rate. However, the power of these methods was far less than that of all other approaches for N = 50 or N = 100, and further for small effects these methods correctly rejected the null up to about 7% less than the other methods (see Tables 2.1 and 2.2). DWLS performed particularly poorly, and for medium effects it rejected 50% less given N = 50, and 30% less given N = 100. For large effects, the difference in power was as much as 80% for N = 50, and 44% for N = 100. The fourth category consists of both bootstrapping methods. These confidence intervals performed well, albeit with slightly lower power than maximum likelihood; this is unsurprising given that the simulated data adheres to the assumptions of maximum likelihood. Across effect sizes, the percentile bootstrapped performed slightly better than the BCa in terms of AUC. Both methods had Type I error rates in line with the nominal rates, and had higher power than regression alone, but lower than ML, WLS, and GLS, and ML and ULS with robust standard errors. 34

51 It should be acknowledged that there was not a linear effect of sample size observed here. Specifically, for N = 100, all methods, both parametric and nonparametric, had increased Type I errors relative to N = 50 and N = 200. As the Type I error does not depend on sample size only our simulated Type I error here does we do not interpret it. 35

52 36 Figure 4. ROC curve for the indirect effect and N = 50, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison. 36

53 37 Figure 5. ROC curve for the indirect effect and N = 100, collapsed across all effect size combinations. Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison. 37

54 38 Figure 6. ROC curve for the indirect effect and N = 200, collapsed across all effect size combinations Full plot comparing specificity (1 observed Type I error rate) and sensitivity (1 observed Type II error rate) is on the right, and the plot on the left shows a limited range of nominal α levels for comparison. 38

55 39 Observed Power for Methods of Testing the Indirect Effect α N Regression Percentile BCa ML MLM MLF WLS ULS ULSM GLS DWLS Table 3. Power for testing the indirect effect for all methods, collapsed across all effect size combinations. 39

56 40 Observed Type I Error Rates for Methods of Testing the Indirect Effect α N Regression Percentile BCa ML MLM MLF WLS ULS ULSM GLS DWLS Table 4. Type I error rates when testing the indirect effect for all methods, collapsed across all effect size combinations. 40

57 Indirect Effect. As in the case of the direct effect, we will again present the methods as falling into four categories, with the first three categories parametric in nature, and the remaining fourth the nonparametric bootstrapping approaches. Specifically, the parametric estimation methods may be considered to fall into three categories, with the first including ML, GLS, and regression, the second WLS, robust standard error approaches (ULSM and MLM), the third category ULS, DWLS, and MLF. In general, the relative performance of the parametric methods was as might be expected from the results concerning the direct effect, with again modest differences in AUC values for different methods. Those methods in the first category again had the highest values of AUC. Further, as in the case of the direct effect, the differences between regression and SEM with ML were small, with the latter appearing to be preferable in that both methods had approximately equal Type I error rates, but SEM with ML had higher power (cf. Iacobucci et al., 2008). This may be seen in Tables 3 and 4. Given the recommendation of Iacobucci et al. (2008) to use SEM exclusively over regression and that SEM estimation methods tend to require larger sample sizes, we also considered possible bias in the estimated standard error, relative to the observed standard deviation of the estimated effects across all 1000 replications, which then serves as an estimate of the true distribution of the indirect effect. As can be seen in Figures 7 and 8, as sample size increased so too did the accuracy of both OLS regression and SEM with ML, such that for N = 200 both approaches yielded reasonably accurate estimates of the true distribution. In contrast, for N = 50 both approaches yielded clearly biased estimates. Interestingly, the bias for these two methods were in opposite directions, such that for N 41

58 = 50 regression tended to overestimate the standard deviation, whereas SEM with ML tended to underestimate it. The remaining parametric approaches - MLF, DWLS, and ULS - again had the lowest values of AUC among the parametric methods, but the differences were quite small compared to those observed for testing the direct effect. This however belies the fact that for small α levels and sample sizes these methods have much lower power than the other methods, as is readily apparent by the clearly lower lines in Figures 4, 5, and 6. In regards to bootstrapped confidence intervals, both the percentile and the BCa bootstrap approaches generally had lower AUC values than the parametric approaches. At first glance this appears to stand in contrast to standard practice and recommendations regarding the use of bootstrapping, but in fact our results here do not conflict with such recommendations. The gain in power afforded by bootstrapping approaches comes at the modest cost of higher (but still generally acceptable) Type I error rates, with this trade-off resulting in similar AUC values. 42

59 43 Figure 7. Difference between the standard deviation of ab estimates across all replications for a given condition, compared to the observed standard deviation of the replications. Calculated as the mean estimated SE for regression across all replications SD of replications. Figure 8. Difference between the standard deviation of ab estimates across all replications for a given condition, compared to the observed standard deviation of the replications. Calculated as the mean estimated SE for SEM with ML across all replications SD of replications. 43

60 Comparison of Distributions In the previous section we found that methods of testing the indirect effect perform roughly as might be expected from the results for the direct effect, and further as might be expected by the literature regarding testing indirect effects. However, it remains unclear as to how well it recovers the true distribution of the indirect effect. To help better understand the performance of the percentile bootstrap, and by extension why it is necessary to provide some form of correction as the BCa does in order to maximize power and coverage of the true effect (cf. Hayes & Scharkow, 2013), we considered the average properties of the distribution of the estimates for a condition (the initial estimates for each replication), relative to the averaged bootstrapped distributions (1000 replications, with 5000 bootstraps each). Stated less precisely, we compared the observed distribution of the estimates to the bootstrapping distributions. Specifically, we considered the mean, median, standard deviation vs. standard error, skewness and kurtosis of these two distributions. Comparisons of these measures are shown in Figures 9 through 13. The differences between these two distributions as they relate to the magnitude of ab are shown in Figures 14 through

61 45 Figure 9. Mean of the mean for bootstrap resamples across all replications, compared to the mean of all replications. Figure 10. Median of the median for bootstrap resamples across all replications, compared to the median of all replications. 45

62 46 Figure 11. Mean of the skew for bootstrap resamples across all replications, compared to the skew of all replications. Figure 12. Mean of the kurtosis for bootstrap resamples across all replications, compared to the kurtosis of all replications. 46

63 47 Figure 13. Mean of the standard deviation for bootstrap resamples across all replications, compared to the standard deviation of all replications. 47

64 48 Figure 14. Difference between the mean ab across all replications for a given condition, compared to the mean ab from each bootstrapped distribution. Calculated as mean mean of bootstrapped distributions - mean of replications. Figure 15. Difference between the median ab across all replications for a given condition, compared to the median of the median ab from each bootstrapped distribution. Calculated as the median median of the bootstrapped distributions - median of replications. 48

65 49 Figure 16. Difference between the skew of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean skew from each bootstrapped distribution. Calculated as mean skew of bootstrapped distributions - skew of replications. Figure 17. Difference between the kurtosis of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean kurtosis from each bootstrapped distribution. Calculated as mean kurtosis of bootstrapped distributions - kurtosis of replications. 49

66 50 Figure 18. Difference between the standard deviation of the distribution of the estimated values of ab for a given condition across all replications, compared to the mean standard deviation from each bootstrapped distribution. Calculated as mean standard deviation of bootstrapped estimates the standard deviation of the replications. 50

67 Bootstrapping generally accurately estimated the mean of the distributions (Figures 9 and 14). Figure 14 suggests a slight tendency for overestimation of the mean when a = b =.59, but the magnitude of the difference is so small as to be considered zero and ignorable. As such, bootstrapping may be considered to perform well in terms of recovery of the mean of the indirect effect for all effect sizes. For the remaining distribution characteristics, it is useful to distinguish between cases when ab = 0 and when ab > 0. We will begin with the cases where ab = 0. For such cases, bootstrapping accurately estimated the median (Figures 10 and 15), with increasing sample size further improving accuracy. In contrast, bootstrapping tended to underestimate the magnitude of skew such that the skew of the bootstrapping distributions was reduced to near zero, despite the distribution of the replications having negative skew (Figures 11 and 16). Similarly, under the null bootstrapping grossly underestimated more extreme values of kurtosis for the indirect effect distribution (e.g., a kurtosis of 12 compared to a kurtosis of 7; Figures 12 and 17). Finally, consistent with the well-known fact that the use of percentile bootstrapped confidence intervals rarely results in a Type I error, bootstrapping clearly overestimated the standard deviation of the true distribution when ab = 0 (Figures 13 and 18). In sum then, when there was no indirect effect, bootstrapping seems to have accurately captured measures of central tendency (the mean and the median), overestimated dispersion, and underestimated the magnitude of skew and kurtosis. We now turn to the cases where ab > 0. Here, bootstrapping again accurately estimated the mean of the true distribution, with very little difference between the bootstrapping distribution and the replication distribution. However, bootstrapping 51

68 showed a tendency to underestimate the median (Figure 10), with the magnitude of the underestimate having increased with ab (Figure 15), and decreasing with N. This bias was modest however, and was at most at an underestimate of.011 for ab =.3481 and N = 50. In regards to skew (Figures 12 and 16), bootstrapping again tended to underestimate the magnitude of skew, with the result being that it was less skewed than the replication distribution. The degree of underestimation decreased with ab. For kurtosis, unlike when ab = 0, the kurtosis of the true and bootstrapping distributions were similar. Additionally, as ab increased, bootstrapping more precisely estimated the kurtosis of the distribution of the replications. In regards to the standard deviation when the null was false, we found that in general bootstrapping overestimated the standard error of the distribution of the replications. However, this effect was curvilinear such that the degree of overestimation decreased until a = b =.39, and so ab =.1521, but again increased when either a or b was equal to.59, and so ab =.2301 or Discussion Past work on methods of testing indirect effects has almost exclusively focused on α =.05, and such work has clearly established the superiority of bootstrapping over the Sobel test at that decision criteria. As we have shown here, it is also the case that bootstrapping is preferable for other decision thresholds as well, including those relevant to common practice (i.e., α =.01 or.10). For testing direct or indirect effects with parametric methods, there was relatively little difference between the AUC values of most methods, and most performed comparably well across α levels, excepting the poor performance of ULS, DWLS, and ML with first-order standard errors (MLF). Of note is that, consistent with Iacobucci et 52

69 al. (2007), ML with naïve standard errors had higher power at lower α levels, as well as more accurate Type I error rates than regression. Nonetheless, although ML had more accurate Type I error rates and higher power, the AUC of ML and regression typically agrees to the second or third decimal place, with regression outperforming SEM under some conditions. Perhaps unsurprisingly then, it is then a matter of one s weighting of Type I and Type II errors in deciding between these two methods. In regards to bootstrapping there were two main findings. The first was simply that at all α levels we considered here, and regardless of parameter estimation method, bootstrapping maintained its power advantage over the Sobel test. As such, at least when using strictly manifest variable mediation models, the use of SEM or regression will yield identical conclusions regarding the indirect effect if one makes use of bootstrapping, and as such the recommendation of Iacobucci et al. (2008) to use SEM over regression must be qualified by that fact. The second finding is that, interestingly, the distribution of the bootstrapped indirect effect does not accurately reflect the true distribution of the indirect effect, at least not for the sample sizes we considered here. What we present here is not the first example of bootstrapping failing to accurately recover the true distribution. Indeed, the failure to do when creating confidence intervals for standard errors is originally what led Efron (1987) to create the bias-corrected and accelerated (BCa) bootstrap. Although our results here are of course preliminary, they do serve to provide some explanation for the superiority of bootstrapping. It is well-known that the percentile bootstrap has low Type I error rates (e.g., Koopman et al., 2015), and as we show here this may be attributed to the fact that under the null bootstrapping clearly has wider 53

70 distributions than was accurate. In contrast, for small effects this overestimation is much smaller and so may serve to explain in part why bootstrapping performs so well for such effects. In practice, which bootstrapped confidence interval is to be preferred depends upon one s weighting of Type I and Type II errors (cf. Hayes & Scharkow, 2013), as well as effect size. If one is most concerned about false positives, then the percentile bootstrap appears to be the best option as in general it has Type I error rates below the nominal level, yet is still more powerful than Sobel-based approaches. Additionally, the percentile bootstrap also had higher values of the AUC than the BCa, and in general will result in more accurate conclusions (note that this is across α levels, rather than specific to α =.05). If one is most concerned about power, then the BCa is to be preferred when testing indirect effects. For α =.05, with correct rejection rates were 3-5% higher than the percentile bootstrap. However, this gain in power comes at a cost of increased Type I error rates, both relative to the decreased error rates of other methods and in absolute terms, as false positives may exceed the nominal α level if one path is large or sample size is small (cf. Koopman et al., 2015). In such cases, Type I error rates are roughly 1-4% higher than the nominal rates. Future Directions. Here we considered only the single mediator case because of its popularity and because it is a natural starting point for mediation research, but it is unlikely that only a single process mediates the relationships of interest in psychology (Baron & Kenny, 1986). A clear next step for future research then is to consider the relative performance of 54

71 the methods we considered here in a parallel or serial mediator context, which would have at least two benefits. The first is that it would consider the performance of these methods under conditions that are generally considered more likely to reflect the underlying processes. The second is that it would also allow some investigation into the effects of misspecification were one to falsely assume that the path between mediators was equal to zero (i.e., to treat a serial mediator as a parallel mediator case). This is perhaps not often of interest to users of mediation, but given that the interpretation of the variable relationships may be strongly contingent on the relationship between the two mediators it is worth considering. Regarding the performance of bootstrapping, what we showed here was only an initial investigation into the reasons for its performance. In addition to future research considering the above point regarding parallel and serial mediator cases, a more thorough investigation into the effects both sample size and effect size is warranted. This is because, consistent with past work showing that bootstrapping tends to have poor small sample size performance (Koopman et al., 2015), we found that as sample size increased the accuracy of the bootstrapped distributions improved. In regards to effect size, the bias observed for both skew and the standard deviation were clearly non-constant across effect sizes, and a deeper investigation would likely reveal how this relates to the observed performance of bootstrapping. Such an investigation may also investigate why there was no apparent crossing of the ROC curves. Speculatively, it is possible that this is due to the non-constant bias in skew and dispersion as effect sizes change, but a more in depth analysis is required to understand these findings. 55

72 Finally, although we considered why the corrections offered by the BCa are necessary when testing the indirect effect, we did not investigate the effectiveness of the corrections in relation to the true distribution. The BCa is known to have inflated Type I error rates under some cases (e.g., small sample sizes), and so it is clear that the corrections do not perfectly yield the true distribution of the indirect effect. This may perhaps be investigated in greater detail by way of comparing the analytic approximation to the BCa developed by Efron (2003) to the distribution of the product developed by MacKinnon and colleagues (MacKinnon, Lockwood, & Williams, 2004). 56

73 Chapter 3: Factor Model as an Alternative Explanation In practice it is rarely the case that all confounding variables are known, and there is further often limited information regarding the nature, effects, and number of the unknown confounders. Even so, it is possible to consider the sensitivity of the estimated effects to possible confounding. Doing so provides some information about how wrong a model must be in order for any conclusions drawn from it to be undermined, or alternatively may be considered to provide information regarding alternative ways of viewing the data one has. Sensitivity analysis for mediation may take a few forms. Ideally, parameter sensitivity is considered by way of measured variables, i.e., statistically controlled for. Without such variables, sensitivity analysis may be conducted using formulae that allow one to consider arbitrarily strong unknown confounders. These approaches have developed to the point that one may consider dichotomous and continuous confounders, without being confined to a particular model (e.g., VanderWeele & Arah, 2011). A third approach that is not often explicitly discussed when using mediation is to make use of different models as a means of examining parameter sensitivity. Selig and Preacher (2009) discuss a few relevant models, including autoregressive models, latent growth curves, and latent difference scores. Additionally, Maxwell and Cole (2007) discuss the use of cross-lagged panel designs to reduce bias in the estimated mediation 57

74 effects. The approach we take here also considers the utility of alternative models in examining the quality of mediation claims. Specifically, we make use of factor models, which are formally equivalent with mediation models because both may perfectly recreate the correlations between a set of variables, provided that enough factors are employed (at most, one less than the number of variables involved in a mediation scheme). Using factor models to examine parameter estimates has a few advantages. The first is that it captures the potential effects of confounding in an elegant way, and it is simple and straight-forward to use and interpret. Factor models have the useful property of constraining all correlations between variables to be 0, conditioned on the effects of the latent variable. Applied to mediation, doing so results in a way to quantify the worstcase confounding scenario, where all apparent direct and indirect effects are in fact due to other variables not included in the mediation scheme. Regardless of the true number of missing variables, the factors themselves may be interpreted as an amalgamation of all confounding effects, with the loadings serving as weights that summarize their effects upon the manifest variables of interest. In addition to utilizing factor models to capture all confounding effects in a worstcase scenario, a factor model may also be considered a viable alternative model in its own right. This is particularly true when the variables involved in the study are of a similar kind (e.g., emotions, attitudes, etc.), as the more related a set of measured variables are the more likely it is that they simply represent the same variable measured repeatedly. Indeed, this is well-known to users of mediation and reflected in concerns that M and Y may represent the same variable (Bullock, Green, & Ha, 2010). The result of this then is 58

75 that researchers must prove that they are in fact measuring different variables for X, M, and Y or at least more than a single variable. Further, psychological research and theory often assumes a small number of latent variables are responsible for a variety of observed variables. This is clearest in the case of self-reported scales that use multiple indicators for a latent variable, but a few general examples include stereotype threat effects explained by working memory (Schmader & Johns, 2003), persuasion by levels of thought confidence (Petty, Briñol, & Tormala, 2002), and performance by arousal (e.g., Anderson, 1994). Of course, variables that are not conceptually or methodologically similar (e.g., self-report vs. physiological measures) are less likely to share a factor space, though they may well share a common cause (e.g., a stressor affecting both reported anxiety levels and cortisol levels). In such cases where a factor model is appropriate, either as an alternative explanation or as a method of capturing confounding effects, the implication is that it is not necessarily the variables of interest that are responsible for any effects, but rather it is the underlying factors that are responsible for the relationships and the changes in the variables of interest. Woody (2011) briefly discussed this possibility, and showed with a single example that it is easy to use scale items to test for mediation, resulting in statistically significant results. Specifically, he showed that using three items from a dieting scale (Restraint Scale; Herman & Mack, 1975) as X, M, and Y resulted in apparent mediation. Such an example is perhaps a bit contrived, but it nonetheless illustrates a case where a mediation model would be in conflict with both standard practice and intuition. That is not to say that mediation as it is often described could not occur between single items of a self-reported measure, but rather that because of the similarity of the variables 59

76 it would be more difficult to prove that there are any causal relationships in the manner supposed by mediation. Factor Spaces in Mediation Before we begin illustrating how mediation results may appear when a factor space underlies the variables, we will first further develop why a factor space may be used to capture confounding effects or serve as an alternative explanation in its own right. As how a factor model is interpreted is quite up to its user, we will not too heavily belabor the distinction between confounding and alternative models, and instead leave it to the user to decide which is most appropriate for a set of variables. The viability of a factor model is most evident in the case of non-experimental, cross-sectional data, using conceptually similar variables, and so we will begin there. We focus primarily on conceptually similar variables as they seem likely to be the cases where a factor model would be of the greatest utility. By conceptually similar variables, we refer to variables that may be considered to be of the same type (e.g., emotions, indicators of intelligence or self-esteem, interpersonal styles, etc.). The similarity of such variables makes it likely that they share similar causes, and in cases involving conceptually similar variables, a low-dimensional factor space is a simple and self-evident alternative to a mediation scheme that is consistent with common practice. The possibility of a lesser number of variables being responsible for a set of variable relationships is of course one of the core motivations behind latent variable modeling more generally, and situations where theory describes a shared factor space describes the observed variables are not uncommon. In general, psychological research and theory assumes a small number of latent variables are responsible for a variety of 60

77 observed variables. Some circumplex examples include emotions (Russell, 1980; 2003) and interpersonal relationships (Gurtman, 2009), and one or two factors are also described for domains such as self-monitoring (Lennox & Wolfe, 1984; Snyder, 1974), narcissism (Dickinson & Pincus, 2003; Raskin & Terry, 1988), and self-esteem (Rosenberg, 1965; Tafarodi & Swann, 2001). One might be inclined to believe that if latent variables are used to capture X, M, and Y with multiple indicators for each factor, but doing so does not preclude the possibility of a shared factor space. Factors are simply summaries intended to capture some dimension of a measure and may themselves be summarized by still higher-order dimensions. A few examples include self-compassion and its six sub-scales (Neff, 2003) and the higher-order factors of stability and plasticity in relation to the Big Five personality dimensions (DeYoung, Peterson, & Higgins, 2002). The importance of this fact is that while structural equation modeling may be used to great effect for testing mediation (cf. Iacobucci et al., 2007; Selig & Preacher, 2009), it does not provide prima facie evidence that a shared factor space does not underlie the variables of interest. Although we focus on conceptually similar variables as being most likely to share a factor space, we wish to also note that the issue we discuss here is potentially present when dealing with methodologically similar variables in particular, self-report variables. Methodological factors may result in a shared factor space that affects multiple measures, and such factor spaces may occur simply due to a response process as when respondents use overlapping information about themselves or others (e.g., Borkenau, 1986; Wojciszke, 1994). For example, participants particularly apathetic ones, as any researcher familiar with undergraduate participant pools may attest to - may use the same 61

78 behavior (e.g., doing a difficult favor for a friend) to answer questions about friendliness, helpfulness, and kindness, and perhaps even competency, loyalty, and reliability. Similarly, affect has sufficient heuristic value as to influence a variety of dependent variables (cf. Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). As a result then, even conceptually distinct variables may nonetheless share a factor space, and so a factor model may still be used to great effect. A point that we have neglected thus far is that a set of variables may follow a factor model, even if there are direct relationships between them beyond the relationships described by the factor model. Such relationships are violation of the factor model assumption of local independence, i.e., given the latent variables the observed variables do not influence each other in a direct way. Detection of such effects would eliminate a factor space interpretation, but it is not possible to do so with only three variables because of the model equivalency. However, if there are multiple indicators for the variables of interest then it is possible to make use of the differences between the two models to better validate and test mediation claims. We do not discuss this further here, and instead develop this point in greater detail in Chapter 4. For now, it is sufficient to simply assume that one has only three variables, as that is not uncommon when using mediation models. Shared Factor Spaces and Study Designs Broadly, the design of a study may differ in two respects. The first is the degree of manipulation of X, and the second is the amount of time that passes between measurements of the variables of interest. From a causal interpretation point of view, the ideal design is experimental and longitudinal, with random assignment to some manipulation of X, with M following X, and Y following M (cf. Maxwell & Cole, 2007; 62

79 Stone-Romero & Rosopa, 2008). In practice, however, it is not always possible to have such a design, as some variables are difficult if not impossible to manipulate (e.g., socioeconomic status, gender, or health). It is also not possible to simultaneously apply an experimental manipulation to M without interfering with the mediation of effects (Bullock, Green, & Ha, 2010), as mediation concerns the transference of an effect X has on M. Randomly assigning both variables in the same experiment interferes with any effect X may have on M, and the effect on Y caused by a manipulation of M may not be the same effect that mediates the XY relationship. One may experimentally manipulate M in a separate study as a test of the mediation relationship between M and Y, but it is not possible to show that an effect of X is being passed through M if a manipulation is applied to both variables simultaneously. As a consequence, even experimental mediation designs are necessarily partly correlational (cf. Sobel, 2008) and do not eliminate the possibility of a shared factor space. Specifically, if X is a manipulation, the shared factor space idea still applies to the mediator(s) and the dependent variable. However, because the factor space idea is necessarily limited for experimental studies, we will not pursue this design. It is simply worth making clear that experimental manipulations, while generally preferable in examining mediation, do not circumvent the possibility of a shared factor space or otherwise confounded relationship (Bullock, Green, & Ha, 2010; Jo, 2008; Sobel, 2008). A similar issue applies in regards to the time course of a study. Although longitudinal designs are generally preferable over cross-sectional designs, their use does not eliminate the possibility of confounding. Instead, the advantages it affords relate to reducing bias in the estimates of the direct and indirect effects (Cole & Maxwell, 2003; 63

80 Maxwell & Cole, 2007; Maxwell, Cole, & Mitchell, 2011). Still, it is worth briefly describing the necessary modifications to the factor space idea in the case of a longitudinal design. Maxwell and Cole (2007) describe both an autoregressive approach and a random effects approach to testing for mediation. Figure 17 gives a representation that can be used for both approaches. We will not use such a design here (but will do so in the next Chapter); it is simply to illustrate that a factor model may still apply regardless of the time between measurements. β 12 β 01 β 12 θ t=0 θ t=1 θ t=2 λ X λ M λ Y λ X λ M λ Y λ X λ M λ Y X 1 X 0 M 0 Y 0 M 1 Y 1 X 2 M 2 Y 2 Figure 19. One-factor longitudinal model. X 0, M 1, and Y 2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis. 64

81 What looks like three different latent variables in Figure 17 is in fact one latent variable that possibly changes across time, and is measured at t = 0, t = 1, and t = 2. For the autoregressive model, θ t+1 = β 0 + β t,t-1 θ t + ε t+1, with the additional constraint that β 01 = β 12, i.e., there is a fixed autoregressive effect, constant across time. Alternatively, for the latent growth curve model, there are random effects such that θ t = θ t=0 + βt + ε t, with θ t=0 as the random intercept, β is the random slope, and t is a measure of time elapsed since t = 0. In contrast to the autoregressive approach, the regression coefficients β in Figure 17 would not be equal. Instead, they would depend on t and on the variance and covariance of the random variables of the growth model. Although the above models are comprised of three factors with three indicators each, in practice one may make use of an incomplete longitudinal design that only uses each indicator once, with X, M, and Y measured at three distinct time points. However, as in the case of a cross-sectional design, a one-factor model would likely be sufficient to recreate their correlations, and therefore also able to explain the results from a mediation analysis. The nature of the study design has no effect on this, as the correlations are agnostic to study design. A factor model may apply equally well regardless. Purpose and Plan The relevance of factor models to mediation models will be developed in two parts. First, we start from factor models in order to illustrate what the mediation analysis results might look like in such cases. This is our main purpose, as what can be expected from mediation analysis results when the true model is a factor model has not been illustrated. Second, although estimating factor models using just three variables is 65

82 difficult and is likely the primary reason that they are not considered when conducting tests of mediation, we will show it is possible to calculate some factor structures using just the three variables in a mediation model. To illustrate our point regarding factor and mediation model equivalence, we will show what sorts of mediation results may occur given different positions of a set of variables in a shared factor space. We have chosen the case of a single mediator between the independent and dependent variables because it is the simplest mediation model, and it is sufficient for illustrative purposes. Note however that regardless of the number of variables in a mediation scheme, a factor model may be used either to capture confounding effects or to serve as an alternative model entirely. We shall work with X, M, and Y as manifest variables that are located in a twodimensional factor space together with a number of other variables. In the present case, the variables may be considered to be measured with perfect reliability, such that the variance that is not contained in the factor space is unique variance rather than error variance. The effects of unreliability are then eliminated so that the effects of belonging to a factor space can be seen in a pure form, and without any distortions of the paths due to unreliability. For didactic purposes we shall use Russell s (1980; 2003; Yik, Russell, & Steiger, 2011; see Figure 18) model of emotions 1, as the factor space is well established. Further, 1 We acknowledge that this model represents core affect and not emotions per se (for more information, see Russell, 2003), but for the sake of explicating our point we will treat this circumplex as a model of emotions. 66

83 emotions have the advantage of being plausibly linked to one another across a variety of settings due to the rich phenomenological diversity of manifest emotions, and because they are often measured as discrete states. It is also easily understood as representing either confounding or an alternative explanation. For example, a researcher may wish to see if being frustrated (X) may lead to being miserable (M), and then potentially to feeling depressed (Y). Alternatively, a researcher may wish to see if being delighted (X) may lead to being happy (M), and then potentially to being glad (Y). Such hypotheses seem reasonable, and it is relatively easy to imagine situations where such transitions between emotional states may occur, and so it is easy to generate an enticing rationale to conduct tests of mediation. However, past work on emotions suggests that these emotions do not influence each other. Instead, it is the underlying dimensions of valence and arousal that are responsible for their intercorrelations (cf. Russell, 2003). A mediation model involving three or more emotions would then be confounded with these two latent variables. 67

84 Figure 20. Affect circumplex (taken with permission from Yik, Russell, & Steiger, 2011). 68

85 Regarding distinguishing between factor and mediation, we will provide formulae that may be easily used. These formulae are based on simple derivations that one can use to translate the parameters from one model to those of the other model. Perhaps their greatest utility is that they demonstrate that not all mediation estimates are possible given a one-factor model, and so if such patterns do occur then one can be confident that at the very least there is more than one variable involved. As such, the formulae we provide may be used under some conditions to rule out at least some alternative explanations for a set of mediation results. From Factor Models to Mediation Analysis Results Although not ideal from a practical perspective to work from a factor model to a mediation model because researchers are likely to have mediation results before they are interested in any alternative explanations, it is necessary because for two factors there are an infinite number of factor loading patterns that may result in the same set of regression weights. However, the reverse is not true; a set of factor loadings will yield only one set of regression weights. We therefore derive all results here working from a factor space. Orthogonal Independent and Dependent Variables We begin with the simple case of where the independent and dependent variables are orthogonal to each other. This case is relatively straightforward to discuss, and allows a convenient starting point because it results in r XY = 0, and so no total effect c. To illustrate such a relationship we will use frustrated (X), miserable (M), and depressed (Y). 2 2 If these emotions would be precisely at the three reference lines we use here (X, IX, and VIII in Figure 2), the angle between frustrated and depressed would be only 60. The angles we use are approximations useful for didactic purposes. 69

86 Roughly, these emotions represent activated displeasure, displeasure, and deactivated displeasure, respectively (see Figure 18). One may hypothesize that feeling frustrated leads to feeling miserable and that feeling miserable leads to feeling depressed. In other words, that feeling miserable mediates the relationship between feeling frustrated and feeling depressed. This hypothesis is easily tested with a mediation analysis to obtain estimated direct and indirect effects. Figure 19 illustrates the vectors for these emotions, with their placement approximately the typical emotion circumplex rotated 90 clockwise, and then mirrored. Frustrated is then in the upper left quadrant, depressed in the upper right quadrant, and miserable between them (45 from both frustrated and depressed). Further, Figure 3 presents the mediation model superimposed on the factor space. The indirect path roughly follows a semi-circle from frustrated on the left, over miserable in the middle to depressed on the right. The direct path goes straight from frustrated on the left to depressed on the right. 70

87 Figure 21. Sample mediation triangle placed within factorial space. The three variables from the core affect example are represented as vectors. The specific situation shown is the case where the vector length of each variable is.8, X and Y are orthogonal, and the mediator is 45 from both X (frustrated) and Y (depressed). In this case, X and Y are uncorrelated, and the mediator is correlated with both X and Y at r =.45. Variable labels are somewhat arbitrary and for illustrative purposes only. 71

88 For this specific case, if we assume that all vectors are 0.8 in length (with standardized variables the length may vary from 0 to 1, and longer vectors indicate that the variables are better explained), the respective factor loadings on Factor I (Valence: unpleasant vs. pleasant) and Factor II (Arousal: passive vs. arousal) are and for frustrated (X; displeasure and activated), and for miserable (M; displeasure) 3, and and for depressed (Y; displeasure and deactivated). These loadings yield the following correlations between the three variables: r XY = 0, r XM =.453, and r MY =.453. The mediation regression weights are then a = 0.453, b = 0.569, ab =.258, and c = Such large effects are quite likely to be statistically significant, and to be interpreted as meaningful. However, the relationships between the variables follow directly from their position in the factor space. Specifically, when the direct and indirect effects have opposing signs, then the explanation from a confounding perspective is that X and Y are in fact orthogonal, with M a vector between the two, with no effect of one feeling on another in the way a mediation model would imply. Similarly, from a factor model perspective, the variable relationships are due to the shared underlying variables of valence and arousal, and any changes would be due to the underlying factor(s). To better illustrate the results under conditions where r XY = 0, we derived how mediation effects depend on the position of the M vector relative to X and Y. We allowed for the full range of possible vector lengths (0 to 1), and allowed M to vary between X and Y in the two-dimensional factor space. We opted to do this because the position of M in between X and Y seems the most natural one for a mediator, but it is nonetheless 3 Placing the mediator directly on the Y-axis is by convention, and this placement is equivalent to all other rotations. 72

89 possible to select mediating variables that are not between X and Y in the factor space. The resultant range for r XM and r MY is then between r = 0 and r = 0.707, which is the highest value that the two correlations can have simultaneously when r XY = 0. This would occur when the vector lengths are 1 and M is at a 45 angle from both X and Y (similar to the example in Figure 19). We then used the calculated correlations to conduct tests of mediation. As Figure 20 illustrates, in the absence of a total effect, mediation necessarily creates opposing effects that increase rapidly in magnitude as r XM and r MY do. Because r XM and r MY are positive, the indirect effects are also positive and increase in magnitude as the correlations do. The left side of Figure 20 illustrates this. Similarly, the direct effects are necessarily negative and increase as the correlations do, as the right side of Figure 20 shows. These two figures are mirror images because the magnitude of the indirect effect equals minus the magnitude of direct effect if the total effect (the correlation between X and Y) is zero. More specifically, the precise absolute magnitude of both effects is r XM r MY /(1-r 2 XM ) which is itself the semi-partial correlation multiplied by. The larger the product of the nonzero correlations and the larger the squared correlation between X and M, the larger the indirect effect is and the larger the negative value of indirect effect is. 73

90 Figure 22. Heat map of the calculated indirect and direct effects when r XY = 0. Values vary as a function of the magnitude of r XM and r MY. 74

91 From a confounding perspective, the implication of the above is that when one finds that the direct and indirect are large and in opposition to one another, X and Y are orthogonal in the shared factor space, and M is simply somewhere between them. As the position of M changes in the factor space, a very large range of values for the direct and indirect effect may be obtained. In some cases, the magnitude of the effects is such that one would likely be quite willing to interpret them as meaningful and as a rather unique finding given that there would be no total effect of X on Y. However, in a factor space it simply means that X and Y are orthogonal to each other, and no interpretation beyond that is necessary. Generalizing to other Cases In order to consider more broadly the behavior of mediation analyses involving variables stemming from a shared factor space, we next derived what sort of results may occur given other angles between X and Y. For this study, we allowed for the angle between the X and Y vectors in the shared factor space to be 0, 30, 45, 60, 90, 120, 150, or 180. Based on the logic that a mediator is a variable in the middle, the angles between X and M were then set to be a proportion of the angle between X and Y: 0/3, 1/3, 1/2, 2/3, or 3/3. The angle between M and Y was then the remainder of the proportion (i.e., 3/3, 2/3, 1/2, 1/3, or 0/3, respectively). For example, in the case of XY = 90, if the XM was 30, then MY was 60. In the case of a XY = 60, if XM was 40, then the MY was 20. When the angle between X and Y is 0, the special case of a unidimensional structure within the two-dimensional space is the result. In order to focus on the effect of XY and the position of M between these two, all vector lengths were set to be equal to each other, and either.5 or.8 in length (recall that longer vectors indicate 75

92 that the factors explain the variables better). We then calculated the loadings for each of the three manifest variables, calculated their correlations, and ultimately conducted tests of mediation. We shall focus on the larger vector length, as the effects are of clearly larger magnitude (the full details of the results are available in the Appendix A). Figures 21 and 22 illustrate the direct and indirect effects, respectively. To use the outdated Baron and Kenny (1986) terminology, the combinations of effects range from the absence of mediation to apparent complete mediation. Further, there are two general trends apparent in the figures: 1. Effect of the angle between X and Y: The direct and indirect effects show the same trend in that they both decrease with the angle between X and Y. 2. Effect of the relative position of M between X and Y: The direct and indirect effect show opposite trends. The direct effect is smaller (including more negative) and the indirect effect is larger, the closer M comes to roughly halfway between X and Y. In other words, when M moves toward the middle between X or Y, the indirect effect increases and the direct effect decreases. Interestingly, a mediator is literally a variable in the middle, and the trends reflect this. However, this is only roughly the case, since in fact the maximum indirect and minimum direct effects are reached slightly before or after halfway. How close this point is to the middle also depends on the vector length. Finally, that the curves in Figures 21 and 22 cross at certain points follows from the nonlinear nature of the formulas for the direct and indirect effects. 76

93 Figure 23. Calculated direct effects in a shared factor space. X-axis represents the magnitude of angle MY as a proportion of angle XY. For example, for XY = 60 and proportion =.33 (1/3), XM = 20 and MY =

94 Figure 24. Calculated indirect effects in a shared factor space. X-axis represents the magnitude of MY as a proportion of XY. For example, for XY = 60 and proportion =.33 (1/3), XM = 20 and MY =

95 Vector Length Angle XY Angle XM Angle MY r XY r XM r MY c a b ab c' Table 5. Sample correlation and regression coefficients based on vector angles and lengths, for four select cases, and vector lengths of 0.8 and

96 Table 5 details the relationships between select cases that we wish to draw further attention to. The first is the case where X and Y are orthogonal, and the mediator is 45 from either. This is the situation of Figure 20. It leads to an indirect effect that is equal to the direct effect and thus of opposite signs. Note that there are an infinite number of cases where X and Y are orthogonal when dealing with a shared factor space and also that there may be a large number of intermediately located variables that would all yield an indirect effect with a corresponding opposite direct effect. The second is a case where M is also exactly in between X and Y but the angle between X and Y is smaller ( XY = 60, XM = 30, and MY = 30 ). This more accurately represents our example where the mediation model is that frustrated feelings cause miserable feelings, which then causes depressed feelings. With variables at these angles, a test of mediation results in an indirect effect and a near zero direct effect. Though this may better fit with intuition, it would still be misleading if in fact the apparent relationships were confounded by the underlying factors. It also illustrates that whether or not a direct effect results depends on the angle between X and Y. Third, when two of the three vectors are identical, such as when XY and XM are both equal to 120, and thus M and Y have identical positions in the factor space, apparent direct and indirect effects still appear. It is also worth noting that such apparent effects also occur when X and M have identical positions and Y is distinct, and that apparent mediation when M is identical to X or Y occurs for nearly all possible angles of X and Y (excepting cases where X and Y are orthogonal). Finally, a special situation is where the angle between all three variables is 0. In this case, mediation yields both a positive indirect effect and a positive direct effect. For 80

97 the confounding perspective, if one finds direct and indirect effects of the same sign then one may in fact be measuring the same variable repeatedly. Of course, even given suspicion that this is the case, it is quite difficult to detect in practice, but with short vector lengths such as 0.5 the magnitude of the correlations is as small as 0.25, and one would not even suspect they may simply be measures of the same variable(s). Even given a longer vector length of 0.8, the correlations between the three variables are r = 0.64 and it is therefore easy to argue that the measurements of a study do in fact represent distinct variables. Nonetheless, the effects are completely confounded. Factor Models and Mediation Models: Equivalence and Differentiation In practice, factor models are mostly used for a rather large number of variables, whereas mediation models are commonly used for a small number of variables. This is in some sense a matter of convention in that factor models are not often used outside of scale validation efforts, but there are also practical reasons in that estimating a factor model is difficult with a small number of variables. Nonetheless, as we will show, it is possible to determine the factor loadings for the case of three variables and one factor. It remains impossible to estimate factor loadings for two factors and three variables, and even if one imposes numerous restrictions only one of many possible loading patterns may be obtained. Even so, it is useful to know what a test of mediation may suggest regarding the underlying dimensionality of the three variables, as ruling out a single factor explanation is a useful strategy when making causal claims because it establishes that, at the very least, more than one variable is necessary to explain the observed relationships. We will first formulate the relationships between correlations, factor 81

98 loadings, and mediation effects for the one-dimensional case, before doing so for the twodimensional case. One factor case A one-factor model is appealing in that it is parsimonious and easily used as a method of considering the effects of confounding in a mediation scheme. In a onedimensional factor space, one can derive the squared factor loadings from the equations as explained in Appendix B. Two conditions can be derived from the equations. First, either all correlations between the three variables must have the same sign, or alternatively two of the correlations are negative and the third is positive. This is a necessary condition. If only one correlation is negative, it follows that two factors are needed. As such, the one factor model can be empirically rejected in practice. Second, the absolute value of each correlation must be equal to or greater than the absolute value of the product of the other two correlations. This is a necessary and sufficient condition (see Appendix A). The mediation coefficients can be expressed in terms of the factor loadings, and of course the reverse is also true, as explained in Appendix B. Equation (6) in Appendix B implies that if the direct effect has a sign opposite to the indirect effect, then the onefactor model is violated and thus two factors are needed. However, the direct and indirect effects having the same sign is not a sufficient condition for a one-factor model, as it is necessary that this be true for all alternative orderings of the three variables (e.g., YXM) before a one-factor explanation may be considered valid. This implies that in the one factor case, changing the roles of X, M, and Y in the mediation analysis (e.g., with M as the independent variable, Y as the mediator, and X as the dependent variable), the results 82

99 will never include direct and indirect effects of opposing sign. The equal signs condition is a sufficient condition for a one-factor solution (see Appendix B). Two factor case In the case of a two-dimensional factor space it is necessary to fix multiple loadings. One such case is shown in Appendix C, where the first factor explains r XM and r XY, with the second factor explaining the remaining relationship of Y with M. This solution was chosen because it nicely separates the effects of X and M. Even so, this is only one such possibility and the problem of rotational indeterminacy remains. As such, a two-factor model cannot be rejected with only three variables. Discussion The issue we have discussed here regards the use of an alternative model to consider the effects of confounding for a mediation model, or to serve as a viable alternative explanation. We showed what a factor space would look like given a set of three variables. We have also presented and discussed methods for how one can translate the parameters from one model to the other when the models are formally equivalent, and we have presented and discussed methods for how they can be differentiated in case they are not formally equivalent. A factor space may be considered a variable in its own right, or it may be viewed as a summary of confounding effects or model violations. For the two perspectives we discuss, confounding and alternative explanations, the interpretations are largely interchangeable; from the mediation point of view the factors are confounding variables, whereas from the factor model perspective the mediation paths are violations of local independence. They are simply two different perspectives, and both are worth 83

100 consideration when explaining the relationships between a set of variables. Whether due to an underlying structure or a methodological artifact, the possibility of shared factor spaces and thus confounding is a challenge to researchers claims of mediation. Despite the theoretical utility of a factor model when discussing apparent mediation, we wish to reiterate that even in cases where a factor model of relatively low dimensionality may apply, it is often not possible to differentiate the factor model from the mediation model when using only three variables (each at only one time point) and what we have provided here does not change that. In such cases, a mediation model can explain the data quite well, even if fully confounded with underlying latent variables. To deal with such a possibility, it is necessary to make use of longitudinal designs with repeated measurement of each variable of interest, and to estimate alternative models (cf. Selig & Preacher, 2009). Future Directions. We have discussed a factor model as either a means of capturing a confounded model, or a theoretically meaningful alternative. For the confounding perspective, the next step would then be to more directly compare it to other means of examining parameter sensitivity in mediation given a confounded relationship (e.g., VanderWeele, 2010 citation). Such approaches are typically focused on a single confounding variable, but the factor model approach we use here affords some limited flexibility in considering a larger number of confounding variables (as shown in the Appendices B and C), and so may provide some means of extending such approaches. Another avenue for future research is not statistical, but rather more theoretical in nature. We argued here that a factor model is most appropriate given conceptually similar 84

101 variables, and further refinement of this point may be useful to encourage researchers to make use of the approach we consider here. In general, it seems likely that a factor model (or more generally, an interpretation of a single variable being responsible for the observed effects) is increasingly appropriate as the correlations between variables increase, but as we showed a high observed correlation between variables is not a necessary condition for a factor space to be present, and it is of course also not sufficient given the possibility of spurious correlations, and so other relevant conditions should be considered. 85

102 Chapter 4: Testing a Factor Model against a Mediation Model The previous chapter argued that factor models may be used to consider the worst-case possibility regarding a confounded relationship between X, M, and Y. Further, in many cases factor models are themselves a viable alternative that should be ruled out when making claims of mediation. Nonetheless, factor models are rarely considered when testing for mediation, presumably because researchers often only make use of manifest variables and because estimating factor models usually requires more than three variables. Even with additional variables, researchers are presumably unaware of how to make use of the variables to distinguish between factor and mediation models. The primary purpose of the work here then is to develop and illustrate an easy to use methodology to test the two competing explanations. In order to do so, we will make use of an approach that is already often discussed in the mediation literature, i.e., longitudinal designs (Maxwell & Cole, 2007; Selig & Preacher, 2009). In general, such designs are considered desirable because they provide additional support for a causal interpretation of a set of variable relationships because they allow adequate time to distinguish between cause and effect. More relevant to our purpose here is that longitudinal designs are useful for reducing bias in the estimates of the direct and indirect effects, and to model alternative explanations. For example, cross-lagged panel designs with X, M, and Y measured at each time point may be used to reduce bias in the estimated effects (Maxwell & Cole, 2007). With the nine observed variables from such a design it 86

103 is possible to estimate more complex models such as autoregressive models (e.g., Maxwell, Cole, & Mitchell, 2011) and growth curve based mediation models (Selig & Preacher, 2009) that provide a fuller picture of the relationships between the variables of interest. Here we will propose another model that may be used to describe the set of relationships between X, M, and Y. The basis of the model we make use of is similar to a latent autoregressive model, excepting that the autoregressive parameter may change depending on the pair of consecutive points in time, and we estimate additional parameters that allow one to test mediation claims at the same time. This model then has the advantage of comparing both factor and mediation models at the same time, and is also quite easy to estimate and use. Methodology The general logic of the method we propose is straight-forward. It is based on the supposition that if adding latent variables to a model makes the direct relationships nonsignificant, then mediation is unlikely. Conceptually, this is scarcely different from common practice with regression: if adding an additional variable (e.g., controlling for an alternative explanation) results in another relationship becoming non-significant then no claims may be made regarding the original variable s relationship with a dependent variable. Here we simply control for latent variables, rather than manifest variables as in the case of regression. Three General Models. The approach we present here makes use of three broad classes of models that are compared based on both model fit and the significance of paths between variables. For 87

104 ease of exposition our presentation here is limited to comparing simple mediation models to factor models with a single indicator for X, M, and Y, each of which is measured three times. Nonetheless, it is straightforward to include additional mediators whether in parallel or in serial, while still comparing mediation and factor space explanations. The first class of models makes use of only a single factor for all indicators across time. Such models are typically limited to cross-sectional designs, but may easily be used for longitudinal designs because correlations are agnostic to the source of any covariation. Doing so does make relatively strong claims, but one may hypothesize that all responses are due to something relatively constant about people (e.g., personality), or one may be concerned that some unmeasured cause(s) before t 0 may explain all observed variables. In such cases a one-factor model is a defensible approach. An additional reason to consider a one-factor model is that it provides a point of comparison for more complex models, and so serves as useful starting point when considering the relationships between the variables we present here. The second class of models assumes that there is a still single latent variable underlying all three indicators, but it is one that may vary over time and so there are three factors one for each time point - with paths between them. The skeleton of this model is shown in Figure 23. Note that this class of models does not include any paths of the sort implied by the typical mediation scheme. Instead, the latent variable mediates itself over time. This model is essentially a latent autoregressive model, but we allow for the autoregressive parameter to vary. Each time point specific factor is then indicated by the observed variables measured at that time point. 88

105 Finally, the third class of models tests for the possibility of mediation between the observed variables, as in the case of a simple mediation scheme. Each item can be expected to have unique variance not attributable to the factor of interest, which allows room for adding paths from X t0 to M t1 and M t1 to Y t2, as well as from M t0 to Y t1 and from X t1 to M t2 if so desired. This approach is taken because the mediation logic would imply such paths. If these added paths are statistically significant (whether by way of normaltheory approaches or by way of bootstrapping methods), and further result in an improved goodness of fit (e.g., lower RMSEA), then it serves as evidence in support of a mediation hypothesis between X, M, and Y because it suggests that the shared factor space is insufficient to explain the relationships between the three variables, and that mediation paths need to be added to the model. 89

106 β 12 β 01 β 12 θ t=0 θ t=1 θ t=2 λ X λ M λ Y λ X λ M λ Y λ X λ M λ Y X 1 X 0 M 0 Y 0 M 1 Y 1 X 2 M 2 Y 2 Figure 25. One-factor longitudinal model. X 0, M 1, and Y 2 are in bold to indicate that in an incomplete longitudinal design they would be the variables that are measured and subject to a mediation analysis. 90

107 Additional Considerations. For each class of models there are a few decisions that must be made regarding the nature of the latent variables. Each decision results in implicit claims about the underlying nature of the variables involved, and so should be considered carefully. Further, these decisions will affect model fit statistics to some degree, and possibly alter the conclusions made regarding the appropriate model if one were to rely solely on model fit statistics. As such, one should be mindful of their theoretical rationale when determining which set of constraints is appropriate. Correlated indicator residuals. As mentioned previously, each variable can be expected to have some unique variance not attributable to the factor of interest. When estimating the models we discuss here, we strongly recommend the use of correlated residuals in most cases. It is unlikely that there is no systematic variance attributable to a variable measured using the exact same method of measurement (e.g., item wording) multiple times, and failing to account for any systematic variance attributable to the specific items will likely result in poor model fit and biased parameter estimates. Equal loadings across time. Another decision to make is whether or not to constrain the loadings for each indicator variable to be equal across time. It should also be noted that for the methodology we discuss here, it is unique in that it is the only imposed restriction we discuss -he other decisions are relaxations of restrictions and allowing a path to vary. Ultimately, this is a question regarding the nature of the latent variable and the quality of measurement. The use of equal loadings assumes measurement invariance, and so that the latent variable measured is the same at each time point. 91

108 If one believes that the relationship between the latent variable and the indicator variables is constant, then it follows that the loadings for each variable should be fixed for each latent variable. This is in fact an assumption made by latent autoregressive models. If the loadings are constrained to be equal then the result will likely be poorer model fit, and the difference may result in different conclusions were one to strictly and unwisely adhere to well-known guidelines (cf. Steiger, 2007). Path from t 0 to t 2. It is also up to the user to decide whether or not to include a path from the latent variable at t 0 to the latent variable t 2. Although not immediately obvious, such a path is somewhat akin to a direct effect in a mediation model because both are double lags. For what we present here, we do not advocate the use of such a path because it makes more sense to state that the most proximal state of the latent variable is more causative than the more distal state. Further, whereas in the case of a simple mediation model where three variables are involved, there is only one here and so considerations regarding missing pathways or variables are less relevant. With that being said, we will nonetheless consider such a path between the latent variables at t 0 and t 2 with what we present here. We do so because although we believe it to be unnecessary, its conceptual relationship to the direct effect merits at least some testing of such an effect. Additionally, users of the model we discuss here may find such a path relevant to their research topic, and we do not wish to preclude such a possibility. Empirical Example In order to demonstrate our model we make use of data from Crocker, Canavello, Breines, and Flynn (2010). Crocker et al. s data has been used to demonstrate a variety of effects regarding the effects of interpersonal goals, including depression (Garcia & 92

109 Crocker, 2008), academic performance (Crocker, Olivier, & Nuer, 2009), and well-being (Crocker, 2008). The data were collected over the course of a semester, with 10 measurements per participant spaced a week apart (excepting slow responders resulting in delayed measurement). Additionally, the data are comprised of N = 230 participants, split into 115 roommate dyads. Eighty-six (75%) of these pairs were female, and participant age ranged from 18 to 21, with M = 18.1 and SD =.36. We wish to make it clear that for the results presented here, we do not make any substantive claims regarding the variable relationships. In addition to the fact that we ignore the obvious dyadic dependencies for our example here, our choice of measures reflects only a desire to stay consistent with the previous chapter. As such, although it would be simple to demonstrate a case where a mediation model seems clearly better than a factor model, we make use of an example where a factor model is adequate to describe the variable relationships. Any interpretation we offer is only for the sake of explication and demonstration, and our choice of items are strictly for didactic purposes and are not meant to make any claims regarding the superiority of a factor model approach over a mediation approach for these data. Instead, the items were chosen only because they seem likely to be unidimensional. Measure. The measure we make use of here is an adapted version of a scale meant to measure objective and subjective burdens placed upon an individual by a depressed significant other (Coyne et al., 1987). Crocker et al. used the subjective burden items, and altered the wording of the original measure to instead reflect perceptions of the burdens placed upon them by their roommate. 93

110 Dimensionality. As an exploratory analysis to determine items to be used here, we first conducted a principal-components analysis of all scale items at t 0. The results suggested that one or two dimensions are sufficient to describe them, as the first five eigenvalues for the 14 items are 7.86, 1.530,.881,.777, and.655. Closer investigation of the items suggests that the items measuring discouraged, ashamed, and depressed may be unidimensional, as these items load strongly on a single principal component, and so suitable for our demonstration here. Drawing inspiration from past work on the development of depression, a reasonably plausible mediation story may be told using these data. Specifically feeling discouraged may lead individuals to feel shame (e.g. Miller, 2013), which then may lead to feeling depressed (e.g., Andrews, Qian, & Valentine, 2002; Shepard & Rabinowitz, 2013). In the case of the present data, the story may be something akin to roommates feeling discouraged by their inability to correct or control their roommates behaviors, which then may lead to feelings of shame because their roommates actions may speak to their own character or competency, which then ultimately leads to feeling depressed by their roommates behavior. Regression analyses. On the surface, the data support a mediation hypothesis. Using discouraged at t = 0, ashamed at t = 1, and depressed at t = 2 as X, M, and Y, respectively. The correlations between the three variables are r X0Y2 =.116, r X0M1 =.272, and r M1Y2 =.292, and these correlations result in apparent mediation using regression. Specifically, there is a significant effect of discouraged on ashamed, a =.272, p <.05, and a significant effect of 94

111 ashamed on depressed, b =.282, p <.05. This yields an indirect effect of ab =.077, and a bootstrapped confidence interval does not include 0, 95% CI [.014,.208]. There was no significant direct effect of feeling discouraged on feeling depressed, c =.040, p =.56. Structural equation modeling analyses. For presentational purposes, we will focus on the three classes of models because doing so most reflects the logic of the approach we discuss here. For each class, we will also consider the effects of (un)correlated errors, (un)equal loadings, and the presence of a path between the latent variables at t 0 and t 2 or not. As mentioned previously these aspects reflect claims about the nature of the underlying factor structure and so may be considered topic-specific, rather than central to our point, and so we leave it to the reader to choose which is most appropriate for their research problem. Additionally, all models were estimated using maximum likelihood. All model fit statistics are shown in Table 6. Model 1 represents a single factor for all three time points, Model 2 a factor for each time point, and Model 3 includes the additional mediated path between the indicator variables. The significance tests represent likelihood ratio tests comparing a given model to the previous one with the same constraints or lack thereof (as a reminder, equal loadings represent an additional imposed constraint upon the estimated model, whereas correlated errors and a path from t 0 to t 2 are relaxations of constraints). For example, the test for Model 2 with equal loadings and correlated errors (Model 2D) was compared to the variant of Model 1 with equal loadings and correlated errors (Model 1D). There is of course not a clean comparison to be made for variants of Model 2 with a t 0 to t 2 path for the latent variable, and so they were simply 95

112 compared to Model 1 variants with the same restrictions regarding loadings and error terms (or lack thereof, e.g., Model 2H to Model 1D). 96

113 97 Model 1 EL CE LL AIC BIC Log-likelihood ΔDF p RMSEA A B X C X D X X Model 2 A B X C X D X X E X F X X G X X H X X X Model 3 A B X C X D X X E X F X X G X X H X X X Table 6. Comparison of model fit statistics for the models we estimate here. EL = equal loadings. CE = correlated errors. LL = lag-lag from the latent variable at t 0 to the one at t 2. 97

114 We begin by considering the possibility that a single latent variable may explain all nine observed variables. In general this class of model yields a very poor fit to the data, with excessively high values of RMSEA. Given that the indicators were measured a week apart, this is unsurprising. Specifically, Models 1A, 1C, and 1D yielded RMSEA ~.23, and Model 1B yielded RMSEA =.265. An overall factor solution for all nine variables is clearly not to be preferred, and is instead rejected. Next, we considered our second class of models, which are based on a threefactor approach. For this model, each of the three burden items are considered to be indicators of the same latent variable that varies over time, with the latent variable at t = 0 predicting the latent variable at t = 1, which then predicts the latent variable at t = 2. All models of this class fit significantly better than the one-factor alternatives, with all chisquare tests yielding ps <.001. Similarly, all had superior AIC, BIC, and RMSEA values. In regards to the specific constraints, models with correlated residuals yielded the best fit, with Model 2B yielding the best fit at RMSEA =.061, as well as superior AIC, BIC, and log-likelihood values relative to all other models in this class. Model 2D fit somewhat worse and yielded RMSEA =.096, and further comparison of AIC and BIC values between the two models suggests that unequal loadings are to be preferred. In contrast, the models with uncorrelated residuals fit poorly, with RMSEAs of.160 and.154 for Models 2A and 2C, respectively. Finally, the inclusion of a path between the latent variables at t 0 and t 2 (LL) for Models 2 and 3 resulted in AIC, BIC, and RMSEA values similar to more restricted model, and so do not suggest that such a path improves model fit. 98

115 In regards to the estimated latent variables, for the four models with correlated errors, the item loadings were positive and of a similar magnitude. For each item the loadings were roughly.35 ±.1 for both models, and all item loadings were statistically significant, ps <.001, suggesting that any change in the nature of the underlying dimension was minimal. Additionally, for both models the autoregressive parameters were roughly.6 at each time point. Specifically, the autoregressive parameters for Model 2B b t0t1 =.527, and b t1t2 =.643, and for Model 2D they were b t0t1 =.590, and b t1t2 =.608. For testing the mediation claim, we now turn to our third class of models. These models simply add correlated residuals between the indicator items of the sort that mediation would imply, e.g., X t0 to M t1 to Y t2. Although a single underlying latent variable that varies over time seems plausible based on the previous results, each item has its own unique variance unaccounted for by the factor, and this remaining variance may still yield a mediated effect. To test such a possibility, we allow for correlated errors between discouraged at t = 0 and ashamed at t = 1 and between ashamed at t = 1 and depressed at t = 2. As a reminder, the logic of our approach is that if these paths are significant and model fit is significantly improved, then a mediation interpretation remains a viable explanation. Here, this was not true for any set of restrictions, and adding these paths did not significantly improve model fit, with all Chi-square tests yielding ps >.05, with AIC, BIC, and RMSEA yielding similar results. 3 In addition, the paths between X 0 and M 1, and between M 1 and Y 2 across time were not significant, further diminishing the strength of any mediation claims that could be made using these data. Specifically, bootstrapping the 3 We also estimated similar models with paths from M 0 to Y 1, X 1 to M 2, and X 0 to Y 2 as in the case of a cross-lagged panel design. The results were similar to those of the models we presented here, and so are not reported. 99

116 product of these two paths as in the case of ab resulted in an indirect effect of.000, 95% CI [-.001,.001] for Model 3B, and for Model 3D the result was an indirect effect of.000, 95% CI [-.002,.001]. As a result then, a mediation explanation for the relationships between discouraged, ashamed, and depressed is not supported. Instead, a single latent variable that varies over time, with some additional variance attributable to each indicator, appears an adequate explanation for these data that is not rejected by our tests. Discussion The general approach we discussed here is intended to be a way one may distinguish between factor and mediation model explanations for a set of variable relationships. It represents a practical extension of the previous chapter that affords a flexible method of considering a factor model interpretation that may be applied in all cases where one makes use of full longitudinal designs. Our approach is based on a latent autoregressive model, but it is unique in that we compare two competing explanations within the same framework. As in the previous chapter, the factors themselves may be viewed either as alternative explanations or as representing confounded relationships. The example we presented here is most consistent with an alternative explanation, but for other cases the confounding interpretation may be more appropriate. In such cases, the utility of our approach is that when used, one provides a test of the robustness of any claims of mediation if there are unmeasured confounders. Additionally, our approach is easily extended to allow for tests of more complex mediation models. In order to do so, it is simply necessary to make use of additional time points and to allow for additional residual correlations. The logic would be as we present 100

117 here: If adding such paths improves model fit and the additional paths are significant, then a mediation model remains a plausible explanation for the variable relationships. Modeling more than one factor for the same set of variables at each point in time is somewhat more difficult, but only marginally so. In order to estimate the loadings for two factors one would need at least five indicators per time point; for three factors eight indicators are necessary. If one has fewer indicators it is necessary to fix at least one loading. Further, one concern is that more complex factor spaces might be somewhat unwieldy and difficult to interpret. Still, precise interpretations of the factors are not necessary because the crucial point is simply whether or not the additional paths explain the covariance between the items in a way that is consistent with a mediation hypothesis. Future Directions. In line with the fact that in what we focus on here with shared factor spaces is an example of a misspecified model, future research on this model may focus on specific forms of misspecification of the longitudinal model we discuss here. This may be number of latent variables (e.g., one vs. two factors), but also incorrect functional forms of the relationships (e.g., non-linear) or inappropriate assumptions (e.g., falsely assuming equal loadings). Further, a comparison to other approaches for considering bias and alternative interpretations to that offered by a simple mediation scheme would likely prove useful. This may be either longitudinal models such as the cross-lagged panel design that Maxwell & Cole (2007) advocate, or other methods of considering parameter sensitivity (e.g., VanderWeele, 2010). Comparing these approaches may be done by way of testing the performance of one given that another is appropriate, or by way of other forms of 101

118 misspecification and comparing the relative ability of each method to nonetheless yield results that would be interpreted appropriately. 102

119 Chapter 5: Fungible Weights in Mediation Whenever estimating a statistical model, there is always some uncertainty regarding the validity of the estimated weights. In general, many assumptions must be made regarding the relationships between variables. Though there are differing schools of thought regarding these assumptions (cf. Jo, 2008), for the SEM and regression approach that we have used throughout, these assumptions are detailed in Sobel (2008), and are as follows: Ignorability of mediator status: This is considered the most important assumption in the SEM/regression approach (Jo, 2008). It is in fact composed of two assumptions (Imai et al., 2010). The first is that, when conditioned on the observed pretreatment covariates, X is independent of all possible values of M and Y. This assumption is satisfied with an experimental manipulation of X with random assignment. The second assumption requires that individuals have the same characteristics regardless of their mediator status, again after conditioning on the observed covariates. In other words, when using mediation models one assumes that participants may be treated as having been randomly assigned to their mediator status. If this condition is not satisfied, then neither the direct or indirect effect may be considered a causal parameter. 103

120 Constant effect: This assumption is that there is no unmodeled moderation whatsoever for the relationship between X and M, i.e., the variables do not interact and so effects on Y do not change across levels of the mediator. This concern is to some degree addressed by the so-called MacArthur approach (Kraemer, Kiernan, Essex, & Kupfer, 2008), which requires the inclusion of such an interaction term. However, users of mediation typically only make use of simple mediation models and so it remains a concern. Linearity: This assumption requires that the dependent variable linearly increases or decreases across levels of the mediator, rather than e.g., a logistic or quadratic relationship. In short, in a simple mediation model, identifying causal parameters using the SEM/regression approach requires that there is no confounding (similarly, no missing additional mediators), no interactions, and no deviations from linearity. These conditions are may not be met in practice. In general then, the resultant parameter estimates for an estimated model, M est, may be biased to some degree whenever it is not isomorphic with the true model, M true. So as to avoid the stickiness associated with the notion of true models (Edwards, 2013), here we refer simply to models that would have the best cross-validated goodness of fit. The discrepancies between the two models are of two types. Sobel (2008) details the first type, as it relates to missing terms in the equation, e.g., omitted variables, omitted interactions, and nonlinear effects. The second type of discrepancy is related to error. This includes variables with measurement error, as well as aberrances in the error term such as outliers that may bias the estimates. Measurement error is likely to be an issue 104

121 when using regression as it assumes that the predictors are measured without error, but it is extremely rare that this assumption is satisfied when using psychological data. Outliers are similarly of concern, and are quite difficult to deal with as evidenced by the many methods developed to detect them (e.g., Cook s D and residual plots). In general then, excepting rare cases, if any aspect of the estimated model is incorrect then the estimated parameters are likely biased to some degree. The degree of bias may be large or small, but regardless of the quality and size of a sample the issue remains as it is an issue more closely associated with models than with data (Green, 1977). Although parameter uncertainty applies to any estimated model, the issue is compounded when testing mediation models because indirect effects are somewhat unique in that they are quantified as the product of two regression coefficients. As a result, the uncertainty of one weight is multiplied by that of another, and so too is the need to consider bias in the estimated effects before drawing conclusions based on them. Inaccurate weights are of course quite likely, but the important issue is not whether they are wrong, but to what degree they must be wrong before affecting any conclusions drawn regarding the relationships between variables. Indeed, in some cases the bias associated with an inaccurate model may be substantial enough that even the simple approach of determining sign and significance (without concern regarding the magnitude of the effect) will yield incorrect interpretations (e.g., Maxwell & Cole, 2007). Further, mediation methodologists generally assume biased effect estimates because the relationship between a given X and Y is presumed to be multiply mediated (Baron & Kenny, 1986; 2007; Preacher & Hayes, 2008). At the very least then, bias is to be expected when conducting tests using only a single mediator because if multiple 105

122 mediators transmit the effects of X on Y, then X is correlated with an unknown number of Ms that are themselves likely correlated, with the number, direction, magnitude, etc. all unknown. Broadly, considering the possible consequences of an inaccurate model is accomplished by way of methods for examining parameter sensitivity. Such methods include non-optimal weights in a model and examine the degree to which doing so reduces model fit (e.g., Green, 1977) or how much other weights are affected (e.g., VanderWeele & Arah, 2011). The approach we will work with is based on the fact that the resultant bias of an inaccurate M est has the somewhat counter-intuitive consequence that the true, unbiased weights may actually perform worse than the biased weights when they are used in an estimated model that differs from the true model. Specifically, we will make use of fungible parameters (Waller, 2008). Models using fungible parameters all yield a predefined decrease of model fit (e.g., RMSEA or R 2 ) compared to the estimates optimized for a given M est. As a result, each set of weights are all equally discrepant with the optimal estimates, and explain the data equally well. To date, fungible parameters have been applied to regression (Waller, 2008), logistic regression (Jones, 2013), and structural equation modeling and latent growth curves (MacCallum, Lee, & Browne, 2012). Within the context of regression models fungible parameters are termed fungible weights (Waller, 2008). Weights may be considered insensitive if small changes in R 2 result in only modest changes in the weights. In contrast, they are considered sensitive if small changes in R 2 yield large changes in the weights. If a parameter is sensitive to small changes in R 2, then the strength of conclusions that may be drawn regarding its relationship with the dependent variable is 106

123 limited as the parameter estimates are considered less trustworthy (Green, 1977; Waller, 2008). In the case of two predictors there are two pairs of fungible weights, and thus two different weights per predictor. These two pairs yield the same correlation between the predicted and observed values of the dependent variable. More generally, if there are three or more predictors there are an infinite number of fungible weights. As each set of fungible weights predicts the dependent variable with equal effectiveness, they cannot be distinguished from one another based on this criterion alone. Though the mathematics that describe fungible weights are beyond the scope of this document (see Waller, 2008; Waller & Jones, 2009), in general examining them is quite simple and requires inputting correlations into an R function (R Core Team, 2012) that Waller (2008) created. Conducting a sensitivity analysis then simply requires to select a lower R 2 by way of the critical correlation between OLS predictions and the fungible predictions, which we denote. High values of result in sets of weights that are only slightly less effective than the OLS weights at describing the data and yield only a minor drop in predictive value, e.g. =.5 and =.98 results in =.49. An example of fungible weights is shown in Figure 24, based on the case that all correlations are equal to.5, and so all bs =.25. Easily visible is that even a modest drop in variance explained based on =.98 yields a large discrepancy between the minimum and maximum values of the fungible weights, and that although the sign remains the same, the interpretation of the importance of the variables would be affected quite strongly. Another example with a smaller discrepancy is shown in Figure 25, based on the case that all correlations are equal to

124 Figure 26. Plot of fungible weights with three predictors when all variable correlations are r =.5 and =.98. The point in the center is the OLS estimate of the weights, b 1 = b 2 = c =.25. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them. 108

125 Figure 27. Plot of fungible weights with three predictors when all variable correlations are r =.3 and =.98. The point in the center is the OLS estimate of the weights, b 1 = b 2 = c =.19. Histograms illustrate the spread of the fungible weights, and show the clear bimodality associated with them. 109

126 Although fungible weights may be examined using plots that provide a great deal of information, there is great utility in quantifying uncertainty. This is exemplified by the familiar confidence interval. Confidence intervals serve to estimate the uncertainty regarding the precision of the estimates obtained for a given estimated model as expected by sampling theory. Unfortunately, quantifying bias is not so easily accomplished because of the many ways it may occur, and there is little information regarding how large such intervals should be. Fungible weights do not circumvent this problem, but by calculating the range of the fungible weights associated with conservative values of it is possible to establish at least some indication for the size of a validity interval that may serve as an analogy to the confidence interval. The range of the fungible weights, which we will call the fungible interval, provides an informative picture of the weights associated with a given value. The bulk of the results and discussion in this chapter is therefore focused on the range of the fungible weights, defined as the minimum and maximum values of the fungible weights associated with a given predictor for a given correlation matrix. The range will be used for a few reasons. The first is that there is heavy bimodality in the distribution of the fungible weights, with peaks at the boundaries (as shown in Figures 24 and 25), so that the range per predictor weight results in a loss of relatively little information as most weights are near the extremes of the range. Additionally, the range per predictor is also common practice for confidence intervals, and its interpretation is simple. With that being said, it does not capture that the value of each fungible weight is a tightly constrained function of the other weights (Waller, 2008). The result of this constraint is that the maximum value of a weight at a given is 110

127 associated with smaller, more modest values of the other weights. Discussing the range does not reflect this because the range is predictor specific, but it is sufficient for illustration purposes. Fungible Intervals for the Single Mediator Model To begin, we shall illustrate fungible weights in the case of a single mediator, as this is the most common form of mediation analysis (Maxwell & Cole, 2007). Further, in the case of two predictors there are only two pairs of fungible regression weights for a given, and each pair consists of different values for and for. This affords simplicity of determining the range and reporting the results. In regards to the logic of this study, we approach it from a hypothetical perspective. Given a set of correlations from a sample, we may wonder how much the mediation effects could change due to a drop in R 2 that is possibly due to using the true regression weights. Note that the result does not depend on the cause of the drop and that therefore the conclusions apply also when the cause is a model violation. If a modest drop in R 2 results in highly discrepant weights then the model is considered sensitive to any unknown violations, and less trustworthy. However, if even larger drops do not result in highly discrepant effects, then the inference is more robust and may be treated as such. Method There is no absolute basis for the choice of an, just as there is not one in the case of model fit (e.g., RMSEA cutoffs; Browne & Cudeck, 1993; Steiger, 2007), reliability (e.g., Cronbach s α =.8 or.9), or with confidence intervals (e.g., the familiar 95% and 99% confidence intervals). In each of these cases, what qualifies as acceptable is primarily a matter of convention and of possible considerations one may have 111

128 regarding the level of uncertainty. Here we generally follow the example of confidence intervals and use three different values, =.90,.95,.98. These criterion values result in only modest drops in variance explained, and the differences are unlikely to be considered noteworthy in practice. For example, for a correlation of.5 between the predicted dependent variable values and the observed values ( analogous correlation obtained with the fungible weights ( = 0.5), the resultant ) values would then be.45,.475, and.49 for values of.90,.95, and.98 respectively. For this study, the possible correlations between the independent variable, mediator, and dependent variable were positive and negative values of.1,.2,.3,.4, and.5, which resulted in 10 3 = 1000 matrices because there are three correlations in a simple mediation scheme. However, 20 of these matrices were not valid correlation matrices, and so our results here are based on 980 correlation matrices. For each valid matrix, we calculated the OLS weights in addition to the two three fungible weight pairs, one pair for each value. To determine the OLS weights and the fungible weights no data simulation is needed, as both types of weights may be calculated directly using a correlation matrix. As no simulation was necessary here, our results for this study are simply calculations of the fungible intervals for the direct effect and the indirect effect. To be clear, these intervals do not depend on sample size, and so are fixed for a given correlation matrix. To provide some sense of the magnitude of the fungible intervals, we also calculated confidence intervals for the direct and indirect effects for N = 100. For the indirect effect we used Sobel s (1982) standard error to calculate confidence intervals, as although it is not recommended in practice it is computationally simple and sufficient for 112

129 the purposes of allowing for some sense of the size of the fungible interval. Specifically, the use of normal theory confidence intervals allows for a straight-forward comparison between the uncertainty that might be expected as a consequence of random sampling and the uncertainty stemming from possible model inaccuracy. Results and Discussion. Comparison with confidence intervals. As a reminder, whereas confidence intervals will decrease in magnitude as sample size increases, fungible weights are fully independent of sample size. Nonetheless, their differences are informative, as there are three general trends apparent in the behavior of fungible intervals that contrast to the behavior of confidence intervals. The first is that whereas confidence intervals tend to become narrower as R 2 increases, the reverse is true of fungible intervals. This is shown in Figure 26, with the dashed line representing the OLS estimate, and the end points of each line representing the minimum and maximum values of an interval. Whereas the fungible intervals for c, shown in black, widen as R 2 increases, the confidence intervals, shown in grey, narrow. This same pattern holds for ab because it holds for b. The fungible intervals of the direct and indirect effects tend to increase as R 2 does. 113

130 Figure 28. Comparison of fungible intervals and confidence intervals with R 2. The lines represent the intervals about the OLS estimated weights. As a given value of R 2 value may result in different fungible intervals depending on the correlations between the variables, the lines have been jiggered about R 2. Grey lines indicate confidence intervals, and the black lines indicate fungible intervals. Confidence intervals are based on Sobel standard error and N =

Kristin Gustavson * and Ingrid Borren

Kristin Gustavson * and Ingrid Borren Gustavson and Borren BMC Medical Research Methodology 2014, 14:133 RESEARCH ARTICLE Open Access Bias in the study of prediction of change: a Monte Carlo simulation study of the effects of selective attrition

More information

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS

EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES MICHAEL MCCANTS EFFICACY OF ROBUST REGRESSION APPLIED TO FRACTIONAL FACTORIAL TREATMENT STRUCTURES by MICHAEL MCCANTS B.A., WINONA STATE UNIVERSITY, 2007 B.S., WINONA STATE UNIVERSITY, 2008 A THESIS submitted in partial

More information

Examination of Cross Validation techniques and the biases they reduce.

Examination of Cross Validation techniques and the biases they reduce. Examination of Cross Validation techniques and the biases they reduce. Dr. Jon Starkweather, Research and Statistical Support consultant. The current article continues from last month s brief examples

More information

Daily Optimization Project

Daily Optimization Project Daily Optimization Project Personalized Facebook Analysis 10/29/15 By Kevin DeLuca ThoughtBurner Kevin DeLuca and ThoughtBurner, 2015. Unauthorized use and/or duplication of this material without express

More information

STAT 2300: Unit 1 Learning Objectives Spring 2019

STAT 2300: Unit 1 Learning Objectives Spring 2019 STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems

More information

Fungible Parameter Estimates in Latent Curve Models

Fungible Parameter Estimates in Latent Curve Models Fungible Parameter Estimates in Latent Curve Models Robert MacCallum The University of North Carolina at Chapel Hill Taehun Lee Michael W. Browne UCLA The Ohio State University Current Topics in the Theory

More information

A Basic Understanding to Mediation Analysis and Statistical Procedures in Management Research

A Basic Understanding to Mediation Analysis and Statistical Procedures in Management Research Journal of Asia Pacific Studies (2018) Volume 5 Issue 1, 1-61 A Basic Understanding to Mediation Analysis and Statistical Procedures in Management Research *Dr. Meena Madhavan, Lecturer-Faculty of Business

More information

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin

More information

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin

More information

Which is the best way to measure job performance: Self-perceptions or official supervisor evaluations?

Which is the best way to measure job performance: Self-perceptions or official supervisor evaluations? Which is the best way to measure job performance: Self-perceptions or official supervisor evaluations? Ned Kock Full reference: Kock, N. (2017). Which is the best way to measure job performance: Self-perceptions

More information

CHAPTER 5 RESULTS AND ANALYSIS

CHAPTER 5 RESULTS AND ANALYSIS CHAPTER 5 RESULTS AND ANALYSIS This chapter exhibits an extensive data analysis and the results of the statistical testing. Data analysis is done using factor analysis, regression analysis, reliability

More information

Fungible Simple Slopes in Moderated Regression

Fungible Simple Slopes in Moderated Regression Fungible Simple Slopes in Moderated Regression Zane Blanton A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree

More information

Chapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity

Chapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity Chapter 3 Basic Statistical Concepts: II. Data Preparation and Screening To repeat what others have said, requires education; to challenge it, requires brains. Overview Mary Pettibone Poole Data preparation

More information

Model Building Process Part 2: Factor Assumptions

Model Building Process Part 2: Factor Assumptions Model Building Process Part 2: Factor Assumptions Authored by: Sarah Burke, PhD 17 July 2018 Revised 6 September 2018 The goal of the STAT COE is to assist in developing rigorous, defensible test strategies

More information

ACC: Review of the Weekly Compensation Duration Model

ACC: Review of the Weekly Compensation Duration Model ACC: Review of the Weekly Compensation Duration Model Prepared By: Knoware Creation Date July 2013 Version: 1.0 Author: M. Kelly Mara Knoware +64 21 979 799 kelly.mara@knoware.co.nz Document Control Change

More information

Distinguish between different types of numerical data and different data collection processes.

Distinguish between different types of numerical data and different data collection processes. Level: Diploma in Business Learning Outcomes 1.1 1.3 Distinguish between different types of numerical data and different data collection processes. Introduce the course by defining statistics and explaining

More information

Department of Economics, University of Michigan, Ann Arbor, MI

Department of Economics, University of Michigan, Ann Arbor, MI Comment Lutz Kilian Department of Economics, University of Michigan, Ann Arbor, MI 489-22 Frank Diebold s personal reflections about the history of the DM test remind us that this test was originally designed

More information

Chapter 11. Multiple-Sample SEM. Overview. Rationale of multiple-sample SEM. Multiple-sample path analysis. Multiple-sample CFA.

Chapter 11. Multiple-Sample SEM. Overview. Rationale of multiple-sample SEM. Multiple-sample path analysis. Multiple-sample CFA. Chapter 11 Multiple-Sample SEM Facts do not cease to exist because they are ignored. Overview Aldous Huxley Rationale of multiple-sample SEM Multiple-sample path analysis Multiple-sample CFA Extensions

More information

AP Statistics Scope & Sequence

AP Statistics Scope & Sequence AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that

More information

Mallow s C p for Selecting Best Performing Logistic Regression Subsets

Mallow s C p for Selecting Best Performing Logistic Regression Subsets Mallow s C p for Selecting Best Performing Logistic Regression Subsets Mary G. Lieberman John D. Morris Florida Atlantic University Mallow s C p is used herein to select maximally accurate subsets of predictor

More information

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology As noted previously, Hierarchical Linear Modeling (HLM) can be considered a particular instance

More information

Semester 2, 2015/2016

Semester 2, 2015/2016 ECN 3202 APPLIED ECONOMETRICS 3. MULTIPLE REGRESSION B Mr. Sydney Armstrong Lecturer 1 The University of Guyana 1 Semester 2, 2015/2016 MODEL SPECIFICATION What happens if we omit a relevant variable?

More information

A Statistical Comparison Of Accelerated Concrete Testing Methods

A Statistical Comparison Of Accelerated Concrete Testing Methods Journal of Applied Mathematics & Decision Sciences, 1(2), 89-1 (1997) Reprints available directly from the Editor. Printed in New Zealand. A Statistical Comparison Of Accelerated Concrete Testing Methods

More information

Models in Engineering Glossary

Models in Engineering Glossary Models in Engineering Glossary Anchoring bias is the tendency to use an initial piece of information to make subsequent judgments. Once an anchor is set, there is a bias toward interpreting other information

More information

A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM?

A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM? A NOTE ON METHODS OF ESTIMATING REGIONAL INPUT-OUTPUT TABLES: CAN THE FLQ IMPROVE THE RAS ALGORITHM? Steven Brand July 2012 Author contact details: Plymouth University Drake s Circus Plymouth PL4 8AA sbrand@plymouth.ac.uk

More information

Understanding UPP. Alternative to Market Definition, B.E. Journal of Theoretical Economics, forthcoming.

Understanding UPP. Alternative to Market Definition, B.E. Journal of Theoretical Economics, forthcoming. Understanding UPP Roy J. Epstein and Daniel L. Rubinfeld Published Version, B.E. Journal of Theoretical Economics: Policies and Perspectives, Volume 10, Issue 1, 2010 Introduction The standard economic

More information

Chapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition

Chapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition Chapter 4: Foundations for inference OpenIntro Statistics, 2nd Edition Variability in estimates 1 Variability in estimates Application exercise Sampling distributions - via CLT 2 Confidence intervals 3

More information

THE MICRO-FOUNDATIONS OF DYNAMIC CAPABILITIES, MARKET TRANSFORMATION AND FIRM PERFORMANCE. Tung-Shan Liao

THE MICRO-FOUNDATIONS OF DYNAMIC CAPABILITIES, MARKET TRANSFORMATION AND FIRM PERFORMANCE. Tung-Shan Liao THE MICRO-FOUNDATIONS OF DYNAMIC CAPABILITIES, MARKET TRANSFORMATION AND FIRM PERFORMANCE Tung-Shan Liao Thesis submitted to the Business School, The University of Adelaide, in fulfilment of the requirements

More information

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE ABSTRACT Robert M. Saltzman, San Francisco State University This article presents two methods for coordinating

More information

Harbingers of Failure: Online Appendix

Harbingers of Failure: Online Appendix Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine

More information

ENVIRONMENTAL FINANCE CENTER AT THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL SCHOOL OF GOVERNMENT REPORT 3

ENVIRONMENTAL FINANCE CENTER AT THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL SCHOOL OF GOVERNMENT REPORT 3 ENVIRONMENTAL FINANCE CENTER AT THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL SCHOOL OF GOVERNMENT REPORT 3 Using a Statistical Sampling Approach to Wastewater Needs Surveys March 2017 Report to the

More information

OFFICE OF THE SECRETARY OF DEFENSE 1700 DEFENSE PENTAGON WASHINGTON, DC

OFFICE OF THE SECRETARY OF DEFENSE 1700 DEFENSE PENTAGON WASHINGTON, DC OFFICE OF THE SECRETARY OF DEFENSE 1700 DEFENSE PENTAGON WASHINGTON, DC 20301-1700 OPERATIONAL TEST AND EVALUATION JUN 26 2013 MEMORANDUM FOR COMMANDER, OPERATIONAL TEST AND EVALUATION FORCE (COMOPTEVFOR)

More information

Attachment 1. Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database

Attachment 1. Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database Attachment 1 Categorical Summary of BMP Performance Data for Solids (TSS, TDS, and Turbidity) Contained in the International Stormwater BMP Database Prepared by Geosyntec Consultants, Inc. Wright Water

More information

Simplicity, Complexity and Modelling in Clinical Trials

Simplicity, Complexity and Modelling in Clinical Trials Simplicity, Complexity and Modelling in Clinical Trials Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework Programme for research, technological

More information

Statistics, Data Analysis, and Decision Modeling

Statistics, Data Analysis, and Decision Modeling - ' 'li* Statistics, Data Analysis, and Decision Modeling T H I R D E D I T I O N James R. Evans University of Cincinnati PEARSON Prentice Hall Upper Saddle River, New Jersey 07458 CONTENTS Preface xv

More information

THE IMPACT OF SAMPLE DESIGN ON THE PERFORMANCE OF THE SAMPLE GEOMETRIC MEAN AND RELATED ISSUES

THE IMPACT OF SAMPLE DESIGN ON THE PERFORMANCE OF THE SAMPLE GEOMETRIC MEAN AND RELATED ISSUES THE IMACT OF SAMLE DESIGN ON THE EFOMANCE OF THE SAMLE GEOMETIC MEAN AND ELATED ISSUES. Introduction Inflation can be defined as the process of a generalised and persistent increase in the level of prices.

More information

Detecting outliers in multivariate data while controlling false alarm rate

Detecting outliers in multivariate data while controlling false alarm rate Tutorials in Quantitative Methods for Psychology 212, Vol. 8(2), p. 18-121. Detecting outliers in multivariate data while controlling false alarm rate André Achim Université du Québec à Montréal Outlier

More information

Watson-Glaser III Critical Thinking Appraisal (US)

Watson-Glaser III Critical Thinking Appraisal (US) Watson-Glaser III Critical Thinking Appraisal (US) Development Report Candidate Name: Organization: Pearson Sample Corporation Date of Testing: 21-11-2017 (dd-mm-yyy) 21-11-2017 Page 1 of 15 How to Use

More information

Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania

Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Multilevel Modeling Tenko Raykov, Ph.D. Upcoming Seminar: April 7-8, 2017, Philadelphia, Pennsylvania Multilevel Modeling Part 1 Introduction, Basic and Intermediate Modeling Issues Tenko Raykov Michigan

More information

Untangling Correlated Predictors with Principle Components

Untangling Correlated Predictors with Principle Components Untangling Correlated Predictors with Principle Components David R. Roberts, Marriott International, Potomac MD Introduction: Often when building a mathematical model, one can encounter predictor variables

More information

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania

More information

Treatment of Influential Values in the Annual Survey of Public Employment and Payroll

Treatment of Influential Values in the Annual Survey of Public Employment and Payroll Treatment of Influential s in the Annual Survey of Public Employment and Payroll Joseph Barth, John Tillinghast, and Mary H. Mulry 1 U.S. Census Bureau joseph.j.barth@census.gov Abstract Like most surveys,

More information

ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR. The Rand Corporatlon

ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR. The Rand Corporatlon EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1980,40 ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR The Rand Corporatlon PATRICK SUPPES Institute for Mathematmal

More information

Financing Constraints and Firm Inventory Investment: A Reexamination

Financing Constraints and Firm Inventory Investment: A Reexamination Financing Constraints and Firm Inventory Investment: A Reexamination John D. Tsoukalas* Structural Economic Analysis Division Monetary Analysis Bank of England December 2004 Abstract This paper shows that

More information

Improving long run model performance using Deviance statistics. Matt Goward August 2011

Improving long run model performance using Deviance statistics. Matt Goward August 2011 Improving long run model performance using Deviance statistics Matt Goward August 011 Objective of Presentation Why model stability is important Financial institutions are interested in long run model

More information

On of the major merits of the Flag Model is its potential for representation. There are three approaches to such a task: a qualitative, a

On of the major merits of the Flag Model is its potential for representation. There are three approaches to such a task: a qualitative, a Regime Analysis Regime Analysis is a discrete multi-assessment method suitable to assess projects as well as policies. The strength of the Regime Analysis is that it is able to cope with binary, ordinal,

More information

Chapter 3. Table of Contents. Introduction. Empirical Methods for Demand Analysis

Chapter 3. Table of Contents. Introduction. Empirical Methods for Demand Analysis Chapter 3 Empirical Methods for Demand Analysis Table of Contents 3.1 Elasticity 3.2 Regression Analysis 3.3 Properties & Significance of Coefficients 3.4 Regression Specification 3.5 Forecasting 3-2 Introduction

More information

An Exploration of the Relationship between Construction Cost and Duration in Highway Projects

An Exploration of the Relationship between Construction Cost and Duration in Highway Projects University of Colorado, Boulder CU Scholar Civil Engineering Graduate Theses & Dissertations Civil, Environmental, and Architectural Engineering Spring 1-1-2017 An Exploration of the Relationship between

More information

Mergers and Sequential Innovation: Evidence from Patent Citations

Mergers and Sequential Innovation: Evidence from Patent Citations Mergers and Sequential Innovation: Evidence from Patent Citations Jessica Calfee Stahl Board of Governors of the Federal Reserve System January 2010 Abstract An extensive literature has investigated the

More information

Shewhart and the Probability Approach. The difference is much greater than how we compute the limits

Shewhart and the Probability Approach. The difference is much greater than how we compute the limits Quality Digest Daily, November 2, 2015 Manuscript 287 The difference is much greater than how we compute the limits Donald J. Wheeler & Henry R. Neave In theory, there is no difference between theory and

More information

Statistical Sampling in Healthcare Audits and Investigations

Statistical Sampling in Healthcare Audits and Investigations Statistical Sampling in Healthcare Audits and Investigations Michael Holper SVP Compliance and Audit Services Trinity Health Stefan Boedeker Managing Director Berkley Research Group LLC HCCA Compliance

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

Chapter 3. Database and Research Methodology

Chapter 3. Database and Research Methodology Chapter 3 Database and Research Methodology In research, the research plan needs to be cautiously designed to yield results that are as objective as realistic. It is the main part of a grant application

More information

Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression

Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression Chapter Six- Selecting the Best Innovation Model by Using Multiple Regression 6.1 Introduction In the previous chapter, the detailed results of FA were presented and discussed. As a result, fourteen factors

More information

Wind Power Variations are exported

Wind Power Variations are exported 1 Wind Power Variations are exported Can we make better use of Danish wind energy? The installation of new wind turbines in Denmark has been seen as an essential step towards meeting the targets in climate

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 1.138/NCLIMATE1585 Adaptation of US maize to temperature variations Ethan Butler and Peter Huybers Department of Earth and Planetary Sciences Harvard University 2 Oxford

More information

Solved Scanner CS Foundation Paper-2, June 2009 Economics and Statistics Part-A ECONOMICS

Solved Scanner CS Foundation Paper-2, June 2009 Economics and Statistics Part-A ECONOMICS Solved Scanner CS Foundation Paper-2, June 2009 Economics and Statistics Part-A ECONOMICS 1(a) (i) (ii) (iii) (iv) (v) (b) (i) State, with reasons in brief, whether the following statements are correct

More information

Hotel Industry Demand Curves

Hotel Industry Demand Curves Cornell University School of Hotel Administration The Scholarly Commons Articles and Chapters School of Hotel Administration Collection 2012 Hotel Industry Demand Curves John B. Corgel Cornell University,

More information

Time Series Analysis in the Social Sciences

Time Series Analysis in the Social Sciences one Time Series Analysis in the Social Sciences in the social sciences, data are usually collected across space, that is, across countries, cities, and so on. Sometimes, however, data are collected across

More information

Is There an Environmental Kuznets Curve: Empirical Evidence in a Cross-section Country Data

Is There an Environmental Kuznets Curve: Empirical Evidence in a Cross-section Country Data Is There an Environmental Kuznets Curve: Empirical Evidence in a Cross-section Country Data Aleksandar Vasilev * Abstract: This paper tests the effect of gross domestic product (GDP) per capita on pollution,

More information

Coal Combustion Residual Statistical Method Certification for the CCR Landfill at the Boardman Power Plant Boardman, Oregon

Coal Combustion Residual Statistical Method Certification for the CCR Landfill at the Boardman Power Plant Boardman, Oregon Coal Combustion Residual Statistical Method Certification for the CCR Landfill at the Boardman Power Plant Boardman, Oregon Prepared for Portland General Electric October 13, 2017 CH2M HILL Engineers,

More information

Gasoline Consumption Analysis

Gasoline Consumption Analysis Gasoline Consumption Analysis One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand

More information

Experimental design of RNA-Seq Data

Experimental design of RNA-Seq Data Experimental design of RNA-Seq Data RNA-seq course: The Power of RNA-seq Thursday June 6 th 2013, Marco Bink Biometris Overview Acknowledgements Introduction Experimental designs Randomization, Replication,

More information

Real Estate Modelling and Forecasting

Real Estate Modelling and Forecasting Real Estate Modelling and Forecasting Chris Brooks ICMA Centre, University of Reading Sotiris Tsolacos Property and Portfolio Research CAMBRIDGE UNIVERSITY PRESS Contents list of figures page x List of

More information

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge MODELING THE EXPERT An Introduction to Logistic Regression 15.071 The Analytics Edge Ask the Experts! Critical decisions are often made by people with expert knowledge Healthcare Quality Assessment Good

More information

CHAPTER 4 EMPIRICAL ANALYSIS AND DISCUSSION

CHAPTER 4 EMPIRICAL ANALYSIS AND DISCUSSION 79 CHAPTER 4 EMPIRICAL ANALYSIS AND DISCUSSION 4.1 SAMPLE SELECTION AND DESCRIPTIVE STATISTICS The study begins by identifying industrial sectors heavily reliant on intellectual capital. The data covers

More information

Groundwater Statistical Methods Certification. Neal South CCR Monofill Permit No. 97-SDP-13-98P Salix, Iowa. MidAmerican Energy Company

Groundwater Statistical Methods Certification. Neal South CCR Monofill Permit No. 97-SDP-13-98P Salix, Iowa. MidAmerican Energy Company Groundwater Statistical Methods Certification Neal South CCR Monofill Permit No. 97-SDP-13-98P Salix, Iowa MidAmerican Energy Company GHD 11228 Aurora Avenue Des Moines Iowa 50322-7905 11114654 Report

More information

Demanding Baselines: Analysis of Alternative Load Estimation Methods for Two Large C&I Demand Response Programs

Demanding Baselines: Analysis of Alternative Load Estimation Methods for Two Large C&I Demand Response Programs Demanding Baselines: Analysis of Alternative Load Estimation Methods for Two Large C&I Demand Response Programs Assessing Baseline Load Methods for California s Critical Peak Pricing & Demand Bidding Programs

More information

Groundwater Statistical Methods Certification. Neal North Impoundment 3B Sergeant Bluff, Iowa. MidAmerican Energy Company

Groundwater Statistical Methods Certification. Neal North Impoundment 3B Sergeant Bluff, Iowa. MidAmerican Energy Company Groundwater Statistical Methods Certification Neal North Impoundment 3B Sergeant Bluff, Iowa MidAmerican Energy Company GHD 11228 Aurora Avenue Des Moines Iowa 50322-7905 11114642 Report No 11 October

More information

ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison

ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison ISO 13528:2015 Statistical methods for use in proficiency testing by interlaboratory comparison ema training workshop August 8-9, 2016 Mexico City Class Schedule Monday, 8 August Types of PT of interest

More information

Watson Glaser III (US)

Watson Glaser III (US) Watson Glaser III (US) Development Report Candidate Name: Organization: Pearson Sample Corporation Date of Testing: (dd mm yyyy) Page 1 of 15 How to Use Your Report Success in the 21st century workplace

More information

CHAPTER 5 DATA ANALYSIS AND DISCUSSION

CHAPTER 5 DATA ANALYSIS AND DISCUSSION CHAPTER 5 DATA ANALYSIS AND DISCUSSION 54 5. Data Analysis and Discussion This chapter is organized as follows: Section 5.1 provides the descriptive statistics of the respondents. Sections 5.1 and 5.2

More information

Bios 6648: Design & conduct of clinical research

Bios 6648: Design & conduct of clinical research Bios 6648: Design & conduct of clinical research Section 3 - Essential principle (randomization) 3.4 Trial monitoring: Interim decision and group sequential designs Bios 6648- pg 1 (a) Recruitment and

More information

Groundwater Statistical Methods Certification. Neal North CCR Monofill Permit No. 97-SDP-12-95P Sergeant Bluff, Iowa. MidAmerican Energy Company

Groundwater Statistical Methods Certification. Neal North CCR Monofill Permit No. 97-SDP-12-95P Sergeant Bluff, Iowa. MidAmerican Energy Company Groundwater Statistical Methods Certification Neal North CCR Monofill Permit No. 97-SDP-12-95P Sergeant Bluff, Iowa MidAmerican Energy Company GHD 11228 Aurora Avenue Des Moines Iowa 50322-7905 11114642

More information

Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections

Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections (Based on APA Journal Article Reporting Standards JARS Questionnaire) 1 Archives of Scientific Psychology Reporting Questionnaire for Manuscripts Describing Primary Data Collections JARS: ALL: These questions

More information

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS Anirvan Banerji New York 24th CIRET Conference Wellington, New Zealand March 17-20, 1999 Geoffrey H. Moore,

More information

Multilevel Modeling and Cross-Cultural Research

Multilevel Modeling and Cross-Cultural Research 11 Multilevel Modeling and Cross-Cultural Research john b. nezlek Cross-cultural psychologists, and other scholars who are interested in the joint effects of cultural and individual-level constructs, often

More information

LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA

LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA AND MODEL AVERAGING Ethan Cohen-Cole Federal Reserve Bank of Boston Working Paper No. QAU07-8 Todd Prono Federal Reserve Bank of Boston This paper can be downloaded

More information

A Closer Look at the Impacts of Olympic Averaging of Prices and Yields

A Closer Look at the Impacts of Olympic Averaging of Prices and Yields A Closer Look at the Impacts of Olympic Averaging of Prices and Yields Bruce Sherrick Department of Agricultural and Consumer Economics University of Illinois November 21, 2014 farmdoc daily (4):226 Recommended

More information

demographic of respondent include gender, age group, position and level of education.

demographic of respondent include gender, age group, position and level of education. CHAPTER 4 - RESEARCH RESULTS 4.0 Chapter Overview This chapter presents the results of the research and comprises few sections such as and data analysis technique, analysis of measures, testing of hypotheses,

More information

Draft agreed by Scientific Advice Working Party 5 September Adopted by CHMP for release for consultation 19 September

Draft agreed by Scientific Advice Working Party 5 September Adopted by CHMP for release for consultation 19 September 23 January 2014 EMA/CHMP/SAWP/757052/2013 Committee for Medicinal Products for Human Use (CHMP) Qualification Opinion of MCP-Mod as an efficient statistical methodology for model-based design and analysis

More information

ESTIMATION OF FLOW-DURATION CURVES AT UNGAGED SITES IN SOUTHERN NEW ENGLAND. By Stacey A. Archfield, Richard M. Vogel, and Sara L.

ESTIMATION OF FLOW-DURATION CURVES AT UNGAGED SITES IN SOUTHERN NEW ENGLAND. By Stacey A. Archfield, Richard M. Vogel, and Sara L. ESTIMATION OF FLOW-DURATION CURVES AT UNGAGED SITES IN SOUTHERN NEW ENGLAND By Stacey A. Archfield, Richard M. Vogel, and Sara L. Brandt Department of Civil and Environmental Engineering, Tufts University,

More information

Energy savings reporting and uncertainty in Measurement & Verification

Energy savings reporting and uncertainty in Measurement & Verification Australasian Universities Power Engineering Conference, AUPEC 2014, Curtin University, Perth, Australia, 28 September 1 October 2014 1 Energy savings reporting and uncertainty in Measurement & Verification

More information

MARK SCHEME for the May/June 2010 question paper for the guidance of teachers 9772 ECONOMICS. 9772/02 Paper 2 (Essays), maximum raw mark 75

MARK SCHEME for the May/June 2010 question paper for the guidance of teachers 9772 ECONOMICS. 9772/02 Paper 2 (Essays), maximum raw mark 75 UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS Pre-U Certificate www.xtremepapers.com MARK SCHEME for the May/June 2010 question paper for the guidance of teachers 9772 ECONOMICS 9772/02 Paper 2 (Essays),

More information

Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels

Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels W O R K I N G P A P E R S A N D S T U D I E S ISSN 1725-4825 Revision confidence limits for recent data on trend levels, trend growth rates and seasonally adjusted levels Conference on seasonality, seasonal

More information

APPENDIX AVAILABLE ON THE HEI WEB SITE

APPENDIX AVAILABLE ON THE HEI WEB SITE APPENDIX AVAILABLE ON THE HEI WEB SITE Research Report 177 National Particle Component Toxicity (NPACT) Initiative: Integrated Epidemiologic and Toxicologic Studies of the Health Effects of Particulate

More information

Practical Exploratory Factor Analysis: An Overview

Practical Exploratory Factor Analysis: An Overview Practical Exploratory Factor Analysis: An Overview James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Practical Exploratory Factor

More information

Quality in Market Positioning

Quality in Market Positioning September 26, 2002 1 of 9 This chapter is organized to address three key facets of market positioning. We begin by offering working definitions of market position and market positioning. We then outline

More information

FUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT

FUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT FUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT Third Edition AMITAVA MITRA Auburn University College of Business Auburn, Alabama WILEY A JOHN WILEY & SONS, INC., PUBLICATION PREFACE xix PARTI PHILOSOPHY

More information

Predicting ratings of peer-generated content with personalized metrics

Predicting ratings of peer-generated content with personalized metrics Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu

More information

Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report

Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report Dr Paul A. Tiffin, Reader in Psychometric Epidemiology,

More information

Report on the Examination

Report on the Examination Version 1.0 General Certificate of Education (A-level) January 2011 Economics ECON1 (Specification 2140) Unit 1: Markets and Market Failure Report on the Examination Further copies of this Report on the

More information

Chapter -7 STRUCTURAL EQUATION MODELLING

Chapter -7 STRUCTURAL EQUATION MODELLING Chapter -7 STRUCTURAL EQUATION MODELLING STRUCTURAL EQUATION MODELLING Chapter 7 7.1 Introduction There is an increasing trend in usage of structural equation modelling (SEM) in management research. Conceptual

More information

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test

CHAPTER 8 T Tests. A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test CHAPTER 8 T Tests A number of t tests are available, including: The One-Sample T Test The Paired-Samples Test The Independent-Samples T Test 8.1. One-Sample T Test The One-Sample T Test procedure: Tests

More information

The Effects of Model Misfit in Computerized Classification Test. Hong Jiao Florida State University

The Effects of Model Misfit in Computerized Classification Test. Hong Jiao Florida State University Model Misfit in CCT 1 The Effects of Model Misfit in Computerized Classification Test Hong Jiao Florida State University hjiao@usa.net Allen C. Lau Harcourt Educational Measurement allen_lau@harcourt.com

More information

ESTIMATION OF SITE-SPECIFIC ELECTRICITY CONSUMPTION IN THE ABSENCE OF METER READINGS: AN EMPIRICAL EVALUATION OF PROPOSED METHODS

ESTIMATION OF SITE-SPECIFIC ELECTRICITY CONSUMPTION IN THE ABSENCE OF METER READINGS: AN EMPIRICAL EVALUATION OF PROPOSED METHODS ESTIMATION OF SITE-SPECIFIC ELECTRICITY CONSUMPTION IN THE ABSENCE OF METER READINGS: AN EMPIRICAL EVALUATION OF PROPOSED METHODS David L. Ryan and Denise Young April 2007 CBEEDAC 2007 RP-05 DISCLAIMER

More information