Two-way CASE ANOVA Using SPSS

Size: px
Start display at page:

Download "Two-way CASE ANOVA Using SPSS"

Transcription

1 Two-way CASE ANOVA Using SPSS Copyright 2015, J. Toby Mordkoff Premises 1. We want the most external validity with respect to the subjects. Therefore, we want to maximize the number of subjects that are included in each error-term. 2. We want the most power, especially for tests of between-subject effects. Therefore, we want the highest possible degrees of freedom in each error-term, which, again, requires the maximum number of subjects. 3. We want differences in significance to primarily depend on differences in the sizes of the effects and not on differences in the (unexplained) within-group variances. Therefore, whenever possible, we want to pool the separate within-group variances into a single, common estimate of error that can then be used for all relevant tests. At the same time, we do not wish to include data from conditions that are irrelevant to the question at hand unless forced to do so in order to include every subject. Therefore, we do not want to pool across levels of a within-subjects factor, as this will not add any more subjects to the analysis. 4. We want to be consistent. Therefore, for every level of the analysis, we want to use the same decision with regard to any particular issue, including which type of error-term should be used for a given type of test and/or whether and what type of correction for multiple tests should be used. These premises are summarized by the label: consistent all-subjects-error (CASE) ANOVA. Choice of Error-term(s) for Two-way Analysis Following from the above When conducting a factorial (pure between-subjects) analysis, the pooled (common) error-term from the initial ANOVA should be used for all tests, provided that no strong evidence against the assumption of equal variance across groups is found. This will include all subjects in all tests, provide the most degrees of freedom, cause differences in significance (p-values) to depend on differences in the means, and be entirely consistent across levels of the analysis. When conducting a repeated-measures (pure within-subjects) analysis, new and unique error-terms should be created for each test, as long as consistent corrections for any violations of sphericity [defined below] are applied (or the analysis switches to employing MANOVA). All subjects will still be included in every error-term, so there will be no reduction in the degrees of freedom, but this will prevent variations in irrelevant conditions from influencing the conclusions concerning effects. The same logic can be applied at all levels of the analysis by generating new error-terms for every subsequent test. When conducting a mixed-factor analysis, the error-terms should be pooled across levels of the betweensubjects factor while being unique across levels of the within-subjects factor, provided that no evidence against equal variance across groups is observed and consistent corrections are applied for any violations of sphericity. As above, this will always include every subject and maximize the degrees of freedom, use identical error-terms across tests involving different groups, prevent irrelevant variance from influencing conclusions, and be consistent across levels of the analysis. These decisions with regard to the type of error-term to use for each type of test are not atypical; in fact, they appear to be standard, if not universal (even if the reasons for these choices have rarely been stated).

2 The Equality Assumptions for Two-way ANOVA In order to use a common error-term, pooled across groups, for all tests, as we do for factorial designs, the separate within-group variances must not be (significantly) different from each other. This assumption, which is referred to as homogeneity of variance or homoscedasticity, is usually assessed via Levene s Test. This test has equal variance as the null hypothesis, such that a significant Levene s is evidence against the assumption, which would imply that the standard ANOVA model should not be used. Because a factorial design has only one error-term (which is used for all tests), only one application of Levene s is required. In SPSS, Levene s Test must be requested. This is done by clicking Homogeneity tests in the Options sub-menu of the appropriate ANOVA procedure (i.e., Analyze General Linear Mode Univariate for a factorial design or Analyze General Linear Model Repeated Measures for a mixed design). The other equality assumption concerns the bivariate correlations (across subjects) between levels of each within-subject factor. These must all be approximately equal, which is known as sphericity. (Note that sphericity always holds for a two-level, within-subjects factor, because only one correlation is possible, so all correlations are automatically equal.) This assumption is usually assessed via Mauchly s Test and, as was true for Levene s, the null hypothesis is that the assumption is being obeyed, such as a significant Mauchly s is evidence against sphericity. When a significant violation of sphericity is found, some form of correction (e.g., Greenhouse-Geisser or Huyhn-Feldt) should be applied to the degrees of freedom, to prevent an increase in the rate of Type-I errors. A two-way, repeated-measures design will require up to three different Mauchly s: one for each factor with three or more levels and one for the interaction. Only a 2 2 design will not require any tests of sphericity. In SPSS, Mauchly s Test is automatically run when Analyze General Linear Model Repeated Measures is used. The output will include three different epsilon values (Greenhouse-Geisser, Huyhn-Feldt, and Lower Bound), which are alternative correction-multipliers for the degrees of freedom. The ANOVA table will also include four separate rows for every within-subjects test, one for the uncorrected test and three others using each of the epsilon values. It is up to the user to decide which part of the output to keep; the least conservative, Huyhn-Feldt, has been shown to be adequate. In the case of a mixed-factor design, both of the equality assumptions apply, albeit in ways that are less obvious and not particularly straight-forward. What needs to be kept in mind is that any error-term that involves the pooling of separate values from separate groups need to be homoscedastic, while any errorterm that concerns a within-subjects effect needs to be spherical. The error-term for the test of the main effect of the between-subjects factor (in a mixed-factor analysis) only needs to be homoscedastic. There is no sphericity assumption for this test because each subject only contributes a single value to the analysis: their overall mean across the levels of the within-subjects factor. To assess this assumption, Levene s Test should be applied to the subject means At least two alternatives to the application of Levene s Test to the subject means have been used. The first alternative argues that a non-significant Box s Test (which concerns the co-variances) is sufficient to ensure equal variance across groups for the between-subjects main effect. The second alternative is to run separate Levene s on each of the levels of the within-subjects factor. The first alternative is only appropriate when the correlations across levels of the within-subjects factor are uniformly very high (as opposed to merely being equal), which is seldom true and rarely verified. (Mauchly s Test only verifies that the correlations are equal, not that they re uniformly high.) The second alternative runs the risk of producing false evidence against the required assumption by conducting multiple tests. Therefore, and to follow the premise of CASE ANOVA that rules be applied consistently, the homogeneity test on the subject means should always be conducted, regardless of the outcome of Mauchly s.

3 The error-term for the test of the main effect of the within-subjects factor (in a mixed-factor analysis), which is also the error-term for the test of the interaction, needs to be both spherical and homoscedastic (because it involves multiple measures from each subject and pooling across separate groups). The sphericity assumption is assessed using Mauchly s and, when a significant violation is found, an epsilon correction (e.g., Huyhn-Feldt) is applied to the degrees of freedom for both tests; this is the same as was done for a (pure) repeated-measures design. In contrast to before, however, the assumption of equal variance is now an assumption of equal co-variance, because each subject is providing more than one value to these tests. Therefore, instead of using Levene s Test (of equal variance), one must use Box s Test (of equal co-variance). In total, therefore: for a mixed-factor analysis to proceed safely, an application of Levene s Test to the subject means should be non-significant, an application of Box s Test to the entire set of data needs to be non-significant, and an epsilon needs to be applied to the degrees of freedom of both within-subject tests if Mauchly s reveals a sphericity problem. In SPSS, Box s Test must be specifically requested by clicking Homogeneity tests under Options when using Analyze General Linear Model Repeated Measures on mixed-factor data. Separate Levene s will then also be run on each level of the within-subjects factor, but these are not appropriate for verifying equal variance for the between-subjects main effect, not are they useful for testing the assumptions required for the withinsubjects main effect or the interaction. Instead, the mean across all levels of the within-subjects factor should be calculated for each subject separately (which is easily done using Data Transform Compute) and this new, single column of means should then analyzed as a one-way between (using Analyze General Linear Model Univariate) with Homogeneity tests under Options switch on. Then, during the main analysis, the same option should be clicked, such that Box s Test is run on the full data-set. Initial Analysis While it is typical to report the main effects before the interaction, the last is the most important for the analysis. If the interaction is significant, then the main effects are (mostly) ignored and a set of simple main effects is conducted, instead, which may then be followed by pair-wise comparisons. If the interaction is not significant, but at least one main effect with at least three levels is significant, then one should proceed directly to pair-wise comparisons. Otherwise, one is done. SPSS does the initial analysis in a manner consistent with CASE ANOVA: a common (pooled) error-term is used for all between-subject effects, while unique error-terms are used for within-subject effects. Furthermore, as will be shown next, SPSS does pair-wise comparisons (and post-hocs) for significant main effects in the way that we want, requiring only a few extra clicks of some buttons. Only the analysis of interactions is complicated. (The next section will assume no interaction and go directly to the pair-wise comparisons. The procedures needed to continue the analysis when the interaction is significant will be dealt with second.) Pair-wise Comparisons for Main Effects Assuming no significant interaction, but at least one significant main effect with three or more levels, some pair-wise comparisons are required to determine which levels of the significant factor are different from which other levels. Under CASE ANOVA, the same decisions with regard to error-terms should apply to these follow-up tests: between-subject comparisons should continue to use the pooled error-term from the initial ANOVA, while within-subject comparisons should use new and unique error-terms. The only question is whether a correction for multiple tests will be applied and, if so, which correction will be employed. On the assumption that the user wishes to maintain a family-wise false-alarm rate of α, the various options for correcting for multiple tests will be briefly reviewed.

4 In the case of a between-subjects effect, there is a wide variety of procedures for correcting for multiple tests. These procedures may be divided into two general categories: those that apply the correction to the p-values (e.g., Bonferroni) and those that apply the correction to the common error-term (e.g., Tukey s HSD). In contrast, only p-value corrections are available for within-subject effects (because withinsubject pairs have unique error-terms). The strongest argument in favor of using error-term corrections, such as Tukey s HSD, for between-subject effects, is that they have more power than either of the two well-known p-value corrections, which are Bonferroni and Dunn-Šidák. Given that we wish to maximize power (as long as the family-wise rate of Type-I errors remains at α or less), this might suggest that we relax our requirement that all decisions be applied in a consistent manner, allowing between-subject effects to be processed using error-term correction while within-subject effects use p-value correction. Parsing an Interaction into Simple Main Effects Generally speaking, an interaction (in a two-way design) is when the effect of one factor depends on the level of the other factor. When exploring the specifics, an interaction may be parsed in either direction: one can examine the effect of Factor 1 at each level of Factor 2, or one can examine the effect of Factor 2 at each level of Factor 1. There are many heuristics for choosing a parsing, but these are beyond the scope of this discussion because most concern theory, instead of statistics. But regardless of how this decision is made, the factor whose effect will be tested (at each level of the other factor) becomes the examined or to-be-examined or moderated factor, while the factor whose levels define the separate tests is the moderating factor or just the moderator. The name for these tests is simple main effects. Simple Main Effects for a Two-way, Factorial Design The previous decision to employ a common error-term for all between-subject tests implies that the simple main effects for a factorial design cannot be conducted by simply running new and separate analyses of the examined factor at each level of the moderating factor. To do so would go against all four of the premises of CASE ANOVA, as each of these tests would ignore many subjects, have fewer than the maximum degrees of freedom, would allow differences in significance to occur due to differences in the (unexplained) within-group variances, and would not be consistent with the initial ANOVA. Instead, while new and separate explained variances must be found for each SME (i.e., each SME will have its own numerator for its F test), the error-term should be the same for all SMEs, each being equal to the original error-term from the initial ANOVA, as that is the only error-term that includes every subject. The same holds for any pair-wise comparisons that might be required (i.e., when an SME is significant and has three or more levels). The original, pooled error-term should be used for these tests, as well, as only this will include all of the subjects, have the highest degrees of freedom, and be consistent with the previous steps. The p-values from these tests should then be corrected for multiple comparisons. Unfortunately, while the information necessary to conduct SMEs using the original error-term from the two-way analysis is available to the SPSS Analyze General Linear Mode Univariate procedure, there appears to be no menu-driven, point-and-click method to get SPSS to use it. Instead, in order to avoid using, e.g., Split File and then recalculating all of the F-values by hand, one must use syntax (i.e., text-based commands) to get a set of SMEs with a common error-term. The easiest way to do this is to start by setting up the two-way analysis using the menus of Analyze General Linear Mode Univariate; this is probably what you did to run the initial ANOVA. Then, instead of clicking Run, go into the Options sub-menu and push the interaction term only from the left to the right at the top (and do not click Compare main effects). Then return to the main menu and click Paste. This will open a syntax box with all of your current settings already typed in. Find the line in the

5 syntax that starts with /EMMEANS=TABLES and add COMPARE(moderated) at the end, where moderated is replaced by the name of the variable (in the active data-set s spreadsheet) that contains the levels of to-beexamined factor. Then also add ADJ(SIDAK) to that same line. Then Run the syntax. The output will not only include the desired set of SMEs (each using the same, original, common error-term), but it will also include all of the pair-wise comparisons that you also might need if you find a significant SME with more than two levels. Assuming that the factor-level codes are in columns called IV1 and IV2, the data are in a column called DV, and you have decided to parse the interaction by looking at the simple effects of IV2 at each level of IV1, here is what the final syntax should be (with the part that you added in bold): UNIANOVA DV BY IV1 IV2 /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /EMMEANS=TABLES(IV1*IV2) COMPARE(IV2) ADJ(SIDAK) /PRINT=ETASQ /CRITERIA = ALPHA(.05) /DESIGN = IV1 IV2 IV1*IV2. Simple Main Effects for a Two-way, Repeated-measures Design In the case of a repeated-measures design, unique error-terms are used for every test under CASE ANOVA, since these will always include all of the subjects. Note, however, that each new error-term evokes new questions about the required assumptions. In this case, each of these new error-terms should be checked for sphericity using Mauchly s (if the examined factor has more than two levels). Those SMEs that exhibit significant violations of sphericity should have their degrees of freedom reduced using an epsilon; those with non-significant Mauchly s do not need correction. To conduct SMEs for a (pure) repeated-measures design using SPSS, run Analyze General Linear Model Repeated Measures with the moderator removed from the model. Repeat this procedure for each sub-set of the data (i.e., each sub-set of columns) that correspond to one level of the moderator. Remember to look at each new Mauchly s and make a new and separate decision as to whether to use the uncorrected or a corrected row from the output. When one finds a significant SME with more than two levels, pair-wise comparisons should be conducted, again using new and unique error-terms. Because each of these tests will concern only two levels of a within-subjects factor, they cannot violate sphericity, so neither Mauchly s nor an epsilon is ever needed. To conduct pair-wise comparisons as the follow-up to a significant repeated-measures SME with more than two levels, either run Analyze Compare Means Paired-Samples T Test requesting all possible pairs of the tobe-examined factor at the appropriate level of the moderating factor or simply push the examined factor from left to right at the top of the Options sub-menu, click Compare main effects, and request Sidak when running the separate SMEs. The latter method avoids having to correct the p-values by hand. Simple Main Effects for a Two-way, Mixed-factor Design This can be the most complicated analysis and the details depend on whether the between-subjects factor is acting as the moderator or is the to-be-examined. When the moderator is the between and the examined is the within, the SMEs should use the within-subjects error-term from the original analysis; otherwise, they will not include every subject. When the moderator is the within and the examined is the between, then unique error-terms may be used for each SME, as each new test will include all of the subjects; however, a new test of assumptions will also be needed, because a new form of the pooling across groups will occur. From a purely statistical perspective, the former parsing (i.e., examine the within at each level of the

6 between) is greatly preferred, as it has much more power because the SMEs will all employ a withinsubjects error-term. Because it s preferred, it shall be discussed first. To meet the requirements of CASE ANOVA (for a two-way, mixed-factor design), the correct error-term to use for examining the SMEs of the within-subjects factor at each level of the between-subjects factor is the original error-term that was used to test both the main effect of the within-subjects factor and the interaction. Using new and separate error-terms (which is what would happen if separate within-subject analyses were conducted for each of the groups), would go against our premises. Under CASE ANOVA, the error-terms should always be pooled across all of the groups. This is the most difficult analysis to conduct using SPSS, because only some procedures will provide the SMEs in the form of a repeated-measure analysis, which is what we need to be consistent with the initial ANOVA. As will be shown later, other procedures can do the appropriate test, but will only provide the output in the form of multivariate tests; yet, these other procedures will be required for the pair-wise comparisons. As was true for factorial designs, the needed information is available, but not using menu-based, point-and-click methods. One way to get the SMEs (as repeated-measures tests) is to enter the following into an empty new syntax, which was written for a 2 3 mixed design with the level-codes for the between in a column called nbtwn (with values 1 and 2, since this particular procedure requires numerical codes for between-subjects factors) and the three levels of the within-subjects factor in columns called W1, W2, and W3 that shall be given the (arbitrary) name of Wthn while running the procedure: MANOVA W1 W2 W3 BY nbtwn(1 2) /WSFACTORS Wthn (3) /DESIGN MWITHIN nbtwn(1) MWITHIN nbtwn(2). Note that because the MANOVA procedure of SPSS hasn t been updated, the output will not appear in the same (pretty) tables as other SPSS output. Furthermore, while an application of Mauchly s to all of the data will be provided (which will be the same as what was found in the two-way mixed analysis), along with all three epsilon values, the ANOVA table at the end will only include the uncorrected tests. If sphericity is being violated, apply the same value of epsilon as was used for the (mixed) interaction and then look up the p-values separately. If the above example were increased in size to have a four-level between and a five-level within (with all labels being roughly the same as above), the new syntax would be: MANOVA W1 W2 W3 W4 W5 BY nbtwn(1 4) /WSFACTORS Wthn (5) /DESIGN MWITHIN nbtwn(1) MWITHIN nbtwn(2) MWITHIN nbtwn(3) MWITHIN nbtwn(4). With regard to any subsequent pair-wise comparisons, these should use an error-term that is pooled across the levels of the between-subjects factor, while being unique to the pair of within-subject levels that are being compared. This will continue to include every subject, but eliminate the influence of within-subject conditions that are irrelevant to the current pair. While each of these error-terms will be new and unique, no new tests of sphericity are required because each will concern only two levels of the within-subjects factor, such that sphericity cannot be violated (by definition). 2 Once obtained, the p-values for these pairwise comparisons should be corrected for multiple tests. As mentioned above, in order to get pair-wise comparisons that use error-terms that are pooled across groups while being unique for within-subject pairs, a different SPSS procedure (from above) must be used via syntax. Fortunately, as was true for factorial designs, a menu-driven option may be used to do most of the set-up; in this 2. It could be argued that these new and unique error-terms all need to have their equality of co-variance verified by separate applications of Box s Test. However, if equality of co-variance was previous verified across all levels of the within-subjects factor, then new tests on specific pairs of levels are not needed and could cause the analysis to halt due to multiple opportunities for a Type-I error to occur (with regard to this assumption).

7 case, use Analyze General Linear Model Repeated Measures, as you did when performing the initial twoway analysis. Under the Options sub-menu, push the interaction term only from the left to the right at the top (and do not click Compare main effects); then click Paste, instead of OK. Locate the line in the syntax that starts with /EMMEANS=TABLES and add COMPARE(within-name) and ADJ(SIDAK) at the end, where within-name is replaced by the arbitrary name that you gave to the within-subjects factor (which will appear in the syntax in the /WSFACTOR= row; they must match). Assuming the same data as in the 2 3 example used for the SMEs, the final syntax will be (with the part that you added in bold): GLM W1 W2 W3 BY nbtwn /WSFACTOR = Wthn 3 Polynomial /METHOD SSTYPE(3) /EMMEANS = TABLES(nBtwn*Wthn) COMPARE(Wthn) ADJ(SIDAK) /PRINT=ETASQ /CRITERIA = ALPHA(.05) /WSDESIGN = Wthn /DESIGN = nbtwn. (Note that the GLM procedure doesn t require numerical coding of the between-subjects factor, so nbtwn may be replaced by the name of the column that contains the string version of the level of the between-subjects factor. Note, also, that in the special case of a two-level within-subjects factor, MANOVA and repeated-measures ANOVA are identical [because sphericity does not apply], so the multivariate tests provided by GLM will be the same as the tests provided by MANOVA, so only the GLM procedure is needed.) (This ends the discussion of parsing and analyzing a mixed interaction by examining the within at each level of the between.) In contrast, if one examines the SMEs of the between-subjects factor at each level of the within-subjects factor, new and separate error-terms are appropriate, as each of these new error-terms will include all of the subjects, will reduce the influence of unexplained variability in irrelevant levels of the within-subjects factor, and be consistent with how the initial ANOVA was conducted. Note, however, that these new error-terms all need to be checked for homoscedasticity, which wasn t verified when the initial ANOVA was being conducted. This should be done using Levene s Test. Recall from earlier that SPSS produces separate Levene s for each level of the within-subjects factor when Analyze General Linear Model Repeated Measures is used on mixed-factor data with Homogeneity tests under Options switched on. These were ignored for the initial ANOVA, but now they are needed as these are the appropriate tests for each of the SMEs. After verifying that all of these are non-significant, conduct separate one-way, between-subject analyses (using Analyze General Linear Mode Univariate) on each column of data (i.e., one for each level of the within-subjects factor). Note that locating the original Levene s in the previous output is not actually necessary, as you can request that these tests be conducted again when running these one-ways to get the SMEs. This is done, as always, by clicking Homogeneity tests in the Options sub-menu. Any necessary pair-wise comparisons (between groups, for a given level of the within-subjects factor) should use the same error-term as previous, associated SME. This is done to involve all of the subjects, to maximize the degrees of freedom, and to prevent differences in significance from being caused by differences in the (unexplained) within-group variances. These pair-wise comparisons should then be corrected for multiple tests. To get the pair-wise comparisons using the desired error-term, push the main effect (of the examined factor) from left to right at the top of the Options sub-menu while conducting the SMEs, then click Compare main effects and ask for Sidak.