ACHIEVEMENT VARIANCE DECOMPOSITION 1. Online Supplement

Size: px
Start display at page:

Download "ACHIEVEMENT VARIANCE DECOMPOSITION 1. Online Supplement"

Transcription

1 ACHIEVEMENT VARIANCE DECOMPOSITION 1 Online Supplement Ones, D. S., Wiernik, B. M., Wilmot, M. P., & Kostal, J. W. (2016). Conceptual and methodological complexity of narrow trait measures in personality-outcome research: Better knowledge by partitioning variance from multiple latent traits and measurement artifacts. European Journal of Personality, 30(4), Abstract In Ones et al. (2016), we reported results for analyses using meta-analytic data to conduct a variance decomposition of a typical other-rated Achievement measure. In this online supplement, we present the details of those analyses. Estimating Measurement Artifacts in Achievement Measures using Meta-Analytic Data We estimated variance components for each source of measurement error variance using meta-analytic data. Connelly (2008) supplied values for other-rated Achievement scales for the coefficient of equivalence (CE; estimated using coefficient alpha, α =.77; p. 256), coefficient of stability (CS; test-retest reliability, r tt =.77; p..264), and interrater reliability between non-selfraters (IRR; r rr =.34; p. 274) 1. Connelly s estimates for these reliability coefficients for other facet scales are shown in Table 2. Note that these estimates include other-ratings for raters with varying levels of familiarity with the individuals being evaluated (i.e., these estimates include friends, coworkers, intimate partners, acquaintances, and strangers). Rating quality and interrater reliability increases with greater familiarity, so the interrater reliability estimate reported would be higher if the included samples were restricted to friends, coworkers, and intimate partners. We used Connelly s (2008) reliability estimates to estimate measurement error variance components for a typical other-rated Achievement scale using the computations below. These methods are summarized in Table Transient error variance (TEV) is score variance due to the particular time of measurement (e.g., from random fluctuations in ratee behavior over time). Ideally, TEV is estimated as the difference between the coefficient of equivalence (CE; parallel forms reliability; internal consistency estimates such as coefficient alpha, α, or coefficient omega, ω) and the coefficient of equivalence and stability (CES; parallel forms of a measure administered at separate times; Schmidt, Le, & Ilies, 2003). However, Connelly (2008) did not report CES results for Achievement measures, so we used the estimate of transient error variance reported by Gnambs (2015) for global Conscientiousness measures (TEV =.10). We subtracted this value from Connelly s estimate of CS (.77) to obtain a mean CES =.67 for Achievement scales. 1 Gnambs (2015) presented meta-analytic reliability estimates based on larger ks. However, these results were for measures of the global Conscientiousness factor. We are interested in estimating measurement error variance components for Conscientiousness facet measures, so we used the results presented by Connelly (2008). We present results for other-rated measures because no meta-analytic estimates of reliability coefficients for self-rated Conscientiousness facet measures are available. * Address correspondence to Brenton M. Wiernik. wiernik@workpsy.ch Copyright Deniz S. Ones, Brenton M. Wiernik, Michael P. Wilmot, Jack W. Kostal

2 ACHIEVEMENT VARIANCE DECOMPOSITION 2 2. Item-specific variance (ISV) is score variance due to the particular items on a scale (i.e., domain sampling from the universe of possible items assessing a construct). ISV is estimated as the difference between the coefficient of stability (CS; test-retest reliability) and the coefficient of equivalence and stability (CES; delayed parallel forms reliability). We used Connelly s (2008) estimate of CS (.77) and our above estimate of CES (.67) to obtain an estimated variance component of ISV = Random response error variance (RRE) is score variance due to momentary variations in attention and mental processes within a measurement occasion (i.e., truly random influences on responses to each item). RRE is estimated as: RRE = 1 + CSE CS CE. By inserting the values for CE, CS, and CES described above, we estimated the variance component as RRE = Scale-specific variance (SSV) is score variance due to idiosyncratic trait conceptualizations across scales. SSV is estimated using the difference between the coefficient of equivalence (CE; internal consistency or parallel forms reliability) and the generalized coefficient of equivalence (GCE; the correlation between two different measures of the same construct administered at the same time; Le, Schmidt, & Putka, 2009). Meta-analytic estimates of the convergence between alternative measures of Achievement (and other Conscientiousness facets) are not available, so we estimated this value by first disattenuating Gnambs (2015) GCE estimate for Conscientiousness factor measures (.74) using his CE estimate for Conscientiousness factor measures (.83) and then re-attenuating it using Connelly s (2008) CE estimate for Achievement facet measures (.77) to estimate GCE =. 69 for Achievement measures. This value is similar to correlations between the IPIP and NEO PI-R achievement-striving scales (Goldberg, 1999; Johnson, 2014). Using this value of GCE =. 69, we estimated the scale-specific variance component as SSV = Rater-specific variance (RSV) is variance unique to a particular rater or data source. RSV may have substantive meaning (e.g., as a general self-evaluative trait; Davies, Connelly, Ones, & Birkland, 2015), but it nevertheless reflects variance that is not shared across raters, so it is often treated as a source of error, especially for other-ratings of personality. RSV is estimated using interrater reliability (IRR) estimates and the random response error variance component: RSV = 1 IRR RRE. The formula assumes that the ratings for both raters are obtained at the same time (or at a relatively short interval); if the ratings are widely separated in time, then TEV should also be subtracted. We used Connelly s (2008) estimate of IRR (.34) and our above estimate of RRE (.13) to obtain an estimated variance component of RSV =. 53 for Achievement.

3 ACHIEVEMENT VARIANCE DECOMPOSITION 3 Table 1. Allocation of variance components by different reliability estimators and methods for computing variance components Reliability Estimator Examples True Score Variance Components Error Score Variance Components Coefficient of equivalence (CE) Coefficient alpha (α) TRV + SSV + TEV + RSV ISV + RRE Coefficient omega (ω) Parallel forms reliability Coefficient of stability (CS) Test-retest reliability TRV + ISV + SSV + RSV TEV + RRE Coefficient of equivalence and stability (CES) Delayed parallel forms reliability TRV + SSV + RSV ISV + TEV + RRE Generalized coefficient of equivalence (GCE) Convergent validity of alternative TRV + TEV + RSV ISV + SSV + RRE measures of the construct Generalized coefficient of equivalence and Delayed convergent validity of TRV + RSV ISV + SSV + TEV + RRE stability (GCES) alternative measures of the construct Interrater reliability (IRR) Interrater correlation TRV + ISV + SSV + TEV:ratee TEV:rater + RSV + RRE Intraclass correlation (see Putka & Hoffman, 2015; Putka, Le, McCloy, & Diaz, 2008 for other estimators) Delayed interrater reliability (DIRR) Delayed interrater correlation TRV + ISV + SSV TEV:ratee + TEV:rater + RSV + RRE Variance Component TRV ISV SSV TEV RRE RSV TEV:ratee TEV:rater Computation Method 1 ISV SSV TEV RRE RSV GCES RSV GCES CS CES CE GCE CE CES 1 - CE CS + CES CS DIRR CE + CS CES IRR IRR DIRR DIRR IRR + CE CES Note. TRV = trait-related variance; ISV = item-specific variance; SSV = scale-specific variance; TEV = temporal error variance (total ratee-specific + rater-specific); RRE = random response; RSV = rater-specific variance (main effects); TEV:ratee = temporal error variance (ratee-specific); TEV:rater = temporary error variance (raterspecific); this formula assumes that rater-specific variance is true score variance (e.g., self-ratings of job satisfaction); this formula assumes that TEV:rater is zero.

4 ACHIEVEMENT VARIANCE DECOMPOSITION 4 6. After estimating each of the above variance components, the proportion of the observed variance in a measure that is due to trait-related variance shared across raters (TRV) can be estimated by subtracting the variance components from one: TR = 1 ISV SSV TEV RRE RSV. This value can also be estimated by subtracting rater-specific variance (RSV) from the generalized coefficient of equivalence and stability (GCES; the correlation between two different measures of the same construct administered at different times; Le et al., 2009). If rater-specific variance is treated as true-score variance rather than error (e.g., as one might consider self-ratings of job satisfaction), then GCES is a direct estimate of TRV. Using the measurement error variance components described above, we obtained an estimate of the trait-related variance component for Achievement measures of TRV = Results for a typical Achievement measure are shown in Table 2. To estimate error variance components for a single Achievement item, we used Connelly s (2008) estimate of single-item CE (.30). Connelly reported results for test-retest reliability coefficients corrected for alpha in both measures. We attenuated these estimates using the single-item CE to estimate single-item CS (CS 1 =.99.30). We estimated values for CES, GCE, GCES, and IRR by first disattenuating their full-scale values using full-scale CE (.77) and then re-attenuating them using the single-item CE (e.g., we estimated single-item interrater reliability to be IRR 1 = ). From these values, we estimated the single-item variance components shown in Table 2. Bifactor Modeling of Latent Traits Contributing to Conscientiousness Facet Scales We estimated the latent trait variance components underlying typical scales assessing the NEO PI-R Conscientiousness facets using a bifactor structural equations model (SEM). We estimated this model using scale intercorrelations provided by T. Judge (personal communication, September 20, 2013; based on Judge, Rodell, Klinger, Simon, & Crawford, 2013). Judge and colleagues study meta-analytically estimated correlations among self-rated personality scales classified as measuring constructs assessed by the NEO PI-R facet scales. 2 Note that the methods described above assume that rater time period interactions (e.g., random rater fluctuations in evaluations of the ratee, such as due to differences in rater mood or rater attention) are negligible. To the degree rater fluctuations are non-negligible, the formulas above will underestimate TRV. This is because the interaction term is included in the estimate of both TEV and RSV. When TEV is estimated using single-rater test-retest reliability, this estimate includes both fluctuations in ratee behaviors (TEV:Ratee) and fluctuations in rater characteristics (TEV:Rater). Similarly, when RSV is estimated by subtracting RRE from (1 IRR), this estimate includes both rater main effects and rater time period interactions (TEV:Rater). A more accurate formula for estimating RSV which accounts for TEV:Rater would be: RSV = CS DIRR. Here, DIRR is a delayed interrater reliability estimate (ratings completed by different raters at different times). This formula prevents TEV:Rater from being counted twice. However, meta-analytic estimates of DIRR are not available, so we could not use this formula in our variance decompositions. Accordingly, the results we present overestimate RSV and underestimate TRV. We anticipate that the magnitude of rater time period interaction effects are likely to be relatively small, but when TRV estimates are also small, impossible results, such as negative TRV, are possible.

5 ACHIEVEMENT VARIANCE DECOMPOSITION 5 Table 2. Reliability estimates from Connelly (2008) and variance components for typical other-rated Achievement measures Type of reliability Estimate Error variance component Achievement scale Achievement item Coefficient of equivalence.77 Trait-related variance (TRV) (CE; coefficient α) Coefficient of stability.77 Item-specific variance (ISV) (CS; test-retest reliability) Interrater reliability (IRR).34 Scale-specific variance (SSV) Coefficient of equivalence and stability.67 Transient error variance (TEV) (CES) a Generalized coefficient of equivalence.69 Random response error (RRE) (GCE) b Generalized coefficient of equivalence and stability (GCES) c.59 Rater-specific variance (RSV) a estimated by subtracting Gnambs (2015) estimate of transient error variance for global Conscientiousness measures (TEV =.10) from CE b estimated by disattenuating Gnambs (2015) GCE for global Conscientiousness measures (GCE =.74) using his estimate of CE for global Conscientiousness measures (CE =.83) and then re-attenuating it using Connelly s (2008) CE estimate for Achievement measures (GCE =.74/.83.77) c estimated by subtracting Gnambs (2015) estimate of transient error variance for global Conscientiousness measures (TEV =.10) from GCE.

6 ACHIEVEMENT VARIANCE DECOMPOSITION 6 We specified the bifactor model based on the empirically-derived Conscientiousness trait structure described by Colin DeYoung (2015; DeYoung, Quilty, & Peterson, 2007). In this structure, the Conscientiousness trait family contains the Conscientiousness factor, two mesolevel aspects, and several narrow facets. The Conscientiousness factor refers to individuals capacity to protect their goals from disruption. The two aspects are Industriousness, which refers to individuals tendency to prioritize long-term or abstract goals over immediate goals, and Orderliness, which refers to individuals tendency to avoid disruptive experiences by following rules set by themselves or others. For narrow facet traits, the most widely-used taxonomy of narrow Conscientiousness facets is the rationally-developed trait taxonomy developed by Costa and McCrae for the NEO PI-R (Costa & McCrae, 1992). This taxonomy includes six Conscientiousness facet constructs competence, order, dutifulness, achievement-striving, selfdiscipline, and deliberation. Judge et al. (2013) used the NEO taxonomy to classify personality scales and report intercorrelations among facet scales, so we also used this taxonomy for the present analyses. We specified a bifactor model wherein each of the six observed Conscientiousness scales (achievement-striving, dutifulness, self-discipline, deliberation, competence, order) loaded onto a corresponding latent facet factor as well as a latent global Conscientiousness factor. Following the results of DeYoung et al. (2007), we also allowed achievement-striving, dutifulness, selfdiscipline, competence, and deliberation to load onto a latent Industriousness aspect factor. The Orderliness aspect trait was assessed using only a single observed scale (order), so no separate aspect-level trait was specified. All loadings were constrained to be positive. Intercorrelations reflect within-inventory, same-rater correlations between facet scales, so the error variance for each observed variable was fixed to 1 CES as estimated above. Models were estimated using OpenMx (Boker et al., 2015) in R (R Core Team, 2015). The initial bifactor model aberrant results for the deliberation and competence scales, with both scales hitting the zero lower bound for their loading on Industriousness. When the lower bound constraint was freed, the deliberation showed a negligible negative loading on the Industriousness aspect (-.05), suggesting that the measures in this category reflected only variance its corresponding facet and the Conscientiousness factor, but not the Industriousness aspect. When its lower bound was freed, competence showed a moderate negative loading on the Industriousness aspect (-.22). This result is likely because of the particular scales classified as measuring competence by Judge et al. (2013), many of which do not appear to reflect narrow Conscientiousness-related traits (e.g., the Eysenck Personality Inventory inferiority scale). As a result, the specific factor variance of this measure in our analyses likely reflects the influence of other Big Five traits, rather than Conscientiousness aspect or facet traits. We refit the bifactor model by removing the loadings of competence and deliberation onto Industriousness. This model showed excellent fit (RMSEA =.058 [95% CI ], TLI=.956), and fit was essentially identical to that of the less parsimonious model including free Industrious loadings for competence and deliberation (RMSEA =.062 [95% CI ], TLI=.952), so we retained the more parsimonious model. For achievement, factor loadings for the retained model were Conscientiousness (.56), Industriousness (.36), Achievement (.48), and error (.57).

7 ACHIEVEMENT VARIANCE DECOMPOSITION 7 References Boker, S. M., Neale, M. C., Maes, H. H., Wilde, M. J., Spiegel, M., Brick, T. R., Team OpenMx. (2015). OpenMx 2.0 User Guide (Release No ). Charlottesville, VA: University of Virginia. Retrieved from Connelly, B. S. (2008, August). The reliability, convergence, and predictive validity of personality ratings: An other perspective (Doctoral dissertation). University of Minnesota, Minneapolis, MN. Retrieved from University of Minnesota Digital Conservancy. (60223) Costa, P. T., & McCrae, R. R. (1992). Professional manual for the NEO Personality Inventory (NEO PI- R) and NEO Five Factor Inventory (NEO-FFI) (Technical manual). Odessa, FL: Psychological Assessment Resources. Davies, S. E., Connelly, B. L., Ones, D. S., & Birkland, A. S. (2015). The general factor of personality: The Big One, a self-evaluative trait, or a methodological gnat that won t go away? Personality and Individual Differences, 81, DeYoung, C. G. (2015). Cybernetic big five theory. Journal of Research in Personality, 56, DeYoung, C. G., Quilty, L. C., & Peterson, J. B. (2007). Between facets and domains: 10 Aspects of the Big Five. Journal of Personality and Social Psychology, 93(5), Gnambs, T. (2015). Facets of measurement error for scores of the Big Five: Three reliability generalizations. Personality and Individual Differences, 84, Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lowerlevel facets of several five-factor models. In I. Mervielde, I. J. Deary, F. De Fruyt, & F. Ostendorf (Eds.), Personality psychology in Europe (Vol. 7, pp. 7 28). Tilburg, The Netherlands: Tilburg University Press. Retrieved from Johnson, J. A. (2014). Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory: Development of the IPIP-NEO-120. Journal of Research in Personality, 51, Judge, T. A., Rodell, J. B., Klinger, R. L., Simon, L. S., & Crawford, E. R. (2013). Hierarchical representations of the five-factor model of personality in predicting job performance: Integrating three organizing frameworks with two theoretical perspectives. Journal of Applied Psychology, 98(6), Le, H., Schmidt, F. L., & Putka, D. J. (2009). The multifaceted nature of measurement artifacts and its implications for estimating construct-level relationships. Organizational Research Methods, 12(1), Putka, D. J., & Hoffman, B. J. (2015). The reliability of job performance ratings equals In C. E. Lance & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp ). New York: Routledge. Putka, D. J., Le, H., McCloy, R. A., & Diaz, T. (2008). Ill-structured measurement designs in organizational research: Implications for estimating interrater reliability. Journal of Applied Psychology, 93(5), R Core Team. (2015). R: A language and environment for statistical computing (Version 3.2.1). Vienna, Austria: R Foundation for Statistical Computing. Retrieved from Schmidt, F. L., Le, H., & Ilies, R. (2003). Beyond alpha: An empirical examination of the effects of different sources of measurement error on reliability estimates for measures of individualdifferences constructs. Psychological Methods, 8(2),