Psychological Assessment

Similar documents
Reliability and interpretation of total scores from multidimensional cognitive measures evaluating the GIK 4-6 using bifactor analysis

Investigating the Theoretical Structure of the Differential Ability Scales Second Edition Through Hierarchical Exploratory Factor Analysis

Factor Structure of the Differential Ability Scales-Second Edition: Exploratory and. Hierarchical Factor Analyses with the Core Subtests

On the Problem of Spurious Non-Linear Effects in Aggregated Scores: Investigating Differentiation of Cognitive Abilities using Item Level Data

Theory and Characteristics

Understanding the Dimensionality and Reliability of the Cognitive Scales of the UK Clinical Aptitude test (UKCAT): Summary Version of the Report

Verbal Comprehension. Perceptual Reasoning. Working Memory

Conference Presentation

Running head: KABC-II CHC EFA 1

SUBTESTS, FACTORS, AND CONSTRUCTS: WHAT IS BEING MEASURED BY TESTS

Investigation of the Factor Structure of the Wechsler Adult Intelligence Scale Fourth Edition (WAIS IV): Exploratory and Higher Order Factor Analyses

WISC V Construct Validity: Hierarchical EFA with a Large Clinical Sample

Sex differences in latent general and broad cognitive abilities for children and youth: Evidence from higher-order MG-MACS and MIMIC models

1. BE A SQUEAKY WHEEL.

Bifactor Modeling and the Estimation of Model-Based Reliability in the WAIS-IV

Hierarchical Factor Structure of the Cognitive Assessment System: Variance Partitions From the Schmid Leiman (1957) Procedure

Overview of WASI-II (published 2011) Gloria Maccow, Ph.D. Assessment Training Consultant

RATIONALE for IQ TEST

Millions of students were administered the Wechsler Intelligence Scale for

Running head: LURIA MODEL CFA 1

Introducing WISC-V Spanish Anise Flowers, Ph.D.

RIST-2 Score Report. by Cecil R. Reynolds, PhD, and Randy W. Kamphaus, PhD

t) I WILEY Gary L. Canivez 1 Stefan C. Dombrowski 2 Marley W. Watkins 3 Abstract RESEARCH ARTICLE

Bifador Modeling in Construct Validation of Multifactored Tests: Implications for Understanding Multidimensional Constructs and Test Interpretation

Test and Measurement Chapter 10: The Wechsler Intelligence Scales: WAIS-IV, WISC-IV and WPPSI-III

Exploratory and Higher-Order Factor Analyses of the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV) Adolescent Subsample

A COMPARISON OF CONFIRMATORY FACTOR ANALYSIS AND TASK ANALYSIS OF FLUID INTELLIGENCE COGNITIVE SUBTESTS

Factor Analysis of the Korean Adaptation of the Kaufman Assessment Battery for Children (K-ABC-K) for Ages 2 ½ through 12 ½ Years

Exploratory and hierarchical factor analysis of the WJ-IV Cognitive at school age. Stefan C. Dombrowski. Rider University. Ryan J.

Chapter 17. Review of the WISC-V. Daniel C. Miller and Ryan J. McGill. Texas Woman s University. Denton, Texas

Acknowledgments Series Preface About the Companion Website Resources on the Companion Website

Equivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests

Administration duration for the Wechsler Adult Intelligence Scale-III and Wechsler Memory Scale-III

Coming December 2013

Using the WASI II with the WAIS IV: Substituting WASI II Subtest Scores When Deriving WAIS IV Composite Scores

Description of Cognitive Subtests

Psychological Science

The uses of the WISC-III and the WAIS-III with people with a learning disability: Three concerns

Construct Validity of the WISC-V in Clinical Cases: Exploratory and Confirmatory Factor Analyses of the 10 Primary Subtests

Frequently Asked Questions (FAQs)

Common Space Analysis of Several Versions of The Wechsler Intelligence Scale For Children

Psychological Science

Short-forms of the UK WAIS-R: Regression equations and their predictive validity in a general population sample

Canterbury Christ Church University s repository of research outputs.

Running head: LURIA MODEL VALIDITY 1

The Joint WAIS III and WMS III Factor Structure: Development and Cross-Validation of a Six-Factor Model of Cognitive Functioning

Research Note. Community/Agency Trust: A Measurement Instrument

Profile Analysis on the WISC-IV and WAIS-III in the low intellectual range: Is it valid and reliable? Simon Whitaker. And.

JOINT ANALYSIS OF TWO ABILITY TESTS: TWO THEORIES, ONE OUTCOME

Copyright 2014 Pearson Education and its Affiliates. All rights reserved. Page 1

Statistics & Analysis. Confirmatory Factor Analysis and Structural Equation Modeling of Noncognitive Assessments using PROC CALIS

FOR TRAINING ONLY! WISC -V Wechsler Intelligence Scale for Children -Fifth Edition Score Report

FOUR LAYERS APPROACH FOR DEVELOPING A TOOL FOR ASSESSING SYSTEMS THINKING

Item response theory analysis of the cognitive ability test in TwinLife

A MULTIVARIATE ANALYSIS TECHNIQUE: STRUCTURAL EQUATION MODELING

Frequently Asked Questions

Mastering Modern Psychological Testing Theory & Methods Cecil R. Reynolds Ronald B. Livingston First Edition

WPPSI -IV Wechsler Preschool and Primary Scale of Intelligence-Fourth Edition Score Report

Putting Spearman s Hypothesis to Work: Job IQ as a Predictor of Employee Racial Composition

Putting Spearman s Hypothesis to Work: Job IQ as a Predictor of Employee Racial Composition

FREQUENTLY ASKED QUESTIONS

Ante s parents have requested a cognitive and emotional assessment so that Ante can work towards fulfilling his true potential.

Introducing the WISC-V Integrated Gloria Maccow, Ph.D., Assessment Training Consultant

Design of Intelligence Test Short Forms

Multidimensional Aptitude Battery-II (MAB-II) Extended Report

WPPSI -IV A&NZ Wechsler Preschool and Primary Scale of Intelligence-Fourth Edition: Australian & New Zealand Score Report

Multidimensional Aptitude Battery-II (MAB-II) Clinical Report

Predictive Validity of Non-g Residuals of Tests: More Than g

PSY-6220: COGNITIVE ASSESSMENT Syllabus, Fall 2013; September 05, 2017 Class: M 1:00-3:40, UH 1610 Lab: R 4:00-5:00, UH 1840

Running head: THE MEANING AND DOING OF MINDFULNESS

Chapter 11. Multiple-Sample SEM. Overview. Rationale of multiple-sample SEM. Multiple-sample path analysis. Multiple-sample CFA.

Chapter 3. Basic Statistical Concepts: II. Data Preparation and Screening. Overview. Data preparation. Data screening. Score reliability and validity

Despite the plethora of IQ tests available for psychologists to use

Development of the WAIS-III estimate of premorbid ability for Canadians (EPAC)

Higher-order models versus direct hierarchical models: g as superordinate or breadth factor?

Construct Validity of the Wechsler Intelligence Scale for Children Fifth UK Edition:

Author's personal copy

RIAS-2 Score Report. by Cecil R. Reynolds, PhD, and Randy W. Kamphaus, PhD

Running head: WRAML2 FACTOR STRUCTURE 1

The Mahalanobis Distance index of WAIS-R subtest scatter: Psychometric properties in a healthy UK sample

PREDICTIVE AND CONSTRUCT VALIDITY OF THE DEVELOPING COGNITIVE ABILITIES TEST: RELATIONS WITH THE IOWA TESTS OF BASIC SKILLS gary l.

. ~ ~ z H. 0.. l:tj < ~ 3:: 1-' I» I» to ::::! ~ 0 (') (') ~ CD :;:,- to ~ ~ C-j~t:;::j "'C. CJ) I:J:j. C/)1-'CJ) I»C.O:;,- 0" c.

WPPSI -IV A&NZ Wechsler Preschool and Primary Scale of Intelligence-Fourth Edition: Australian & New Zealand Score Report

Correlations Between the Wide Range Intelligence Test (WRIT) and the Wechsler Abbreviated Scale of Intelligence (WASI): Global and Subtest Comparisons

WISC-IV Matrix Reasoning and Picture Concepts Subtests: Does the Use of Verbal Mediation Confound Measurement of Fluid Reasoning?

Chapter 7. Measurement Models and Confirmatory Factor Analysis. Overview

Comparability of GATE Scores for Immigrants and Majority Group Members: Some Dutch Findings

FOR TRAINING ONLY! Score Report. DAS II: School Age Battery CORE BATTERY. Core Cluster and Composite Scores and Indexes

Core Abilities Assessment

WAIS R subtest scatter: Base-rate data from a healthy UK sample

GENERAL MENTAL ABILITY, NARROWER COGNITIVE ABILITIES, AND JOB PERFORMANCE: THE PERSPECTIVE OF THE NESTED-FACTORS MODEL OF COGNITIVE ABILITIES

Supplementary material

WAIS IV on Q-interactive

Which is the best way to measure job performance: Self-perceptions or official supervisor evaluations?

Measurement Model of Evaluation Utilization: External Evaluation

Psychological Testing History, Principles, and Applications Robert J. Gregory Sixth Edition

Statistics: General principles

WPPSI-IV IV Informational Series

Methodology. Inclusive Leadership: The View From Six Countries

Transcription:

Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew R. Reynolds, Randy G. Floyd, and Christopher R. Niileksela Online First Publication, August 12, 2013. doi: 10.1037/a0034102 CITATION Reynolds, M. R., Floyd, R. G., & Niileksela, C. R. (2013, August 12). How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests. Psychological Assessment. Advance online publication. doi: 10.1037/a0034102

Psychological Assessment 2013 American Psychological Association 2013, Vol. 25, No. 4, 000 1040-3590/13/$12.00 DOI: 10.1037/a0034102 How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew R. Reynolds University of Kansas Randy G. Floyd University of Memphis Christopher R. Niileksela University of Kansas Global composites (e.g., IQs) calculated in intelligence tests are interpreted as indexes of the general factor of intelligence, or psychometric g. It is therefore important to understand the proportion of variance in those global composites that is explained by g. In this study, we calculated this value, referred to as hierarchical omega, using large-scale, nationally representative norming sample data from 3 popular individually administered tests of intelligence for children and adolescents. We also calculated the proportion of variance explained in the global composites by g and the group factors, referred to as omega total, or composite reliability, for comparison purposes. Within each battery, g was measured equally well. Using total sample data, we found that 82% 83% of the total test score variance was explained by g. The group factors were also measured in the global composites, with both g and group factors explaining 89% 91% of the total test score variance for the total samples. Global composites are primarily indexes of g, but the group factors, as a whole, also explain a meaningful amount of variance. Keywords: intelligence, composite reliability, omega, psychometric g, IQ Intelligence tests are among the most popular and useful of all measurement devices in psychology (Flanagan & Harrison, 2012; Kaufman, 2009). The factor structure of the subtest scores of the intelligence test relative to the measurement structure of the test is the focus of much factor-analytic research. Multiple latent sources of variance in those subtest scores are described in that research via factor loadings (e.g., Canivez & Watkins, 2010; Reynolds, Keith, Fine, Fisher, & Low, 2007). The general factor (psychometric g or simply g) is often interpreted as one latent source of variance in all subtest scores, with all subtests having positive nonzero loadings on g (Jensen, 1998). A global composite (e.g., an IQ) calculated as the sum of subtest scores, and not a score from an individual subtest, is intended to index g. Therefore, although subtest loadings on the g factor (i.e., g loadings) provide estimates of how strongly g is measured in the individual subtests, and consequently, these g loadings provide hints about g measurement in a global composite when they are averaged across subtests, it is not clear how well global composites measure g. Such estimates are important if these composites are interpreted as indexes of g. In this study, the proportion of variance Matthew R. Reynolds, Department of Psychology and Research in Education, University of Kansas; Randy G. Floyd, Department of Psychology, University of Memphis; Christopher R. Niileksela, Department of Psychology and Research in Education, University of Kansas. Correspondence concerning this article should be addressed to Matthew R. Reynolds, Psychology and Research in Education, Joseph R. Pearson Hall, 1122 West Campus Rd., University of Kansas, Lawrence, KS 66045. E-mail: mreynolds@ku.edu in global composite scores that is accounted for by the g factor was calculated for three popular individually administered intelligence tests designed for children and adolescents. Psychometric intelligence is described in multifactored hierarchical models (Gustafsson, 1984). One such model, based on Cattell Horn Carroll theory (CHC), is a general framework that currently includes g at the apex, about eight to 10 broad cognitive ability factors (e.g., Verbal Comprehension Knowledge [Gc], Novel Reasoning [Gf], Visual Spatial Ability [Gv], Short-Term Memory [Gsm], and Processing Speed [Gs]), and numerous narrow abilities (Schneider & McGrew, 2012). CHC theory is a work in progress, and in fact Carroll (1993) and Horn (Horn & Blankson, 2005) disagreed on evidence of a theory of g, but CHC theory is a useful framework for research and applied practice because it describes intelligence at three levels of generality, with scores from intelligence batteries often reflecting each of these levels (Keith & Reynolds, 2010). The global composite (IQ) is considered an indicator of g, broad composite scores as indicators of CHC broad abilities (otherwise known as group factors), and individual subtest scores as indicators of narrow abilities. In this study, the focus was on the indicator of g within a test battery, the global composite. More than 85 years ago, Spearman (1927, Appendix A) proposed an estimate that was supposed to represent the g loading of the global composite (also see Jensen, 1998, p. 104). More recently, model-based estimates calculated from structural equation models have been proposed (e.g., McDonald, 1999). McDonald has referred to these estimates as coefficient omega. In terms of classical test theory, omega is both a reliability and validity index because it is the true score variance in the composite that is attributed to a common factor (Bollen, 1989; Gustafsson & Åberg- 1

2 REYNOLDS, FLOYD, AND NIILEKSELA Bengtsson, 2010). Omega is a general estimate and may be used to determine the saturation of one factor (e.g., g) in a composite even when there are multiple factors (e.g., g and group factors) contributing to the composite, and it may also be used to calculate the saturation of all of the factors contributing to the composite. The former estimate has been referred to as hierarchical omega ( h, Zinbarg, Yovel, Revelle, & McDonald, 2006), whereas the latter has been subsequently referred to as omega total ( T ; Revelle & Zinbarg, 2009). That is, h may be used to quantify the saturation of g in a global composite, whereas T may be used to quantify the saturation of all of the common factors in a global composite (e.g., g and group factors) and is also therefore described as composite reliability. The distinction is important because despite the multifactorial nature of intelligence, in practice, global composites are mostly interpreted as indexes of g, not of intelligence in general (g group factors). In this study, we compared these two estimates because although global composites are intended to index g, it is also important to understand how well they index all of the common factors. Purpose Because global composite scores are interpreted as indexing g, a latent construct that has been subjected to years of study and has a wide range of predictive validity (Jensen, 1998), it is especially important to estimate how well global composite scores measure this construct. The primary purpose of this study was to estimate the g saturation in global composites from three popular individually administered intelligence tests designed for children and adolescents. This g saturation was indexed by h. The g saturation ( h ) was also compared with the saturation of all of the factors by calculating T., or composite reliability. Method Participants The participants were children and adolescents included in the norming samples for three intelligence tests. These norming samples were representative of the United States (Elliott, 2007b; Kaufman & Kaufman, 2004; Wechsler, 2003). The total samples for each test ranged from 2175 to 2200. There were 200 participants per 1-year age group for the second edition of the Differential Abilities Scales (DAS II; Elliott, 2007a), 125 to 200 participants per 1-year age group for the second edition of the Kaufman Assessment Battery for Children (KABC II; Kaufman & Kaufman, 2004), and 200 participants per 1-year age group for the fourth edition of the Wechsler Intelligence Scale for Children (WISC IV; Wechsler, 2003). Measurement Instruments Differential Abilities Scales Second Edition (DAS II). The DAS II (Elliott, 2007a) is an individually administered intelligence test developed for children and adolescents. Fifteen of the school-age battery subtests for children ages 7 17 were used in this study. The DAS II provides composites representing six group factors: Nonverbal Reasoning, Verbal, Spatial, Verbal Short-Term Memory, Visual Verbal Memory, and Cognitive Speed. The g factor is operationalized in the test as General Conceptual Ability (GCA), comprising six subtests. Composite score reliability estimates for the GCAs ranged from.96 to.97 in the ages used in this study (Elliott, 2007b, pp. 127 128). Kaufman Assessment Battery for Children Second Edition (KABC II). The KABC II is an individually administered measure of cognitive abilities developed for children and adolescents. Data from 16 subtests used with children ages 7 18 were included in this study. The KABC II provides composites representing five group factors: Crystallized Knowledge, Fluid Reasoning, Visual Spatial Ability, Long-Term Retrieval, and Short-Term Memory. The g factor is operationalized in the test as the Fluid Crystallized Index (FCI), comprising 10 subtests. Composite score reliability estimates for the FCIs ranged from.96 to.97 in the ages used in this study (Kaufman & Kaufman, 2004, p. 88). Wechsler Intelligence Scale for Children Fourth Edition (WISC IV). The WISC IV is an individually administered measure of cognitive abilities developed for children and adolescents ages 6 16. Fifteen subtests were used in this study. The WISC IV provides composites representing four group factors: Perceptual Reasoning, Verbal Comprehension, Processing Speed, and Working Memory. The g factor is operationalized in the test as the Full Scale IQ (FSIQ), comprising 10 subtests. Composite score reliability estimates for the FSIQs ranged from.96 to.97 in the ages used in this study (Wechsler, 2003, p. 35). Analytic Approach Nested factor models. Covariance matrices for total samples and for each 1-year age group for each test (DAS II, KABC II, WISC IV) were created based on the subtest correlations and standard deviations for the norm samples provided in the test manuals (standard deviations for the KABC II had to be obtained from raw data files that were available because they are not provided in the test manual). These matrices were used as input data. Mplus (Muthén & Muthén, 1998 2010) was used for all analyses. Nested factor models were estimated for each intelligence test with the total sample and with each1-year age group data (see Gustafsson & Balke, 1993; Reynolds & Keith, 2013). In a nested factor model, the g factor is modeled as a first-order factor, with direct effects on every subtest. More specific first-order factors (or group factors) are also modeled for a subset of interrelated subtests; thus, there are direct effects from two latent factors (i.e., g and group) on most subtests. All of these factors are independent from one another. We used nested factor models, rather than higher order models, to account for the hierarchical and multidimensional nature of intelligence. Although higher order models, in which g has direct effects on first-order factors and indirect effects on subtests through the first-order factors, are more consistent with intelligence theory (Jensen, 1998), the nested factor models are more consistent with the scoring structure of intelligence tests (Keith & Reynolds, 2012). Regardless, nested factor models and higher order models typically produce similar h values because the average g loadings from the higher order model (calculated as total g effects, normally only indirect effects through the first-order factors) are similar to the average g loadings in the nested factor model (Zinbarg et al., 2006; see Niileksela, Reynolds, & Kaufman, 2013, for an example).

GENERAL FACTOR MEASUREMENT 3 All subtests included in each test battery were included in the nested factor models so that the best estimates of g loadings were obtained, and because in many instances, it allowed for the group factors to be empirically identified without having to include equality constraints on the loadings. Group factors are often indexed by composites comprising two subtests. A factor not correlated with other factors and with only two subtest indicators is empirically underidentified. In these circumstances in which there are only two indicators, constraints must be included in the model (e.g., fix the two factor loadings to be equal) for proper model identification. However, test batteries often include additional measures of these group factors; therefore, this problem of underidentification was mostly circumvented in this study by including these additional subtests in the nested factor models. The estimates associated with these additional subtests, however, were not included the calculations of the omegas; only those associated with the global composite were included in those calculations. 1 The DAS II nested factor model included 13 subtests that loaded directly on the first-order g factor. In children ages 12 and younger, g was also indicated by Phonological Processing, although this subtest was not associated with an additional firstorder factor. In addition to g, six first-order group factors associated with subsets of those subtests (i.e., two to three subtests) were included. These group factors were independent of each other and of g. Only one group factor was indicated by more than two subtests, so equality constraints were applied to the loadings of group factors with two indicators to properly identify the model. For purposes of calculating the omegas, however, only factor loadings and residual variances from the six subtests contributing to the GCA were used: Pattern Construction, Recall of Designs, Matrices, Sequential and Quantitative Reasoning, Word Definitions, and Verbal Similarities. The KABC II model included 16 subtests with direct loadings on g. The model included four first-order group factors because there was not reliable Fluid Reasoning variance remaining once g was accounted for (Gustafsson, 1984; Reynolds & Keith, 2007). Factor loadings and residual variances from 10 subtests contributing to the FCI were used in the omega calculations; the Gestalt Closure, Expressive Vocabulary, Hand Movements, Atlantis Delayed, and Rebus Delayed subtests were omitted. The WISC IV model included 15 subtests loading directly on g. There were four first-order group factors. Factor loadings and residual variances from 10 subtests contributing to the FSIQ were used in the omega calculations; the Picture Completion, Arithmetic, Information, Word Reasoning, and Cancellation subtests were omitted. Model evaluation. We evaluated the models globally by examining the fit indexes. Global indexes of stand-alone fit included chi-square test of fit statistic, root-mean-square error of approximation (RMSEA), comparative fit index (CFI), and standardized root-mean-square residual (SRMR). Models with good global fit typically have RMSEA values close to or less than.05, CFI values close to or greater than.95, and SRMR values close to or less than.05 (Schermelleh-Engel, Moosbrugger, & Müller, 2003). In addition, the quality of the parameter estimates, as indicated by implausible values, was evaluated. Omega calculations were based only on models that had acceptable global fit and parameter estimates. Omega calculations. Both h and T values were calculated from the nested factor model-based estimates. The estimates were calculated in the total sample and in each 1-year age group for each test. First, h was calculated as the square of the sum of subtest g loadings divided by the total variance in the subtest scores included in the composite (Gustafsson, 2002). This estimate represented the proportion of variance in the global composite that was accounted for by g. The square root of h represented the correlation between the composite score and latent g, which is the g loading for the composite (McDonald, 1999). Second, T estimates were calculated; these estimates were based on all of the common factors in the nested factor model, including g and the other first-order group factors. The sum of subtest g loadings squared, and the sum of each group s factor subtest loadings squared, were summed and divided by the total variance in the subtest scores. Results and Discussion Models for each test using the total samples and their respective fit indexes are shown in Figures 1, 2, and 3. Only the fit indexes for the total samples are reported for each test because the fit statistics for the age-specific models were similar (except that the model chi-square value was larger in the total samples because of their large sample sizes). All of the models fit very well. Some slight adjustments, noted in Table 1, were made to models for convergence purposes in some age groups. The bolded g loadings in Figures 1 3 were the estimates included in h calculations; the bolded g and group factor loadings were included in the T calculations. As shown in Table 1, the global composites from each test the GCA (Elliott, 2007a), the FCI (Kaufman & Kaufman, 2004), and the FSIQ (Wechsler, 2003) measure their respective g factors similarly. In the total sample for the DAS II, the g saturation ( h ) was.83, and across age groups, values ranged from.79 for 17- year-olds to.87 for 8- and 9-year-olds (SD.03). In the total sample for the KABC II, the g saturation was.82, and values ranged from.78 for 7- and 16-year-olds to.87 for 15-year-olds (SD.03). Last, in the total sample for the WISC IV, the g saturation was.83, and values ranged from.79 for 6-year-olds to.85 for 15-year-olds (SD.02). Therefore, using total sample data within each test, 82% 83% of the variance in the global composite score was explained by g. About 82% 84% of the variance in the global composite scores within each test was explained by g when the estimates were averaged across the age groups (see last row in Table 1). As shown in the third column for each test in Table 1, although g was predominantly measured by the global composites, the group factors were also measured. In the total sample for the DAS II, the combined g and group factor saturation ( T ) was.89, and values ranged from.87 for 17-year-olds to.93 for 8- and 9-year-olds (SD.02) across age groups. In the total sample for 1 Models were estimated using only the subtests included in the global composites (i.e., other subtests were not included), and the findings were similar. Moreover, models were estimated that calculated omega values using all available subtests and creating a composite from those subtests. As per the principle of aggregation, the omega values were larger when all subtests were used to create a composite score. Interested readers should contact the first author to obtain those results.

4 REYNOLDS, FLOYD, AND NIILEKSELA Figure 1. Differential Abilities Scales Second Edition nested-factor model using the total sample data. Bolded g and bolded g and group factor estimates were used in the calculations of hierarchical omega and omega total, respectively. Residuals variances have been removed from the figure. CFI comparative fit index; RMSEA root-mean-square error of approximation; SRMR standardized root-mean-square residual. the KABC II, the combined g and group factor saturation was.89, and values ranged from.87 for 8-year-olds to.92 for 15-year-olds (SD.02). Last, in the total sample for the WISC IV, the combined g and group factor saturation was.91, and values ranged from.88 for 6- and 7-year-olds to.93 for 15-year-olds (SD.02). Therefore, using total sample data within each test, 89% 91% of the variance explained in the global composite was explained by g and the group factors. About 90% 91% of the global composite variance was explained by both g and the group factors when the estimates were averaged across the age ranges (see last row in Table 1). The variance explained by the group-factors was much less than g, but the amount was not trivial, with an additional 5% 12% of the variance explained by these group factors. Global composites are primarily indexes of g, but they are also indexes of intelligence in general. One practical implication of these findings is related to the calculation of confidence intervals around global composites. A simple true-score confidence interval for a global composite score, using the reliability estimate based on internal consistency, will be narrower than the construct-, or validity-, based confidence interval for a global composite as an estimate of g. If clinicians interpret global composites as estimates of g, then it seems to make more sense to use a validity-based confidence interval. For example, confidence intervals are often calculated from Cronbach s alpha, which for global composites from the tests used in this study ranged from.96.97. Using these estimates, along with the typical 15-point standard deviation, an average standard error of measurement could be calculated as 15 1.97, which equals 2.59. A 95% confidence interval approximately equals 5.00 (i.e., 1.96 2.59) and for a score of 100 would range from 95 to 105. Alternatively, the validity-based confidence interval for the global composite as an estimate of g is based on h (i.e.,.82.84 using the total samples, or.78.87 in various age groups). Therefore, an average standard error of measurement, for example, is equal to 15 1.83, or 6.18. A 95% confidence interval around the global composite as an estimate of g is approximately 12, and for a score of 100 would range from 88 to 112. The confidence interval

GENERAL FACTOR MEASUREMENT 5 Figure 2. Kaufman Assessment Battery for Children Second Edition nested-factor model using the total sample data. Bolded g and bolded g and group factor estimates were used in the calculations of hierarchical omega and omega total, respectively. Residuals variances have been removed from the figure. CFI comparative fit index; RMSEA root-mean-square error of approximation; SRMR standardized root-mean-square residual. based on the construct of interest, therefore, is much larger when the valid variance related to g is considered. 2 Hence, a reliability estimate based on internal consistency should not be confused with a validity estimate (see Schneider, 2013). Our findings may be generalized to another study in which similar estimates were calculated with a different method. As noted, h is the saturation of g in a global composite, or said differently, it is the g loading squared. In a study by Keith, Kranzler, and Flanagan (2001), children completed two intelligence tests, the Woodcock Johnson III Tests of Cognitive Abilities (WJ III; Woodcock, McGrew, & Mather, 2001) and Cognitive Assessment System (CAS; Naglieri & Das, 1997). Second-order g factors from those batteries were found to be perfectly correlated. Thus, the authors also correlated the g factor from the WJ III with the global composite from the CAS. The resulting g loading the correlation between g and global composite was.79, which is a g saturation of 62%, and less than all of the h values in this study. The CAS subtests are not as g saturated as subtests in the tests we included in this study (Canivez, 2011), which may account for this difference. Nevertheless, although h may seem abstract, it is conceptualized as the square of the g loading of a global composite. One area for future research is related to g measurement as a function of g. Recent research, including research with intelligence tests used in this study, has shown that g is measured less well in intelligence tests as g increases (Detterman & Daniel, 1989; Reynolds, 2013; Reynolds, Keith, & Beretvas, 2010; Tucker-Drob, 2010). For example, in one study using estimates from a one-factor g model of DAS II group factor-based composites, omegas were shown to decrease substantially across the IQ range (Reynolds, 2013). In that study, the omega values, calculated from models 2 We thank a reviewer for suggesting that we report this information.

6 REYNOLDS, FLOYD, AND NIILEKSELA with DAS II group factor based composites, at IQs of 100 were similar if not identical to h values calculated from the models with the DAS II subtests in this study. Thus, the findings may generalize; however, future research is needed for a definitive answer. If they do generalize, the implications are that the confidence intervals about global composites, used as indicators of g, become wider as g increases (and confidence in g measurement decreases). Limitations Figure 3. Wechsler Intelligence Scale for Children Fourth Edition nested-factor model using the total sample data. Bolded g and bolded g and group factor estimates were used in the calculations of hierarchical omega and omega total, respectively. Residuals variances have been removed from the figure. CFI comparative fit index; RMSEA root-mean-square error of approximation; SRMR standardized root-mean-square residual. We did not investigate whether g exists or whether g is a meaningful psychological construct. Rather, we investigated how much of g, if such a construct exists, is measured in global composites of three popular intelligence tests. In general, this approach regards g as a reflective latent variable and not as a formative composite or an aggregate of multiple abilities. It is important to recognize, however, that the assumption of whether g is a reflective construct is debatable (van der Maas et al., 2006). All models require assumptions, and some researchers may not recognize the existence of g as a valid assumption for a model. Thus, it should be noted that omega values may also be calculated for each of the group factor indexes, without the assumption of g or with g partialled. Such decisions should be based on the theoretical model of the researcher. Estimating all of these values was beyond the purpose of this study. Similarly, it should be noted that the intelligence tests used in this study were either fully or to some extent based on theory, and they most notably have been studied with CHC theory. We modeled the data, however, so that they were more closely aligned with the intended factor structure of the tests. We could have used CHC theory-based predictions to improve model fit, and in some cases, those were used when the models did not converge. Nevertheless, the focus of this study was not on the group factors but the g factor. Future studies may investigate how our omission of theory influenced the results (including the use of a more theory-based g factor model, the higher order factor model, rather than the nested factor model used in this study).

GENERAL FACTOR MEASUREMENT 7 Table 1 Hierarchical Omega and Omega Total Estimates for the Differential Abilities Scales II, Kaufman Assessment Battery for Children II, and Wechsler Intelligence Scale for Children IV Global Composites for the Total Samples and by Age Differential Abilities Scales II GCA Kaufman Assessment Battery for Children II FCI a Wechsler Intelligence Scale for Children IV FSIQ Sample N or n Hierarchical omega Omega total N or n Hierarchical omega Omega total N or n Hierarchical omega Omega total Total 2,200.83.89 2,175.82.89 2,200.83.91 Age 6 200.79.88 Age 7 200.85.92 200.78.88 200 d.82.88 Age 8 200.87.93 200.81.87 200 e.82.90 Age 9 200.87.93 200.81.88 200.84.91 Age 10 200 b.86.91 200.79.88 200.83.92 Age 11 200.84.91 200.83.89 200.83.91 Age 12 200.84.90 200.82.91 200 de.83.92 Age 13 200.81.88 200.82.91 200.82.91 Age 14 200.84.90 200.83.90 200 ef.80.91 Age 15 200.83.90 150.87.92 200 g.85.93 Age 16 200.86.91 150.78.88 200.81.92 Age 17 200.79.87 150 c.85.91 Age 18 125.83.91 M h.84.91.82.90.82.91 Note. GCA General Conceptual Ability; FCI Fluid Crystallized Index; FSIQ Full Scale IQ. a Rebus and Atlantis were constrained equal for all ages. b Digits Forward and Digits Backward were constrained to be equal. c No Gestalt Closure loading on broad factors. d Matrix Reasoning and Picture Concepts were constrained to be equal. e Letter Number Sequencing and Digit Span were constrained to be equal. f Block Design, Matrix Reasoning, and Picture Concepts were constrained to be equal. g Perceptual Reasoning factor not included. h Fisher s z formula used to calculate mean. Conclusion The g factor saturation was estimated in three popular individually administered intelligence tests using large-scale, nationally representative norming sample data. The g factor was measured equally well in the global composites within each test, but a combination of group factors were also measured in those scores. Previous research has demonstrated that for the most part, the same g factors are measured across intelligence batteries (e.g., Floyd, Reynolds, Farmer, & Kranzler, in press; Johnson, Bouchard, Krueger, McGue, & Gottesman, 2004; Keith et al., 2001). 3 Given those findings, it would appear that an insight made over 85 years ago by Spearman (1927, pp. 197 198) is supported. That is, different intelligence test composites show high correlations with each other and with g itself, and thus measure g similarly. Spearman called it the theorem of the indifference of the indicator. In this study, we have shown how well g is measured by those indicators. 3 These findings are restricted to intelligence measures. Although g factors from achievement and intelligence measures are highly correlated, they are less than perfectly correlated (Kaufman, Reynolds, Liu, Kaufman, & McGrew, 2012). References Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley. Canivez, G. L. (2011). Hierarchical structure of the Cognitive Assessment System: Variance partitions from the Schmid Leiman (1957) procedure. School Psychology Quarterly, 26, 305 317. doi:10.1037/a0025973 Canivez, G. L., & Watkins, M. W. (2010). Exploratory and higher order factor analyses of the Wechsler Adult Intelligence Scale Fourth Edition (WAIS IV) adolescent subsample. School Psychology Quarterly, 25, 223 235. doi:10.1037/a0022046 Carroll, J. B. (1993). Human cognitive abilities. Cambridge, England: Cambridge University Press. Detterman, D. K., & Daniel, M. H. (1989). Correlations of mental tests with each other and with cognitive variables are highest for low-iq groups. Intelligence, 13, 349 359. doi:10.1016/s0160-2896(89)80007-8 Elliott, C. D. (2007a). Differential Ability Scales Second Edition. San Antonio, TX: Pearson. Elliott, C. D. (2007b). Differential Ability Scales Second Edition: Introductory and technical handbook. San Antonio, TX: Pearson. Flanagan, D. P., & Harrison, P. H. (Eds.). (2012). Contemporary intellectual assessment, third edition: Theories, tests, and issues. New York, NY: Guilford Press. Floyd, R. G., Reynolds, M. R., Farmer, R. L., & Kranzler, J. H. (in press).are the general factors from different child and adolescent intelligence tests the same? Results from a five-sample, six-test analysis. School Psychology Review. Gustafsson, J.-E. (1984). A unifying model for the structure of intellectual abilities. Intelligence, 8, 179 203. doi:10.1016/0160-2896(84)90008-4 Gustafsson, J.-E. (2002). Measurement from a hierarchical point of view. In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.), The role of constructs in psychological and educational measurement (pp. 73 95). Mahwah, NJ: Erlbaum. Gustafsson, J.-E., & Åberg-Bengtsson, L. (2010). Unidimensionality and interpretability of psychological instruments. In S. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 97 121). Washington, DC: American Psychological Association. doi:10.1037/12074-005 Gustafsson, J.-E., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28, 407 434. doi:10.1207/s15327906mbr2804_2 Horn, J. L., & Blankson, A. (2005). Foundations for better understanding

8 REYNOLDS, FLOYD, AND NIILEKSELA of cognitive abilities. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment, third edition: Theories, tests, and issues (pp. 73 98). New York, NY: Guilford Press. Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger. Johnson, W., Bouchard, T. J., Krueger, R. F., McGue, M., & Gottesman, I. I. (2004). Just one g: Consistent results from three test batteries. Intelligence, 32, 95 107. doi:10.1016/s0160-2896(03)00062-x Kaufman, A. S. (2009). IQ testing 101. New York, NY: Springer. Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery for Children Second Edition. Circle Pines, MN: American Guidance Service. Kaufman, S. B., Reynolds, M. R., Liu, X., Kaufman, A. S., & McGrew, K. S. (2012). Are cognitive g and academic g one and the same g? An exploration on the Woodcock Johnson and Kaufman tests. Intelligence, 40, 123 138. doi:10.1016/j.intell.2012.01.009 Keith, T. Z., Kranzler, J. H., & Flanagan, D. P. (2001). What does the Cognitive Assessment System (CAS) measure? Joint confirmatory factor analysis of the CAS and the Woodcock Johnson Tests of Cognitive Ability (3rd edition). School Psychology Review, 30, 89 119. Keith, T. Z., & Reynolds, M. R. (2010). CHC theory and cognitive abilities: What we ve learned from 20 years of research. Psychology in the Schools, 47, 635 650. Keith, T. Z., & Reynolds, M. R. (2012). Using confirmatory factor analysis to aid in understanding the constructs measured by intelligence tests. In D. F. Flanagan & P. H. Harrison (Eds.), Contemporary intellectual assessment, third edition: Theories, tests, and issues (pp. 758 799). New York, NY: Guilford Press. McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum. Muthén, L. K., & Muthén, B. O. (1998 2010). Mplus user s guide (6th ed.). Los Angeles, CA: Author. Naglieri, J. A., & Das, J. P. (1997). Das Naglieri Cognitive Assessment System. Itasca, IL: Riverside. Niileksela, C. R., Reynolds, M. R., & Kaufman, A. S. (2013). An alternative Cattell Horn Carroll (CHC) factor structure of the WAIS IV: Age invariance of an alternative model for ages 70 90. Psychological Assessment, 25, 391 4045. doi:10.1037/a0031175 Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega and the glb: Comments on Sijtsma. Psychometrika, 74, 145 154. doi: 10.1007/s11336-008-9102-z Reynolds, M. R. (2013). Interpreting intelligence test composite scores in light of Spearman s law of diminishing returns. School Psychology Quarterly, 28, 63 76. doi:10.1037/spq0000013 Reynolds, M. R., & Keith, T. Z. (2007). A test of Spearman s law of diminishing returns in hierarchical models of intelligence. Intelligence, 35, 267 281. doi:10.1016/j.intell.2006.08.002 Reynolds, M. R., & Keith, T. Z. (2013). Measurement and statistical issues in child assessment research. In D. H. Soklofske, C. R. Reynolds, & V. Schwean (Eds.), The Oxford handbook of child and adolescent assessment (pp. 48 83). New York, NY: Oxford University Press. doi: 10.1093/oxfordhb/9780199796304.013.0003 Reynolds, M. R., Keith, T. Z., & Beretvas, S. N. (2010). Use of factor mixture modeling to capture Spearman s law of diminishing returns. Intelligence, 38, 231 241. doi:10.1016/j.intell.2010.01.002 Reynolds, M. R., Keith, T. Z., Fine, J. G., Fisher, M. E., & Low, J. (2007). Confirmatory factor structure of the Kaufman Assessment Battery for Children Second Edition: Consistency with Cattell Horn Carroll theory. School Psychology Quarterly, 22, 511 539. doi:10.1037/1045-3830.22.4.511 Schermelleh-Engel, K., Moosbrugger, H., & Müller, H. (2003). Evaluating the fit of structural equation models: Tests of significance and descriptive goodness-of-fit measures. Methods of Psychological Research Online, 8, 23 74. Schneider, W. J. (2013). What if we took our models seriously? Estimating latent scores in individuals. Journal of Psychoeducational Assessment, 31, 186 201. doi:10.1177/0734282913478046 Schneider, W. J., & McGrew, K. S. (2012). The Cattell Horn Carroll model of intelligence. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment, third edition: Theories, tests, and issues (pp. 91 144.). New York, NY: Guilford Press. Spearman, C. (1927). The abilities of man: Their nature and measurement. New York, NY: Macmillan. Tucker-Drob, E. M. (2009). Differentiation of cognitive abilities across the life span. Developmental Psychology, 45, 1097 1118. doi:10.1037/ a0015864 van der Maas, H. L. J., Dolan, C. V., Grasman, R. P. P. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. J. (2006). A dynamical model of general intelligence: The positive manifold of intelligence by mutualism. Psychological Review, 113, 842 861. Wechsler, D. (2003). The Wechsler Intelligence Scale for Children Fourth Edition. San Antonio, TX: The Psychological Corporation. Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock Johnson III Tests of Cognitive Abilities. Itasca, IL: Riverside. Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Estimating generalizability to a latent variable common to all of a scale s indicators: A comparison of estimators for h. Applied Psychological Measurement, 30, 121 144. doi:10.1177/0146621605278814 Received April 8, 2013 Revision received July 8, 2013 Accepted July 9, 2013