A MATTER OF CONTEXT: A META-ANALYTIC INVESTIGATION OF THE RELATIVE VALIDITY OF CONTEXTUALIZED AND NONCONTEXTUALIZED PERSONALITY MEASURES

Size: px
Start display at page:

Download "A MATTER OF CONTEXT: A META-ANALYTIC INVESTIGATION OF THE RELATIVE VALIDITY OF CONTEXTUALIZED AND NONCONTEXTUALIZED PERSONALITY MEASURES"

Transcription

1 PERSONNEL PSYCHOLOGY 2012, 65, A MATTER OF CONTEXT: A META-ANALYTIC INVESTIGATION OF THE RELATIVE VALIDITY OF CONTEXTUALIZED AND NONCONTEXTUALIZED PERSONALITY MEASURES JONATHAN A. SHAFFER West Texas A&M University BENNETT E. POSTLETHWAITE Pepperdine University The empirical evidence that has accumulated in support of the notion that personality is a valid predictor of employee performance is vast, yet debate on the matter continues. This study investigates frame-ofreference effects as they relate to the validity of self-report measures of personality. Specifically, we compare the validities of general, noncontextualized personality measures and work-specific, contextualized measures. The findings suggest that personality measures are a more valid predictor of performance when the scale items or instructions are framed specifically so as to reference work-specific behaviors. We found that the validities for noncontextualized measures of personality ranged from.02 to.22, with a mean validity of.11. The validities for contextualized measures ranged from.14 to.30, with a mean of.24. Additional moderator analyses were conducted in an effort to examine several alternate explanations for these validity differences. Specifically, we examined differences between the developmental purpose (general use vs. workplace use) and reliabilities of each type of personality We thank Frank Schmidt and Amy Colbert for their helpful comments on earlier versions of this paper. We also thank two anonymous reviewers for their comments. Finally, we thank all of the test publishers that provided us with specific information related to the personality scales included in our analyses. Text in bold represent corrections added on 15 May 2013 after initial online publication on 2 August Values in tables and text have been corrected in order to rectify publication bias and the validity of conscientiousness in Appendix B. In the initial publication, Appendix B included a study that was not reflected in the metaanalysis, and omitted another study that was included in the meta-analysis. In addition, the ks, Ns, and estimates that were reported in Table 3 for four of the conscientiousness analyses were incorrect, and several of the rows for agreeableness also contained errors. For the most part, the errors resulted in observed correlations that were incorrect by.01, affecting the calculations for the confidence intervals of those estimates. Nevertheless, the discussion of the authors findings for Table 3 is still valid. The authors would like to apologize for any inconvenience caused by the errors, and would like to thank Sven Kepes and Mike McDaniel for making us aware of the need for the corrections. Correspondence and requests for reprints should be addressed to Jonathan A. Shaffer, Assistant Professor of Management, Department of Management, Marketing, and General Business, West Texas A&M University, Box 60809, Canyon, TX 79016; jshaffer@wtamu.edu. C 2012 Wiley Periodicals, Inc. 445

2 446 PERSONNEL PSYCHOLOGY measure. We also compared the validities from published studies to those from unpublished studies. Results suggest that these moderators did not have an impact on the validity differences between noncontextualized and contextualized measures. We are one thing to one man and another thing to another. There are parts of the self which exist only for the self in relationship to itself. We divide ourselves up in all sorts of different selves in reference to our acquaintances. We discuss politics with one and religion with another. There are all sorts of different selves answering to all sorts of different social reactions. It is the social process itself that is responsible for the appearance of the self; it is not there as a self apart from this type of experience. A multiple personality is in a certain sense normal... George H. Mead (1934, p. 142) The relationship between self-report measures of personality and job performance has been the focus of an enormous body of research. Though early studies concluded that personality was not a meaningful predictor of job performance (Ghiselli, 1973; Guion & Gottier, 1965; Schmitt, Gooding, Noe, & Kirsch, 1984), later work concluded that personality is, in fact, a useful predictor of performance (Barrick & Mount, 1991; Salgado, 1997). The empirical evidence that has accumulated in support of the notion that personality is a valid predictor of performance seems to be robust enough that some researchers have gone so far as to recommend that researchers focus their efforts on other avenues of study (Barrick, Mount, & Judge, 2001). Despite this recommendation, debate about the validity of self-report measures of personality continues. Proponents of self-report personality measures argue that their validity is sufficient to warrant their use in most, if not all, selection contexts (Ones, Dilchert, Viswesvaran, & Judge, 2007). Still, others maintain that the validity of self-reported personality tends to be disappointingly low and not very impressive (Morgeson et al., 2007, p. 693). In light of the disagreement surrounding the use of personality measures in employee selection, further examination of the issue is needed. One issue that recently was addressed by Morgeson et al. (2007) is that current self-report measures of personality do not adequately predict job behaviors because the measures themselves may be deficient. For example, it has been shown that different personality scales do not correlate highly with each other, even though they are based on the same personality model (Hough, 1992; Hough, Eaton, Dunnette, Kamp, & McCloy, 1990). One potential solution to this issue that has been implemented in previous research is to apply statistical corrections for construct unreliability to the validity estimates of personality measures (Mount & Barrick, 1995;

3 SHAFFER AND POSTLETHWAITE 447 Salgado, 1997). However, Schmitt (2004) suggests that such statistical corrections do not address the underlying issue of low observed validities. Schmitt further notes that the observed validities reported in various meta-analyses have not changed in decades and that instead of applying numerous corrections for measurement error to the observed validities, researchers should instead endeavor to improve upon the personality measures themselves. Specifically, it has been proposed that the validity of personality for predicting workplace outcomes is lower than might be expected due, in part, to the fact that personality measures typically are not highly job relevant when designed to capture broad, global differences between individuals as opposed to more specific, work-related differences (Robie, Schmit, Ryan, & Zickar, 2000). Morgeson and colleagues (2007) also raised concerns about the job relevance of personality measures and suggest that simply contextualizing personality scales by making them refer specifically to the workplace might increase their validity. The authors go on to offer a straightforward solution to this issue simply add at work to each personality scale item. This notion is not a new one. Previous research has suggested that context-specific, or contextualized, personality measures should be stronger predictors of performance than broad, noncontextualized measures (Hunthausen, Truxillo, Bauer, & Hammer, 2003; Robie et al., 2000; Schmit, Ryan, Stierwalt, & Powell, 1995). Schmit et al. (1995) referred to this phenomenon as the frame-of-reference (FOR) effect. FOR effects occur when responses to personality scales and the subsequent validity of those scales vary based on the specific behavioral context that respondents choose as a referent when completing individual scale items. FOR effects can present problems for studies of the validity of personality because personality measures that are designed to assess broad, noncontextualized personality may not be the most effective predictors of situation-specific, contextualized behavioral outcomes (Heller, Watson, Komar, Min, & Perunovic, 2007). Thus, as it relates to the prediction of performance in an employment setting, the use of noncontextualized personality scales is not ideal because individuals may present themselves differently across situations (e.g., work, home, and school). Davison and Bing (2009) suggest that when personality is measured for the purposes of personnel selection, if the test taker is uncertain as to whether the item calls for presentation of the work self or the nonwork self, then accurate self-presentation may be hindered, and the criterion-related validity of the personality test would be reduced (p. 501). The implication here is that contextualized personality scales may show higher validities than do noncontextualized scales because contextualized scales give test takers a reference point for describing their

4 448 PERSONNEL PSYCHOLOGY work-specific behaviors, and those descriptions are stronger predictors of work-specific performance. Interest in the validity of contextualized measures of personality has been increasing (Barrick, Stewart, & Piotrowski, 2002; Berry, Page, & Sackett, 2007; Gill & Hodgkinson, 2007; Lounsbury, Gibson, & Hamrick, 2004; Loveland, Gibson, Lounsbury, & Huffstetler, 2005), but direct comparisons of the validity of contextualized and noncontextualized personality measures have been limited mainly to academic settings (Bing, Whanger, Davison, & VanHook, 2004; Lievens, De Corte, & Schollaert, 2008; Reddock, Biderman, & Nguyen, 2011; Robie, Born, & Schmit, 2001; Schmit et al., 1995). The extent to which contextualizing personality measures increases their validity for predicting job performance has not been explored fully. This study fills this void in the literature by meta-analytically comparing the validity of noncontextualized personality measures to those of contextualized personality measures for predicting job performance. In the remainder of this study, we first review theoretical and empirical work that suggests contextualized measures may be more valid than are noncontextualized measures. Second, we discuss several important moderators that may account for the hypothesized validity differences between the two types of measures. Last, we present the results of our analyses and discuss the implications of our findings for both research and practice. Theory and Hypotheses The theoretical basis for exploring FOR effects on the validity of personality can be found in person situation interaction theory (Mischel, 1973). Person situation interaction theory rests on the notion that personality is not necessarily a consistent predictor of behavior across various situations. Instead, the theory predicts that individual behavior in a given situation is a function of both the personality of the individual and the situation itself. From this perspective, although some situations may be powerful determinants of behavior, others are likely to be exceedingly trivial (Mischel, 1973; p. 255). Wright and Mischel (1987) later referred to this view of the relationship between personality and behavior as the conditional model of dispositions. For an example of the potential interaction between an individual s personality and the situation in which individual behaviors are expressed, consider an individual who scores low on Conscientiousness. This person may demonstrate highly conscientious behavior at work (e.g., maintaining a tidy workspace, keeping a detailed schedule, etc.), but may exhibit highly unconscientious behaviors at home (e.g., letting dirty laundry accumulate, not cleaning up after oneself, etc.). The conditional model of dispositions would predict that this individual

5 SHAFFER AND POSTLETHWAITE 449 may be more likely to display conscientious behaviors at work because such behaviors are highly valued in a work environment. At home, however, the situation may offer fewer incentives for exhibiting conscientious behaviors. Thus, although an individual s level of Conscientiousness may be a good predictor of his or her behavior in some situations, the situation itself may be a stronger predictor of behavior in others. Some researchers have likened this phenomenon to the playing of a social role, arguing that (a) different situations call for individuals to display different roles, and (b) individuals self-perceptions of personality can systematically change based on the role that is being fulfilled (Allport, 1961; Donahue & Harary, 1998; Heller, Ferris, Brown, & Watson, 2009). The possibility that situation-specific personality measures may be more valid than nonspecific measures has been explored in academic and employment contexts and has met with initial support. Schmit et al. (1995) compared the validity of noncontextualized measures of Conscientiousness to contextualized measures that had been revised to reflect Conscientiousness at school. The authors found that the school-specific Conscientiousness measure was a stronger predictor of cumulative college grade point average (GPA) than was the noncontextualized measure. Bing et al. (2004) extended these results by showing that measures of schoolspecific, contextualized Conscientiousness incrementally predicted college GPA above and beyond ACT scores and noncontextualized measures of Conscientiousness. Lievens et al. (2008) assessed Conscientiousness in a group of undergraduate students using two contextualized measures specific to a school setting and a workplace setting. The results showed that work-specific Conscientiousness was a poor predictor of GPA, whereas school-specific Conscientiousness was a strong predictor of GPA, suggesting that personality measures may be most valid when they are framed to reflect the context in which the criterion of interest ultimately occurs. Some studies have examined FOR effects on the validity of personality in a work setting. Hunthausen et al. (2003) assessed the Big Five personality traits in two groups of customer service managers. The first group completed a noncontextualized personality measure, whereas the second group completed a contextualized measure. The results showed that the contextualized personality measure was a better predictor of performance than was the noncontextualized personality measure for four of the Big Five traits (Conscientiousness, Emotional Stability, Extraversion, and Openness to Experience). Furthermore, the contextualized personality measures incrementally predicted job performance after controlling for cognitive ability, whereas the noncontextualized measures added no incremental prediction over cognitive ability. Other studies that have directly compared the validity of contextualized and noncontextualized personality

6 450 PERSONNEL PSYCHOLOGY measures have met with mixed results (DeGroot & Kluemper, 2007; Pace & Brannick, 2010; Small & Diefendorff, 2006). In summary, the conditional model of dispositions predicts that revising personality measures in such a way that contextualizes them to a workplace setting will increase their validity. A noncontextualized personality scale may not predict job performance as well because such scales are not designed to capture work-specific behaviors. Based on the above discussion, we expect: Hypothesis 1: The validity of personality measures that contain a FOR that is specific to a workplace context will be greater than the validity of noncontextualized personality measures. Moderators It is possible that any differences between the validities of contextualized and noncontextualized personality measures are the result of something other than the FOR contained in the measures. That is, even if the validities of contextualized and noncontextualized personality measures are not equivalent, it is important to examine alternative explanations for these differences. Thus, we identified several moderators that may influence the validity of contextualized and noncontextualized personality for predicting job performance. Developmental Purpose It is important to determine whether personality measures that were originally designed for general purposes are as valid as measures that were designed specifically for use in the workplace. Although both types of measures assess the same basic personality dimensions, those designed for general use are intended to predict behaviors across a wide range of contexts, whereas those designed for workplace use are intended to predict behaviors that are more relevant to a work context. The use of general-purpose personality assessments for predicting job performance can be suboptimal because such measures were not intended for use in personnel selection (Gill & Hodgkinson, 2007). Personality scales that are designed for general use are constructed using empirically driven, exploratory methods that are meant to identify scale items that show high factor loadings on a given trait while also showing low factor loadings on other traits. The typical approach to constructing a personality measure designed for general use would be to generate a large pool of representative scale items for each of the Big Five

7 SHAFFER AND POSTLETHWAITE 451 traits and then conduct several factor analyses on those items. Throughout the factor analysis process, items are eliminated that do not load highly on given traits. Items with the highest factor loading for a given trait are retained in the scale for that particular trait, the end results being a measure that contains distinct scales for each of the Big Five traits (Digman, 1990; Goldberg, 1992; Saucier, 1994). In contrast, personality measures that have been designed for use in the workplace typically consist of items that have been chosen based on their theoretical connection to the criteria of interest: job performance. For these measures, the development process may begin with a job analysis that determines the activities or duties that workers are expected to perform. Personality scales are then constructed that consist of items designed not only to fit the five-factor model of personality but also to display high levels of face validity relative to the activities and duties that are relevant to a work environment (Page, 2009; Schmit, Kihm, & Robie, 2000). For example, the items I have a rich vocabulary and I do things according to a plan can be found in the International Personality Item Pool (Goldberg, 1999). However, the second item is more face valid relative to job performance criteria. Thus, this latter item might be retained by a test developer for use in a workplace measure of personality, whereas the former item would not be retained. A second way that developers may attempt to make a personality measure more relevant to the workplace is by revising scale items such that they use work-related vocabulary in an effort to make the items more easily understood by job incumbents or applicants. The test manual for one personality measure designed for workplace use explained that during scale development the authors attempted to eliminate psychological language from items completely (Abraham & Morrison, 2009). We note that FOR (contextualized vs. noncontextualized) and development purpose (general vs. workplace) are two independent dimensions. Thus, a scale classified as contextualized may or may not have been originally developed for use in the workplace. For example, validity studies have been conducted using personality measures that were originally designed for general purposes but had been subsequently revised to reflect a work-specific FOR (e.g., Halfhill, Nielsen, Sundstrom, & Weilbaecher, 2005; Hunthausen et al., 2003). Other studies have used measures that were designed for use in the workplace that consist of noncontexualized items that do not explicitly measure work-specific personality (e.g., Postlethwaite, Robbins, Oh, & Casillas, 2010; Witt & Carlson, 2006). 1 1 A summary of the information that we used to categorize each of the personality measures included in this study, including example items and instructional anchors can be found in Appendix A.

8 452 PERSONNEL PSYCHOLOGY Because measures that have been designed specifically for use in the workplace are designed to identify worker characteristics that are most relevant to actual job performance (Pearlman & Sanchez, 2010), such measures should be more valid predictors of job performance. Hypothesis 2: The validity of personality measures that have been designed for specific use in the workplace will be greater than the validity of those designed for general use. Reliability Differences Contextualized personality measures are thought to have higher levels of internal reliability than do noncontextualized measures (Lievens et al., 2008). This is because when responding to a noncontextualized personality measure, it is possible that respondents describe themselves based on different situational referents for each test item. A respondent may describe their behavior at work on one item, their behavior at home on another item, and their behavior at school on yet another item (Wang, Bowling, & Eschleman, 2010). On the other hand, when responding to a contextualized personality measure, respondents should be more likely to limit their descriptions of themselves to the specific context to which the measure refers. From this perspective, variance in responses to test items that occurs due to respondents describing themselves based on different situational referents for each test item is considered error variance that reduces the reliability of noncontextualized personality measures (Robie et al., 2000). Only a few studies have tested whether FOR effects influence the reliability of personality tests. Schmit et al. (1995) tested four of the Big Five traits (excluding Openness) and found that error variance was lower for contextualized measures of personality, which thereby resulted in higher reliability estimates for such measures. Robie et al. (2000) extended this research by analyzing facet-level measures of noncontextualized and school-specific, contextualized Conscientiousness in a large sample of undergraduate students. They found greater amounts of error variance and thus, lower reliabilities for five of the six facets when the measures were noncontextualized. Lievens et al. (2008) examined the reliability of all of the Big Five traits. Their results showed that contextualizing personality measures increased the reliability of the personality scales. Although few studies comparing the reliability of contextualized and noncontextualized measures of personality have been conducted, early evidence tends to support the view that the higher validity of contextualized measures may be due to their higher levels of internal reliability.

9 SHAFFER AND POSTLETHWAITE 453 Hypothesis 3: Contextualized measures of personality will have higher internal reliabilities than will noncontextualized measures of personality. Publication Status Finally, the issue of publication bias as it relates to the validity of personality has garnered recent interest. Publication bias refers to the notion that only a portion of all studies conducted on a given topic has been published and that the results of the published studies differ from those of the unpublished studies in a systematic and meaningful way (McDaniel, Rothstein, & Whetzel, 2006). Publication bias has also been called the file drawer problem, a term that refers to the possibility that significant results are published and nonsignificant results are put in the back of a file drawer and forgotten (Rosenthal, 1979). Hunter and Schmidt (2004) refer to this issue as availability bias. The authors explain that the general implication of publication bias for meta-analysis is that if studies that contain nonsignificant results are not readily available for inclusion in a given meta-analysis, the results of that meta-analysis will be upwardly biased. As it relates to this study, the topic of publication bias is of particular relevance for several reasons. First, McDaniel et al. (2006) analyzed validity information provided by four test publishers and found initial evidence that publication bias existed for several of the data sets containing validity studies of personality. Thus, there is already some indication that publication bias is present in the personality validity literature. Second, it is possible that studies comparing the validity of contextualized personality measures to noncontextualized measures may be more likely to be published if those studies find significant differences between the validity of the two types of measures. More specifically, it may be the case that published studies are more likely to be those studies that find higher validities for contextualized measures. Based on both the conceptual nature of publication bias and the empirical evidence that is currently available: Hypothesis 4: The validities reported in published studies of personality measures will be higher than the validities reported in unpublished studies. Method We conducted an extensive search for both published and unpublished papers to include in this meta-analysis. First, we conducted an electronic search of PsycINFO, PsycARTICLES, EBSCO, Web of Science,

10 454 PERSONNEL PSYCHOLOGY ProQuest Dissertations and Theses, and the Defense Technical Information Center (DTIC). The electronic search included, but was not limited to, the following keywords: personality, Big Five, five factor, Conscientiousness, Agreeableness, Emotional Stability, Neuroticism, Openness, Extraversion, and job performance. Second, we conducted a manual search of the following journals for the time period from 1977 to 2011: Journal of Applied Psychology, Personnel Psychology, Psychological Bulletin, International Journal of Selection and Assessment, Human Performance, Organizational Behavior and Human Decision Processes, Journal of Vocational Behavior, Journal of Organizational Behavior, Journal of Occupational Psychology, Journal of Occupational and Organizational Psychology, Journal of Management, and Academy of Management Journal. Third, we searched the conference programs of both the Society for Industrial and Organizational Psychology and the Academy of Management meetings for the time period of for additional unpublished papers. Fourth, we searched the reference sections of previously published, relevant meta-analyses for studies that had not been uncovered in our previous searches (e.g. Barrick & Mount, 1991; Hurtz & Donovan, 2000; Salgado, 1997). Fifth, we contacted test publishers to request technical reports. Finally, we sent out an electronic request through the Academy of Management mailing list servers for any additional unpublished data that were available. Inclusion Criteria After acquiring all promising studies, we examined the abstracts and evaluated the results of each study to determine its relevance to our study purposes. We used several decision rules in order to determine if the study should be included in this analysis. First, the study had to be empirical in nature and had to examine job performance in a field setting. Given this, laboratory studies were excluded. Second, the study needed to include a measure of at least one of the Big Five personality traits and a measure of job performance. Third, the study had to report sample sizes and correlations between personality and job performance, or enough information that the reported statistics (univariate F-values, t-values, chi-square values, differences scores, or means and standard deviations) could be converted into usable effect sizes. Fourth, the study must have reported data based on an independent sample. We found several studies that reported results that seemed to be based on the same data set. In such cases, we included only the study with the largest sample size. Following these criteria, we identified a total of 90 studies from which we obtained usable data. We obtained multiple effect sizes from some studies; the number of effect sizes and total sample sizes available for each personality trait varies across

11 SHAFFER AND POSTLETHWAITE 455 our analyses. We obtained (N = 10,866 16,078) effect sizes for noncontextualized measures of personality and (N = 2,178 3,478) effect sizes for contextualized measures. Description of Coding Procedures After developing a set of coding instructions and a coding sheet, each author independently coded each of the studies included in our analysis. 2 We held meetings periodically to cross-check and discuss our coding procedures, implementing changes when needed. We resolved any disagreements through discussion. Before performing any data analyses, we reached complete agreement. To ensure the independence of our data, we recorded the uncorrected observed correlations and sample sizes as listed in the studies. In the case of the dependent variables, some studies reported several effect sizes for job performance. When this occurred we employed three main decision rules to determine which coefficient to retain. First, we prioritized measures of supervisory ratings of overall job performance if such ratings were available. Second, when measures of multiple job performance facets were provided (e.g., ratings of task performance and contextual performance were reported separately) we computed a composite correlation for overall job performance whenever possible. 3 Third, in the few cases in which multiple job performance criteria were reported but it was not possible to compute a composite correlation, we averaged the correlations for the individual performance measures to obtain an effect size for overall job performance. Finally, one study reported correlations between personality traits and performance ratings from more than one supervisor. In this case, we averaged the resulting correlations. Personality Description of Variables Only personality measures that were designed based on the five-factor model of personality were included in this study (Hurtz & Donovan, 2 A summary of the meta-analytic database can be found in Appendix B. 3 To compute composite correlations, we used formula given by Hunter and Schmidt (2004, p. 435). This formula, where x is a single predictor variable, Y is a composite criterion measure derived from the individual criterion measures y 1, y 2...y i, n is the number of y criterion measures included in the composite measure, and r yiyj is the average correlation between y measures, is as follows: r xyi r xy =. n + n(n 1) r yi yj

12 456 PERSONNEL PSYCHOLOGY 2000). The initial level of agreement between coding decisions for this moderator was 95%. Performance outcomes We included only studies that reported supervisory ratings of job performance. Some studies included facet-level job performance measures such as task, contextual, and counterproductive work performance, but too few, if any, of these studies reported results for contextualized measures of personality to allow for a meaningful analysis of these criteria. Scale characteristics We coded each study based on whether the personality measure used was a noncontextualized measure or a contextualized measure of workspecific personality and whether the measure was designed for general or workplace use. In order to be considered a work-specific, contextualized personality measure, a measure must have included explicit language that gave the items in that measure a work-specific FOR. We coded a personality measure as being work-specific if it (a) consisted of a preponderance of items that included a specific reference to the workplace in the items themselves, (b) contained instructions that directed respondents to describe themselves exclusively in terms of their workplace behaviors, or (c) contained both items and instructions that referenced the workplace. Ninety-nine percent of our initial coding decisions were in agreement for this moderator. To determine whether a given personality test was designed for general or workplace use, we reviewed published papers and test manuals for details describing how each measure was developed. In some cases, it was necessary to contact the publisher of a given personality measure to obtain this information. Measures that were intended for use in predicting a wide range of criteria for psychological, medical, or research purposes were coded as general-purpose measures. Those measures that were based on job analysis or were designed specifically for use in predicting job performance criteria were coded as having a workplace development purpose. Initial coding agreement was 96% for this moderator. Publication status Dissertations, theses, conference papers, and validity studies provided by test publishers were coded as unpublished studies. Studies found in journals were coded as published studies.

13 SHAFFER AND POSTLETHWAITE 457 Meta-Analytic Procedure and Artifact Corrections We analyzed our data using the methods developed by Hunter and Schmidt (2004). This method computes the sample-weighted mean of the observed correlations and observed standard deviations from the original studies and then corrects them for statistical artifacts, including predictor unreliability, criterion unreliability, and range restriction. Because not all studies in our data set included artifact information, we computed separate artifact distributions for each of the Big Five traits based on the data available in our data set. Because it is extremely unlikely that the individuals in our sample were selected top-down based on personality scores, corrections for direct range restriction were not appropriate for use in this study. Therefore, in order to correct for range restriction we used the procedures for correcting for indirect range restriction outlined by Hunter and Schmidt (2004). We computed a separate u x value for each Big Five trait by combining data from each of the studies in our data set with the normative data provided in the test manuals for the various personality scales that were included in our data set. This procedure resulted in u x values that ranged from.91 (Conscientiousness and Agreeableness) to.93 (Emotional Stability and Openness). These estimates are virtually identical to those reported in previous meta-analyses (Barrick & Mount, 1991; Hurtz & Donovan, 2000). We also computed estimates of predictor reliability. We found mean alpha reliabilities that ranged from.78 (Openness to Experience) to.85 (Emotional Stability). Our estimates are very similar to the estimates reported by Hurtz and Donovan (2000), which were also based on explicit Big Five measures. Although we did not correct for unreliability in the predictor, predictor reliabilities were used to test Hypothesis 3 and in the process of correcting for indirect range restriction. In order to correct for unreliability in job performance ratings, we first attempted to derive an estimate of mean interrater reliability from the studies included in our data set. However, only one study reported such data. Because interrater reliability estimates were largely unavailable in our data set, we elected to correct for criterion unreliability using a metaanalytic estimate given in previous research. In determining what estimate of interrater reliability to use in our study, we considered the results from several meta-analyses that reported estimates of mean interrater reliability that were based on large samples. Hunter (1986) estimated mean interrater reliability to be.60, but this estimate was based on a relatively limited data set. Based on larger, independent data sets, Rothstein (1990) and Viswesvaran, Ones, and Schmidt (1996) estimated mean interrater reliability to be between.48 and.52. Salgado et al. (2003) also reported a meta-analytically derived reliability estimate of.52 for job performance ratings in a sample of European studies. After reviewing the available

14 458 PERSONNEL PSYCHOLOGY TABLE 1 Validity Estimates as Moderated by Personality Scale Type 95% CI Analysis k N r SDr ρ SDρ Lower Upper Conscientiousness Overall , Noncontextualized 91 16, General 69 12, Workplace 22 3, Contextualized 22 3, General 7 1, Workplace 15 2, General 76 13, Workplace 37 6, Emotional Stability Overall 86 13, Noncontextualized 68 10, General 52 8, Workplace 16 2, Contextualized 18 2, General Workplace 14 2, General 56 8, Workplace 30 4, Extraversion Overall 90 14, Noncontextualized 72 11, General 54 9, Workplace 18 2, Contextualized 18 2, General Workplace 14 2, General 58 9, Workplace 32 5, Agreeableness Overall 94 15, Noncontextualized 73 11, General 56 9, Workplace 17 2, Contextualized 21 3, General Workplace 15 2, General 62 10, Workplace 32 5, continued

15 SHAFFER AND POSTLETHWAITE 459 TABLE 1 (continued) 95% CI Analysis k N r SDr ρ SDρ Lower Upper Openness to Experience Overall 80 13, Noncontextualized 66 10, General 51 8, Workplace 15 2, Contextualized 14 2, General Workplace 10 1, General 55 8, Workplace 25 4, Note. k = number of validity coefficients; N = total sample size; r = observed sample weighted mean validity; SDr = observed sample weighted standard deviation; ρ = mean operational validity corrected for indirect range restriction; SDρ = corrected standard deviation; CI = confidence interval. research on this issue, we chose to use the estimate of.52 provided by Viswesvaran et al. (1996) because it was derived from the largest available data set, was meta-analytically derived, and has been replicated in an independent set of primary studies. Results Hypothesis 1 predicted that contextualized measures of personality would be more valid than noncontextualized measures of personality. As shown in Table 1, contextualized measures of Conscientiousness were more valid (ρ =.30, k = 22, N = 3,478) than were noncontextualized measures (ρ =.22, k = 91, N = 16,078), and there was no overlap between the confidence intervals for these estimates (note that the validity estimates discussed in this section are estimates of operational validity and are therefore not corrected for unreliability in the predictor). Contextualized measures of Emotional Stability also were more valid (ρ =.27, k = 18, N = 2,619) than were noncontextualized measures (ρ =.11, k = 68, N = 10,946) with no overlap in the confidence intervals for the two estimates. This pattern, which is further depicted in Figure 1, remained consistent for contextualized (ρ =.25, k = 18, N = 2,692) and noncontextualized (ρ =.08; k = 72, N = 11,876) measures of Extraversion, contextualized (ρ =.24, k = 21, N = 3,357) and noncontextualized (ρ =.10, k = 73, N = 11,831) measures of Agreeableness, and contextualized (ρ =.14, k = 14, N = 2,178) and noncontextualized (ρ =.02, k = 66, N = 10,866) measures of Openness to Experience. None of the

16 460 PERSONNEL PSYCHOLOGY Noncontextualized Contextualized.05 Conscientiousness Emotional stability Extraversion Agreeableness Opnness to experience Figure 1: Comparison of Validity Estimates for Supervisory Ratings of Overall Job Performance. confidence intervals for these estimates overlapped. Overall, the results suggest that contextualized personality measures are better predictors of performance than are noncontextualized measures and provide support for Hypothesis 1. Hypothesis 2 proposed that personality measures designed for use in the workplace would be more valid than those designed for general use. As shown in Table 1, the overall validity estimates for personality measures designed for workplace use (ρ =.30, k = 37, N = 6,177) were higher than those for measures designed for general-purpose use (ρ =.21, k = 76, N = 13,379) for Conscientiousness. This was also the case for workplace (ρ =.20, k = 30, N = 4,850) and general-purpose (ρ =.11, k = 56, N = 8,715) measures of Emotional Stability, for workplace (ρ =.19, k = 32, N = 5,133) and general-purpose (ρ =.10, k = 62, N = 10,055) measures of Agreeableness, and for workplace (ρ =.11, k= 25, N= 4,317) and general-purpose (ρ =.00, k = 55, N = 8,727) measures of Openness to Experience. There was no overlap in the confidence intervals for these estimates. These findings lend initial support to Hypothesis 2. However, in order to examine Hypothesis 2 more closely, we disentangled the effects of contextualization from the effects of scale development by examining the developmental purpose of personality measures within the noncontextualized and contextualized personality measures categories separately. Turning first to the results for noncontextualized personality

17 SHAFFER AND POSTLETHWAITE 461 measures, the results show that scales designed for use in the workplace (ρ =.28, k = 22, N = 3,795) were more valid than those designed for general use (ρ =.21, k = 69, N = 12,283) for Conscientiousness. This was also the case for workplace (ρ =.15, k = 17, N = 2,751) and general use (ρ =.08, k = 56, N = 9,080) measures of Agreeableness and workplace (ρ =.09, k = 15, N = 2,540) and general use (ρ =.02, k = 51, N = 8,326) measures of Openness to Experience. Turning next to the results for contextualized measures of personality, only for Conscientiousness were measures designed for use in the workplace (ρ =.31, k = 15, N = 2,382) more valid than those designed for general use (ρ =.27, k = 7, N = 1,096). In general, the results show that once the FOR of a personality measure is taken into account there is no consistent indication that personality measures designed for use in the workplace are more valid than are measures designed for general use. Overall, these results do not support Hypothesis 2. Hypothesis 3 predicted that contextualized measures of personality would have higher internal reliabilities than would noncontextualized measures of personality. As shown in Table 2, the mean alpha reliability estimates for noncontextualized and contextualized measures were virtually identical, ranging from.77 to.84 (M =.81) for noncontextualized measures and from.81 to.87 (M =.84) for contextualized measures. For conscientiousness, we obtained reliability estimates of.83 (k = 59, N = 9,202) and.83 (k = 10, N = 1,648) for noncontextualized and contextualized measures, respectively. Reliability estimates were.84 (k = 44, N = 6,849) and.87 (k = 6, N = 789) for Emotional Stability;.81 (k = 47, N = 7,326) and.87 (k = 7, N = 943) for Extraversion;.78 (k = 48, N = 7,301) and.81 (k = 9, N = 1,527) for Agreeableness; and.77 (k = 45, N = 7,170) and.83 (k = 7, N = 872) for Openness to Experience. In addition, within each personality trait there was overlap between the confidence intervals surrounding the reliability estimates. Because coefficient alpha is dependent on the number of items a measure contains (Cortina, 1993), we examined the possibility that the mean number of items found in noncontextualized personality measures differed from the number of items found in contextualized measures. We found no statistical difference in the mean number of items for the two types of measures. This suggests that the validity estimates for contextualized and noncontextualized personality scales were not unduly influenced by the number of items contained in each scale. Taken as a whole, these results do not support Hypothesis 3. Hypothesis 4 predicted that the validities reported in published studies of contextualized personality measures will be higher than the validities reported in unpublished studies. To test this hypothesis, for each personality trait we compared the validity estimates and 95% confidence

18 462 PERSONNEL PSYCHOLOGY TABLE 2 Reliability Estimates for Noncontextualized and Contextualized Measures of Personality 95% CI k N Item mean Item SD α SDα Lower Upper Conscientiousness Overall 69 10, Noncontextualized 59 9, Contextualized 10 1, Emotional Stability Overall 50 7, Noncontextualized 44 6, Contextualized Extraversion Overall 54 8, Noncontextualized 47 7, Contextualized Agreeableness Overall 57 8, Noncontextualized 48 7, Contextualized 9 1, Openness to Experience Overall 52 8, Noncontextualized 45 7, Contextualized Note. k = number of validity coefficients; N = total sample size; α = observed sample weighted mean alpha reliability; SDα = observed sample weighted standard deviation of alpha; CI = confidence interval. intervals of published and unpublished studies of noncontextualized and contextualized scales (Dudley, Orvis, Lebiecki, & Cortina, 2006). As shown in Table 3, the results show little to no difference between the validity estimates for noncontextualized measures of Emotional Stability, Extraversion, Agreeableness, and Openness to Experience. In those cases where small differences exist, the confidence intervals for the estimates overlap. This pattern can also be seen for contextualized measures of personality. Specifically, there is little difference between validity estimates. For contextualized measures, the validity estimates that are reported in published studies (ρ =.25, k = 12, N = 2,018) are very similar to those reported in unpublished studies (ρ =.23, k = 6, N = 674) for Extraversion. This is also the case for the results from published (ρ =.24, k = 14, N = 2,602) and unpublished studies (ρ =.27, k = 7, N = 755) for Agreeableness and published

19 SHAFFER AND POSTLETHWAITE 463 TABLE 3 Validity Estimates as Moderated by Publication Status 95% CI Analysis k N r SDr ρ SDρ Lower Upper Conscientiousness Overall Published 67 11, Unpublished 46 7, Noncontextualized Published 52 9, Unpublished 39 6, Contextualized Published 15 2, Unpublished Emotional Stability Overall Published 46 7, Unpublished 40 6, Noncontextualized Published 35 5, Unpublished 33 5, Contextualized Published 11 1, Unpublished Extraversion Overall Published 54 8, Unpublished 36 5, Noncontextualized Published 42 6, Unpublished 30 5, Contextualized Published 12 2, Unpublished Agreeableness Overall Published 54 9, Unpublished 40 5, Noncontextualized Published 40 6, Unpublished 33 5, Contextualized Published 14 2, Unpublished continued

20 464 PERSONNEL PSYCHOLOGY TABLE 3 (continued) 95% CI Analysis k N r SDr ρ SDρ Lower Upper Openness to Experience Overall Published 47 7, Unpublished 33 5, Noncontextualized Published 38 6, Unpublished 28 4, Contextualized Published 9 1, Unpublished Note. k = number of validity coefficients; N = total sample size; r = observed sample weighted mean validity; SDr = observed sample weighted standard deviation; ρ = mean operational validity corrected for indirect range restriction; SDρ = corrected standard deviation; CI = confidence interval. (ρ =.12, k = 9, N = 1,543) and unpublished studies (ρ =.17, k = 5, N = 635) for Openness to Experience. The exception to this pattern of results is seen for measures of Conscientiousness. For contextualized measures of Conscientiousness, the mean validity reported in published studies was slightly lower (ρ =.28, k = 15, N = 2,723; 95% CI =.26.30) than the mean validity reported in unpublished studies (ρ =.34, k = 7, N = 755; 95% CI =.34.34). Taken together, these results do not support Hypothesis 4. A reviewer suggested that we examine potential differences in the validity of contextualized personality measures based on the means by which a work-specific FOR is achieved (i.e., based on whether measures (a) consisted of items that made specific references to the workplace, (b) contained instructions that specifically referenced workplace behaviors, or (c) contained both items and instructions that referenced the workplace). The results of this analysis can be found in a supplementary table in Appendix C. The results suggest that the validity of personality measures can be increased when both instructions and scale items reference workspecific behaviors. However, these results were not consistent across all five personality traits and should be interpreted with caution given the small number of studies in many of the analyses. Discussion The purpose of this meta-analysis was to investigate the relative validity of contextualized and noncontextualized measures of self-reported personality. At the overall level of analysis, the validity estimates that we