t L I-i RB Margaret K. Schultz Princeton, New Jersey May 25, 1954 William H. Angof'f THE DEVELOPMENT OF NEW SCALES FOR THE APTITUDE AND ADVANCED

Similar documents
Equating and Scaling for Examination Programs

Understanding and Interpreting Pharmacy College Admission Test Scores

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT

Kaufman Test of Educational Achievement Normative Update. The Kaufman Test of Educational Achievement was originally published in 1985.

Reliability & Validity

Estimating Reliabilities of

Career Guidance Tools for Practical Applications I

R&D Connections. The Facts About Subscores. What Are Subscores and Why Is There Such an Interest in Them? William Monaghan

Bennett Mechanical Comprehension Test -II (BMCT -II) FREQUENTLY ASKED QUESTIONS

KeyMath Revised: A Diagnostic Inventory of Essential Mathematics (Connolly, 1998) is

The Time Crunch: How much time should candidates be given to take an exam?

Validation of the General Classification Test For Selection Of Commissioning From the Ranks Candidates. Captain Deborah A.

The Standards for Educational and Psychological Testing: Zugzwang for the Practicing Professional?

A Comprehensive Evaluation of Regression Uncertainty and the Effect of Sample Size on the AHRI-540 Method of Compressor Performance Representation

Minnesota Teacher Licensure Examinations. Technical Report

UK Clinical Aptitude Test (UKCAT) Consortium UKCAT Examination. Executive Summary Testing Interval: 1 July October 2016

Understanding Your GACE Scores

Watson-Glaser II Critical Thinking Appraisal Profile Report

The Reliability of a Linear Composite of Nonequivalent Subtests

The Impact of SEM Programs on Customer Participation Dan Rubado, JP Batmale and Kati Harper, Energy Trust of Oregon

Using Your ACT Results

GUIDE TO GRADUATION. B.S. in BUSINESS ADMINISTRATION HUMAN RESOURCE. management SPECIALIZATION

CUSTOMER SERVICE REPRESENTATIVE TEST BATTERY

Bachelor of Science in Environmental Science

PLANT OPERATOR SELECTION SYSTEM

The Effects of Management Accounting Systems and Organizational Commitment on Managerial Performance

What Pearson Tests Do I Need to Take?

Report of RADTECH Student Performance on the PSB-Health Occupations Aptitude Examination-Revised

A91003: RESEARCH METHODOLOGY AND STATISTICAL ANALYSIS TEACHING PLAN UNIT TOPIC NUMBER OF

Hierarchical Linear Modeling: A Primer 1 (Measures Within People) R. C. Gardner Department of Psychology

Telecommunications Churn Analysis Using Cox Regression

GUIDE TO GRADUATION. B.S. in BUSINESS ADMINISTRATION MANAGEMENT. SPECIALIZATION

Draft Poof - Do not copy, post, or distribute

On-the-Job Search and Wage Dispersion: New Evidence from Time Use Data

OREGON ELECTRICITY SURVEY

CALMAC Study ID PGE0354. Daniel G. Hansen Marlies C. Patton. April 1, 2015

Replications and Refinements

demographic of respondent include gender, age group, position and level of education.

STATE BOARD OF EDUCATION Action Item November 18, SUBJECT: Approval of Amendment to Rule 6A , Florida Teacher Certification Examinations.

THE SUCCESS AND FAILURE OF DANISH FOOD SECTOR PRODUCT DEVELOPMENT IN THE. Working paper no 48. December 1997

Interest Patterns of Management Personnel as a Function of Organizational Level

Mechanical Engineering (n=127) PSU Acoustics

How to Get More Value from Your Survey Data

Woodcock Reading Mastery Test Revised (WRM)Academic and Reading Skills

Intercultural Communication (Also meets Cultural Diversity Requirement) Total Hours 9

Regression analysis of profit per 1 kg milk produced in selected dairy cattle farms

Harbingers of Failure: Online Appendix

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

Certification Areas Grade Levels Test Titles Fees Early Childhood Education & Elementary Education

RIST-2 Score Report. by Cecil R. Reynolds, PhD, and Randy W. Kamphaus, PhD

20. Alternative Ranking Using Criterium Decision Plus (Multi-attribute Utility Analysis)

The Multivariate Dustbin

Environmental Science

Advocacy and Advancement A Study by the Women s Initiatives Committee of the AICPA

Majors and Careers: Making the Right Choices. By Jamal Shareef

The Assessment Center Process

Online Student Guide Types of Control Charts

Calculation of Spot Reliability Evaluation Scores (SRED) for DNA Microarray Data

INDUSTRIAL ENGINEERING

Factor Decomposition of the Gender Job Satisfaction Paradox: Evidence from Japan

Foreign-owned Plants and Job Security

Administration duration for the Wechsler Adult Intelligence Scale-III and Wechsler Memory Scale-III

What Can Coaches Do for You?

Linking Forecasting with Operations and Finance. Bill Tonetti November 15, 2017 IIF Foresight Practitioner Conference

Mechanical properties and strength grading of Norway spruce timber of different origins

Overview of Statistics used in QbD Throughout the Product Lifecycle

EARTH AND ATMOSPHERIC SCIENCES

The Relationships among Organizational Climate, Job Satisfaction and Organizational Commitment in the Thai Telecommunication Industry

CROSS-CULTURAL COMMUNICATION & MULTICULTURAL COMPETENCIES New York Library Association (NYLA) Annual Convention 2017 Saratoga Springs, New York

Variable Selection Methods for Multivariate Process Monitoring

A STUDY OF SERVICE RECOVERY EFFORTS IN BANKS

SUPPLIER EVALUATION AND PURCHASING IN JIT ENVIRONMENT-A SURVEY OF INDIAN FIRMS

SDRT 4/SDMT 4 Administration Mode Comparability Study

Ricardo Model. Sino-American Trade and Economic Conflict

BIOMOLECULAR SCIENCE PROGRAM

Gender Composition of Occupations and Earnings: Why Enter a Female Dominated Occupation?

Comm Advertising Research Perceptual Mapping

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Trip Generation Characteristics of Free- Standing Discount Stores: A Case Study

Four + One Master of Forestry (MF) Degree Proposal School of Forest Resources (SFR) University of Maine

A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL

SPECIAL CONTROL CHARTS

Online Price Dispersion: An International Comparison

Mechanical Engineering (n=127) PSU Acoustics

CASE STUDIES IN USING WHOLE BUILDING INTERVAL DATA TO DETERMINE ANNUALIZED ELECTRICAL SAVINGS

CONSTRUCTING A STANDARDIZED TEST

Equivalence of Q-interactive and Paper Administrations of Cognitive Tasks: Selected NEPSY II and CMS Subtests

Analyzing the Relationship between Organizational Culture and HMIS Use among Homeless Service Providers

STATISTICS PART Instructor: Dr. Samir Safi Name:

(By order of the Governor) M.SHERIF Under Secretary to Government

Nisreen F. Alshubaily Al Imam Mohammad Ibn Saud Islamic University (IMSIU) Riyadh, Saudi Arabia

THE Minnesota Vocational Test for Clerical Workers has

The 1995 Stanford Diagnostic Reading Test (Karlsen & Gardner, 1996) is the fourth

NSC Employee Perception Surveys

Profiling Vendor Performance

The Importance of Culture Fit

HRM and Dairy. Research Questions. Purpose of the Study. Dependent Variable. Explanatory Variables

The Job Outlook for The Class of 2014

Using your CASIO fx-82tl In Statistics Mode

Transcription:

RB-54-15 [ s [ B U t L I-i TI N THE DEVELOPMENT OF NEW SCALES FOR THE APTITUDE AND ADVANCED TESTS OF THE GRADUATE :RECORD EXAMINATIONS Margaret K. Schultz and William H. Angof'f EDUCATIONAL TESTING SERVICE Princeton, New Jersey May 25, 1954

THE DEVELOPMENT OF NEW SCALES FOR THE APTITUDE AND AnVANCED TESTS OF THE GRADUATE RECORD EXAMINATIONS Margaret K. Schultz and William H. Angoff Abstract The new scales for the Aptitude and Advanced Tests of the Graduate Record Examinations have been developed with the use of data collected on 2,095 second semester seniors in eleven colleges whicp are generally representative of the schools'who normally administer the tests to their students. Scores on the Verbal and Quantitative parts of the Aptitude Test were converted to yield scaled score means of 500 and standard deviations of 100. Scores on the Advanced Tests were adjusted to reflect differences in Aptitude Test scores among the groups taking the separate Advanced Tests. It is considered that this type of score scale would lead to fewer errors of interpretation than the previous scales, in which raw scores on each of the Advanced Tests were converted directly to a scale with a mean of 500 and standard deviation of 100.

THE DEVELOPMENT OF NEW SCALES FOR THE APTITUDE MTD ADVANCED TESTS OF THE GRADUATE RECORD EXAMINATIONS * The Graduate Record Examinations offer, among other tests, an Aptitude Test} yielding a Verbal score and a Quantitative score, and a series of Advanced Tests in the usual major areas of college study -- biology, chemistry, history, psychology, and so forth. These tests are taken primarily by college seniors and graduate students in testing programs sponsored by their colleges, and by applicants to graduate schools which require or recommend the tests for admission. The scaled score system for these tests and for the other tests in the Graduate Record Examinations program is one which defines a scaled score of 500 as the mean score for the particular standardization or reference group on which the scale is based, and 100 as the standard deviation of these scores. Such a score system has little inherent normative value, however, if the standardization group whose perforwce is the basis for the scale is not reasonably representative of groups who currently use the tests. In the case of the Aptitude Test, groups now being tested are far more heterogeneous and generally lower in ability than the original standardization group, which was composed of first-year graduate men tested at four eastern universities in 191. This change in the characteristics of the groups being tested was one reason for the development of a new scale for the Aptitude Test. A second reason was the fact that the Aptitude Test itself had undergone revisions in content and scope since the original scaling. It might be noted that the Aptitude Test, as such, was introduced in 1949. Prior to that time, the Verbal and Quantitative scores were obtained as two parts of the Profile Tests. Aptitude Test scores were first reported on this new scale in the fall of 1952. At the same time, new scales for seventeen of the Advced Tests were also introduced. There were two reasons for rescaling the Advanced Tests. For one thing, fourteen of them had been extensively revised since the previous scales had been established, and the allotted testing time had been extended from 1 3/4 to three hours. Secondly, it seemed advisable to make a change in the type of score scale used for the Advanced Tests. The previous scales} established in 1947, had been so set that 500 was the mean scaled score on each particular test for a group of seniors majoring in the field covered by that test, and the standard deviation of their scores was 100. A different standardization group was therefore used for each test, with no adjustment made for oup differences in ability. It is * Paper presented at the 1953 meeting of the American Psychological Association.

generally found that some fields of specialization attract more able groups of students, while other fields of specialization attract less able students. Also, some fields attract a more heterogeneous group of students than do others. As a result of this kind of self-selection, the groups of candidates taking different Advanced Tests differ markedly from one another with respect to level and range of ability. In the development of the newadvacedtest score scales, it was decided to design them to reflect these differences in ability as'measured by performance on the Aptitude Test. It should be emphasized that both this new type of scale and the 1947 scale preserve the rank order of raw scores on a test and therefore both yield the same percentile ranks fora given group of students. The data for the rescaling of the Aptitude and' Advanced Tests were collected in the spring of 1952. A single standardization group was chosen. This group consisted of 2,095 graduating seniors, 686 women and.l,409 men, in eleven colleges. The colleges were selected in such a way as to yield a fairly wide range of scores and a level of performance near the average of the groups who normally take the tests. In addition, it was judged that the colleges chosen were widely enough known so that the performance of their students could be evaluated by other test users. It was felt that this knowledge would add to the normative value of the scores themselves. The colleges in the standardization group have been listed in the norms bulletins for the tests. Each student in the standardization group took both parts of the Aptitude Test -- that is, Verbal and Quantitative. The new scale for each part of the Aptitude Test was established by a linear transformation of the raw scores obtained by the entire group on that part so as to yield scaled scores with a mean of 500 and a standarddeviatlon of 100. In the scaling of the Advanced Tests, a more complex statistical procedure was employed, in order to yield scales which would reflect the performance of the particular Advanced Test group on the Aptitude Test..In addition to taking the AptitUde Test, each of the 2,095 seniors also took the Advanced Test in his major field of study. For each Advanced Test subgroup, regression coefficients were determined for predicting Advanced Test scores from Verbal and Quantitative Aptitude Test scores. Estimations were then made of the raw score mean and variance of the entire standardization group on each Advanced Test, utilizing the regression coefficients and also the measured differences between the entire group and each Advanced Test subgroup on the two Aptitude Test scores. The equations used in this type of score scaling were developed by Dr. Ledyard R Tucker of Educational Testing Service. The equations are given on the page following the present text.

-3- Once estimates of the raw score mean and variance of the entire standardization group on each Advanced Test had been obtained, linear transformations were determined which would result in a scaled score mean of 500 and a standard deviation of 100 on each test for the entire standardization group. Figure 1 shows in graphical form the level and dispersion of scores for each Advanced Test subgroup on the two parts of the Aptitude Test and also on the Advanced.Test taken by that group. The total group has, by definition of the new scale, a mean of 500 and a standard deviation of 100 on each part of the Aptitude Test. It can be seen that the Advanced Test subgroups differ rather markedly from one another, and from the total group, iathe Aptitude Test scores. For example, there is a difference of 7 points in the Verbal score and 62 points in the Quantitative Bcore between the Chemistry subgroup and the entire standardization group. It is this kind of difference which has been taken into account in establishing the Advanced Test scales, and reflects itself in a high mean score (530) for the Chemistry subgroup on the Advanced Test in Chemistry. Table 1 shows the scaled score means and standard deviations on the Aptitude and Advanced Tests obtained by each of the subgroups--essentially the same data as given in Figure 1, but in.numerical fofm. Tne fact that the Advanced Test mean and standard deviation for any particular subgroup are above or below 500 and 100 means that the Aptitude Test scores for that group differed correspondingly from the Aptitude Test scores for the entire group, and that some degree of correlation existed between Aptitude and: Advanced Test scores for that group. The Advanced Test means range from 446, for the Education Test subgroup, to 549, for the Philosophy Test subgroup. The standard deviations range from 92 to 101. The 1947 scaling method for the Advanced Tests would have yielded a mean of 500 and a standard deviation of 100 for each of these groups. In the new scales, then, the 500-point on each Advanced Test is the estimated average score of the entire standardization group, hot the obtained average score of any particular subgroup. In the 1947 scales, the 500-point on each Advanced Test was the obtained average score for the particular group of seniors majoring in the field of the test. As indicated above, the extent to which the level and range of Aptitude Test scores for a particular Advanced Test subgroup are reflected in the Advanced Test scale depends upon the correlation of each part of the Aptitude Test with the particular Advanced Test. Table 2 shows these zero-order correlations and the multiple correlations. For the total group of 2,095 seniors, the correlation

-4- between the two parts of the Aptitude Test was.39. correlations are above.60, although only five are above.70. two Advanced Tests particularly, Gerrrn were extremely low --.28 and.33. the scaling is relatively unreliable. All but 4 of the multiple In the case of and Spanish, the multiple correlations For those tests, it would be expected that In the extreme case, where the multiple correlation is zero, the best estimate of the performance of the total group 1 on the particular Advanced Test is the performance of the subgroup (taking the Advanced Test) itself. Since it is these estimates which are arbitrarily translated to a mean and standard deviation of 500 and 100 respectively, the type of scale resulting would be identical to the one previously used for the Graduate Record Examinations. That is, it would be one where selective factors affecting the level and range of general verbal and quantitative ability (as measured by the Aptitude Test) w'ould essentially be disregarded in setting the separate Advanced Test scales, since knowledge of that performance does not aid in the estimation process. It should be pointed out that in the case of six of the Advanced Tests the numbers of cases for the SUbgroups were less than 50. of German this number was only 10. unreliable. In the case Obviously, then, these scales are extremely However, since the primary effort in setting these scales was to determine the performance of a generally representative total standardization group, the distribution of cases among the various major fields of study was accepted as it was found. To add more cases to individual subgroups and thus to change the proportions in the separate major field groups arbitrarily would have meant changing the nature and composition of the total standardization group, which, in turn, yielded the best estimate of the proportions in the populations. At any rate, it was felt that since the unreliable scales involved tests that were taken by relatively few examinees, extensive harm would not be done. points can be made. In summarizing the characteristics of the new GRE scales, two main First, the new' scale for the Aptitude Test incorporates normative data for a group of students which is clearly identified and fairly representative of a large portion of the test users. Second, the new scales for the Advanced Tests are adjusted for the differences among the groups in Aptitude Test performance. The scales for all tests are therefore based upon

-5- the performance of a single, over-all standardization, or reference group. This type of scale system reduces a major source of possible error in interpretation that was present in the old-type scales where each test was separately standardized on a different group of students. It is our belief that score scales for tests of different functions cannot be adjusted to yield direct comparability in all respects at once. Under the scale system now used for the Advanced Tests, it is not claimed, for example, that a score of 600 on the Chemistry Test represents the same level of achievement in Chemistry that a score of 600 on the Economics Test represents in that field. However, it can be said that a scale system of the kind described here, one which makes an adjustment for the differences in ability among the several groups, does reduce the likelihood of misinterpretation in the comparison of scores obtained by students who take different optional examinations.

-6- Equations Used for the Scaling of the GRE Advanced Tests Notation: v, q = scaled scores on the Verbal and Quantitative parts of the Aptitude Test x = raw scores on the Advanced Test t = entire standardization group s = subgroup taking the particular Advanced Test M, C? = estimated mean and variance M, 0 2 = obtained mean and variance b = regression coefficient C vq = covariance between Verbal and Quantitative Aptitude Test scores =M xs + 2 b b (C- C ) xv.q xq.v va, vq 55 S Linear Transformation to Obtain Scaled Scores (y): y=ax+b, where r-j and B = 500 - A M xt

FIG1h"'.E 1 i-1eans AND STAIIDA.1=ill DEVIA'I'IONS OF SCALED SCORES ON TEE GFE APrlTUDE AIID PINANCED STS, OBTATh"ED BY EACH ADVANCED TEST SUBGROUP IN TEE STA.'\"DARDlZATION o Verbal Quantitative Advanced Test Scaled Score 750 700 (Mean score is indicated by a horizontal line drawn through the center of each bar; the standard deviation is indicated by the ength of the bar above and below the mea..'1) 650 600 550 1!1! :::: lm.,a'! I:::::<!i::: 450 400 IIW :::: HH ;:;:. 350 300" 250. Cf.l til n t>:l t>:l ";I 0 G Q c+... :::T (') P. 'i (1/ (l) 0 III 0 g s;; Gl 0 ::st<j... < (')... ::s... EJ Q p.::s 0... III Q g i» '1 c+ (1J c+... :::T ::s c+...... Q 0,", '1 Q 0 'i (1) '"' <!' '< III ::s... ::s 0 c+ s;; '0! I,

FIGURE. 1 (Continued) MEAIS AND STA!."'IDARD DEVIATIONS OF SCALED SCORES ON THE are APl'ITUDE AND ADVANCED TES'l'S, OBTAINED BY EACH ADVANCED TEST SUBGROUP m THE STANDARDIZATION PROOF-AM W Verbal Quantitative Advanced Test Scaled Score 750 (l4ean score is indicated by a horizontal line drawn through tbe center of each bar; the standard deviation is indicated by the length of the bar above a..'1d below the mean) 700 55.0 600 \iw m 1m 550 m..., 500 I l: :,'%j I.:.:. 450 Illl ::; :::: :::: 300" 250 I CD ::>:: t"' '" "j "j 'd en en. CD 0 'd... l-'- ::T aj c+... '< () t=:j c+ ro III CD g.... :;l "';:l 0 'i ro 0... 0... '" c+ Ii CD () 0... Q... '< c+ 0 CD... 0 ::T t;::l 0'1 s:: 'i ro c+ '1... 'g. i '< (l> 0 '< '< CJl I 0::> I

Table 1 Scaled Score Means and Standard Deviations on Aptitude and Advanced Tests, by Advanced Test Subgroup Aptitude Test- Aptitude Test- Verbal Quantitative Advanced Test Advanced Test. Standard Standard Standard SUbgroup N Mean Deviation Mean Deviation Mean Deviation Biology 209 486 94 499 87 495 96 Chemistry 180 507 96 562 99 530 102 Economics 239 476 89 516 99 494 97 Education 180 438 86 434 83 446 93 Engineering 151 454 92 570 86 497 98 French 32 520 92 453 72 533 92. Geology 35 473 92 500 86 488 97 German 10 543 69 495 79 505 99 Government 146 498 93 485 89 496 97 History 181 517 93 468 80 506 97 Literature 239 564 98 463 86 548 99 Mathematics 81 508 93 587 90 542 97 Philosophy 31 563 96 521 90 549 97 Physics 49 531 106 633 79 546 101 Psychology 171 527 94 495 86 522 96, Sociology 127 482 93 447 77 474 96 Spanish 34 529 102 451 83 520 99 Entire Stand. Group 2,095 500 100 500 100

-10- Table 2 Correlations Among Scores or Each Advanced Test Subgroup Correlations Multiple R: Aptitude Test Aptitude Verbal Aptitude Quant. Adv. Test Advanced Test Verbal vs vs vs vs Apt. Subgro'up' N Quantitative Advanced Test Advanced Test Verbal, Quant. Biology 209.47 50 58.63 Cp,emistry 180 57.38 53 53 Economics 239.44 57 51.64 Education 180 57.74 50 75 Engineering 151 55 58.48.61 French 32.56.54.10.60 Geology 35 50 57 52.63 German 10 -.09.09 -.27.28 Government 146.47.69.45 71 History 181.49.61.38.62 Literature 239.49.82.45.82 Mathematics 81.29.20.46.47 Philosophy 31.49 75.22 77 Physcis 49 50.62.44.64 Psychology 171.38.60.46.65 Sociology 127 53.67 53 70 Spanish 34.66.31.12.33 Entire Stand. Group 2,095.39