tems are then tested to determne f they are truly dmensonally dstnct from the rest of the test tems. In ths way, DIMTEST can be used to confrm whether

Size: px

Start display at page:

Download "tems are then tested to determne f they are truly dmensonally dstnct from the rest of the test tems. In ths way, DIMTEST can be used to confrm whether"

Marian McCarthy
5 years ago
Views:

1 A New Bas Correcton Method for the DIMTEST Procedure Amy G.Froelch Department of Statstcs Iowa State Unversty Wllam F. Stout Department of Statstcs Unversty of Illnos, Urbana-Champagn Abstract Developed by Stout (1987), DIMTEST s a nonparametrc procedure that provdes a hypothess test of undmensonalty for a test data set. A new bas correcton method for the DIMTEST procedure based on the nonparametrc IRT parametrc bootstrap method (Km, 1994; Habng, 2001) s presented. Usng a specfed examnee ablty dstrbuton and nonparametrc tem response functon estmates for each test tem, a second test data set s generated under the assumpton of undmensonalty. The DIMTEST statstc calculated from ths generated data set serves to correct for the bas present n the DIMTEST statstc calculated usng the orgnal data set. Usng results from both Stout (1987) and Douglas (1997), the new DIMTEST procedure s shown to have an asymptotcally standard normal dstrbuton under farly general regularty condtons and assumptons as both the number of tems and the number of examnees tends to nfnty. A Monte-Carlo smulaton study shows ths new verson of the DIMTEST procedure has an average Type I error rate slghtly below the nomnal rate of ff = 0:05 and very hgh power to detect multdmensonalty n a varety of realstc multdmensonal models. 1 Background on Dmensonalty Assessment One of the many areas of research n the feld of Item Response Theory concerns the assessment of the dmensonalty of a test. Three man methods are used to assess the number of dmensons on a test: lnear factor analyss (Hambleton & Traub, 1973; Hatte, 1985; Reckase, 1979), non-lnear factor analyss (Etazad-Amol & McDonald, 1983; Gessarol & De Champlan, 1996) and condtonal covarance based procedures, such as DIMTEST (Stout, 1987; Nandakumar & Stout, 1993), HCA/CCPROX (Roussos, Stout & Marden, 1998) and DETECT (Zhang & Stout, 1999b). Dmensonalty assessment usng these procedures can be dvded nto two man areas: 1) determnng f a test s undmensonal or multdmensonal and 2) f necessary, determnng the multdmensonal structure of the test tems. Undmensonalty s an mportant concept n Item Response Theory for many reasons. Many common IRT models assume undmensonalty. Consequently, many IRT based procedures for tem parameter and examnee ablty estmaton; for example, BILOG (Mslevy & Bock, 1990), assume tests measure only one dmenson. Serous bas of tem parameter and examnee ablty estmates can be obtaned from these procedures f the assumpton of undmensonalty s volated (Ackerman, 1989; Krsc & Hsu, 1995). For these reasons, and many others, t s essental to have a relable procedure to test for undmensonalty of test data. Stout (1987) developed the DIMTEST procedure to provde a nonparametrc hypothess test of undmensonalty for a test data set. To test for undmensonalty, the DIMTEST procedure selects an Assessment Subtest (called AT1) of potentally dmensonally dstnct tems from the test. These 1

2 tems are then tested to determne f they are truly dmensonally dstnct from the rest of the test tems. In ths way, DIMTEST can be used to confrm whether tem clusters found by lnear factor analyss, non-lnear factor analyss, HCA/CCPROX, DETECT, or some other statstcal method are truly dmensonally dstnct from the remanng test tems. DIMTEST can also be used to confrm the dmensonal dstnctveness of tem clusters obtaned through substantve analyss of the test. The DIMTEST procedure produces a statstc that has a known asymptotc dstrbuton (as both test length and number of examnees tends to nfnty). However, for a fnte length test, the DIMTEST statstc s statstcally based. To correct for ths serous statstcal bas, Stout (1987) selected another Assessment Subtest (called AT2). The DIMTEST statstc calculated usng the AT2 Subtest serves as an estmate of the bas present n the DIMTEST statstc calculated usng the AT1 Subtest. Unfortunately, ths methodology had three serous weaknesses. One, for certan applcatons, some statstcal bas stll remaned n the procedure, resultng n unacceptable levels of Type I hypothess testng error. Two, the use of DIMTEST was precluded on short tests or when the AT1 Subtest contaned more than one-thrd of the test tems. Three, ths bas correcton method s not easly adapted to tests wth polytomous tems, or to computer adaptve tests. Several research projects have been conducted over the past few years (Gao (1997) and Stout, Froelch, & Gao (2001)) wth the goal of removng the AT2 bas correcton method from the DIMTEST procedure and replacng t wth a new bas correcton method based on the nonparametrc IRT parametrc bootstrap method (Km, 1994; Habng, 2001). Whle the new bas correcton method was promsng, ts mplementaton has resulted n hgh Type I error rates for the DIMTEST procedure, especally for tests wth a small number of tems. The purpose of ths paper s to present an mproved bas correcton method for the DIMTEST procedure based on the same NIRT parametrc bootstrap method. Wth ths mproved bas correcton method, the DIMTEST procedure has average Type I error rates at or slghtly lower than the nomnal rate and very hgh power to detect multdmensonalty n a varety of realstc test stuatons. Snce the AT2 Subtest s no longer needed, the new DIMTEST procedure also provdes much greater flexblty than the orgnal procedure presented n Stout (1987). Ths bas correcton method can also be extended to stuatons where choosng an AT2 Subtest s ether very dffcult or mpossble, such as wth polytomous tems (Froelch, 2001) or tems on a Computer Adaptve Test (Froelch, 2000). Fnally, ths paper also ncludes a theoretcal justfcaton of the DIMTEST procedure wth the new bas correcton method. 2 Condtonal Covarances and Dmensonalty In order to fully dscuss the DIMTEST procedure, the followng revew of the lterature and notaton s necessary. Let U be the response of a randomly sampled examnee on the th tem of a test, and let U (n) =(U 1 ;U 2 ;:::;U n ) T denote the response pattern of a randomly sampled examnee to an n tem test. The focus of ths paper s lmted to dchotomous tems, so U = 1 f the examnee answers tem correctly, andu = 0 f the examnee answers tem ncorrectly. Item Response Theory assumes examnee responses to test tems depend upon both the characterstcs of the tems themselves and upon the latent examnee random ablty vector. The probablty a randomly sampled examnee wth ablty vector = answers tem correctly s gven by the condtonal probablty P ( ) =P (U =1j = ) (1) The model n equaton (1) s referred to as the Item Response Functon or IRF of tem. 2

3 For the latent varable model defned n equaton (1) to be psychometrcally reasonable, two assumptons are often made. Frst, the latent varable probablty model s assumed to be monotone ncreasng coordnatewse n for each tem. Second, the latent varable probablty model s assumed to be locally ndependent. A latent varable model s locally ndependent f P (U (n) = u (n) j = ) = ny =1 P (U = u j = ). (2) holds for all and all possble response patterns u (n). Thus, examnee responses to test tems are ndependent condtoned upon the value of the latent examnee ablty vector. The dmensonalty of a test s then defned as the mnmum number of dmensons of the latent vector requred to produce a locally ndependent and monotone ncreasng latent varable probablty model. In other words, the latent ablty vector contans all dmensons or abltes that affect examnee performance on the test tems. To determne the dmensonalty of a test, the assumpton of local ndependence requres provng 2 n equatons hold for each possble value of. In practce, the requrement of local ndependence of the latent varable model s usually replaced wth the requrement of weak local ndependence. A latent varable model dsplays weak local ndependence f Cov(U ;U l j = ) =0. (3) for each n(n 1)=2 tem pars (; l) and for every. Provng weak local ndependence thus requres showng only n(n 1)=2 equatons hold for every. Investgatng dmensonalty through the use of condtonal covarances can be developed further usng a geometrc representaton for the tems of a multdmensonal test (see, for example, Ackerman, 1996). Assume a test has two dmensons, denoted by 1 and 2. The dmensons 1 and 2 can be represented by a coordnate system wth 1 correspondng to the horzontal axs and 2 correspondng to the vertcal axs, as shown n Fgure 1. (The use of orthogonal axes n no way mples the correlaton between dmensons 1 and 2 s zero). In Fgure 1, Item 1 represents an tem that measures the 1 ablty but not the 2 ablty, whle Item 2 represents an tem that measures the 2 ablty but not the 1 ablty. Item 3 represents an tem that measures both the 1 and 2 abltes. Thus, Item 3 best measures some composte ablty of 1 and 2. Fgure 1: Two-Dmensonal Coordnate System Item 2 Item 3 Item

4 Ths multdmensonal representaton of tem vectors can be used to explan the role of condtonal covarances n determnng the dmensonalty of a test (Zhang & Stout, 1999a). Fgure 2 represents a two dmensonal test wth the domnant ablty the test s desgned to measure, also called the drecton of best measurement of the test, denoted as. (The drecton of best measurement of both an tem and a test s rgorously defned n Zhang & Stout, 1999a). Fgure 2: Graphcal Representaton of Condtonal Covarances 2 6 Item 1 ff Item 2 Item 5 Item 3 ΦΦ* 1 Item 4 Φ ΦΦΦΦ Φ ΦΦΦΦ - 1 The condtonal covarances between two tems on the same sde of the latent condtonng varable (Items 1 and 2 and Items 3 and 4) wll be postve, and the condtonal covarances between two tems on dfferent sdes of the latent condtonng varable (Items 1 and 3, Items 1 and 4, Items 2 and 3, and Items 2 and 4) wll be negatve. In addton, f ether tem of an tem par les n the same drecton as the latent condtonng varable (all tem pars contanng Item 5), the condtonal covarance for that tem par wll be zero. (Zhang & Stout (1999a) show smlar, but more complex results hold for condtonal covarances when the dmensonalty of a test s greater than two). The magntude and sgn of the condtonal covarance of an tem par thus provde nformaton about the dmensonal structure of test tems gven a partcular condtonng varable. If a test s undmensonal, each test tem wll le n the same drecton as the drecton of best measurement of the test. Therefore, all condtonal covarances between tem pars condtoned on wll be zero. However, f a test s multdmensonal, all test tems wll not necessarly le n the same drecton as. Therefore, some of the condtonal covarances between tem pars condtoned on wll be dfferent from zero. These concepts serve as the bass for the DIMTEST procedure. 3 Revew of DIMTEST procedure Developed by Stout (1987) and refned by Nandakumar & Stout (1993), the DIMTEST procedure provdes a nonparametrc hypothess test of undmensonalty of a data set. For completeness and comparson wth the new DIMTEST procedure, a step by step descrpton of the orgnal procedure s gven below. Step 1. Select the Assessment Subtest 1 (AT1), Assessment Subtest 2 (AT2), and the examnee Parttonng Subtest (PT). 4

5 Step 1.1 Select a group of m tems from the total n tems on the test to form the Assessment Subtest 1 (AT1). The AT1 Subtest s tested for dmensonal dstnctveness relatve to the remanng test tems. Denote the drecton of best measurement of the remanng test tems as. The AT1 tems should be chosen so that they are 1) dmensonally homogeneous and 2) dmensonally dstnct from. In other words, the drecton of best measurement of the AT1 tems should be close to each other and far away from. Selectng the AT1 Subtest n ths manner mples two condtons. Frst, f the AT1 tems le n the same drecton as, the condtonal covarance between two AT1 tems gven wll be zero, thus ndcatng undmensonalty of the test data. Second, f the AT1 tems measure a dmenson dfferent than, the condtonal covarance between two AT1 tems gven wll be postve, thus ndcatng multdmensonalty s present n the test data. The effectveness of the DIMTEST procedure n detectng muldmensonalty n the test data s entrely dependent upon the selecton of the AT1 Subtest. Fgures 3 and 4 represent two dfferent choces for the AT1 Subtest for a two-dmensonal test. In Fgure 3, the AT1 tems are dmensonally homogeneous relatve to and dmensonally dstnct from. Therefore, the condtonal covarances of all AT1 tem pars wll be postve. Thus, DIMTEST wll lkely conclude that the test s multdmensonal. However, n Fgure 4, the AT1 tems are not dmensonally homogeneous relatve to. Only the condtonal covarances between Items 1 and 2 and between Items 3 and 4 wll be postve, whle the rest of the condtonal covarances between the four AT1 tems wll be negatve. Thus, whle the test s multdmensonal, DIMTEST wll lkely not detect the multdmensonalty present and conclude the test s undmensonal. Fgure 3: A Good Choce of AT1 6 2 Item 2 Item 4 ff Λ ΛΛν Item 3 Item 1 Λ ΛΛ Λ ΛΛ Λ οοοοοοοοοοοο: ΛΛ There are generally two ways to select a good AT1 Subtest. The frst method s based on a substantve analyss of tem content. Ths analyss can come from prevous dmensonalty analyss of the subject matter, test specfcatons, or characterstcs of the tem's themselves, such as tems from the same readng passage. The second method s to use an exploratory statstcal data analyss method, such as lnear factor analyss, non-lnear factor analyss, HCA/CCPROX, DETECT, etc., to select AT1. Stout (1987) orgnally suggested usng lnear factor analyss to select the AT1 Subtest. Although not a part of the DIMTEST procedure tself, usng lnear factor analyss to select the AT1 Subtest has become ngraned n the mplementaton of the DIMTEST computer program. 5-1

6 Fgure 4: APoor Choce of AT1 2 6 Item 1 ff Item 2 Item 3 ΦΦ* 1 Item 4 Φ ΦΦΦΦ Φ ΦΦΦΦ - 1 Step 1.2 Select a second group of m tems from the (n m) remanng test tems to form the Assessment Subtest 2 (AT2). The AT2 tems are chosen to have, on average, a smlar tem dffculty dstrbuton as the AT1 tems. (See page 595 of Stout (1987) for complete detals.) The AT2 tems serve as a bas correcton method for the DIMTEST statstc. Step 1.3 Form the PT Subtest usng the remanng (n 2m) test tems. Step 2. Defne the kth examnee subgroup as all examnees whose total score on the PT Subtest, denoted as Z PT, s equal to k. Defne J k as the number of examnees n subgroup k. An examnee subgroup k s elmnated from the DIMTEST statstc calculaton f J k s less than a specfed mnmum sze (typcally the mnmum sze of J k ranges from 2 to 20). Denote the number of subgroups used n the calculaton of the DIMTEST statstc as K. Let U (k) j denote the response of the jth examnee from subgroup k to the th assessment tem. For each examnee subgroup k, calculate the followng quanttes: Y (k) j = mx =1 U (k) j, μ Y (k) = 1 J k ^ff 2 k = 1 J k XJ k (Y (k) j=1 ^ff 2 U;k = mx =1 XJ k j=1 Y (k) j, ^p (k) = 1 J k XJ k j=1 U (k) j, j μ Y (k) ) 2, and (4) ^p (k) (1 ^p (k) ). (5) The quantty Y (k) j s the total score of the jth examnee from subgroup k on the assessment subtest, Y μ (k) s the average total score of the examnees n subgroup k on the assessment test, and ^p (k) s the proporton of examnees n subgroup k who correctly answered assessment tem. The quantty ^ff k 2 s the usual estmate of the varance of the total score on the assessment subtest for examnees n the kth subgroup and ^ff U;k 2 s an estmate of the same varance assumng the test s undmensonal. 6

7 Step 3. For each examnee subgroup k, calculate the statstc X T L;k =^ff k 2 ^ff U;k 2 =2 dcov(u ;U l jz PT = k) (6) <l2at 1 where Cov(U d ;U l jz PT = k) s the usual estmate of the covarance between two tems condtoned on the set of all examnees wth PT Subtest score k (Gao, 1997). If the AT1 tems are dmensonally smlar to the PT tems, the condtonal covarance between each par of AT1 tems condtoned on total score on the PT tems wll be small for each k. However, f the AT1 tems measure a dfferent dmenson than the PT tems, the condtonal covarance between each par of AT1 tems wll be postve and large for each k. Thus, the value of the statstc T L;k s an estmate of the dmensonal dstnctveness between the AT1 Subtest and the PT Subtest for each examnee subgroup k. Step 4. To obtan a statstcal test of the null hypothess, the asymptotc varance of T L;k, denoted as S 2 k, s calculated as S 2 k = (^μ 4;k ^ff 4 k ) ^ff 4;k J k, (7) where ^μ 4;k = 1 J k XJ k (Y (k) j=1 The DIMTEST statstc T L s gven by j μ Y (k) ) 4 and ^ff 4;k = mx =1 ^p (k) (1 ^p (k) )(1 2^p (k) ) 2. T L = P Kk=1 T L;k q PK k=1 S2 k (8) Under the null hypothess of undmensonalty, Stout (1987) proved the DIMTEST statstc T L has an asymptotc standard normal dstrbuton under certan regularty condtons as both the number of tems and the number of examnees tend to nfnty. However, for a fnte length test, Holland & Rosenbaum (1986) proved a general result about condtonal assocaton that mples the theoretcal condtonal covarance, Cov(U ;U l jz PT = k), s non-negatve foranytwo non-pt tems and l even when undmensonalty holds. Thus, E(T L;k ) 0 for all k, makng the DIMTEST statstc T L postvely based under the null hypothess of undmensonalty. Stout (1987) corrected for ths bas by calculatng another DIMTEST statstc usng the second assessment subtest, AT2. Step 5. Repeat Steps 2 through 4 usng the m tems from the AT2 Subtest to calculate another DIMTEST statstc, denoted as T B. When the test s undmensonal, T B serves as an estmate of the bas n the T L statstc. The bas corrected DIMTEST statstc s then gven by T = T L T p B. (9) 2 The DIMTEST statstc T has an asymptotc standard normal dstrbuton under the null hypothess of undmensonalty as the number of examnees and the number of tems tend to nfnty (see Stout, 1987). Thus, the null hypothess s rejected wth asymptotc level ff f the value of T s greater than the 100(1 ff)th percentle of the standard normal dstrbuton. Remark: Stout (1987) found the DIMTEST statstc T had an unacceptably hgh level of Type I error n smulaton studes. To lower the Type I error rate of the DIMTEST procedure, two 7

8 changes to the computaton of the DIMTEST statstc were made. One, the varance of T L;k was replaced by anoverestmate calculated as k = (^μ 4;k ^ff k 4)+^ff 4;k +2 (^μ 4;k ^ff k 4)^ff 4;k, (10) J k S 20 where ^μ 4;k and ^ff 4;k are the same as n Step 4 above. Two, the form of the T L statstc was changed to TL 0 = p 1 X K T L;k. (11) K S 0 k=1 k The fnal DIMTEST statstc q T 0 = T L 0 T B 0 p (12) 2 has only an approxmate asymptotc standard normal dstrbuton as both the number of tems and the number of examnees tend to nfnty. 4 Correctng the Bas n TL Generally, the method of bas correcton used by Stout (1987) has been successful (shown to usually have Type I error rates near the nomnal rate and good power to detect multdmensonalty n test data) n smulaton and real data studes (see Stout, 1987; Nandakumar & Stout, 1993; Hatte, Krakowsk, Rogers, & Swamnathan, 1995). However, there are mportant stuatons n whch the DIMTEST statstc has an nflated Type I error rate when undmensonalty holds. When the AT1 Subtest contans tems wth relatvely large dscrmnatons and/or smlar dffcultes when compared to the remanng test tems, the AT2 bas correcton method often fals to correct enough of the bas n the T L statstc to produce acceptable Type I error and/or power results. In addton, because the test must be splt nto three subtests, the DIMTEST procedure can produce unrelable results when the test s relatvely short, or when the AT1 Subtest s relatvely long. Fnally, the AT2 method of bas correcton s dffcult to mplement n other stuatons, such as for polytomous tems or for computer adaptve tems. In order to develop a new bas correcton method for the DIMTEST procedure, the source and amount of the bas n the DIMTEST statstc when the test s undmensonal must be determned. Let Z PT be the total score on the PT Subtest and let U and U l be the responses to AT1 tems and l respectvely. When the test s undmensonal, the followng equalty holds. Cov(U ;U l jz PT = k) =Cov(P ( );P l ( )jz PT = k) (13) where by defnton, (assumng the dstrbuton of gven Z PT s contnuous wth densty g( ) = f( jz PT = k)) Z Z Z Cov(P ( );P l ( )jz PT = k) = P ( )P l ( )g( )d P ( )g( )d P l ( )g( )d (14) (Proofs of these two equatons are gven n Secton 11.) Clearly, from equatons (13) and (14), the theoretcal value of the condtonal covarance between two AT1 tems condtoned on a gven PT score depends on three functons: the IRFs of the two AT1 tems, P ( ) and P l ( ), and the condtonal dstrbuton of gven total score on the PT tems, f( jz PT = k). For the AT2 bas correcton method, the DIMTEST statstcs from both the AT1 and AT2 Subtests are calculated usng the same set of examnees wth the same PT Subtest scores. Therefore, 8

9 the condtonal dstrbuton of gven total score on the PT Subtest s the same for both assessment subtests. Thus, n order to correct for the bas n the T L statstc, the th tem of the AT2 Subtest needs to have roughly the same IRF as the th tem of the AT1 Subtest throughout the range of. However, the AT2 tems are chosen only to have, on average, a smlar tem dffculty dstrbuton as the AT1 tems. The th tem from AT1 and the th tem from AT2 can have dfferent dscrmnaton and guessng parameters, and therefore markedly dfferent tem response functons. Thus, the AT2 bas correcton method can fal to correct for at least some of the bas present n the DIMTEST statstc T L. Ths falure wll be extreme n certan cases, such aswhentheat1 tems have large dscrmnatons or smlar dffcultes when compared to the other test tems. In other cases, when the test s short or the AT1 Subtest s long, there are smply not enough test tems left after selectng the AT1 Subtest to choose a reasonable AT2 Subtest. These shortcomngs can adversely affect the performance of the DIMTEST procedure. 5 New Bas Correcton Method The goal of the new bas correcton method s to replace the AT2 Subtest wth a method that estmates the three functons n equaton (14) under the null hypothess of undmensonalty. The new bas correcton method s based on the nonparametrc IRT parametrc bootstrap method (Km, 1994; Habng, 2001). The NIRT parametrc bootstrap method uses the nonparametrc method of kernel smoothng to estmate each tem's response functon. Kernel smoothng was developed by Nadaraya (1964) and Watson (1964) for nonparametrc regresson estmaton and was frst appled to NIRF estmaton by Ramsay (1991). In applyng kernel smoothng to IRF estmaton, the ndependent varable s an estmate of examnee ablty and the dependent varable s the examnee response on the test tem. The kernel smoothed estmate of the IRF for a test tem at a value s found by essentally calculatng a weghted movng average of examnee responses on the tem. Examnee responses to the tem are weghted by thekernel functon», whch s generally taken to be a quadratc or gaussan functon. Kernel smoothng also uses a bandwdth h to control the amount of smoothng or borrowng of nformaton from examnees judged to be close to n ablty. When the ndependent varable (examnee ablty) s measured wthout error, the commonly accepted value of the bandwdth that mnmzes the mean square error of the kernel smoothed estmates s proportonal to J 0:2, where J s the number of observatons. Ramsay (1993) found ths bandwdth stll mnmzed the mean square error for kernel smoothed estmates of the IRF n a varety of smulated examnee response data. The kernel smoothed estmates of each tem's response functon are calculated usng a specfc dstrbuton for the ablty. The purpose of the dstrbuton s to smply set a scale for the IRF estmates. Generally, kernel smoothed estmates of the IRFs are calculated for a fxed number of ponts equally spaced throughout ths dstrbuton. Snce the actual dstrbuton selected s completely arbtrary, the scale of the kernel smoothed estmates s chosen to be the Unform dstrbuton wth lower bound zero and upper bound one. Douglas (1997) showed, under farly general regularty condtons, the method of kernel smoothng produces consstent tem response functon estmates as both the number of tems and the number of examnees tend to nfnty. For a fnte length test, empercal results show the kernel smoothng estmaton procedure tends to produce a good estmate of the IRF for each tem when the value of s towards the mddle of the dstrbuton (Ramsay, 1991). However, for low values of, the kernel smoothng procedure produces postvely based estmates of P ( ). Lkewse, for hgh values of, the kernel smoothng procedure produces negatvely based estmates of P ( ). Several dfferent methods to correct for ths bas near the endponts of the dstrbuton have 9

10 been developed. Rce (1984) used a second smoothng wth a wder bandwdth near the endponts of the dstrbuton, whle Müller (1991) corrected for ths bas by usng a dfferent kernel functon at the endponts of the dstrbuton. Habng & Ntcheva (2000) found usng Müller's kernel functon wth a wder bandwdth mnmzed the mean square error of IRF estmates n a varety of smulated data. Thus, for values of less than one bandwdth away from the endponts of the dstrbuton, the kernel smoothed estmate of P ( ) s found usng ths varaton of Müller's method. Once the IRFs estmates have been obtaned from the kernel smoothng procedure, the NIRT parametrc bootstrap method uses these IRFs estmates and the specfed Unform(0,1) dstrbuton to generate a second complete examnee data set under the null hypothess of undmensonalty. If the null hypothess of undmensonalty holds for the orgnal data, the generated data wll have approxmately the same IRFs for the AT1 tems and approxmately the same condtonal dstrbuton of gven PT score as the orgnal data. Thus, accordng to equatons (13) and (14), the DIMTEST statstc calculated from ths undmensonal generated data set should have approxmately the same amount of bas as the DIMTEST statstc calculated from the orgnal data. However, f the null hypothess s false and f the assessment subtest s well chosen, the dfference between the two DIMTEST statstcs should be large, reflectng the true multdmensonal structure of the orgnal data. 6 DIMTEST wthout AT2 The new verson of DIMTEST uses the NIRT parametrc bootstrap method to correct for the bas n the DIMTEST statstc T L. The eght steps n the new DIMTEST procedure are explaned below. The dfferences between the orgnal DIMTEST procedure presented n Secton 3 and the new verson are hghlghted. Step 1. Choose m tems for the AT Subtest. (Snce there s no AT2 Subtest, we smply denote the assessment subtest as AT). The methods avalable for selectng the AT tems are the same as n the orgnal DIMTEST procedure. When multdmensonalty s present n the test data, the goal s stll to choose AT so that the tems are 1) dmensonally homogeneous and 2) dmensonally dstnct from the drecton of best measurement of the remanng test tems. Step 2. Place the remanng (n m) tems n the PT Subtest. Defne the kth examnee subgroup as all examnees whose total score on the PT Subtest, denoted as Z PT, s equal to k. Defne J k as the number of examnees n subgroup k. An examnee subgroup k s elmnated from the DIMTEST statstc calculaton f J k s less than a specfed mnmum sze (typcally the mnmum sze of J k ranges from 2 to 20). Denote the number of subgroups used n the calculaton of the DIMTEST statstc as K. Let U (k) j denote the response of the jth examnee from subgroup k to the th AT tem. For each examnee subgroup k, calculate the followng quanttes: Y (k) j = mx =1 U (k) j, μ Y (k) = 1 J k ^ff 2 k = 1 J k XJ k (Y (k) j=1 XJ k j=1 Y (k) j, ^p (k) = 1 J k XJ k j=1 U (k) j, j μ Y (k) ) 2, and (15) 10

11 ^ff 2 U;k = mx =1 ^p (k) (1 ^p (k) ). (16) Step 3. For each examnee subgroup k, calculate the statstc X T L;k =^ff k 2 ^ff U;k 2 =2 dcov(u ;U l jz PT = k) (17) <l2at where d Cov(U ;U l jz PT = k) s the usual estmate of the covarance between two tems condtoned on the set of all examnees wth PT Subtest score k (Gao, 1997). Step 4. The asymptotc varance of T L;k, denoted as Sk 2, s calculated as where ^μ 4;k = 1 J k XJ k (Y (k) j=1 The DIMTEST statstc T L s gven by S 2 k = (^μ 4;k ^ff 4 k ) ^ff 4;k J k, (18) j μ Y (k) ) 4 and ^ff 4;k = mx =1 ^p (k) (1 ^p (k) )(1 2^p (k) ) 2. T L = P Kk=1 T L;k q PK k=1 S 2 k (19) Remark: The adjustments made by Stout (1987) to the calculaton of S 2 k and T L are no longer needed. The asymptotc form of the DIMTEST statstc wth the new bas correcton method has acceptable levels of both Type I error and power. See Secton 8 for smulaton results. Step 5. For each test tem, calculate an estmate of the tem's response functon. Adapted from Ramsay (1991) and Douglas (1997), the kernel smoothng estmaton procedure requres two steps. Step 5.1. Estmate each examnee's ablty for each test tem. Step Calculate each examnee's total score. Add a Unform(0,1) varable to each examnee's total score to break any tes. Ths random breakng of tes makes the kernel smoothng procedure computatonally easer to mplement. Step For each tem and each examnee j, subtract the score on the th tem from each examnee's adjusted total score obtaned from Step Denote ths value as ^T( ;j) for =1;:::;n; j =1;:::;J. Step For each tem, rank the ^T( ;j) values from the smallest to the largest. Dvde the rank for each examnee by J +1, where J s the number of examnees. Denote ths value as ^ ( ;j) for =1;:::;n; j =1;:::;J. Step 5.2 Estmate the tem response functon for each test tem. 11

12 Step Defne the evaluaton ponts of the tem response functon as (l) = l=41 for l = 1;:::;40, and defne the bandwdth of the kernel smoothng functon as h =0:9J 0:2. If h< (l) < 1 h, the IRF estmate ^P ( (l) ) for each tem s gven by ^P ( (l) )= P Jj=1» P Jj=1» (l) ^ ( ;j) h (l) ^ ( ;j) h U j (20) where U j s the response of the jth examnee to the th tem and the kernel functon s»(x) =1 x 2 ; jxj»1,»(x) = 0, otherwse. Step If (l) < h, defne ρ = (l) =h and the new bandwdth as h? = h(2 ρ). Defne fl = (l) =h? and j = (^ ;j (l) )=h?. The IRF estmate ^P ( (l) ) for each tem s gven by where the kernel functon»? (fl; j )sdefnedas (»? (fl; j )= (1 + j)(fl j ) 1+5 (1 + j ) 3 Jj=1» ^P ( )=P? (fl; (l) j )U j PJj=1, (21)»? (fl; j ) 1 fl 2 1 fl ) +10 j, (22) 1+fl (1 + fl) 2 for 1 < j <fl,»? (fl; j ) = 0, otherwse. Step If (l) > 1 h, defne ρ =1 (l) and the new bandwdth as h? = h(2 ρ). Defne fl =(1 (l) )=h? and j = (^ ;j (l) )=h?. The IRF estmate ^P ( (l) ) for each tem s gven by equaton (21) wth the kernel functon»? (fl; j ) gven by equaton (22). Step 6. Generate examnee responses to all test tems usng the estmated IRFs calculated n Step 5. A response pattern for the jth smulated examnee, j =1;:::;J, s obtaned as follows. Step 6.1. Generate the examnee's ablty, denoted as j, from the Unform(0,1) dstrbuton. Step 6.2. Set (0) =0and (41) =1. Defne the value of the IRF at the endponts of the dstrbuton to be ^P (0) = ^c and ^P (1) = 1, where ^c s an estmate of examnee guessng on the test. (Anchorng the kernel smoothed estmates at the endponts of the dstrbuton resulted n slght mprovements n the Type I error of the procedure). For each tem, the value of P ( j )=P (U j =1j j ) s calculated by lnearly nterpolatng the kernel smoothed estmates obtaned n Step 5. Thus, for l = 1;:::;41, f (l 1) < j» (l), then ψ! " ψ!# j (l 1) P ( j )= ^P ( (l) j (l 1) )+ 1 ^P ( (l 1) ) (l) (l 1) (l) (l 1) Step 6.3. For each tem, determne the value of U j by comparng P ( j ) to a randomly generated value from the Unform(0,1) dstrbuton, denoted as u. If P ( j ) >u, U j =1, otherwse, U j =0. Step 7. Usng ths generated data set and the same (AT,PT) partton as the orgnal data, calculate another DIMTEST statstc accordng to Steps 2 through 4 and denote ths statstc as T G. 12

13 Step 8. To reduce the random varaton n the T G statstc, Steps 6 and 7 are repeated N tmes and the average of the N T G values, denoted as T G, s calculated. Under the assumpton of undmensonalty, the fnal DIMTEST statstc, T = T L T p G (23) 1+1=N has an asymptotcally standard normal dstrbuton under certan regularty condtons and assumptons as the number of tems and the number of examnees tends to nfnty (see Secton 7). The null hypothess of undmensonalty s rejected at asymptotc level ff f T s larger than the 100(1 ff)th percentle of the standard normal dstrbuton. 7 Theoretcal Justfcaton for the New DIMTEST Procedure In ths secton, several theoretcal results are presented for the new DIMTEST procedure. These results are, n part, a combnaton of the results from Stout (1987) on the DIMTEST procedure and Douglas (1997) on the kernel smoothng procedure for estmatng IRFs. For a full descrpton of ther results and proofs, the reader should refer to Stout (1987) and Douglas (1997). Defne a trangular array oftemsas(; n) where =(1;:::;n)andn =(1;:::;1). Let P ;n ( ) denote the IRFs for each test of length n for all 2. Let J n denote the number of examnees on the test of length n. Let J n (k) denote the number of examnees n the kth PT examnee subgroup and let K n denote the number of PT examnee subgroups for each testoflengthn. The followng assumptons are made on the model. A1. The latent vector space s undmensonal as defned n Secton 2. A2. For all pars (; n), P ;n ( )scontnuous, dfferentable and strctly ncreasng n for all 2. A3. Let the densty of be denoted as f( ) and denote the dervatve of the IRF as P;n 0 ( ) for 2. There exsts a compact nterval [a; b] such that for some ffl and fxed constant C<1, ffl f( ) ffl for all 2 [a; b]. ffl P;n 0 ( ) ffl for all 2 [a; b] and all pars (; n). ffl P;n 0 ( )» C for all 2 and all pars (; n). ffl P ;n ( )(1 P ;n ( )) ffl for all 2 [a; b] and all pars (; n). A4. The number of tems n the AT Subtest s fxed as the number of test tems tends to nfnty. A5. There exsts a postve constant c such that mn 1»k»Kn J (k) n max 1»k»Kn J (k) n c for all n A6. As n!1, mn 1»k»K n J (k) n!1 and A7. There exst constants M L, M U and r such that for all n, M L n 3=2 <J n <M U n r. max 1»k»Kn J n (k) K n! 0 n 2 13

14 Result 2.1. (From Stout, 1987) Under assumptons A1 through A7, T L! N(0; 1) as n!1, J n!1, and T G! N(0; 1) as n!1, J n!1. Result 2.2. (From Douglas, 1997) Defne the kernel smoothed estmates of the IRFs for the th tem of the test wth length n as ^P;n ( ). Under assumptons A1 through A7, f f f max f ^P f ;n ( ) P ;n ( ) f! 0 (24) wth probablty 1 as n!1. sup 2(a;b) Result 2.3. Assume the DIMTEST statstcs T L and T G are asymptotcally ndependent. Then under assumptons A1 through A7, the fnal DIMTEST statstc T = T L T p G 1+1=N has an asymptotc standard normal dstrbuton as n!1, J n!1. Proof. From Result 2.1, the dfference T L T G has an asymptotcally normal dstrbuton wth mean 0 and varance gven by Var(T L T G ) = Var(T L )+Var(T G )+2Cov(T L ; T G ) = Var(T L )+(1=N)Var(T G )+2Cov(T L ;T G ) By assumpton, the statstcs T L and T G are asymptotcally ndependent. Thus, Var(T L T G )=1+1=N as n!1, J n!1. The DIMTEST statstc therefore has a standard normal dstrbuton as n!1, J n!1. The above proof of the asymptotc dstrbuton for the DIMTEST statstc T assumes the statstcs T L and T G are asymptotcally ndependent. Whle ths assumpton s not proven, t appears to be reasonable gven Result 2 above. Here s a heurstc argument ofthevaldty of ths assumpton. The value of the DIMTEST statstc T G depends on the kernel smoothed IRF estmates obtaned from the orgnal data. The DIMTEST statstc T L s also calculated drectly from the orgnal data. Snce both DIMTEST statstcs T L and T G are dependent on the orgnal data, the two statstcs are themselves dependent. However, as the test length ncreases, Result 2 (Douglas, 1997) says that the kernel smoothed IRF estmates for each tem of the n tem test wll be wthn some arbtrary value of the true tem response functon P ;n ( ) for all n the some nterval (a; b). Thus, the amount of error n the kernel smoothed estmates ^P;n ( ), whch s due to the varatons n the orgnal data, wll tend to zero as n!1. Thus, asymptotcally, the kernel smoothed estmates of the IRFs wll no longer depend on the orgnal data. Therefore, the DIMTEST statstc T G wll no longer be dependent on the orgnal data and T L and T G wll be asymptotcally ndependent. 8 Monte-Carlo Smulaton Study for DIMTEST The purpose of ths smulaton study s to assess the performance of the new verson of DIM- TEST. The smulaton study s splt nto two parts accordng to whether the data s undmensonal (measurng Type I error) or multdmensonal (measurng power). 14

15 8.1 Type I Error Study Examnee response data were smulated for the Type I error study usng the undmensonal three parameter logstc (3PL) model P ( ) =c + 1 c 1 + exp[ 1:7a ( b )], (25) where a, b and c are the tem's dscrmnaton, dffculty, and guessng parameters and s the examnee's ablty. Examnee abltes were generated from the N(0; 1) dstrbuton whle the tem parameters used n the study were estmated from three real tests: an Armed Servces Vocatonal Apttude Battery (ASVAB) Auto Shop test wth 25 tems (from Mslevy & Bock, 1984), an ACT Math (ACTM) test wth 40 tems (from Drasgow, 1987) and a SAT Verbal (SATV) test wth 80 tems (from Lord, 1968). For all DIMTEST runs, an exploratory lnear factor analyss program based on tetrachorc correlatons was used to calculate the second factor loadngs for each test tem (Stout, 1987). The tems n the AT Subtest were then chosen based on the sze and drecton of these factor loadngs. The number of tems n the AT Subtest was fxed at ether n=4 orn=2 (see Stout, 1987) or allowed to vary between each tral based on the observed factor loadngs (see Nandakumar & Stout, 1993). Four levels of the number of examnees were used n the study (750, 1000, 1500, 2000). For each DIMTEST run, the frst (250, 350, 500, 750) examnees were used to select the AT Subtest and the remanng (500, 650, 1000, 1250) examnees were used to calculate the DIMTEST statstc. For all DIMTEST runs, the mnmum sze of J k used n calculatng the DIMTEST statstc was set to two, the estmate of examnee guessng on the test was set to ^c =0:17, and the number of bootstrap samples of the data was set to N =50. All levels of the desgn: 3 tests, 4 examnee levels, and 3 AT Subtest szes; were fully crossed, gvng 36 dfferent undmensonal models. Each model was smulated 400 tmes and the rate of rejecton per 100 DIMTEST runs for each smulated model recorded n Table 1. The nomnal rate of rejecton s ff =0:05. Table 1: DIMTEST: Type I Error Results Test ASVAB ACTM SATV AT sze 6 12 vary vary vary J = J = J = J = Power Study Examnee response data were smulated for the power study usng a two dmensonal verson of the three parameter logstc model, P ( ) =c + 1 c 1 + exp[ 1:7a T ( b )], (26) where the vector a =(a ;1 ;a ;2 ) T conssts of dscrmnaton parameters for each dmenson for tem, thevector b =(b ;1 ;b ;2 ) T conssts of dffculty parameters for each dmenson for tem, andc 15

16 s the guessng parameter for tem. The examnee ablty vector =( 1 ; 2 )was generated from the bvarate normal dstrbuton N(0; 0; 1; 1;ρ) where ρ =0:3 or0:7. Three dfferent two dmensonal models were used for the tem response functons: smple structure, approxmate smple structure, and no structure. Let f denote the angle between the th tem's drecton of best measurement and the 1 axs as shown n Fgure 5. For the smple Fgure 5: Angle for Power Smulatons 2 6 f Item - 1 structure model, half of the test tems were randomly chosen to have f = 0 ff, whle the other half of the tems were assgned f =90 ff. For the approxmate smple structure model, half of the test tems were randomly chosen to have 0 ff» f» 20 ff, whle the other half of the tems were assgned to have 70 ff» f» 90 ff. The partcular value of f for each tem was generated from the Unform(0,20) or from the Unform(70,90) dstrbuton respectvely. For the no structure model, the angle f for each tem was randomly generated from the Unform(0,90) dstrbuton. The dscrmnaton parameters a 1; and a 2; vared dependng on the two dmensonal model used for the tem response functons. Let a denote the value of the tem dscrmnaton parameter for tem from the three tests used n the Type I error study: ASVAB, ACTM, and SATV. The tem dscrmnaton parameters a ;1 and a ;2 were determned for each tem by the relatonshps a ;1 = a cos(f ) and a ;2 = a sn(f ) For the smple structure model, the tem dffculty parameters b 1; and b 2; were taken from the three undmensonal tests used n the Type I error study. For the other two multdmensonal models, the tem dffculty parameters b 1; and b 2; were generated from the standard normal dstrbuton and truncated at 1:5 and 1:5. Fnally, for all three two dmensonal models, the c parameters were taken from the three undmensonal tests used n the Type I error study. Three dfferent methods were used to select the AT Subtest. The frst two methods used an exploratory lnear factor analyss program to calculate the second factor loadngs for each test tem. Method 1 allowed the number of tems n the AT Subtest to vary between trals based on these observed factor loadngs, whle Method 2 fxed the number of tems n the AT Subtest at n=2. Method 3 used a confrmatory approach to determne the power of the DIMTEST procedure when the AT Subtest was correctly specfed for each tral. For Method 3, all tems wth angle f greater than 45 ff were placed n the AT Subtest. Four levels of the number of examnees were used n the study (750, 1000, 1500, 2000). For the two methods of selectng the AT Subtest based on the lnear factor analyss program, the frst 16

17 (250, 350, 500, 750) examnees were used to determne the AT Subtest and the remanng (500, 650, 1000, 1250) examnees were used to calculate the DIMTEST statstc. For the confrmatory method of selectng the AT Subtest, the frst (250, 350, 500, 750) examnees were gnored and the remanng (500, 650, 1000, 1250) examnees were used to calculate the DIMTEST statstc. For all DIMTEST runs, the mnmum sze of J k used n calculatng the DIMTEST statstc was set to two, the estmate of examnee guessng on the test was set to ^c =0:17, and the number of bootstrap samples of the data was set to N =50. All levels of the desgn: 3 two-dmensonal models, 3 tests, 4 examnee levels, 3 AT Selecton methods, and 2 examnee ablty correlatons; were fully crossed, gvng 216 dfferent twodmensonal models. Each model was smulated 100 tmes and the number of rejectons of the null hypothess of undmensonalty recorded n Table 2 for the smple structure model, n Table 3 for the approxmate smple structure model, and n Table 4 for the no structure model. The nomnal rate of the rejecton s ff = Table 2: DIMTEST: Power Results, Smple Structure Model Test ASVAB ACTM SATV ρ AT method J = J = J = J = J = J = J = J = Table 3: DIMTEST: Power Results, Approxmate Smple Structure Model Test ASVAB ACTM SATV ρ AT method J= J = J = J = J= J = J = J =

18 Table 4: DIMTEST: Power Results, No Structure Model Test ASVAB ACTM SATV ρ AT method J= J= J= J= J= J= J= J= Dscusson of Results The smulaton results from Table 1 show the new DIMTEST procedure has Type I error rates at or slghtly below the nomnal rate of rejecton of ff = 0:05 n all but two of the 36 smulated undmensonal models. For the three dfferent tests, the average rejecton rate was 2.35% for the ASVAB test, 3.44% for the ACTM test, and 4.19% for the SATV test. For the three dfferent AT Subtest szes, the average rejecton rate was 2.46% for n=4 tems, 4.60% for n=2 tems and 2.92% when the number of tems n the AT Subtest was allowed to vary between trals. Although the average rate of rejecton when the AT Subtest sze was n=2 was near the nomnal rate of rejecton, ths method ncludes the two cases where the Type I error rate was greater than or equal to 10%. Thus, some cauton should be used n nterpretng results, partcularly for a large test, when the number of tems n the AT Subtest s close to one-half of the test tems. However, t seems unlkely ths type of stuaton would appear often n practce. The smulaton results n Table 2 and Table 3 show the new DIMTEST procedure has very hgh power to detect multdmensonalty n the test data when the two-dmensonal model has smple structure or approxmate smple structure wth low correlaton between examnee abltes. However, when the two-dmensonal model has an approxmate smple structure wth hgh correlaton between examnee abltes or has no structure, the smulaton results from Table 3 and Table 4 show the power of the DIMTEST procedure for the confrmatory AT Subtest selecton method s sgnfcantly (n some cases, markedly) hgher than the power results for the other two AT Subtest selecton methods. For example, for the no structure model wth hgh correlaton between examnee abltes, the average rejecton rate for the AT selecton methods usng lnear factor analyss was 22% whle the average rejecton rate for the confrmatory AT selecton method was 86.75%. In these cases, the AT Subtest selecton methods usng lnear factor analyss were not selectng enough dmensonally smlar tems for AT, thus severely reducng the power of the DIMTEST procedure. The results of ths smulaton study suggest replacng the AT Subtest selecton method usng lnear factor analyss wth a method that wll sgnfcantly ncrease the power of the DIMTEST procedure. A natural choce for a new AT selecton method would be an exploratory method based on condtonal covarances, such as the procedures HCA/CCPROX or DETECT. Research n ths area s ongong (Froelch & Habng, 2001). 18

19 10 Conclusons The DIMTEST procedure was developed by Stout (1987) to provde a nonparametrc hypothess test of undmensonalty for a test data set. To correct for a serous statstcal bas n the DIM- TEST statstc, Stout (1987) splt the test tems nto three subtests, the dmensonalty Assessment Subtests AT1 and AT2 and the examnee Parttonng Subtest PT. The DIMTEST statstc calculated usng the AT2 Subtest was used to estmate the bas n the DIMTEST statstc calculated usng the AT1 Subtest. Unfortunately, there were several problems wth usng a second subtest to correct for ths bas. For certan applcatons, some statstcal bas stll remaned n the procedure, resultng n unacceptable levels of Type I hypothess testng error. In addton, snce the test was splt nto three subtests, the use of the DIMTEST procedure was precluded on short tests or when the Assessment Subtest contaned more than one-thrd of the test tems. In order to solve the problems assocated wth the AT2 Subtest, a new bas correcton method based on the nonparametrc IRT parametrc bootstrap method was developed. Wth ths method, the DIMTEST procedure has better statstcal performance and provdes greater flexblty than the orgnal DIMTEST procedure. The new DIMTEST procedure s also theoretcally justfed and a comprehensve smulaton study shows the new verson of the DIMTEST procedure has a Type I error rate near the nomnal rate of ff = 0:05 and very hgh power to detect multdmensonalty present n the data when the AT Subtest s correctly specfed. When the AT Subtest s chosen usng an exploratory lnear factor analyss program, the power of the DIMTEST procedure to detect multdmensonalty n the data s severely lmted for certan two-dmensonal IRT models. These results ndcate the need to develop a new selecton method for the AT Subtest that wll ncrease the power of the DIMTEST procedure whle keepng the Type I error rate of the procedure near the nomnal rate. 11 Proof of Results Proof of Equaton (13): The followng result from elementary probablty theory holds for any four random varables W, X, Y, and V. Cov(X; Y jw )=E[Cov(X; Y jw;v)jw ]+Cov[E(XjW;V);E(Y jw;v)jw ] Settng X = U, Y = U l,w=z PT,andV = gves Cov(U ;U l jz PT )=E[Cov(U ;U l jz PT ; )jz PT ]+Cov[E(U jz PT ; );E(U l jz PT ; )jz PT ] By defnton, E(U jz PT ; ) = P ( ) and E(U l jz PT ; ) = P l ( ) and by local ndependence, Cov(U ;U l jz PT ; ) = 0. Therefore, we have Proof of Equaton (14): By defnton, Cov(U ;U l jz PT )=Cov(P ( );P l ( )jz PT ) Cov(P ( );P l ( )jz PT = k) = E(P ( )P l ( )jz PT = k) E(P ( )jz PT = k)e(p l ( )jz PT = k) = E jzpt =k(p ( )P l ( )) E jzpt =k(p ( ))E jzpt =k(p l ( )) = Z P ( )P l ( )g( )d Z P ( )g( )d Z P l ( )g( )d where g( ) =f( jz PT = k) s the densty dstrbuton of gven PT Score k. 19

20 References Ackerman, T. (1996). Graphcal representaton of multdmensonal tem response theory. Appled Psychologcal Measurement, 20, Ackerman, T. (1989). Undmensonal IRT calbraton of compensatory and noncompensatory multdmensonal tems. Appled Psychologcal Measurement, 13, Brnbaum, A. (1968). Some latent trat models and ther use n nferrng an examnee's ablty. In F. M. Lord and M. R. Novck, Statstcal theores of mental test scores, Menlo Park, CA: Addson-Wesley Publshng Company. Douglas, J. (1997). Jont consstency of nonparametrc tem characterstc curve and ablty estmaton. Psychometrka, 62, Drasgow, F. (1987). A study of the measurement bas of two standardzed psychologcal tests. Journal of Appled Psychology, 72, Etazad-Amol, J., & McDonald, R.P. (1983). A second generaton nonlnear factor analyss. Psychometrka, 48, Froelch, A.G., & Habng, H. (2001). Refnements of the DIMTEST methodology for testng undmensonalty and local ndependence. Paper presented at the annual meetng of the Natonal Councl on Measurement n Educaton, Seattle. Froelch, A.G. (2001). Assessng the undmensonalty of polytomous test tems: Poly-DIMTEST. Manuscrpt. Froelch, A.G. (2000). Assessng the undmensonalty of test tems and some asymptotcs of parametrc tem response theory. Unpublshed Doctoral Dssertaton. Unversty of Illnos at Urbana-Champagn, Department of Statstcs. Gao, F. (1997). DIMTEST enhancements and some parametrc IRT asymptotcs. Unpublshed Doctoral Dssertaton. Unversty of Illnos at Urbana-Champagn, Department of Statstcs. Gessarol, M.E., & De Champlan, A. (1996). Usng an approprate ch-square statstc to test the number of dmensons underlyng the responses to a set of tems. Journal of Educatonal Measurement, 2, Habng, B. (2001). Nonparametrc regresson and the parametrc bootstrap for local dependence assessment. Appled Psychologcal Measurement 25, Habng, B., & Ntcheva, D. (2000). Optmatonof kernel smoothng for IRF estmaton. Techncal Report Hambleton, R.K., & Traub, R.E. (1973). Analyss of emprcal data usng two logstc latent trat models. Brtsh Journal of Mathematcal and Statstcal Psychology, 26, Hatte, J., Krakowsk, K., Rogers, J., & Swamnathan, H. (1996). An assessment of Stout's ndex of essental dmensonalty. Appled Psychologcal Measurement, 20, Hatte, J. (1985). Methodology revew: Assessng undmensonalty of tests and tems. Appled Psychologcal Measurement, 9,