The 27th Annual Conference of the Japanese Society for Artificial Intelligence, Shu-Chen Cheng Guan-Yu Chen I-Chun Pan

2C4-IOS-3c-6 An estmaton method of tem dffculty ndex combned wth the partcle swarm optmzaton algorthm for the computerzed adaptve testng Shu-Chen Cheng Guan-Yu Chen I-Chun Pan Department of Computer Scence and Informaton Engneerng, Southern Tawan Unversty of Scence and Technology The computerzed adaptve testng s to provde tems that are consstent wth the current ablty of testee, and to decde the dffculty for the next selected tem accordng to the correctness of the testee s answer. It acheves the goals of adaptve learnng through the mechansms of dynamc adustment of tem dffculty to accelerate the test process or to shorten the number of tems n a test. A prerequste of computerzed adaptve testng s to estmate the dffcultes of tems correctly. In ths study, we descrbe the parameters of tems by an adusted approach. It consders each knowledge block as an ndependent dmenson and gves a value for each dmenson of the dffculty. Combned wth the partcle swarm optmzaton algorthm, a dynamc tem selecton strategy s proposed to develop an adaptve testng system. Therefore, t adopts the multple assessment methods for the abltes by gvng a value for each dmenson of the ablty. By way of the the dynamc tem selecton n computerzed adaptve testng, all the selected tems wll be hghly correlated and more consstent wth the current actual abltes of the testees. 1. Introducton Learnng s generally mmedately accompaned by testng. Testng s an ntegral part of the learnng process. The results of testng can provde feedbacks to both nstructors and learners. An nstructor can correct teachng drectons based on such feedbacks, and learners can correcton ther learnng from these feedbacks. Due to the popularty of computers and Internet, the teachers or researchers have started to construct the computerzed test systems. The computerzed adaptve testng (CAT has been developed to solve the problem that the tradtonal computerzed testng gves the napproprate test tems. It provdes the test tems wth the dffcultes whch are consstent wth the testee s ablty. It creates the exclusve content of personal tests by the way of dynamc tem selecton. For a bg test tem bank wht many test tems, there are two mportant challenges to a CAT system. Frst s how to correctly and quckly estmate the tem dffculty ndex of test tem. In ths study, the testees abltes are consdered nto the estmaton process of the tem dffculty ndces. Those who answered wrongly wth hgher ablty or answered correctly wth lower ablty are regarded as the answers abnormalty. The concept of answers abnormal rate s proposed to develop an estmaton method of tem dffculty ndces. Second s how to quckly locate a sutable test tem for a learner s ablty. Ths study adopts the knowledge structure concept for multple ablty evaluaton for testees, whch s based on the partcle swarm optmzaton (PSO algorthm, to develop a dynamc tem selecton strategy. Through the proposed estmaton method based on the answers abnormal rate, the tem dffculty ndces and the testees abltes Contact: Shu-Chen Cheng, Department of Computer Scence and Informaton Engneerng, Southern Tawan Unversty of Scence and Technology, No.1, Nanta St., Yongkang Dst., Tanan Cty 710, Tawan (R.O.C., +886-6- 2533131#3228, kttyc@mal.stust.edu.tw can be estmated mutually. Each test tem can also be estmated ndependently. Therefore, the test tem bank can be expanded easly at any tme wthout abundant pre-test samples. And then, the dynamc tem selecton system adopts the PSO algorthm as a core and ntegrates wth the knowledge structure concept. The quck search advantage n the PSO algorthm and knowledge structure characterstcs allow the most sutable test tems for a testee s ablty to be quckly dentfed, even n a bg test tem bank. 2. Lterature Revews 2.1 Computerzed Adaptve Testng The tradtonal testng s based on classcal test theory. It gves all the testees the same test paper wth the same test tems. However, ths s not approprate for certan types of tests. For the testees wth hgher or lower abltes, to gve them the same test tems may be too dffcult or too easy. Inapproprate tems are not only unable to dscrmnate the testees abltes accurately, but even may combat the testees postve or confdence. Hence, the tests loss the sgnfcance (Cheng, Ln, & Huang, 2009; Huang, Ln, & Cheng, 2009 In order to mprove the lack of tradtonal testng, the basc concept of computer adaptve testng s to select the test tem wth the dffculty whch s the most consstent wth the testee s current ablty. When a test tem has completed, the test system wll assess the testee s ablty mmedately. And then, the next one test tem wll be selected accordng to ths ablty. In the other words, the testee s answer s correct or not wll affects the dffculty of next one test tem selected. For the testees wth hgher abltes, do not have to gve them too easy test tems; for the testees wth lower abltes, do not have to gve them too dffcult test tems. Through ths knd of dynamc tem selecton strategy, the computer adaptve testng can be held accordng to the dfferent testees abltes. The adaptve testng s a way of test whch s created exclusvely and personally. Therefore, the - 1 -

adaptve testng s wdely used n dfferent areas (Anatchkova, Sars-Baglama, Kosnsk, & Borner, 2009; El-Alfy & Abdel-Aal, 2008; Badaracco & Martínez, 2013. Because of the feature of dynamc tem selecton strategy accordng to the testees abltes, to mplement the computer adaptve testng can not only shorten the number of test tems, but also can assess the testees abltes accurately. It archves the goal of ndvdualzed learnng (Cheng, Ln, & Huang, 2009; Huang, Ln, & Cheng, 2009. 2.2 Partcle Swarm Optmzaton The basc concept of partcle swarm optmzaton (PSO algorthm s derved from a socal group behavor smulaton, frst used by Eberhart & Kennedy (1995 to develop an optmzaton method based on characterstcs of foragng behavor n fsh shoals and brd flocks. Ths method assumes that a flock of brds forages for food n an area. There s only one place where the food s. The brds do not know the poston of food, but they know how far they are from the food. Thereby, the smplest or most effectve strategy to fnd the food s to search n the adacent areas to be closest to the food. Snce the PSO algorthm was formally proposed, t has been wdely appled n many applcatons due ts many advantages, ncludng ts smple structure, few parameters, fast convergence, and applcablty to dynamc envronments and almost optmzaton problems. Many relevant studes have even appled the PSO algorthm to dgtal learnng and used t n adaptve testng systems for conductng tem searches (Cheng, Ln, & Huang, 2009; Huang, Ln, & Cheng, 2009, appled t to onlne learnng systems for teachng (Huang, Huang, & Cheng, 2008, appled t to automatc learnng partner recommendaton (Ln, Huang, & Cheng, 2010, or appled t to blogs for searchng recommended posts (Huang, Cheng, & Huang, 2009. There are two mportant functons of PSO algorthm: the ftness functon and the velocty functon (Mus, Daolo, & Cagnon, 2011. The ftness value, calculated by the ftness functon, determnes whether the poston where the partcle falls s good or bad. The velocty functon s been used to determne the partcle s velocty as (1 and to determne the new partcle s fallng poston as (2. vd w vd + C1 R1 ( Pbestd xd (1 + C2 R2 ( Gbestd xd Where v d s the velocty of -th partcle n d-th dmenson; w s the nerta weght; C 1 and C 2 are the acceleraton functons, whch are usually 2; R 1 and R 2 are the random values between 0 and 1; Pbest d s the partcle s optmal soluton, whch s the poston of optmal soluton of -th partcle n d-th dmenson; Gbeest d s the global optmal soluton, whch s the poston of current optmal soluton among all partcles n d-th dmenson. X d X d + Vd (2 Where x d s the poston of -th partcle n d-th dmenson. In ts ntal state, the PSO algorthm randomly generates partcles n a search space, where each partcle has a dfferent velocty. After usng the ftness functon to obtan an adaptve value for the current poston, the algorthm determnes whether the current poston s good or bad. Each partcle can memorze ts own optmal ftness value, whch named the partcle s optmal soluton (Pbest. It then passes a message to dentfy the poston wth an optmal ftness value among the postons passed by all partcles, whch named the global optmal soluton (Gbest. It then uses (1 to compute a new velocty for each partcle, (2 to determne a new poston for the partcle, and update Pbest and Gbest wth an teratve approach untl the optmal soluton s found. Fg. 1 shows a flowchart for the PSO algorthm. Fg. 1 PSO algorthm flowchart. 2.3 Item Dffculty Index There are usually two methods to estmate the tem dffculty ndces. Frst, the tem dffculty ndex of a test tem s represented by the percentage of correct answers. It s shown as (3. R P = 100% (3 N Where P s the tem dffculty ndex; N s the number of all the testees; R s the number of testees who answered correctly. There s another method to estmate the tem dffculty ndex. Frst, the testees are sorted by ther scores. Then, the groups of hghest scores and lowest scores are desgnated as the hgher score group and the lower score group. To compute the percentage of correct answers for these two groups. Fnally, the average of ther percentage s taken as the tem dffculty ndex. It s shown as (4. P H + P P = L (4 2 Where P H s the percentage of correct answers n the hgher score group and P L s the percentage of correct answers n the lower score group. Typcally, these two extreme groups are token 25%, 27%, or 33%. (Haladyna, 1999; Suen, 1990. - 2 -

3. Methods 3.1 Item Dffculty Index Estmaton In ths study, an estmaton method of tem dffculty ndces based on the answers abnormal rate s proposed to refer to the tem response theory (IRT model. The testee s abltes are consdered nto the estmaton process of tem dffculty ndces. The answers abnormal rate of one test tem for the testees wth the abltes greater than the tem dffculty ndex s represented by the wrong answer rate. It s shown as (5. hw haar = (5 hn Where haar s the answers abnormal rate of hgher ablty group for one test tem f ts dffculty s the -th level; hw s the number of wrong answers of hgher ablty group for one test tem f ts dffculty s the -th level; hn s the number of all the testees of hgher ablty group for one test tem f ts dffculty s the -th level. The answers abnormal rate of one test tem for the testees wth the abltes smaller than the tem dffculty ndex s represented by the correct answer rate. It s shown as (6. lr laar = (6 ln Where laar s the answers abnormal rate of lower ablty group for one test tem f ts dffculty s the -th level; lr s the number of correct answers of lower ablty group for one test tem f ts dffculty s the -th level; ln s the number of all the testees of lower ablty group for one test tem f ts dffculty s the -th level. The answers abnormal rate of one test tem for the testees wth the abltes equal to the tem dffculty ndex s represented by the absolute value of the dfference between the correct answer rate and 0.5. It s shown as (7. er eaar = 0.5 (7 en Where eaar s the answers abnormal rate of equal ablty group for one test tem f ts dffculty s the -th level; er s the number of correct answers of equal ablty group for one test tem f ts dffculty s the -th level; en s the number of all the testees of equal ablty group for one test tem f ts dffculty s the -th level. To add the three parts (5, (6, and (7 together s the answers abnormal rate of one test tem. It s shown as (8. AAR = haar + eaar + laar (8 Where AAR s the answers abnormal rate for one test tem f ts dffculty s the -th level. To take the level of mnmum answers abnormal rate as the tem dffculty ndex of one test tem. It s shown as (9. D = arg mn( AAR (9 Where D s the tem dffculty ndex of one test tem. 3.2 Ftness Functon of PSO The ftness functon of the PSO algorthm determnes the strengths and weaknesses of a partcle based on ts poston. In ths study, the three evaluaton crtera are the test tem dffculty, test tem relevance degree and knowledge block, and selected number of test tems. These tems are used as evaluaton crtera to locate the most sutable test tem for the learner s ablty usng the relevant parameters. The man purpose of (10 s to evaluate the gap between the testee s current ablty and the dffculty of selected tem. A smaller gap gves a smaller value, when 0 DL k 1. The optmal stuaton s that there s no gap between the selected tem and the testee s ablty, and ts value s 0. m d k D r = 1 k DLk = (10 qk Where D s the testee s current knowledge block ablty, where 0<D <1; d k s the current knowledge block dffculty of current selected tem, where 0<d k <1; q k s the number of relevant knowledge blocks for the current selected tem; m s the number of relevant knowledge blocks; r k s the relevance of the k-th test tem and the -th relevant knowledge block, whch ts value s 1 f t has relevance and 0 otherwse. Equaton (11 locates the relevance degree between the test tem and weght value of knowledge block set by the testee, where 0 RD k 1. A smaller value ndcates greater relevance. m ( w U T r = 1 k RDk = 1 (11 qk Where U s the number of tems currently selected from knowledge block and T s the total number of test tems expected to be gven n the test; w s a dfferent weght value that can be set for each relevant knowledge block, where 0 w 1. Equaton (12 s the exposure control factor. It balances the number of selected tmes for all test tems and adopts an tem exposure rate control as ts obectve. They thus take the most frequently selected tem n the tem bank and the current selected tem as the maor evaluaton terms. To mantan the man purpose of adaptve testng, t selects the tems that meet the learner s ablty and relevant knowledge. To mprove the tem selecton accuracy, these functons use RD k as a constrant condton. The smaller values ndcate that are has been prevously selected fewer tmes, where 0 ECF k 1. nk ECFk = (1 RDk (12 Max( n1,..., nk,..., nn Where n k s the number of tmes that the currently selected test tem has been selected, where 0 n k ; Max(n 1,...,n k,...,n N s the number of tmes that the most frequently selected tem n the tem bank has been selected, where 0 Max(n 1,...,n k,...,n N ; N s the number of test tems n the test tem bank. Addng (10, (11, and (12 allows the ftness functon (13 n the PSO algorthm. Mnmum Z ( X k = DLk + RDk + ECFk (13 Where Z s the ftness value and X k ={DL k, RD k, ECF k } s the partcle s poston vector. In ths algorthm, a smaller ftness value denotes greater ftness between tem dffculty, relevant knowledge, and testee s current ablty; n other words, t ndcates that the test tem s more sutable for the testee. - 3 -

3.3 Velocty Functon of PSO After defnng the ftness functon, we come to another mportant functon n the PSO algorthm: the velocty functon. Velocty affects the drecton and dstance of a partcle s movement n the search space. It s shown as (14. Vt+ 1 = W Vt + C1 R1 ( X p X t (14 + C2 R2 ( X g X t Where V t ndcates the velocty of a partcle n the t-th teraton; V t+1 ndcates the velocty of a partcle n the (t+1-th teraton; C 1 and C 2 are learnng factors that nfluence the ndvdual and global optmal solutons for each teraton; R 1, R 2, and W all prevent fallng nto the local optmal soluton n the search process; X p s the partcle s ndvdual optmal soluton; X g s the global optmal soluton; X t s the poston of a partcle n the t-th teraton. After each partcle obtans ts velocty accordng to (14, (15 s been used to update the partcle s new poston. X t+ 1 = X t + Vt +1 (15 Where X t+1 updates parameters of the selected test tem and determnes the partcle s (t+1-th tem poston. 3.4 Dynamc Item Selecton Strategy The CAT prmarly seeks to allow a system to provde the test tems n lne wth the testee s ablty when the learner takes an onlne computerzed test and to determne the dffculty of the next tem accordng to each answer. The system uses ths mechansm to acheve computerzed adaptve learnng obectves and accurately estmate the user s ablty. Ths study realzes the CAT wth the knowledge structure concept to nterpret the relatonshp between test tem knowledge and the testee s ablty. It adopts the PSO algorthm as the core to fnd the most sutable test tem for the testee s current ablty from a bg test tem bank. Fg. 2 shows the flowchart for PSO adaptve testng. 4. Experments and Results 4.1 Searchng Speed Ths experment ams to observe the search tme for the computerzed adaptve dynamc tem selecton model proposed n ths study over dfferent szes of test tem banks wth dfferent parameter settngs, and to compare them wth the search tme of a sequental search. The mplementaton method for ths experment s to observe the results of gvng dfferent numbers of partcles and teratons to the computerzed adaptve dynamc tem selecton model. There are 10 and 20 partcles; 5, 10, 15 and 20 teratons; and 10 tem selecton processes are performed. Fg. 3 and Fg. 4 compare the PSO and sequental search tmes. These two comparson fgures show that, when the number of test tems s below 1000, the average PSO and sequental search tmes are not sgnfcantly dfferent; when the number of test tems s over than 1000 and approaches 5000, the PSO search speed s sgnfcantly faster than sequental search. It proves that PSO search s effectve for tem selecton n a bg test tem bank. Fg. 3 Search tme comparson of sequental search and PSO search (10 partcles wth dfferent teratons. Fg. 4 Search tme comparson of sequental search and PSO search (20 partcles wth dfferent teratons. Fg. 2 PSO adaptve testng flowchart. 4.2 Searchng Accuracy Ths experment uses three search methods, PSO, random, and sequental searches, to perform tem selecton over tem banks wth dfferent szes and further compared the ftness values of the searched tems. In ths experment, three dfferent search methods are adopted to conduct 10 tem selecton actons over 7 dfferent szes of test tem banks. The parameters for ths experment are w 1 =0.5, w 2 =0.3, w 3 =0.2; the testee s abltes n - 4 -

knowledge blocks are all set to 0.5; the optmal ftness value s 0 and the worst ftness value s 2. Fg. 5 shows that all ftness values selected by PSO searches are close to the optmal soluton, except for the 5-partcles and 5- teratons condton. Although a sequental search ensures that the optmal soluton can be always found, the search speed experments suggest that a sequental search ncreases the tme cost and that the PSO search speed s sgnfcantly better than the sequental search speed, even n a bg test tem bank. proposed n ths study. Combnng the PSO dynamc tem selecton strategy n that system (Huang, Ln, & Cheng, 2009, a complete and robust CAT system s constructed. Fg. 7 Dstrbuton of Item Dffculty Index. Fg. 5 Comparson of search accuracy rate. p=partcles. Fg. 6 shows that, along wth the ncreased number of partcles, less teraton s requred to locate test tems wth the optmal ftness values. Explorng the reasons reveals that the partcle dstrbuton range expands when the number of partcles ncreases, so the probablty of an optmal soluton located around the partcles s greater and the opportunty to locate an optmal soluton also ncreases. Durng the ntal setup, the number of teratons can be correspondngly reduced;.e., an optmal soluton can be found wthout too much teraton. More can be found from the experment: the search stablty s more stable when applyng a search scheme wth 10 partcles and teratons. Ths experment n ths study s onlne tests. The tem dffculty ndces are 9 levels ranged from 0.1 to 0.9. The ntal values of tem dffculty ndces are all set to 0.5. The partcpants are the students, who elect the course named Technology Englsh, n the departments related wth computer and nformaton n a unversty n southern Tawan. The students abltes are also dvded nto 9 levels ranged from 0.1 to 0.9. The ntal values of student s abltes are all set to 0.2. The expermental perod s 6 weeks, and the way for the tests s that the students exercse n the after-school tme freely. Then, the tem dffculty ndces wll automatcally be estmated every week accordng to the results of tests. Fg. 7 shows the dstrbuton of the tem dffculty ndces n the test tem bank after ths experment. Although there are not enough test data, t needs more data for corroboraton, t can be seen that f the selected tmes of a test tem reaches a certan number of canddates, the results of estmaton wll be stable. Fg. 8 shows the number of adusted test tems n each tme of estmaton durng ths experment. It can be seen that the number of adusted test tems s decreasng quckly. Fg. 9 shows the average adusted levels of tem dffculty ndces n each tme of estmaton. Ther values fall between 0.1 and 0.2. It represents that the average gap of tem dffculty ndces n each tme of estmaton s not too large. Fg. 6 Changes of ftness values of dfferent number of partcles n teratons. 4.3 Item Dffculty Index Estmaton The CAT system used n ths study s developed n an onlne Englsh learnng system. That system provdes the learnng materals for technology Englsh. And then, the CAT s used assessng the learners outcomes (Cheng, Ln, & Huang, 2009. The tem dffculty ndces for the tem bank n the test system s estmated by the method based on the answers abnormal rate Fg. 8 Number of Adusted Items for Each Week.. - 5 -

Fg. 9 Average Adusted Dffculty Levels for Each Week. 5. Conclusons and Future Works In the method of tem dffculty ndex estmaton based on the answers abnormal rate proposed n ths study, the testee s abltes are consdered nto the process of estmaton. The tem dffculty ndces and the testees abltes can be estmated mutually at the same tme. It can accelerate the process of tem dffculty ndex adustment estmaton to be stable. Every test tem s deemed to be ndependent, so ts tem dffculty ndex can be estmated ndependently. The tem bank can be expanded easly at any tme. New test tems and old exstng ones work together n the system. Ther tem dffculty ndces can be estmated quckly and reasonably. The dynamc tem selecton system adopts the PSO algorthm as a core and ntegrates wth the knowledge structure concept. The quck search advantage of the PSO algorthm and the characterstcs of knowledge structures allow the most sutable test tems for a testee s ablty to be selected quckly, even n a bg test tem bank. In ths study, we only dscuss wth the estmaton of tem dffculty ndces. However, wth respect to the mathematcal model of IRT, t wll be more sutable to dscuss wth the tem dscrmnaton ndces and the tem guess ndces for the type of choce tems. The descrpton of tem parameters can be more complete by them. Therefore, the tem dscrmnaton ndces and the tem guess ndces wll be dscussed and researched based on the answers abnormal rate n the future. The partcpants of ths study are unversty students n the departments related wth computer and nformaton. Thereby, the results of experments are case dependent. In the future, the partcpants of experments can nvolve dfferent feld of departments to obtan more general results. [Badaracco 2013] Badaracco, M. & Martínez, L., A fuzzy lngustc algorthm for adaptve test n Intellgent Tutorng System based on competences, Expert Systems wth Applcatons, 40(8, pp. 3073-3086, 2013. [Cheng 2009] Cheng, S.-C., Ln, Y.-T., & Huang, Y.-M., Dynamc queston generaton system for web-based testng usng partcle swarm optmzaton, Expert Systems wth Applcatons, 36(1, pp. 616-624, 2009. [El-Alfy 2008] El-Alfy, E.-S. M. & Abdel-Aal, R. E., Constructon and analyss of educatonal tests usng abductve machne learnng, Computers & Educaton, 51(1, pp. 1-16, 2008. [Haladyna 1999] Haladyna, T. M., Developng and valdatng multple-choce exam tems (2 ed., Mahwah, NJ: Lawrence Erlbaum Assocates, 1999. [Huang 2009] Huang, T. C., Cheng, S. C., & Huang, Y. M., A Blog Artcle Recommendaton Generatng Mechansm Usng an SBACPSO Algorthm, Expert Systems wth Applcatons, 36( 7, pp. 10388-10396, 2009. [Huang 2008] Huang, T. C., Huang, Y. M., Cheng, S. C., Automatc and Interactve e-learnng Auxlary Materal Generaton Utlzng Partcle Swarm Optmzaton, Expert Systems wth Applcatons, 35(4, pp. 2113-2122, 2008. [Huang 2009] Huang, Y.-M., Ln, Y.-T., & Cheng, S.-C., An adaptve testng system for supportng versatle educatonal assessment, Computers & Educaton, 52(1, pp. 53-67, 2009. [Kennedy 1995] Kennedy, J. & Eberhart, R. C., Partcle Swarm Optmzaton, Proceedngs of the IEEE Internatonal Conference on Neural Networks, 4, pp. 1942-1948, 1995. [Ln 2010] Ln, Y. T., Huang, Y. M., & Cheng, S. C., An Automatc Group Composton System for Composng Collaboratve Learnng Groups Usng Enhanced Partcle Swarm Optmzaton, Computers & Educaton, 55(4, pp. 1483-1493, 2010. [Mus 2011] Mus, L., Daolo, F., & Cagnon, S., Evaluaton of parallel partcle swarm optmzaton algorthms wthn the CUDA archtecture, Informaton Scences, 181(5, pp. 4642-4657, 2011. [Suen 1990] Suen, H. K., Prncples of exam theores, Hllsdale, NJ: Lawrence Erlbaum Assocates, 1990. Acknowledgements Ths research was partally supported by the Natonal Scence Councl, Tawan, ROC, under Contract No.: NSC 100-2511-S- 218-008-MY3 and NSC 101-2511-S-218-004-MY2. References [Anatchkova 2009] Anatchkova, M. D., Sars-Baglama, R. N., Kosnsk, M., & Borner, J. B., Development and Prelmnary Testng of a Computerzed Adaptve Assessment of Chronc Pan, The Journal of Pan, 10(9, pp. 932-943, 2009. - 6 -