Conditional Item-Exposure Control in Adaptive Testing Using Item-Ineligibility Probabilities

Size: px
Start display at page:

Download "Conditional Item-Exposure Control in Adaptive Testing Using Item-Ineligibility Probabilities"

Transcription

1 Journal of Educational and Behavioral Statistics December 2007, Vol. 32, No. 4, pp DOI: / Ó 2007 AERA and ASA. Conditional Item-Exposure Control in Adaptive Testing Using Item-Ineligibility Probabilities Wim J. van der Linden Bernard P. Veldkamp University of Twente Two conditional versions of the exposure-control method with item-ineligibility constraints for adaptive testing in van der Linden and Veldkamp (2004) are presented. The first version is for unconstrained item selection, the second for item selection with content constraints imposed by the shadow-test approach. In both versions, the exposure rates of the items are controlled using probabilities of item ineligibility given y that adapt the exposure rates automatically to a goal value for the items in the pool. In an extensive empirical study with an adaptive version of the Law School Admission Test, the authors show how the method can be used to drive conditional exposure rates below goal values as low as Obviously, the price to be paid for minimal exposure rates is a decrease in the accuracy of the ability estimates. This trend is illustrated with empirical data. Keywords: adaptive testing; conditional item-exposure control; item eligibility method; uniform exposure rates Items in adaptive tests are selected as a solution to an optimization problem in which an objective function is maximized over the item pool. If the test has to meet a set of content specifications, the optimization becomes an instance of a more complicated constrained combinatorial optimization problem. A popular choice for the objective function in these problems is the value of the information function at the current ability estimate for the items in the pool. Suppose the items have been calibrated using the three-parameter logistic (3-PL) model exp½a i ðy b i ÞŠ p i ðyþ PrfU ij ¼ 1g c i þð1 c i Þ 1 þ exp½a i ðy b i ÞŠ ; ð1þ where y ð ; Þ is a parameter representing the ability of the test taker and b i ð ; Þ, a i > 0, and c i ½0; 1Š are the difficulty, discriminating power, 398 The authors are grateful to Wim M. M. Tielen for his computational assistance.

2 Conditional Item-Exposure Control FIGURE 1. Dominance curves for levels 1, 2, 3, 4, 5, 10, 20,..., 100 for an item pool from the Law School Admission Test. Note: Each curve is composed of the segments of the information functions of the items that dominate the other items at the y values. and guessing parameter for item i ¼ 1;...; N in the pool, respectively (Birnbaum, 1968). For this model, the item information function is I i ðyþ ¼ ½p 0 i ðyþš2 ½p i ðyþš½1 p i ðyþš : ð2þ Let g ¼ 1;...; n denote the items in the adaptive test and b y ðg 1Þ the estimate of y after the first g 1 items. If the gth item is selected, the objective function that is maximized over the items in the pool is the information in Equation 2 at y ¼ b y ðg 1Þ. Because we optimize, the items in the test tend to be picked only from a small subset of the pool. This point is illustrated in the topmost curve in Figure 1, which is composed of the segments of the information functions in Equation 2 that are locally best among the items in a pool for a section from the Law School Admission Test (LSAT). We refer to this curve as the Level 1 dominance curve for the item pool. The small subset of items on which this curve is based (in this case, only 6 items from a pool of 305) dominates the other items in the pool everywhere over the interval; consequently, they are always chosen. The basic message from this curve is that unless special precaution is taken, the majority of the items in the pool are bound to remain inactive. If the adaptive test is administered in a continuous testing program, a set of dominant items is easily memorized. If the stakes for the test takers are high, it thus is possible for a few of them to plot and identify a critical portion of the 399

3 van der Linden and Veldkamp item pool. Subsequent test takers are then able to familiarize with these items and increase their test scores. Fortunately, selecting items below the Level 1 dominance set does not necessarily involve a large loss. The second curve in Figure 1 shows the Level 2 dominance curve for the same item pool, that is, the curve consisting of the segments of the informations for the locally second-best items in the pool. Because the curves for the two levels hardly differ in height (except at the upper end of the scale where the pool had a few strongly discriminating items), not much accuracy of scoring would be lost if we relaxed the criterion for item selection somewhat and selected items from both subsets. This fact has led to the idea of probabilistic control of item exposure in adaptive testing. The first probabilistic method was proposed by McBride and Martin (1983). Their method simply consisted of picking the items randomly from the first five levels of dominance along the y scale (i.e., the first five curves in Figure 1). The fact that the method is probabilistic is important because it treats all test takers at a given ability level equally fair in the sense that each of them has the same probability of getting each item. A more advanced probabilistic method was proposed by Sympson and Hetter (1985; see also Hetter & Sympson, 1997). This method, which will be discussed in more detail in the next section, is based on a probabilistic experiment that is conducted each time an item is selected. The outcome of this experiment is either the decision to go on and administer the item or to pass and rerun the experiment for the next best item in the pool. The conditional probabilities of administration given the selection of an item are the control parameters used to restrict the item-exposure rates. The values of these parameters are to be set through an iterative process of simulated adaptive test administrations. An alternative method of probabilistic exposure control was presented in van der Linden and Veldkamp (2004). The probability experiment in this method differs from that in the Sympson-Hetter method in the following three aspects. First, the experiment is not conducted each time after an item is selected but only once before a test taker begins the test. Second, the critical parameters in the experiment are not the conditional probabilities of administering an item given it has been selected but the probabilities of the items being eligible for the test taker. If an item is eligible, it is available for administration to the test taker. If the item is ineligible, it is removed from the pool for the test taker. Third, the probabilities of ineligibility are used in a recurrence relation that allows them to adapt automatically to appropriate levels during testing. The differences between the two methods will become precise when we discuss them in more detail in the following. The present article serves three different goals. Our first goal is to generalize the item-ineligibility method, which was developed for use with constrained item selection through the shadow-test method, to any type of item selection in adaptive testing. The generalization makes the basic structure of the method transparent and allows us to highlight some of its features. Our second goal is to 400

4 formulate a version of the method for conditional item-exposure control given y. Conditional control is generally accepted as more effective because it reduces the likelihood that test takers of approximately the same ability level detect the subset of items in the pool specific to them (Stocking & Lewis, 1998). As will become clear in the following, to formulate a conditional version of the method, we have to reconceptualize the adaptive test as one conducted from multiple item pools. Our final goal is to explore the behavior of the new versions of these methods when the exposure rates of the items are driven to their minimum. This appears to be possible provided we are willing to pay the price of a decrease in the accuracy of the ability estimates. The Level 10 through Level 100 curves in Figure 1 explain the decrease in accuracy: If the exposure rates are set lower and lower, we are required to select items from the lower dominance levels in the pool and eventually have to accept considerable loss of accuracy in testing. In addition, if content constraints are imposed on the test, lowering the exposure rates may eventually lead to overconstraining of the item selection, namely, the case where no feasible test is left in the pool. Thus, if the conditional item-exposure rates are chosen to be too low, the test becomes ineffective at some of the ability levels. Sympson-Hetter Method To highlight the differences with the ineligibility method in the next sections, we briefly discuss the Sympson-Hetter (hereafter SH) method, which is based on the following two events for each item in the pool: S i : item i is selected; A i : item i is administered. Conditional Item-Exposure Control Because an item can never be administered without being selected, it always holds that A i S i ; ð3þ for all i. Hence, for the conditional exposure rate of item i given y (i.e., probability PðA i yþ), it follows that PðA i yþ ¼PðA i ; S i yþ ¼PðA i S i ; yþpðs i yþ ð4þ for all possible values of y. The SH method is used to force the item-exposure rates of all items in the pool below an upper bound r max. Typically, r max is chosen to be in the range of.20 to.30. From Equations 3 and 4 it follows that the bound is realized when PðA i S i ; yþpðs i yþ r max ð5þ for i ¼ 1;...; N. 401

5 van der Linden and Veldkamp The probability of item selection, PðS i yþ, depends on a variety of factors, including the distribution of the item parameters in the pool, the objective function that is optimized, and the initial item that is chosen. Because each of these factors is fixed by design, the only parameters left in Equation 5 to manipulate the exposure rates are the conditional probabilities PðA i S i ; yþ. It is always possible to meet the bound in Equation 5 by setting the probabilities PðA i S i ; yþ at low values for all items. However, implicit in Equation 4 is the idea that the exposure rates for the best items in the pool should not be much lower than r max because we do not want to lose them entirely. In other words, r max should be viewed as a goal value that has to be approached from below rather than just an upper bound. Values for PðA i S i ; yþ that approach the goal value cannot be found by an analytic method. Sympson and Hetter therefore proposed to find them using an iterative process of computer simulations that has to be continued until admissible exposure rates are found. At each step in this process, a large set of simulated test administrations is replicated at the y values for which the exposure rates have to be controlled. At the end of the step, the new conditional exposure rates of the items given these values are estimated and the control parameters are adjusted. The fact that the control parameters are set at the true y values, whereas in operational testing we control the exposure rates at estimated y values, is not much of a problem provided the set of y values is well chosen (Stocking & Lewis, 2000). Let t ¼ 1; 2;... denote the sets of simulations. The adjustment rule used in the SH method is P ðtþ1þ ða i S i ; yþ :¼ 1 if P ðtþ ðs i yþ r max ; r max =P ðtþ ðs i yþ if P ðtþ ðs i yþ > r max ; ð6þ where i ¼ 1;...; N. The rule is based on the idea that if at step t an item was selected with a probability smaller than r max, no control is needed. However, if an item was selected with a conditional probability larger than r max, its control parameter should have been set such that PðA i yþ ¼r max. From Equation 4 it follows that this requirement would have been realized if P ðtþ ða i S i ; yþ had been equal to r max =P ðtþ ðs i yþ. Hence, the new value of P ðtþ1þ ða i S i ; y) is set at this level. For a more formal treatment of the SH method and some alternative methods based on variations of this adjustment rule, the reader should consult van der Linden (2003). In practical settings, the use of the SH method has been found to be time consuming. The number of y values at which the exposure rates are controlled is usually in the range of 10 to 12. The number of iterations of the adaptive test simulations for one y value is generally of the same order. It is therefore not uncommon to have to run 100 to 200 simulations before an admissible set of exposure rates is found. In addition, if some of the items have to be replaced 402

6 during operational testing because they are compromised or appear to be flawed, the values of the control parameters become invalid and the procedure has to be repeated (Chang & Harris, 2002). A more fundamental problem with the SH method is the fact that the itemexposure rates do not necessarily converge to values below r max during the adjustment process. It can be regularly observed that the rates of some of the overexposed items increase rather than decrease after adjustment. Also, rates that were below r max for some steps may suddenly jump back to a value larger than this target. Because of this behavior, it is necessary to inspect the exposure rates of all items after each step and use personal judgment to decide when to stop. Conditional Item-Ineligibility Methods We first discuss the case of adaptive testing without content constraints. The modifications of the method necessary to deal with adaptive testing with content constraints are introduced in a later section. Testing Without Content Constraints To formulate the item-ineligibility method we consider the following two events: E i : item i is eligible; A i : item i is administered. Conditional Item-Exposure Control If the item is eligible, it remains in the pool during the entire test for the test taker; otherwise it is removed. Unlike the SH method, it is not necessary to allow for an event S i of selecting item i: An item is always administered if it is selected. More formally, it holds that A i ¼ S i, and we need not consider the latter. Analogous to Equation 3, A i E i ð7þ for all i. We are therefore able to write PðA i yþ ¼PðA i ; E i yþ ¼PðA i E i ; yþpðe i yþ ð8þ for all possible values of y. If we impose r max as a goal value for the exposure rates PðA i yþ, we obtain PðA i yþ ¼PðA i E i ; yþpðe i yþ r max ; ð9þ 403

7 van der Linden and Veldkamp or PðE i yþ r max PðA i E i ; yþ ; ð10þ with PðA i E i ; yþ > 0: From Equation 7, PðA i E i ; yþ ¼ PðA i yþ PðE i yþ ; ð11þ with PðE i yþ > 0. Hence, Equation 10 can be rewritten as PðE i yþ rmax PðA i yþ PðE i yþ; ð12þ still with PðA i yþ > 0. The basic idea is to conceive of Equation 12 as a recurrence relation. Suppose j test takers have already taken the test, and we want to establish the probabilities of eligibility for test taker j þ 1. If r max is our goal value for the exposure rates at selected points y k, k ¼ 1;...; K, the probabilities of eligibility P ðjþ1þ ðe i y k ) can be calculated as P ðjþ1þ r max ðe i y k Þ¼min P ðjþ ða i y k Þ PðjÞ ðe i y k Þ; 1 ; ð13þ with P ðjþ ða i y k Þ > 0. The rationale for the recurrence relation in Equation 13 is that it automatically maintains r max as a goal value for the exposure rates. It is easy to show that if P ðjþ ða i y k Þ > r max ; then P ðjþ1þ ðe i y k Þ < P ðjþ ðe i y k Þ if P ðjþ ða i y k Þ¼r max ; then P ðjþ1þ ðe i y k Þ¼P ðjþ ðe i y k Þ if P ðjþ ða i y k Þ < r max ; then P ðjþ1þ ðe i y k Þ > P ðjþ ðe i y k Þ: ð14þ Thus, if an exposure rate is larger than the goal value, the probability of eligibility of the item always goes down. As a result, because of Equation 7, the expected exposure rate also goes down. On the other hand, if an exposure rate is below r max, the probability of eligibility of the item goes up. Practical Implementation To implement the method for exposure control conditional on a set of values y k, k ¼ 1;...; K, we have to abandon the idea of an adaptive test from a single item pool. Before a person takes the test, the current probabilities of eligibility are used to decide which items are eligible for each of the values y k. The result 404

8 Conditional Item-Exposure Control of these experiments is K different versions of the item pool, one at each y k. During the test, the person visits the version of the item pool that is closest to his or her current ability estimate. To estimate the probabilities of eligibility, we have to record the following two counts: a ij k : number of test takers through j who visited item pool k and took item i; e ijk : number of test takers through j who visited item pool k when item i was eligible. P ðjþ ða i y k Þ and P ðjþ ðe i y k ) can then be estimated as a ijk =j and e ijk =j, and the estimated probability of eligibility for test taker j þ 1 in Equation 13 is obtained as bp ðjþ1þ ðe i y k Þ¼min rmax e ijk ; 1 ; ð15þ a ijk with a ijk < 0. The estimates in Equation 15 ignore the differences between the test taker s true and estimated ability. Just as for the SH method, we expect the impact of estimation error on the actual exposure rates to be negligible for all practical purposes (Stocking & Lewis, 2000). The simulation results presented later in this article confirm this expectation. Two different implementations of the method are possible: 1. The method can be used to control the exposure rates on the fly, that is, without any prior adjustment of the eligibility probabilities. 2. The probabilities can be adjusted prior to operational testing through a computer simulation of administrations of the test. In the previous study with the unconditional version of the method (van der Linden & Veldkamp, 2004), we were able to report that for a typical goal value of r max ¼ :25, both the probabilities of eligibility and the exposure rates were already stable after 1,000 test administrations. In the empirical study reported later in this article, we minimized the goal values and found that for values close to their minimum (see Equation 25 in the following), the method should be used in combination with the technique of fading to get stability after the same number of administrations. (The technique of fading is explained in a following section.) Whatever implementation is used, it is always possible to deal with the replacement of a few items in the operational pool, for instance, because they appear to be compromised, on the fly. Testing With Content Constraints Usually, content constraints are to be imposed on the adaptive test. If so, the shadow-test approach offers an effective implementation. Shadow tests are fullsize tests calculated prior to the selection of the items that (a) are optimal at the 405

9 van der Linden and Veldkamp last ability estimate, (b) meet all content constraints, and (c) include all items already administered to the current person. The next item to be administered is the best item among the free items in the current shadow test. Shadow tests can be easily assembled using the technique of 0-1 integer programming (van der Linden, 2000, 2005; van der Linden & Reese, 1998). A natural way to implement the control in adaptive testing with content constraints is through the inclusion of ineligibility constraints in the models for the shadow tests. If the decision is that item i is ineligible for the test taker, the following constraint is added to the model x i ¼ 0; ð16þ where x i is the 0-1 decision variable for item i in the model (that is, if x i ¼ 1, item i is selected but it is not if x i ¼ 0Þ. If item i remains eligible, no constraint is added. The extension of the test-assembly model for the shadow test with these ineligibility constraints gives rise to two new issues. The first is potential overconstraining of the item selection. In principle, it is possible that a temporary combination of content and ineligibility constraints in the model is unfortunate and no feasible solution is left. Generally however, for a typical testing program, the number of ineligibility constraints in the model is small, and the likelihood of an infeasible solution can be ignored. In fact, in our empirical studies with the unconditional version of this method, infeasibility never occurred (van der Linden & Veldkamp, 2004). We expect the same to happen with the conditional version of the method except when the exposure rates are driven to their minimum (see the next section). Also, the likelihood of an infeasibility depends entirely on the appropriateness of the item pool for the test. A method of item-pool assembly that guarantees a balanced distribution of the items in the pool with respect to the content constraints is presented in van der Linden, Ariel, and Veldkamp (2006). For methods of item-pool design that guarantee such distributions, see van der Linden (2005). If infeasibility occurs, a straightforward solution is to remove all ineligibility constraints from the model and use the full pool for item selection. This measure may lead to an occasional extra exposure for some of the items, but the adaptive mechanism in Equation 14 automatically corrects for them. The second issue has to do with the fact that the use of shadow tests reinvokes the distinction between item selection and administration on which the SH method rests. An item can now be selected for the shadow test but may not be administered because it was dominated by the other free items in the test. To deal with these new issues, we distinguish the following possible events: 406 E i : item i is eligible; F: the model for the shadow test with the ineligibility constraints is feasible; S i : item i is selected in a shadow test; A i : item i is administered.

10 Conditional Item-Exposure Control It holds for these four events that A i S i fe i Fg; ð17þ where F is the event of an infeasible shadow test. Following the same argument as in van der Linden and Veldkamp (2004, Equations 8 through 13), an upper bound r max on the conditional exposure rates PðA i yþ can be shown to lead to PðE i yþ 1 1 PðF yþ þ rmax PðE i F yþ PðA i yþpðf yþ ; ð18þ with PðA i yþ > 0 and PðF yþ > 0. This inequality implies the following version of Equation 13 for the case of adaptive testing with content constraints: P ðjþ1þ 1 ðe i y k Þ¼min 1 P ðjþ ðf y k Þ þ rmax P ðjþ ðe i F y k Þ P ðjþ ða i y k ÞP ðjþ ðf y k Þ ; 1 ; ð19þ still with P ðjþ ða i E i ; y k Þ > 0 and P ðjþ ðf y k Þ > 0. This equation looks more complicated than Equation 13, but the relation between the two becomes clear if we consider the case that the shadow test is always feasible and set P ðjþ ðf y k Þ¼1. It then holds that P ðjþ ðe i F y k Þ¼ P ðjþ ðe i y k Þ, and Equation 13 directly follows from Equation 19. In addition, Equations 13 and 19 have the same adaptive behavior. The only difference between the two relations is a correction of the probability of item eligibility for possible infeasibility in Equation 19. For these and other details, see van der Linden and Veldkamp (2004). To estimate the right-hand probabilities in Equation 19, we need the following counts: n jk : number of test takers through j who visited item pool k; a ijk : number of test takers through j who visited item pool k and took item i; j jk : number of test takers through j who visited item pool k when the test was feasible; r ijk : number of test takers through j who visited item pool k when item i was eligible or the test was infeasible. The left-hand probabilities are then estimated as ( ) bp ðjþ1þ ðe i y k Þ¼min 1 n jk j jk þ rmax n jk r ijk a ijk j jk ; 1 ; ð20þ with a ijk > 0; j jk > 0: 407

11 van der Linden and Veldkamp To begin the test, it is recommend to set bp ðjþ1þ ðe i y k Þ¼1 for item i until both a ijk > 0 and j jk > 0. This initialization helps us prevent indeterminate values because the conditions in Equation 20 are not yet satisfied. Fading In van der Linden and Veldkamp (2004), it is recommended to update the counts using the technique of fading, which is used in applications of Bayesian networks for updating posterior probabilities. The basic idea underlying this technique is to weigh the old data by a fading factor when new data are added. As a result, the effective size of the sample remains fixed at a predetermined level (Jensen, 2001), and the probabilities of eligibility have the same level of stability throughout operational testing. For a demonstration of the effectiveness of fading for updating eligibility probabilities, see van der Linden and Veldkamp (2004). Suppose we use a fading factor w. The number of test takers n jk who visited item pool k in Equation 20 is no longer a direct count but a number updated as n ðjþ1þk ¼ wn jk þ 1: ð21þ The updates of the other counts are analogous. For example, the number of test takers who visited pool k and received item i is updated as a ðjþ1þik ¼ wa ijk þ 1; if item i was administered to test taker j; wa ijk ; otherwise. ð22þ These updates produce estimates of the probabilities based on a moving window with weights close to one for recent test takers but approaching zero for earlier test takers. The method can be shown to have estimates based on an effective sample size equal to 1=ð1 wþ (Jensen, 2001). In the following empirical study, we used w ¼ :999, which amounts to a sample size of 1,000. The use of fading is particularly important when the goal value for the exposure rates is set close to its minimum (see the next section). Typically, the probabilities of eligibility for the item pool go down to a value smaller than 1 in an order determined by the level of dominance of the items (see Figure 1). When the goal value approaches its minimum, the process has to be continued until the last items in the pool are reached. But when these items become active, the numbers of test takers that have visited the item pools, n jk, have already grown large. As a result, Equation 20 adapts only slowly to the changes in the counts of the item administrations, a ijk, which had been close to zero thus far. The technique of fading prevents this slower adaptation because the weighted counts in Equation 20 are based on the same effective number of test takers for later items as for earlier ones. 408

12 Conditional Item-Exposure Control Minimizing Exposure Rates It is tempting to explore how low r max can be set before the method breaks down. Unfortunately, although we are able to discuss a useful lower bound on the marginal exposure rates, exact bounds for conditional rates are hard to find. Marginal Rates For the marginal exposure rates, the following relation holds for any population of test takers X N i¼1 PðA i Þ¼n; ð23þ where n is the length of the adaptive test. The relation was presented without formal proof in van der Linden (2003). However, a straightforward proof runs as follows: Let x ij be an indicator variable equal to 1 if examinee j takes item i and equal to 0 otherwise. Then, for a population of J test takers, X J j¼1 x ij =J is the empirical marginal exposure rate of item i and X N i¼1 x ij ¼ n is the common length of the test. Hence, the sum of the exposure rates can be written as X N i¼1 PðA i Þ¼ XN i¼1 ¼ XJ j¼1 ¼ n; e e XJ j¼1 XN i¼1 x ij =J! x ij!=j ð24þ where e denotes the expectation over replicated test administrations. The transition form the second to the third line in Equation 24 is valid because the test length is supposed to be the same for all j. 409

13 van der Linden and Veldkamp From a security point of view, it would be ideal if the exposure rates of all items were distributed uniformly with a common low value. From Equation 23, it follows that this type of distribution is only possible for PðA i Þ¼nN 1, for all i: ð25þ To illustrate the capability of the current method to realize a uniform distribution of exposure rates, we simulated adaptive administrations of a test of 25 items from a pool of 305 items. The pool and the test are described in the section on the empirical study that follows. For this combination of test length and pool size, uniform exposure rates are only possible with PðA i Þ¼25=305 ¼ :0814 for all items. We simulated 10,000 administrations of the test to random test takers at y ¼ 2:0; 1:5;...2:0 for each of the goal values r max ¼ :25;:20;:15;:10, and (To make the results comparable, the range of y values was chosen to be identical to that in the main study that follows.) The resulting exposure rates for the items are shown in Figure 2. The lower the goal value, the lower the maximum exposure rate and the larger the portion of the item pool that became active in the test. The curve for r max ¼ :0814 shows a uniform distribution except for minor disturbances at its extremes due to the probabilistic nature of the method. It is interesting to observe the difference between the curves for r max ¼ :10 and r max ¼ :0814. Although the difference between the two goal values seems negligible, for the former, some 50 items were still inactive, whereas all items became active for the latter. Conditional Rates For conditional exposure rates, the version of Equation 23 is more complicated. In this case, we have a different set of rates for each item pool at y k.in addition, the number of test takers who visit a pool is no longer fixed but random. Consequently, to derive an expression for the sum of the exposure rates, we have to weigh the sums of the rates for the individual pools by the probability of a visit to the pool. Because the probability of a visit depends on how far the test taker is in the test, we arrive at the following version of Equation 23: X K X n X N k¼1 g¼1 i¼1 P g ða i y k ÞP gk ¼ n; ð26þ where g ¼ 1;...; n still indexes the items in the adaptive test, P gk is the probability of a test taker visiting the item pool at y k for the selection of item g, and P g ða i y k Þ is the probability of administering item i during this visit. Because all probabilities in Equation 26 are dependent on each other, it is impossible to use this relation for deriving lower bounds on the conditional item-exposure rates. This conclusion holds a fortiori for adaptive testing with 410

14 Conditional Item-Exposure Control FIGURE 2. Estimated marginal exposure rates of the items in the pool for the goal values r max ¼.25,.20,.15,.10, and Note: For each curve, the items on the horizontal axis are ordered by their exposure rates. item content constraints. We therefore took an empirical approach and explored what happened to the conditional exposure rates when the goal values r max were systematically decreased in a series of simulated test administrations. Empirical Study An adaptive version of a section of the LSAT was simulated for 10,000 test takers at y k ¼ 2:0; 1:5;...; 2:0. The test consisted of 25 discrete items, whereas the pool had a size of 305 items. The following versions of the test were simulated: 1. Tests with and without content constraints. 2. Tests with conditional item-exposure control at y k ¼ 2:0; 1:5;...; 2:0 with the goal values r max ¼ 1 (no control),.20,.15,.10,.05, and.025. The content constraints were the usual constraints for the paper-and-pencil version of the LSAT section; they dealt with such attributes as item type and content, answer key distribution, word counts, and gender/minority orientation of the items. The constraints were imposed on the adaptive test using the shadow-test approach. The total number of constraints was equal to 30. The last two levels of exposure control were lower than the bound of.0814 calculated from Equation 25 for the marginal exposure control in the preceding section; they were therefore expected to be critical. The tests administrations were simulated with the maximum-information criterion for item selection in Equation 2. The initial ability estimate was set 411

15 van der Linden and Veldkamp FIGURE 3. Estimated conditional exposure rates of the items given y ¼ 1:5, 0.5, 0.5, and 2.0 for adaptive testing without the content constraints. Note: Each curve is for a different goal value r max ¼ :25,.20,.15,.10,.05, and.025. For each curve, the items on the horizontal axis are ordered by their exposure rates. The maximum values for the curves decrease with r max. equal to b y ð0þ ¼ 0 for all test takers. The interim estimates of y were calculated using the method of expected a posteriori (EAP) estimation with a noninformative prior. During the simulations, we recorded (a) the actual exposure rates and (b) the errors in the ability estimates at each y k. The exposure rates for adaptive testing without the content constraints are shown in Figure 3. The four panels in this figure are for y k ¼ 1:5; 0:5; 0:5, and 1.5. (The results at the other values of y k fitted the patterns in these panels exactly and are omitted for space.) Without exposure control, the maximum exposure rates tended to be close to 1.0. The largest rate was that of.99 observed 412

16 Conditional Item-Exposure Control FIGURE 4. Estimated conditional exposure rates of the items given y = 1.5, 0.5, 0.5, and 2.0 for adaptive testing with the content constraints. Note: Each curve is for a different goal value r max ¼ :25,.20,.15,.10,.05, and.025. For each curve, the items on the horizontal axis are ordered by their exposure rates. The maximum values for the curves decrease with r max. The only exception is the maximum value for r max ¼ :025, which has moved to the second position for y ¼ 1:5 and y ¼ 1:5. at y k ¼ 0:5: For the conditions with control, the maximum rate decreased systematically with r max. The lowest goal value of r max ¼ :025 had a maximum rate of.04 observed for a few items at each y k. The number of items that were still inactive for this goal value varied from 10 to 15 items at y k ¼ 1:5 and 1.5 to 60 to 70 items at y k ¼ 0:5 and 0:5. These results suggest that for the current item pool, r max ¼ :025 must have been close to the minimum value possible at the two outmost values of y k but that a lower value might have been possible at the y k values in the middle of the scale. 413

17 van der Linden and Veldkamp TABLE 1 Percentage of Feasible Shadow Tests During Adaptive Testing With Content Constraints for r max =.025 y k % Feasible The exposure rates for adaptive testing with the content constraint in Figure 4 show the same general pattern. The only exceptions are the rates for r max ¼ :025 at y k ¼ 1:5 and 1.5. The curves for this goal value jumped to the second position (that is, directly after the condition without control). The reason for this jump was that the goal value had become much too low to produce feasible shadow test at each of the values y k. Table 1 shows the percentages of feasible shadow tests for r max ¼ :025 during the simulations. Although infeasibility was hardly a problem at y k ¼ 0, for the more extreme values of y k the percentage of feasibility quickly decreased to 7.1 ðy k ¼ 2:0Þ and 16.4 ðy k ¼ 2:0Þ. Recallthatwhen infeasibility occurs, the items were replaced in the pool. For the higher values of r max, replacement is an occasional random event, which is automatically corrected for by the adaptive mechanism in Equation 14. However, for r max ¼ :025 we appeared to have dived below what was possible at the majority of the y k values, and the method was no longer able to cope with the number of replacements. It is interesting to note that for the combination of r max ¼ :025 and adaptive testing without the content constraints in Figure 3, only 2 out of the 9 10; 000 simulated administrations resulted in an infeasible shadow test. The reason for r max ¼ :025 being too low for adaptive testing with the content constraints was thus not that the number of eligible items in the pools was smaller than the length of the test but that none of the possible combination of eligible items met satisfied the set of content constraints for the test. We also recorded the errors in the ability estimates during the test. Figures 5 and 6 show the estimated bias and mean square error (MSE) functions calculated from these errors. As expected, the curves were generally ordered in the values of r max ; for smaller goal values, the curves were closer to the horizontal axis. It should be observed that for r max ¼ :025, the errors were smaller for adaptive testing with the content constraints than without them. This finding might surprise because the addition of content constraints to adaptive testing generally results in less space for optimizing the item selection and consequently larger errors. However, the finding is explained by the fact discussed earlier that r max ¼ :025 was too low to satisfy the content constraints at the majority of the y k values. Because the algorithm frequently had to return all the items to the pool, the exposure rates of the more dominant items went up and the accuracy of the ability estimation improved. 414

18 Conditional Item-Exposure Control Bias 0.5 r max = no control x θ MSE r max = no control FIGURE 5. Estimated conditional bias and MSE functions for adaptive testing without content constraints for r max ¼ :25,.20,.15,.10,.05, and.025. Note: The curves in both panels are ordered in r max. MSE ¼ mean square error. θ Practical Conclusion For a high-stakes testing program, we expect item security to be the number one priority. On the other hand, the accuracy of the ability estimates cannot be sacrificed too much. The empirical study in this article was based on only one 415

19 van der Linden and Veldkamp Bias r max = no control θ MSE r max = no control FIGURE 6. Estimated conditional bias and MSE functions for adaptive testing with content constraints for r max ¼ :25,.20,.15,.10,.05, and.025. Note: The curves in both panels are ordered in r max. MSE ¼ mean square error. θ combination of test and item pool. It is therefore dangerous to generalize. But if we had to choose a goal value for the conditional exposure rates for this combination, our choice would have been r max ¼ :15 or 10. The bias and MSE curves for the two lower levels of.05 and.025 in Figures 5 and 6 are much more less favorable. Such levels may only become acceptable if the item pool is made larger and/or the test shorter (see the lower bound on the exposure rates in Equation 25). 416

20 References Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp ). Reading, MA: Addison-Wesley. Chang, S.-W., & Harris, D. J. (2002, April). Redeveloping the exposure control parameters of CAT items when a pool is modified. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Hetter, R. R., & Sympson, J. B. (1997). Item-exposure in CAT-ASVAB. In W. A. Sands, J. R. Waters, & J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp ). Washington, DC: American Psychological Association. Jensen, F. V. (2001). Bayesian networks and graphs. New York: Springer. McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New horizons in testing (pp ). San Diego, CA: Academic Press. Stocking, M. L., & Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, Stocking, M. L., & Lewis, C. (2000). Methods of controlling the exposure of items in CAT. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp ). Boston: Kluwer. Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp ). San Diego, CA: Navy Personnel Research and Development Center. van der Linden, W. J. (2000). Constrained adaptive testing with shadow tests. In W. J. van der Linden & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp ). Boston: Kluwer. van der Linden, W. J. (2003). Some alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 28, van der Linden, W. J. (2005). Linear models for optimal test design. New York: Springer. van der Linden, W. J., Ariel, A., & Veldkamp, B. P. (2006). Assembling a CAT item pool as a set of linear test forms. Journal of Educational and Behavioral Statistics, 31, van der Linden, W. J., & Reese, L. M. (1998). A model for optimal constrained adaptive testing. Applied Psychological Measurement, 22, van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29, Authors Conditional Item-Exposure Control WIM J. VAN DER LINDEN is professor, Department of Research Methodology, Measurement, and Data Analysis, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands; w.j.vanderlinden@utwente.nl. His areas of interest are test theory, applied statistics, and research methods. 417

21 van der Linden and Veldkamp BERNARD P. VELDKAMP is assistant professor, Department of Research Methodology, Measurement, and Data Analysis, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands; His areas of interest are educational measurement and statistics. Manuscript received June 21, 2005 Accepted March 17,

A Strategy for Optimizing Item-Pool Management

A Strategy for Optimizing Item-Pool Management Journal of Educational Measurement Summer 2006, Vol. 43, No. 2, pp. 85 96 A Strategy for Optimizing Item-Pool Management Adelaide Ariel, Wim J. van der Linden, and Bernard P. Veldkamp University of Twente

More information

Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests

Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests Journal of Educational and Behavioral Statistics Spring 2006, Vol. 31, No. 1, pp. 81 100 Assembling a Computerized Adaptive Testing Item Pool as a Set of Linear Tests Wim J. van der Linden Adelaide Ariel

More information

A Comparison of Item-Selection Methods for Adaptive Tests with Content Constraints

A Comparison of Item-Selection Methods for Adaptive Tests with Content Constraints Journal of Educational Measurement Fall 2005, Vol. 42, No. 3, pp. 283 302 A Comparison of Item-Selection Methods for Adaptive Tests with Content Constraints Wim J. van der Linden University of Twente In

More information

A Gradual Maximum Information Ratio Approach to Item Selection in Computerized Adaptive Testing. Kyung T. Han Graduate Management Admission Council

A Gradual Maximum Information Ratio Approach to Item Selection in Computerized Adaptive Testing. Kyung T. Han Graduate Management Admission Council A Gradual Maimum Information Ratio Approach to Item Selection in Computerized Adaptive Testing Kyung T. Han Graduate Management Admission Council Presented at the Item Selection Paper Session, June 2,

More information

Designing item pools to optimize the functioning of a computerized adaptive test

Designing item pools to optimize the functioning of a computerized adaptive test Psychological Test and Assessment Modeling, Volume 52, 2 (2), 27-4 Designing item pools to optimize the functioning of a computerized adaptive test Mark D. Reckase Abstract Computerized adaptive testing

More information

An Automatic Online Calibration Design in Adaptive Testing 1. Guido Makransky 2. Master Management International A/S and University of Twente

An Automatic Online Calibration Design in Adaptive Testing 1. Guido Makransky 2. Master Management International A/S and University of Twente Automatic Online Calibration1 An Automatic Online Calibration Design in Adaptive Testing 1 Guido Makransky 2 Master Management International A/S and University of Twente Cees. A. W. Glas University of

More information

An Integer Programming Approach to Item Bank Design

An Integer Programming Approach to Item Bank Design An Integer Programming Approach to Item Bank Design Wim J. van der Linden and Bernard P. Veldkamp, University of Twente Lynda M. Reese, Law School Admission Council An integer programming approach to item

More information

Dealing with Variability within Item Clones in Computerized Adaptive Testing

Dealing with Variability within Item Clones in Computerized Adaptive Testing Dealing with Variability within Item Clones in Computerized Adaptive Testing Research Report Chingwei David Shin Yuehmei Chien May 2013 Item Cloning in CAT 1 About Pearson Everything we do at Pearson grows

More information

A standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University

A standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University A standardization approach to adjusting pretest item statistics Shun-Wen Chang National Taiwan Normal University Bradley A. Hanson and Deborah J. Harris ACT, Inc. Paper presented at the annual meeting

More information

Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing

Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing Potential Impact of Item Parameter Drift Due to Practice and Curriculum Change on Item Calibration in Computerized Adaptive Testing Kyung T. Han & Fanmin Guo GMAC Research Reports RR-11-02 January 1, 2011

More information

Modeling of competition in revenue management Petr Fiala 1

Modeling of competition in revenue management Petr Fiala 1 Modeling of competition in revenue management Petr Fiala 1 Abstract. Revenue management (RM) is the art and science of predicting consumer behavior and optimizing price and product availability to maximize

More information

Computer Adaptive Testing and Multidimensional Computer Adaptive Testing

Computer Adaptive Testing and Multidimensional Computer Adaptive Testing Computer Adaptive Testing and Multidimensional Computer Adaptive Testing Lihua Yao Monterey, CA Lihua.Yao.civ@mail.mil Presented on January 23, 2015 Lisbon, Portugal The views expressed are those of the

More information

Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/

Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ Glossary of Standardized Testing Terms https://www.ets.org/understanding_testing/glossary/ a parameter In item response theory (IRT), the a parameter is a number that indicates the discrimination of a

More information

Estimating Reliabilities of

Estimating Reliabilities of Estimating Reliabilities of Computerized Adaptive Tests D. R. Divgi Center for Naval Analyses This paper presents two methods for estimating the reliability of a computerized adaptive test (CAT) without

More information

Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach

Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Automated Test Assembly for COMLEX USA: A SAS Operations Research (SAS/OR) Approach Dr. Hao Song, Senior Director for Psychometrics and Research Dr. Hongwei Patrick Yang, Senior Research Associate Introduction

More information

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania

More information

A Review of Item Exposure Control Strategies for Computerized Adaptive Testing Developed from 1983 to 2005

A Review of Item Exposure Control Strategies for Computerized Adaptive Testing Developed from 1983 to 2005 The Journal of Technology, Learning, and Assessment Volume 5, Number 8 May 2007 A Review of Item Exposure Control Strategies for Computerized Adaptive Testing Developed from 1983 to 2005 Elissavet Georgiadou,

More information

Microeconomic Theory -1- Introduction and maximization

Microeconomic Theory -1- Introduction and maximization Microeconomic Theory -- Introduction and maximization Introduction Maximization. Profit maximizing firm with monopoly power 6. General results on maximizing with two variables 3. Non-negativity constraints

More information

CHAPTER 5 SUPPLIER SELECTION BY LEXICOGRAPHIC METHOD USING INTEGER LINEAR PROGRAMMING

CHAPTER 5 SUPPLIER SELECTION BY LEXICOGRAPHIC METHOD USING INTEGER LINEAR PROGRAMMING 93 CHAPTER 5 SUPPLIER SELECTION BY LEXICOGRAPHIC METHOD USING INTEGER LINEAR PROGRAMMING 5.1 INTRODUCTION The SCMS model is solved using Lexicographic method by using LINGO software. Here the objectives

More information

MBF1413 Quantitative Methods

MBF1413 Quantitative Methods MBF1413 Quantitative Methods Prepared by Dr Khairul Anuar 1: Introduction to Quantitative Methods www.notes638.wordpress.com Assessment Two assignments Assignment 1 -individual 30% Assignment 2 -individual

More information

LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA

LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA AND MODEL AVERAGING Ethan Cohen-Cole Federal Reserve Bank of Boston Working Paper No. QAU07-8 Todd Prono Federal Reserve Bank of Boston This paper can be downloaded

More information

TRANSPORTATION PROBLEM AND VARIANTS

TRANSPORTATION PROBLEM AND VARIANTS TRANSPORTATION PROBLEM AND VARIANTS Introduction to Lecture T: Welcome to the next exercise. I hope you enjoyed the previous exercise. S: Sure I did. It is good to learn new concepts. I am beginning to

More information

AN APPLICATION OF LINEAR PROGRAMMING IN PERFORMANCE EVALUATION

AN APPLICATION OF LINEAR PROGRAMMING IN PERFORMANCE EVALUATION AN APPLICATION OF LINEAR PROGRAMMING IN PERFORMANCE EVALUATION Livinus U Uko, Georgia Gwinnett College Robert J Lutz, Georgia Gwinnett College James A Weisel, Georgia Gwinnett College ABSTRACT Assessing

More information

Balancing Security and Efficiency in Limited-Size Computer Adaptive Test Libraries

Balancing Security and Efficiency in Limited-Size Computer Adaptive Test Libraries Balancing Security and Efficiency in Limited-Size Computer Adaptive Test Libraries Cory oclaire KSH Solutions/Naval Aerospace edical Institute Eric iddleton Naval Aerospace edical Institute Brennan D.

More information

An Approach to Implementing Adaptive Testing Using Item Response Theory Both Offline and Online

An Approach to Implementing Adaptive Testing Using Item Response Theory Both Offline and Online An Approach to Implementing Adaptive Testing Using Item Response Theory Both Offline and Online Madan Padaki and V. Natarajan MeritTrac Services (P) Ltd. Presented at the CAT Research and Applications

More information

Discrete and dynamic versus continuous and static loading policy for a multi-compartment vehicle

Discrete and dynamic versus continuous and static loading policy for a multi-compartment vehicle European Journal of Operational Research 174 (2006) 1329 1337 Short Communication Discrete and dynamic versus continuous and static loading policy for a multi-compartment vehicle Yossi Bukchin a, *, Subhash

More information

Evolutionary Algorithms

Evolutionary Algorithms Evolutionary Algorithms Evolutionary Algorithms What is Evolutionary Algorithms (EAs)? Evolutionary algorithms are iterative and stochastic search methods that mimic the natural biological evolution and/or

More information

Mileage savings from optimization of coordinated trucking 1

Mileage savings from optimization of coordinated trucking 1 Mileage savings from optimization of coordinated trucking 1 T.P. McDonald Associate Professor Biosystems Engineering Auburn University, Auburn, AL K. Haridass Former Graduate Research Assistant Industrial

More information

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST

Introduction to Artificial Intelligence. Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST Introduction to Artificial Intelligence Prof. Inkyu Moon Dept. of Robotics Engineering, DGIST Chapter 9 Evolutionary Computation Introduction Intelligence can be defined as the capability of a system to

More information

The Effects of Model Misfit in Computerized Classification Test. Hong Jiao Florida State University

The Effects of Model Misfit in Computerized Classification Test. Hong Jiao Florida State University Model Misfit in CCT 1 The Effects of Model Misfit in Computerized Classification Test Hong Jiao Florida State University hjiao@usa.net Allen C. Lau Harcourt Educational Measurement allen_lau@harcourt.com

More information

The computer-adaptive multistage testing (ca-mst) has been developed as an

The computer-adaptive multistage testing (ca-mst) has been developed as an WANG, XINRUI, Ph.D. An Investigation on Computer-Adaptive Multistage Testing Panels for Multidimensional Assessment. (2013) Directed by Dr. Richard M Luecht. 89 pp. The computer-adaptive multistage testing

More information

Innovative Item Types Require Innovative Analysis

Innovative Item Types Require Innovative Analysis Innovative Item Types Require Innovative Analysis Nathan A. Thompson Assessment Systems Corporation Shungwon Ro, Larissa Smith Prometric Jo Santos American Health Information Management Association Paper

More information

Survey of Kolmogorov Complexity and its Applications

Survey of Kolmogorov Complexity and its Applications Survey of Kolmogorov Complexity and its Applications Andrew Berni University of Illinois at Chicago E-mail: aberni1@uic.edu 1 Abstract In this paper, a survey of Kolmogorov complexity is reported. The

More information

Tutorial Resource Allocation

Tutorial Resource Allocation MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 160728 Tutorial Resource Allocation Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel and only

More information

Economics 101 Fall 2013 Answers to Homework #6 Due Tuesday, Dec 10, 2013

Economics 101 Fall 2013 Answers to Homework #6 Due Tuesday, Dec 10, 2013 Economics 101 Fall 2013 Answers to Homework #6 Due Tuesday, Dec 10, 2013 Directions: The homework will be collected in a box before the lecture. Please place your name, TA name and section number on top

More information

Bandwagon and Underdog Effects and the Possibility of Election Predictions

Bandwagon and Underdog Effects and the Possibility of Election Predictions Reprinted from Public Opinion Quarterly, Vol. 18, No. 3 Bandwagon and Underdog Effects and the Possibility of Election Predictions By HERBERT A. SIMON Social research has often been attacked on the grounds

More information

Using this information, we then write the output of a firm as

Using this information, we then write the output of a firm as Economists typically assume that firms or a firm s owners try to maximize their profit. et R be revenues of the firm, and C be the cost of production, then a firm s profit can be represented as follows,

More information

Redesign of MCAS Tests Based on a Consideration of Information Functions 1,2. (Revised Version) Ronald K. Hambleton and Wendy Lam

Redesign of MCAS Tests Based on a Consideration of Information Functions 1,2. (Revised Version) Ronald K. Hambleton and Wendy Lam Redesign of MCAS Tests Based on a Consideration of Information Functions 1,2 (Revised Version) Ronald K. Hambleton and Wendy Lam University of Massachusetts Amherst January 9, 2009 1 Center for Educational

More information

Optimization Prof. Debjani Chakraborty Department of Mathematics Indian Institute of Technology, Kharagpur

Optimization Prof. Debjani Chakraborty Department of Mathematics Indian Institute of Technology, Kharagpur Optimization Prof. Debjani Chakraborty Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 39 Multi Objective Decision Making Decision making problem is a process of selection

More information

proficiency that the entire response pattern provides, assuming that the model summarizes the data accurately (p. 169).

proficiency that the entire response pattern provides, assuming that the model summarizes the data accurately (p. 169). A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Lot Sizing for Individual Items with Time-varying Demand

Lot Sizing for Individual Items with Time-varying Demand Chapter 6 Lot Sizing for Individual Items with Time-varying Demand 6.1 The Complexity of Time-Varying Demand In the basic inventory models, deterministic and level demand rates are assumed. Here we allow

More information

Logistic Regression with Expert Intervention

Logistic Regression with Expert Intervention Smart Cities Symposium Prague 2016 1 Logistic Regression with Expert Intervention Pavla Pecherková and Ivan Nagy Abstract This paper deals with problem of analysis of traffic data. A traffic network has

More information

Understanding UPP. Alternative to Market Definition, B.E. Journal of Theoretical Economics, forthcoming.

Understanding UPP. Alternative to Market Definition, B.E. Journal of Theoretical Economics, forthcoming. Understanding UPP Roy J. Epstein and Daniel L. Rubinfeld Published Version, B.E. Journal of Theoretical Economics: Policies and Perspectives, Volume 10, Issue 1, 2010 Introduction The standard economic

More information

ITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES. Brian Dale Stucky

ITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES. Brian Dale Stucky ITEM RESPONSE THEORY FOR WEIGHTED SUMMED SCORES Brian Dale Stucky A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the

More information

Effects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2. April L. Zenisky and Ronald K.

Effects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2. April L. Zenisky and Ronald K. Effects of Selected Multi-Stage Test Design Alternatives on Credentialing Examination Outcomes 1,2 April L. Zenisky and Ronald K. Hambleton University of Massachusetts Amherst March 29, 2004 1 Paper presented

More information

Journal of Industrial Organization Education

Journal of Industrial Organization Education Journal of Industrial Organization Education Volume 3, Issue 1 2008 Article 1 Capacity-Constrained Monopoly Kathy Baylis, University of Illinois, Urbana-Champaign Jeffrey M. Perloff, University of California,

More information

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES Sample Size Issues for Conjoint Analysis Studies Bryan Orme, Sawtooth Software, Inc. 1998 Copyright 1998-2001, Sawtooth Software, Inc. 530 W. Fir St. Sequim, WA

More information

Part IV. Pricing strategies and market segmentation

Part IV. Pricing strategies and market segmentation Part IV. Pricing strategies and market segmentation Chapter 8. Group pricing and personalized pricing Slides Industrial Organization: Markets and Strategies Paul Belleflamme and Martin Peitz Cambridge

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION v171025 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

Minimizing Makespan for Machine Scheduling and Worker Assignment Problem in Identical Parallel Machine Models Using GA

Minimizing Makespan for Machine Scheduling and Worker Assignment Problem in Identical Parallel Machine Models Using GA , June 30 - July 2, 2010, London, U.K. Minimizing Makespan for Machine Scheduling and Worker Assignment Problem in Identical Parallel Machine Models Using GA Imran Ali Chaudhry, Sultan Mahmood and Riaz

More information

Combinatorial Auctions

Combinatorial Auctions T-79.7003 Research Course in Theoretical Computer Science Phase Transitions in Optimisation Problems October 16th, 2007 Combinatorial Auctions Olli Ahonen 1 Introduction Auctions are a central part of

More information

Production Planning under Uncertainty with Multiple Customer Classes

Production Planning under Uncertainty with Multiple Customer Classes Proceedings of the 211 International Conference on Industrial Engineering and Operations Management Kuala Lumpur, Malaysia, January 22 24, 211 Production Planning under Uncertainty with Multiple Customer

More information

Course notes for EE394V Restructured Electricity Markets: Market Power

Course notes for EE394V Restructured Electricity Markets: Market Power Course notes for EE394V Restructured Electricity Markets: Market Power Ross Baldick Copyright c 2009 Ross Baldick Title Page 1 of 54 Go Back Full Screen Close Quit 1 Background This review of background

More information

Evaluating Content Alignment in the Context of Computer-Adaptive Testing: Guidance for State Education Agencies

Evaluating Content Alignment in the Context of Computer-Adaptive Testing: Guidance for State Education Agencies Evaluating Content Alignment in the Context of Computer-Adaptive Testing: Guidance for State Education Agencies Carole Gallagher June 2016 The work reported herein was supported by grant number #S283B050022A

More information

K E N E X A P R O V E I T! V A L I D A T I O N S U M M A R Y Kenexa Prove It!

K E N E X A P R O V E I T! V A L I D A T I O N S U M M A R Y Kenexa Prove It! K E N E X A P R O V E I T! V A L I D A T I O N S U M M A R Y 2010 Kenexa Prove It! 800.935.6694 www.proveit.com TABLE OF CONTENTS INTRODUCTION... 3 TYPES OF VALIDATION... 4 CRITERION VALIDITY... 4 CONSTRUCT

More information

THE EFFECTS OF FULL TRANSPARENCY IN SUPPLIER SELECTION ON SUBJECTIVITY AND BID QUALITY. Jan Telgen and Fredo Schotanus

THE EFFECTS OF FULL TRANSPARENCY IN SUPPLIER SELECTION ON SUBJECTIVITY AND BID QUALITY. Jan Telgen and Fredo Schotanus THE EFFECTS OF FULL TRANSPARENCY IN SUPPLIER SELECTION ON SUBJECTIVITY AND BID QUALITY Jan Telgen and Fredo Schotanus Jan Telgen, Ph.D. and Fredo Schotanus, Ph.D. are NEVI Professor of Public Procurement

More information

Price Formation Education Session Day 1 Economic Dispatch

Price Formation Education Session Day 1 Economic Dispatch Slide 1 Price Formation Education Session Day 1 Economic Dispatch Anthony Giacomoni Melissa Maxwell Laura Walter December 4, 2017 Slide 2 Disclaimer Slide This is not a committee meeting. This session

More information

Tutorial #3: Brand Pricing Experiment

Tutorial #3: Brand Pricing Experiment Tutorial #3: Brand Pricing Experiment A popular application of discrete choice modeling is to simulate how market share changes when the price of a brand changes and when the price of a competitive brand

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software May 2012, Volume 48, Issue 8. http://www.jstatsoft.org/ Random Generation of Response Patterns under Computerized Adaptive Testing with the R Package catr David Magis

More information

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS Anirvan Banerji New York 24th CIRET Conference Wellington, New Zealand March 17-20, 1999 Geoffrey H. Moore,

More information

Audit - The process of conducting an evaluation of an entity's compliance with published standards. This is also referred to as a program audit.

Audit - The process of conducting an evaluation of an entity's compliance with published standards. This is also referred to as a program audit. Glossary 1 Accreditation - Accreditation is a voluntary process that an entity, such as a certification program, may elect to undergo. In accreditation a non-governmental agency grants recognition to the

More information

Pricing with Market Power

Pricing with Market Power Chapter 7 Pricing with Market Power 7.1 Motives and objectives Broadly The model of perfect competition is extreme (and hence wonderfully powerful and simple) because of its assumption that each firm believes

More information

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin

More information

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT

Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Assessing first- and second-order equity for the common-item nonequivalent groups design using multidimensional IRT Benjamin

More information

Demo or No Demo: Supplying Costly Signals to Improve Profits

Demo or No Demo: Supplying Costly Signals to Improve Profits Demo or No Demo: Supplying Costly Signals to Improve Profits by Fan Li* University of Florida Department of Economics, P.O.Box 117140 Gainesville, FL 32611-7140 Email: lifan51@ufl.edu Tel:1-352-846-5475

More information

A SIMULATION MODEL FOR INTEGRATING QUAY TRANSPORT AND STACKING POLICIES ON AUTOMATED CONTAINER TERMINALS

A SIMULATION MODEL FOR INTEGRATING QUAY TRANSPORT AND STACKING POLICIES ON AUTOMATED CONTAINER TERMINALS A SIMULATION MODEL FOR INTEGRATING QUAY TRANSPORT AND STACKING POLICIES ON AUTOMATED CONTAINER TERMINALS Mark B. Duinkerken, Joseph J.M. Evers and Jaap A. Ottjes Faculty of OCP, department of Mechanical

More information

SOFTWARE ENGINEERING

SOFTWARE ENGINEERING SOFTWARE ENGINEERING Project planning Once a project is found to be feasible, software project managers undertake project planning. Project planning is undertaken and completed even before any development

More information

University Question Paper Two Marks

University Question Paper Two Marks University Question Paper Two Marks 1. List the application of Operations Research in functional areas of management. Answer: Finance, Budgeting and Investment Marketing Physical distribution Purchasing,

More information

Sawtooth Software. Which Conjoint Method Should I Use? RESEARCH PAPER SERIES. Bryan K. Orme Sawtooth Software, Inc.

Sawtooth Software. Which Conjoint Method Should I Use? RESEARCH PAPER SERIES. Bryan K. Orme Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES Which Conjoint Method Should I Use? Bryan K. Orme Sawtooth Software, Inc. Copyright 2013, Sawtooth Software, Inc. 1457 E 840 N Orem, UT 0 84097 (801) 477-4700 www.sawtoothsoftware.com

More information

Software Next Release Planning Approach through Exact Optimization

Software Next Release Planning Approach through Exact Optimization Software Next Release Planning Approach through Optimization Fabrício G. Freitas, Daniel P. Coutinho, Jerffeson T. Souza Optimization in Software Engineering Group (GOES) Natural and Intelligent Computation

More information

Chapter 8: Exchange. 8.1: Introduction. 8.2: Exchange. 8.3: Individual A s Preferences and Endowments

Chapter 8: Exchange. 8.1: Introduction. 8.2: Exchange. 8.3: Individual A s Preferences and Endowments Chapter 8: Exchange 8.1: Introduction In many ways this chapter is the most important in the book. If you have time to study just one, this is the one that you should study (even though it might be a bit

More information

Logistic and production Models

Logistic and production Models i) Supply chain optimization Logistic and production Models In a broad sense, a supply chain may be defined as a network of connected and interdependent organizational units that operate in a coordinated

More information

System Dynamics Group Sloan School of Management Massachusetts Institute of Technology

System Dynamics Group Sloan School of Management Massachusetts Institute of Technology System Dynamics Group Sloan School of Management Massachusetts Institute of Technology Introduction to System Dynamics, 15.871 System Dynamics for Business Policy, 15.874 Professor John Sterman Professor

More information

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery

Glossary of Terms Ability Accommodation Adjusted validity/reliability coefficient Alternate forms Analysis of work Assessment Band Battery 1 1 1 0 1 0 1 0 1 Glossary of Terms Ability A defined domain of cognitive, perceptual, psychomotor, or physical functioning. Accommodation A change in the content, format, and/or administration of a selection

More information

A Framework for the Optimizing of WWW Advertising

A Framework for the Optimizing of WWW Advertising A Framework for the Optimizing of WWW Advertising Charu C. Aggarwal, Joel L. Wolf and Philip S. Yu IBM T.J. Watson Research Center, Yorktown Heights, New York Abstract. This paper discusses a framework

More information

DEMAND CURVE AS A CONSTRAINT FOR BUSINESSES

DEMAND CURVE AS A CONSTRAINT FOR BUSINESSES 1Demand and rofit Seeking 8 Demand is important for two reasons. First, demand acts as a constraint on what business firms are able to do. In particular, the demand curve forces firms to accept lower sales

More information

Department of Economics, University of Michigan, Ann Arbor, MI

Department of Economics, University of Michigan, Ann Arbor, MI Comment Lutz Kilian Department of Economics, University of Michigan, Ann Arbor, MI 489-22 Frank Diebold s personal reflections about the history of the DM test remind us that this test was originally designed

More information

Market mechanisms and stochastic programming

Market mechanisms and stochastic programming Market mechanisms and stochastic programming Kjetil K. Haugen and Stein W. Wallace Molde University College, Servicebox 8, N-6405 Molde, Norway E-mail: Kjetil.Haugen/Stein.W.Wallace@himolde.no 18.12.01

More information

A Fuzzy Multiple Attribute Decision Making Model for Benefit-Cost Analysis with Qualitative and Quantitative Attributes

A Fuzzy Multiple Attribute Decision Making Model for Benefit-Cost Analysis with Qualitative and Quantitative Attributes A Fuzzy Multiple Attribute Decision Making Model for Benefit-Cost Analysis with Qualitative and Quantitative Attributes M. Ghazanfari and M. Mellatparast Department of Industrial Engineering Iran University

More information

Routing order pickers in a warehouse with a middle aisle

Routing order pickers in a warehouse with a middle aisle Routing order pickers in a warehouse with a middle aisle Kees Jan Roodbergen and René de Koster Rotterdam School of Management, Erasmus University Rotterdam, P.O. box 1738, 3000 DR Rotterdam, The Netherlands

More information

A Bayesian Approach to Operational Decisions in Transportation Businesses

A Bayesian Approach to Operational Decisions in Transportation Businesses From: FLAIRS-02 Proceedings Copyright 2002, AAAI (wwwaaaiorg) All rights reserved A Bayesian Approach to Operational Decisions in Transportation Businesses Nils-Peter Andersson, Love Ekenberg and Aron

More information

American Association for Public Opinion Research

American Association for Public Opinion Research American Association for Public Opinion Research Bandwagon and Underdog Effects and the Possibility of Election Predictions Author(s): Herbert A. Simon Source: The Public Opinion Quarterly, Vol. 18, No.

More information

Some Thoughts on the Traveler s Dilemma. 1 Introduction. 1.1 Statement of the problem. by Guillaume Alain

Some Thoughts on the Traveler s Dilemma. 1 Introduction. 1.1 Statement of the problem. by Guillaume Alain Some Thoughts on the Traveler s Dilemma by Guillaume Alain Abstract In this paper, we will discuss the traveler s dilemma, a problem famous for its ability to point out how theoretical results of game

More information

Before the Office of Administrative Hearings 600 North Robert Street St. Paul, MN 55101

Before the Office of Administrative Hearings 600 North Robert Street St. Paul, MN 55101 Rebuttal Testimony Anne E. Smith, Ph.D. Before the Office of Administrative Hearings 00 North Robert Street St. Paul, MN 0 For the Minnesota Public Utilities Commission Seventh Place East, Suite 0 St.

More information

Item response theory analysis of the cognitive ability test in TwinLife

Item response theory analysis of the cognitive ability test in TwinLife TwinLife Working Paper Series No. 02, May 2018 Item response theory analysis of the cognitive ability test in TwinLife by Sarah Carroll 1, 2 & Eric Turkheimer 1 1 Department of Psychology, University of

More information

Simple Market Equilibria with Rationally Inattentive Consumers

Simple Market Equilibria with Rationally Inattentive Consumers Simple Market Equilibria with Rationally Inattentive Consumers Filip Matějka and Alisdair McKay January 16, 2012 Prepared for American Economic Review Papers and Proceedings Abstract We study a market

More information

M. Zhao, C. Wohlin, N. Ohlsson and M. Xie, "A Comparison between Software Design and Code Metrics for the Prediction of Software Fault Content",

M. Zhao, C. Wohlin, N. Ohlsson and M. Xie, A Comparison between Software Design and Code Metrics for the Prediction of Software Fault Content, M. Zhao, C. Wohlin, N. Ohlsson and M. Xie, "A Comparison between Software Design and Code Metrics for the Prediction of Software Fault Content", Information and Software Technology, Vol. 40, No. 14, pp.

More information

Thinking About Chance & Probability Models

Thinking About Chance & Probability Models Thinking About Chance & Probability Models Chapters 17 and 18 November 14, 2012 Chance Processes Randomness and Probability Probability Models Probability Rules Using Probability Models for Inference How

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. FIGURE 1-2

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. FIGURE 1-2 Questions of this SAMPLE exam were randomly chosen and may NOT be representative of the difficulty or focus of the actual examination. The professor did NOT review these questions. MULTIPLE CHOICE. Choose

More information

Transient and Succession-of-Steady-States Pipeline Flow Models

Transient and Succession-of-Steady-States Pipeline Flow Models Transient and Succession-of-Steady-States Pipeline Flow Models Jerry L. Modisette, PhD, Consultant Jason P. Modisette, PhD, Energy Solutions International This paper is copyrighted to the Pipeline Simulation

More information

Dynamic Vehicle Routing for Translating Demands: Stability Analysis and Receding-Horizon Policies

Dynamic Vehicle Routing for Translating Demands: Stability Analysis and Receding-Horizon Policies Dynamic Vehicle Routing for Translating Demands: Stability Analysis and Receding-Horizon Policies The MIT Faculty has made this article openly available. Please share how this access benefits you. Your

More information

Supplimentary material for Research at the Auction Block: Problems for the Fair Benefits Approach to International Research

Supplimentary material for Research at the Auction Block: Problems for the Fair Benefits Approach to International Research Supplimentary material for Research at the Auction Block: Problems for the Fair Benefits Approach to International Research Alex John London Carnegie Mellon University Kevin J.S. Zollman Carnegie Mellon

More information

A Statistical Comparison Of Accelerated Concrete Testing Methods

A Statistical Comparison Of Accelerated Concrete Testing Methods Journal of Applied Mathematics & Decision Sciences, 1(2), 89-1 (1997) Reprints available directly from the Editor. Printed in New Zealand. A Statistical Comparison Of Accelerated Concrete Testing Methods

More information

Getting Started with OptQuest

Getting Started with OptQuest Getting Started with OptQuest What OptQuest does Futura Apartments model example Portfolio Allocation model example Defining decision variables in Crystal Ball Running OptQuest Specifying decision variable

More information

Product quality evaluation system based on AHP fuzzy comprehensive evaluation

Product quality evaluation system based on AHP fuzzy comprehensive evaluation Journal of Industrial Engineering and Management JIEM, 2013 6(1):356-366 Online ISSN: 2013-0953 Print ISSN: 2013-8423 http://dx.doi.org/10.3926/jiem.685 Product quality evaluation system based on AHP fuzzy

More information

Rationing Poor Consumers to Reduce Prices

Rationing Poor Consumers to Reduce Prices Rationing Poor Consumers to Reduce Prices Simona Grassi Ching-to Albert Ma Max Weber Fellow Department of Economics European University Institute Boston University Villa La Fonte, Via Delle Fontanelle,

More information

CHAPTER 4, SECTION 1

CHAPTER 4, SECTION 1 DAILY LECTURE CHAPTER 4, SECTION 1 Understanding Demand What Is Demand? Demand is the willingness and ability of buyers to purchase different quantities of a good, at different prices, during a specific

More information

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT

PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT PRINCIPLES AND APPLICATIONS OF SPECIAL EDUCATION ASSESSMENT CLASS 3: DESCRIPTIVE STATISTICS & RELIABILITY AND VALIDITY FEBRUARY 2, 2015 OBJECTIVES Define basic terminology used in assessment, such as validity,

More information

Symmetric Information Benchmark Begin by setting up a comparison situation, where there is no information asymmetry. Notation:

Symmetric Information Benchmark Begin by setting up a comparison situation, where there is no information asymmetry. Notation: ECO 37 Economics of Uncertainty Fall Term 009 Notes for Lectures 7. Job Market Signaling In this market, the potential employee s innate skill or productive capability on the job matters to an employer

More information

Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products

Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products Conjoint analysis based on Thurstone judgement comparison model in the optimization of banking products Adam Sagan 1, Aneta Rybicka, Justyna Brzezińska 3 Abstract Conjoint measurement, as well as conjoint

More information