Course Evaluation Validation using Data Envelopment Analysis. Joseph Sarkis Clark University. Inshik Seol Clark University

Size: px
Start display at page:

Download "Course Evaluation Validation using Data Envelopment Analysis. Joseph Sarkis Clark University. Inshik Seol Clark University"

Transcription

1 THE ACCOUNTING EDUCATORS JOURNAL Volume XX 2010 pp Coure Evaluation Validation uing Data Envelopment Analyi Joeph Sarki Clark Univerity Inhik Seol Clark Univerity Abtract In thi paper we detail a methodology and ome of it variation with repect to evaluating the validity of a teaching and coure evaluation tool. The methodology applie a mathematical programming baed multifactor productivity approach called data envelopment analyi (DEA). The paper how how DEA can be effectively applied to check the validity of the evaluation intrument uing actual accounting tudent repone for an accounting intructor and coure. The reult how that DEA can be ued a upplement to exiting coure evaluation ytem by making the interpretation of the coure evaluation intrument more meaningful. Introduction Performance evaluation criteria are major concern to mot, if not all, accounting (a well a other) faculty at higher education intitution. For example, Hunt et al. (2009) found that accounting faculty are quite concerned with the performance evaluation criteria ued for promotion and tenure deciion when they earch for new job and/or relocation. One of the mot widely ued evaluation criteria i teaching effectivene. Read et al. (2001) urveyed 250 adminitrator of accounting program in AACSBaccredited intitution and found that teaching effectivene wa weighed between 34 to 50 percent in tenure evaluation and promotion deciion. Thu, the importance of effective and accurate teaching evaluation and performance criteria cannot be undertated for academic faculty. Among many different teaching evaluation method (e.g., elf-aement, peer review, outide conultant, etc.), the mot popular method i the evaluation by tudent. Peteren et al (2008) argue that mot intitution of higher education ue tudent evaluation becaue it provide adminitrator with ueful information for faculty reappointment, tenure, and promotion procee, a well a for merit and teaching award. Specifically, tudent teaching evaluation i popular becaue (1) it provide direct feedback to the faculty, (2) it guide faculty for better pedagogical performance in the claroom (Peteren et al (2008). Other advantage of tudent evaluation over other evaluative method include argument that tudent are: (1) cutomer; (2) not biaed compared to a one-time outide reviewer; (3) an inexpenive way of collecting data; and (4) anonymou (Seiler & Seiler 2002). Read et al. (2001), in their tudy, found that coure evaluation received more than 50 percent of the relative weight for evaluating teaching. Crumbley & Fliedner (2002) alo found that accounting adminitrator would not replace the current coure evaluation ytem with an alternative evaluation ytem even though there are ome problem with their current tudent evaluation ytem.

2 22 Sarki and Seol One of the problem of tudent coure evaluation, however, i it reliability and validity. Even though ome paper how that tudent evaluation are valid and reliable mechanim for teaching evaluation (Wachtel, 1998), other do not agree (Nimmer & Stone, 1991). The common finding are that tudent coure evaluation are affected by many other factor including tudent characteritic (e.g., expected grade), intructor characteritic (e.g., gender), and coure characteritic (e.g., cla ize), and other environmental characteritic (e.g., ambience of the claroom). Crumbley et al. (2001) urveyed over 500 accounting tudent to determine accounting tudent perception on teaching evaluation and found that tudent punih intructor who engage in a number of well-known learning/teaching technique, which encourage intructor to increae tudent evaluation core by acrificing the learning proce. They conclude that uing tudent data a a urrogate for teaching performance i an illuionary performance meaurement ytem and ugget other meaure hould be employed. Mot tudie, however, focu more on outide (e.g., ituational) factor and not many paper dicu the validity of the tudent repone to the coure evaluation or how to ae that validity. Thi paper trie to fill thi gap by introducing a methodology to invetigate the validity of a coure evaluation intrument. Thi validity i baed on the tudent evaluation repone on input and output evaluation item from a coure evaluation quetionnaire and the level of conitency (bia) between thee input and output. The methodology relie on a multi-factor productivity model called Data Envelopment Analyi (DEA). DEA allow for the imultaneou comparion of multiple input and output factor that will determine the relative evaluation of a number of tudent. We apply DEA to an analyi of the valuation intrument for an accounting intructor. The following ection introduce the DEA methodology. DEA: A SIMPLE GRAPHICAL EXAMPLE 1 To help undertand ome of the baic foundation of DEA, we will provide a imple graphical example. Thi graphical example i eay to how when looking at the envelopment ide of DEA analyi (a mathematically decribed in expreion (A4) in the appendix). Let u aume that we are uing the teaching evaluation of three tudent, A, B, and C, within an accounting cla. We will aume there i only one input (e.g. Intructor Cla Preparation rating) and thi evaluation i the ame for each tudent (ay a 3 rating). Looking at Figure 1, we ee that there are two performance output factor reported by tudent, repreented by rating for overall teacher effectivene, along the Y axi, and knowledge gained from cla rating, along the X axi. Thu, we can ee that given an equal initial rating on the input, the higher the number on the Y and X axe, the better the evaluated performance of the intructor (auming a 0-4 coring range, with higher number repreenting better evaluation). Figure 1 how that Student A evaluate the knowledge gained from the coure relatively highly (a 4 rating), but gave a poor rating for overall teacher effectivene, while tudent C gave a high teacher effectivene rating (4) and a lower knowledge gained rating (1). Student B gave ranking of 1 for teacher effectivene and 2 for knowledge gained. If it can be aumed that linear combination of the tudent evaluation are allowed, then the line egment connecting tudent A and C how the poibilitie of output for virtual tudent of varying compoition of the efficient tudent evaluation. Similar line egment can be drawn between A and B or between B and C. However, ince the egment AC lie beyond egment AB and BC (toward the upper right ide of the quadrant), thi mean that a combination of only A and C will produce an efficient virtual tudent evaluation core that will generate the highet rating output for a given et of input rating. Thi line 1 Thi imple graphical example i baed on Tim Anderon baeball example located at

3 Coure Evaluation Validation uing Data Envelopment Analyi 23 (egment AC) i called the efficient frontier. We ee that Student B evaluation lie below the efficient frontier and can be conidered inefficient. Since Student B i within the efficient frontier, it i deemed inefficient. Student B relative efficiency core i determined by comparing it to a virtual tudent evaluation V, formed by combining tudent evaluation point A and C. The virtual tudent evaluation V, i a linear combination of approximately 64 percent of tudent C evaluation and 36 percent of Student A evaluation. The efficiency of tudent B evaluation i then calculated by finding the fraction of input that tudent V evaluation rating would need to be with the Student B output evaluation core. Thi core i calculated by looking at the line from the origin, O, to V. The efficiency of tudent B evaluation core i OB/OV, or about 68 percent. Mathematically, we can expand thi evaluation to numerou dimenion for input and output a hown in the Appendix. Application of the DEA model: A Cae example To how the applicability of the DEA approach to evaluate the validity of tudent evaluation intrument, we ued a ample ize of 99 tudent repone from four accounting clae for an individual accounting intructor. The evaluation intrument i compoed of (1) twelve evaluation quetion that ue a Likert cale rating cheme ranging from a rating of 1 (poor) to 5 (excellent) and (2) other quetion which include tudent characteritic a well a open-end quetion. We elected three input and two output to how the robutne of the olution and to not overly complicate the analyi. The three input include the tudent evaluation core of: 1. Intructor ability to preent material clearly. 2. Intructor preparation for clae. 3. Intructor overall organization of the coure. The two output include the tudent evaluation core of: 1. Contribution of thi coure to tudent acquiring new knowledge. 2. Overall effectivene of the intructor. The election i baed on the previou reearch howing that tudent conider preparation, claroom preentation, organization among other are important dimenion of their learning (O Toole et al., 2000; Tang, 1997). Thee evaluation input and output factor were alo elected becaue mot of the tudent fully reponded to them (tudent with not applicable repone in their evaluation for our elected input and output were not included in the DEA evaluation) and thu allowed u to have a more complete data et for evaluation. The raw data and reult of efficiency core for each individual are hown in Table 1. To get a better decription of the efficiency core reult and analyzing them for validity/bia we have graphed them in Figure 2. If the reult acro all tudent were conitent (e.g. all 3 rating on input, all 3 rating on output), then all the tudent would have an equal relative efficiency core of Thi reult would mean that the graphic would be repreented with a traight line acro at Yet, if any one of the tudent i inconitent with lower input evaluation rating and higher output evaluation rating, repreenting a more efficient olution, then that tudent evaluation would get a 1.00 core and other core will probably move down in relative core. If it were only one tudent, the remaining tudent may till be relatively conitent, and a prevalence of the reult would till be around a certain efficiency core or couple of core, depending on the number of input and output factor. If there i more diperion and many level of DEA core, then we could argue that there i le validity in the ranking.

4 24 Sarki and Seol Note that high efficiency core do not equate to bia. Bia (or inconitency) refer to the variance in the efficiency core, not the actual efficiency core. Low input with high output lead to the mot efficient core. In ome cae high efficiency core are outlier. If the ret of the ample conitently applie their individual weighting cheme, the other deciion making unit will have lower but relatively imilar efficiency core. Such bia may occur from one of two type of occurrence. The firt bia would occur when tudent give high input rating and lower output rating (meaning a maller relative efficiency DEA core, further away from a value of 1.00). The econd bia would occur when tudent provide lower input rating and higher output rating (meaning a higher relative efficiency core, cloer to a value of 1.00). Thu, larger diperion in efficiency core would repreent bia in the intrument. We have two way of oberving bia with DEA-baed data. The firt i to look to ee if there are relatively conitent efficiency core by determining the heterogeneity of the data uing cluter analyi (or level of line on a graph); the econd i to invetigate the diperion of the data. A mentioned, clutering and cluter analyi of the efficiency core may provide inight into the validity and bia of the tudent evaluation. When a larger number of cluter, with a non-trivial et of tudent aigned to thee cluter, exit, then it i very likely that there are biae and le validity in the repone. Looking at ize and number of cluter and the amount of diffuion in the data together will give a better idea of the overall validity of the tudent repone baed on the input and output factor. Thee cluter may alo be tatitically evaluated to determine tatitical ignificance in their difference. A cluter analyi i carried out on our data et uing SPSS oftware and it Two-Step Cluter analyi module. We ued a Euclidean ditance meaure and two eparate clutering criterion (Akaike' Information Criterion and Schwarz' Bayeian Criterion). For diperion of data we ued the imple tandard deviation and coefficient of variation tatitic. The reult how that there are three cluter and the coefficient of variation wa about Since perfectly unbiaed reult will have only one cluter and coefficient of variation equal to zero, we clearly have ome bia in the reult. The determination of what i acceptable and not acceptable bia (egregioune of the bia) will typically be judgmental. Thu, let u take a look at the characteritic of the cluter. Table 2 (a) how that there i 6 percent, 62 percent, and 32 percent in each repective cluter. The raw data how that the firt cluter of evaluation core are thoe where the repondent have undervalued the input and/or overvalued the output of the evaluation (e.g. tudent number 3 who aigned a value of 3 for one of the input while aigning a value of 4 for the output). Thi wa the mallet cluter with only 6 percent having inconitent evaluation making the intructor and/or cla eem better than expected. If we look at the element within the econd cluter, we ee le bia and more conitency in the data here (e.g. tudent 2 and 4 who rated 5 for all our factor). Thi i the larget cluter, which ignifie that a large percentage of the tudent were conitent in their appraial. The final cluter i compried of thoe tudent who put higher value on the input element and leer value on the output element. Example of thee are hown by tudent 1 who ranked the input a excellent, but the output a good (rating of 4 and 5 ) repectively. Given thee obervation, we do not ee the extreme weighting and difference where definite biae do exit, one way or the other. The diperion of the data and the number of cluter do not necearily point to

5 Coure Evaluation Validation uing Data Envelopment Analyi 25 great diagreement. Thu, we can tate with ome confidence that the intrument for thi ample of clae did not contain much bia. Table 2 ummarize the reult. SUMMARY AND DISCUSSION The coure evaluation by tudent i the mot widely ued tool to meaure teaching effectivene of college intructor. However, many previou reearcher have quetioned the validity of thi evaluation approach and it intrument. Our paper introduce a multi-factor productivity model called Data Envelopment Analyi (DEA) and how how DEA can be applied in checking the validity of coure evaluation intrument. Uing actual data we how how the reult can be interpreted and further evaluated uing tandard tatitical tool. DEA, however, ha ome limitation too. For example, DEA technique tend to ue extreme weighting to make a tudent relative efficiency core a large a poible. To overcome thi limitation weight retriction or range of weight retriction (aurance region) may be introduced by deciion maker and analyt (Thompon et al., 1990; Wong & Bealey, 1990). Thee weight retriction may be ued to reflect the relative importance of each input and output factor. Second, the interpretation of the reult can be judgmental. DEA doe not give a ingle number that tell u that there i a true bia or not. One way that we can overcome thi limitation i to compare the reult to ome benchmark core depending on different characteritic like type of coure, the work load of a coure, the grade ditribution of coure, gender, and level of the coure (Morgan et al. 2003, Whitworth et al. 2002). Therefore, the collection of additional data or the impact of exogenou factor will add additional inight. Another limitation of the approach developed here i the poible biae and validity aociated with the halo effect. The technique doe not capture the poibility of tudent reponding poitively baed on output perception and adjuting their input accordingly. But, ome of thi ha been mitigated by having mixed quetion type (poitive and negative) in the urvey quetionnaire. For example, intead of a quetion tating Thi intructor wa effective (on a 1 to 4 cale) it would be tated in the negative Thi intructor wa ineffective to help prevent ome of thi ytemic bia. By applying DEA, adminitrator and faculty can decreae the dicrepancy between them with repect to uing coure evaluation a a mean of meauring teaching effectivene. For example, Morgan et al. (2003) found that accounting adminitrator believe tudent evaluation meaure teaching effectivene to a greater degree than faculty, while faculty member believe their peronality i the primary determinant of rating on tudent evaluation. Even though DEA cannot olve all the problem related to the coure evaluation, the method can definitely be ued a upplement to the exiting ytem to interpret the poible problem regarding the validity of evaluation intrument and/or make the interpretation of the coure evaluation intrument more meaningful. A uggeted by other, however, the bet practice hould be that tudent evaluation houldn t be ued a the only meaure for faculty teaching evaluation but ue multiple criteria rather than jut focu on one mechanim.

6 26 Sarki and Seol REFERENCES Banker, R. D., Charne, A., & Cooper, W. W Some model for etimation of technical and cale efficiencie in data envelopment analyi. Management Science, 30(9), Charne A, Cooper WW, Rhode E Meauring the efficiency of deciion making unit. European Journal of Operational Reearch 2: Crumbley, D. L., Henry, B. K., and Kratchman, S. H Student perception of the evaluation of college teaching, Quality Aurance in Education, 9 (4): Crumbley, D. L., and Fliedner, E Accounting adminitrator perception of tudent evaluation of teaching (SET) information, Quality Aurance in Education, 10 (4): Hunt, S.C., Eaton, T.V., and Reintein, A Accounting Faculty Job Search in a Seller Market, Iue in Accounting Education 24 (2), Morgan, D. A., Sneed, J. and Swinney, L Are tudent evaluation a valid meaure of teaching effectivene: perception of accounting faculty member and adminitrator, Management Reearch New, 26 (7): Nimmer, J.G. and Stone, E.F Effect of grading practice and time of rating on tudent rating of faculty performance and tudent learning, Reearch in Higher Education, 32 (April): O Toole, D. M., Spinelli, M. A., and Wetzel, J. N The Important Learning Dimenion in the School of Buine: A Survey of Student and Faculty, Journal of Education for Buine, 75 (6): Peteren, R.L., Berenon, M.L., Mira, R.B., and Radoevich, V.J An Evaluation of Factor Regarding Student Aement of Faculty in a Buine School, Deciion Science The Journal of Innovative Education, 6(2): Read, W. J., Rama, D. V., and Raghunandan, K The Relationhip Between Student Evaluation of Teaching and Faculty Evaluation. Journal of Education for Buine, 76(4): Seiler, V.L. and Seiler, M. J Profeor who make the grade, Review of Buine, 23 (2): Tang, T.L.P Teaching Evaluation at a Public Intitution of Higher Education: Factor Related to the Overall Teaching Effectivene, Public Peronnel Management, 25: Thompon, R.G., Langemeier, L.N., Lee, C.T., and Thrall, R.M The Role of Multiplier Bound in Efficiency Analyi with Application to Kana Farming, Journal of Econometric, 46 (1/2): Wachtel, H.K Student evaluation of college teaching effectivene: A brief review. Aement and Evaluation in Higher Education, 23 (2): Whitworth, J. E., Price, B. A., and Randall, C. H Factor that affect college of buine tudent opinion of teaching and learning, Journal of Education for Buine, 77 (5): Wong, Y-H B., and Bealey, J E Retricting Weight Flexibility in Data Envelopment Analyi, Journal of the Operational Reearch Society, 41 (9):

7 Coure Evaluation Validation uing Data Envelopment Analyi 27 Mathematical Formulation for Baic DEA Model Appendix DEA productivity model for a given deciion-making unit (DMU) can ue ratio baed on the amount of output (rating) per given et of input (rating). The definition of a DMU can vary greatly, from individual (tudent) to clae to chool, a long a the unit can be modeled with input and output value. DEA allow for the imultaneou analyi of multiple input to multiple output, a multi-factor productivity approach. The general efficiency meaure ued by DEA i bet ummarized by equation (A1). Oyvky y Ek (A1) I u x x kx where: (E k ) i the efficiency or productivity meaure of DMU, uing the weight of tet DMU k; (O y ) i the value of output y for DMU ; (I x ) i the value for input x of DMU ; (v ky ) i the weight aigned to DMU k for output y; and (u kx ) i the weight aigned to DMU k for input x. In the baic DEA ratio model developed by Charne, Cooper, and Rhode (1978) (CCR), the objective i to maximize the efficiency value of a tet DMU k from among a reference et of DMU, by electing the optimal weight aociated with the input and output meaure. The maximum efficiencie are contrained to 1. The formulation i repreented in expreion (A2). Okyvky y maximize Ekk I u ubject to: x kx kx E k 1 u, v 0 kx ky DMU (A2) Thi nonlinear programming formulation (A2) i equivalent to formulation (A3) (ee Charne et al. (1978) for a complete tranformation explanation): maximize E O v ubject to: kk ky ky y E k 1 DMU I u kx kx 1 (A3) u x kx, v 0 ky The tranformation i completed by contraining the efficiency ratio denominator from (A2) to a value of 1, repreented by the contraint I kxu kx 1. x

8 28 Sarki and Seol The reult of formulation (A3) (the CCR formulation) i an optimal imple or technical efficiency value (E kk *) that i at mot equal to 1 (thi formulation ha alo been defined a the contant return to cale formulation). If E kk * = 1, then no other DMU i more efficient than DMU k for it elected weight. That i, E kk * = 1 ha DMU k on the optimal frontier and i not dominated by any other DMU. If E kk * < 1, then DMU k doe not lie on the optimal frontier and there i at leat one other DMU that i more efficient for the optimal et of weight determined by (A3). The formulation (A3) i executed time, once for each DMU. The dual of the CCR formulation (alo defined a the envelopment ide) i repreented by model (A4): minimize ubject to: I I 0 Input I O x y O x ky 0 Output O 0 DMU The CCR model ha an aumption of contant return to cale for the input and output. To take into conideration variable return to cale, a model introduced by Banker, Charne, and Cooper (1984) (BCC) i utilized. The BCC model aid in determining the cale efficiency of a et of unit (which i a technically efficient unit for the variable return to cale model). Thi new model ha an additional convexity contraint defined by limiting the ummation of the multiplier weight (λ) equal to one, or: (A5) 1 The ue of the CCR and BCC model together help determine the overall technical and cale efficiencie of the DMU repondent and whether the data exhibit varying return to cale. (A4)

9 Coure Evaluation Validation uing Data Envelopment Analyi 29 Figure 1: Simple Graphical Example of DEA for evaluation purpoe. 4 C Overall Teacher Effectivene Rating Efficient Frontier B Knowledge Gained from Cla Rating V A Figure 2: Graph of Contant Return to Scale DEA Model for Student Evaluation Efficiency Score Relative Efficiency Score Student

10 30 Sarki and Seol Table 1: The actual input factor core a identified by the 99 tudent and model reult. Student Input 1 Input 2 Input 3 Output 1 Output2 CCR* BCC**

11 Coure Evaluation Validation uing Data Envelopment Analyi * Charne, Cooper, and Rhode (1978) Model ** Banker, Charne, and Cooper (1984) Model See the appendix for model decription.

12 32 Sarki and Seol Table 2(a): Cluter Ditribution N % of Total Cluter % % % Total % Table 2(b): Cluter and centroid for DEA relative efficiency core for tudent evaluation data CCR* BCC** Mean Std. Coef. of Mean Std. Coef. of Deviation Variation*** Deviation Variation*** Cluter Combined * Charne, Cooper, and Rhode (1978) Model ** Banker, Charne, and Cooper (1984) Model ***The coefficient of variation tatitic can be determined by dividing a ample tandard deviation by it mean.