An Application of Data Envelopment Analysis to Software Quality Assessment

An Application of Data Envelopment Analysis to Software Quality Assessment Georgia Paschalidou Department of Applied Informatics, University of Macedonia, 56 Egnatia str., 54006 Thessaloniki gpaschalid@uom.edu.gr Emmanouil Stiakakis Department of Applied Informatics, University of Macedonia, 56 Egnatia str., 54006 Thessaloniki stiakakis@uom.gr Alexander Chatzigeorgiou Department of Applied Informatics, University of Macedonia, 56 Egnatia str., 54006 Thessaloniki achat@uom.gr ABSTRACT Data Envelopment Analysis (DEA) is a non-parametric technique which involves the use of linear programming methods to measure the efficiency of a homogenous set of units. These units are known as Decision Making Units (DMUs) and defined by multiple input and output data. Efficiencies are measured relative to a piece-wise surface (efficient frontier) which envelops the data, thus justifying the name of the technique. Although DEA has been mostly used in production economics, its application in the context of software quality evaluation seems to be a promising approach. This study provides an application of DEA to assess the evolution of two open-source software projects in terms of selected metric values for successive versions of each project. What is really interesting in DEA is that a single efficiency score is calculated for each version despite the often convoluted overall picture of the metric values. According to a simplified view of DEA, there are two categories of units, the efficient (onto the efficient frontier) and the inefficient ones. Each inefficient unit is characterized by a reference set of peers which involves all the efficient units operating closer to that unit. Through the consideration of the reference set of the inefficient versions of each project, the metrics that require improvement, as well as the extent of improvement, could be estimated. These results could assist software developers in identifying design issues that require further improvement. Notwithstanding the fact that there are a number of issues to be further investigated, the applicability of DEA and other operations research tools in the context of software quality might yield interesting results. Categories and Subject Descriptors D.2.7 [Distribution, Maintenance, and Enhancement]: Restructuring, reverse engineering, and reengineering, D.2.8 [Metrics]: Product Metrics General Terms Measurement, Performance, Design. Keywords DEA, Software Evolution, Software Quality, Design Metrics. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. BCI 3, September 9-2, 203, Thessaloniki, Greece. Copyright 203 ACM 978--4503-85-8/3/09...$5.00.. INTRODUCTION The investigation of the history of a software project can reveal important information regarding its evolution and past design decisions. In particular, the evolution of software quality can be assessed by means of trends of selected metrics over successive software versions. Several approaches have been proposed in the software engineering community so as to confront the issue of combining different metrics and unifying their values. According to ISO 926, maintainability, one of the main quality characteristics, can be assessed in terms of four sub-characteristics, i.e. analyzability, changeability, stability, and testability []. For the assessment of each one of these sub-characteristics several sets of metrics have been proposed [2, 3, 4]. However, these trends might be convoluted limiting the ability of the researchers to obtain a clear view of software quality change. In order to confront this problem, we have applied a nonparametric statistical technique, namely Data Envelopment Analysis (DEA), borrowed from the field of production economics. DEA, which is considered one of the most successful approaches for providing a single view of various economic factors, was originally proposed by Charnes et al. [5]. It should be noted that the application of DEA in order to assess the evolution of software quality over a set of successive versions is a new approach. This methodology can provide an overall picture of the quality of a particular unit compared to other similar units, based on selected variables that can serve as output metrics of the units under examination. Briefly, the advantages of DEA consist in the ability to unify several output items, the fact that the approach does not require knowledge of their interrelationships, the identification of peers that can serve as benchmarks, and the simplicity of its application. In the case of software evolution analysis, DEA provides a ranking of all examined versions based on their efficiency scores, a reference set of efficient peers, and the changes required to improve each inefficient software version. 2. BRIEF DESCRIPTION OF DEA Data Envelopment Analysis (DEA) is a nonparametric linear programming methodology that estimates the relative efficiency of a set of homogenous, comparable, peer entities called Decision Making Units (DMUs). According to Johansson [6], efficiency in production is achieved when the outputs of firms are produced in the best and most profitable way. The evaluation of efficiency was first introduced in 957 by Farrell [7], who explains that efficiency consists of two components: technical efficiency and allocative efficiency. The first one estimates the capacity of a firm to attain the potential increase in output with a given level of inputs while the latter gives the capacity of a firm to apply the 228

inputs in optimal quantities at particular prices that is to minimize the cost. Calculating a combination of technical and allocative efficiency provides a measure of economic efficiency [8]. The definition of a DMU is generic and flexible. According to Cooper et al. [9], a DMU is regarded as an entity which converts multiple inputs into multiple outputs and whose performance is to be evaluated. Examples of DMUs could be hospitals, bank branches, schools, department stores, supermarkets, etc. For each DMU we calculate its relative efficiency, by forming the ratio of a weighted sum of outputs to a weighted sum of inputs. The advantage of DEA at this point, is that it does not require the user to prescribe weights for each input and output. DEA allows each DMU to select those weights which can maximize its efficiency. Subsequently, a surface is constructed, called efficient frontier, which is formed by the units with the maximum efficiency score and envelops all the other less efficient units [0]. Thus, the efficient frontier designates the efficiency of the most efficient units while simultaneously allows measuring the efficiency of the other DMUs by calculating their deviations from it [9]. Since its introduction in 978, the above-mentioned methodology has been widely used in many industries to examine efficiency levels. More specifically, DEA has been successfully applied in a great variety of contexts, such as health care, US Air Force wings, education, cities, courts, business firms, sports, market research, etc. To illustrate how DEA works, we assume a software system of k versions. We suppose that there are data on n inputs and m outputs for each of k different versions of the software project. Thus, the data for all k versions, referred to as Decision Making Units, are represented in the n x k input matrix X and the m x k output matrix Y. An efficiency measure of a DMU, for example the version, is given by the ratio of all outputs over all inputs, such as u y / v x, where u is an m x vector of output weights and v is an n x vector of input weights (u and v are the transpose vectors of u and v, respectively). The DEA problem is then formulated as follows: find optimal values for u and v, such that the efficiency measure for the version to be maximized, subject to the constraints that the efficiency measures for each DMU must be less than or equal to unity. The optimal values are obtained by the solution of the mathematical programming problem (P.); this problem has to be solved k times, once for each version of the project: subject to m ur y r = max u, v n v x j j = r j m ur yri r= n v j x ji j= i =, 2,, k u r, v j 0 (P.) Since this problem has an infinite number of solutions, a reformulation of the DEA model is required aiming to convert the objective function to a linear function. Hence, we can arbitrarily consider that the sum of inputs for the DMU equals to one: n v j= j x = j Every linear problem-model (primal) can be converted to another linear problem-model (dual). According to the duality theory, the dual problem can be constructed applying a set of transformations to the primal one. Both the primal and the dual model produce precisely the same solution. It should be noted that if the primal problem is a maximization problem the corresponding dual is a minimization problem. If an optimal solution exists, then these two previously mentioned linear problems have the same value of the objective function. Most of the times (depending on the structure of the linear problem) instead of solving the primal model, it is much easier to solve the corresponding dual model. The dual form in the case of DEA ( envelopment form ) is presented below: subject to min θ θ, λ k r + i yri i= y λ 0 k j i ji i= λ i 0 θ x λ x 0 i =, 2,, k (P.2) where: θ is a scalar (0 < θ ) λ is a k x vector of constants. If the sum of input and output items is less than the number of versions plus one (i.e., n + m < k + ), as actually occurs in the two cases of the paper, the envelopment form, i.e. the dual model, involves fewer constraints and is much easier to be solved than the primal form [8]. The values θ correspond to the scores calculated in our analysis (see Table 2). A value θ= corresponds to an efficient DMU, indicating a point on the efficient frontier, i.e. the surface which envelops all the inefficient DMUs. Efficiency scores are calculated relative to this surface, so DEA measures only relative efficiencies. The lambda (λ) values are the raw weights assigned to the fully efficient units that operate closer to the corresponding inefficient one, when solving DEA (see next section). The larger the value of lambda, the more similar is the operation of the inefficient DMU to the efficient one. The model described above is the basic DEA model, since many alternative approaches of the DEA methodology have been proposed in the literature. It should also be mentioned that there are no particular specifications that the software projects have to meet in order for the application of DEA to be feasible. Moreover, the standard implementation offered by many DEA tools is sufficient to apply the methodology. Furthermore, the time consumed for the application of DEA on a project is negligible. 3. DEMONSTRATION OF DEA APPLICATION In order to demonstrate how DEA works and especially the notion of efficient and inefficient units, we are going to present a simplified example of its application. Let us consider a software system consisting of 0 versions (v., v.2, v.3,, v.0). We 229

assume that each version can be assessed using 2 design metrics, supposingly y and y 2, for which the goal is to maximize their values. In the context of DEA, we consider these design metrics as the "outputs" of the design process according to which the quality of each software version can be assessed. We also do not consider any input which can affect the efficiency of the software system. In other words, we assume a constant input equal to "" for all versions. For this reason the input does not appear in the corresponding graphical representation. Table contains the sample data (values are hypothetical). Table. Sample data for DEA example Version (I) Input (O) Output (O) Output2 v. 6 4 v.2 3 7 v.3 3 9 v.4 4 v.5 2 v.6 5 4 v.7 6 2 v.8 6 9 v.9 0 8 v.0 3 We used a DEA solver in order to apply the method and extract the results shown in Table 2. More specifically, we employed the output-oriented basic DEA model proposed by Charnes et al. [5]. Figure shows a graphical depiction of the involved units and the efficient frontier. As it can be observed, there are two fully efficient DMUs, namely versions v.5 and v.8 (they lie on the efficient frontier). Version v.7 is on the efficient frontier but cannot be considered as fully efficient since there is room for improvement (i.e. output y can be increased to move the corresponding unit to v.5). All other versions are inefficient (i.e. have an efficiency score less than unity). According to their relative efficiency, a full ranking of the examined versions can be obtained representing the extent by which a software version can be considered well designed, in terms of the selected metrics. In Table 2, the rank for each version is shown in the 4th column, while in Figure ranks are shown in parentheses next to each unit. Table 2. Score and reference set for each version No. DMU Score Rank Reference set (lambda) v. 0.408 0 v.5 (0.263) v.8 (0.736) 2 v.2 0.82 6 v.8 3 v.3 0.75 8 v.7 4 v.4 0.96 5 v.5 5 v.5 v.5 6 v.6 0.937 4 v.8 7 v.7 3 v.5 8 v.8 v.8 9 v.9 0.752 7 v.5 (0.542) v.8 (0.457) 0 v.0 0.687 9 v.8 y2 2 0 8 6 4 2 0 Efficient Frontier v.7 v.4 (3) (5) v.3 (8) v. (0) v.5 v.9 (7) v.0 (9) v.2 (6) v.6 (4) 2 4 6 8 0 2 4 6 y v.8 Figure. Efficient frontier for the examined versions of the sample software project An important piece of information that can be obtained by the application of DEA is the construction of the reference set for each inefficient unit. The reference set consists of the fully efficient units that operate closer to the corresponding inefficient one. For example, the reference set for version v.9 consists of versions v.5 and v.8, which as it can be observed in Figure, are the closest fully efficient neighbors of v.9. An interpretation of the reference set is that if someone wishes to improve certain metrics for a particular inefficient software system, he/she could investigate the properties of the corresponding systems in its reference set, to look out for opportunities for improvement. In the context of software evolution analysis, it would make sense to consider the version in the reference set that a) precedes in time the version under examination and b) is closer in terms of elapsed versions. Consequently, for version v.9 it would be preferable to study the properties of version v.8. Lambda values, given in the fourth column of Table 2, is a measure of the relative importance of the other DMUs comprising the reference set of a particular DMU. More specifically, lambda is a vector describing the percentages of efficient DMUs used in order for that DMU to obtain the maximum efficiency score. Each inefficient unit could be theoretically moved (projected) to the efficient frontier and thus obtain an efficiency score equal to one, assuming that the underlying system can be restructured in order to improve its outputs. For each inefficient unit a projection difference can be obtained for each of the outputs. In Table 3, we can see the proposed projection for the outputs of each inefficient version. If we modify the values of all outputs according to the suggested projections and re-run the DEA solver, we obtain the efficiency scores illustrated in Figure 2. As it can be observed, all units lie now on the efficient frontier. Nevertheless, only 9 units have become fully efficient. More especially, v.3 remains inefficient. This means that there is still room for improvement in its output value y. 230

Version Table 3. Projections on the efficient frontier Actual Value Output (y ) Output 2 (y 2 ) Projecte d Value Differenc e (%) Actual Value Projecte d Value Differenc e (%) v. 6 4.684 44.74% 4 9.789 44.74% v.2 3 6 23.08% 7 9 28.57% v.3 3 6 00% 9 2 33.33% v.4 4 75% 2 9.09% v.5 0% 2 2 0% v.6 5 6 6.67% 4 9 25% v.7 6 83.33% 2 2 0% v.8 6 6 0% 9 9 0% v.9 0 3.285 32.86% 8 0.628 32.86% v.0 6 45.45% 3 9 200% y2 2 0 8 6 4 2 0 Efficient Frontier v.3 v.4, v.5, v.7 v.9 v. 2 4 6 8 0 2 4 6 y v.2, v.6, v.8, v.0 Figure 2. Efficient frontier after the projection of inefficient units Let us now consider that we re-examine the aforementioned system, modifying the values of only one output each time. The first case includes changing only the values of output (y ) according to the projections. The results obtained show that 6 versions, namely v.2, v.5, v.6, v.7, v.8, and v.0 attain the maximum efficiency score (that is unity) while only 3 of them, particularly v.5, v.7, v.8 are fully efficient. Versions v.2, v.6 and v.0 have not become efficient although they lie on the efficient frontier. This practically means that there are some slacks for these units. More specifically, it is considered that there is a slack when one of the outputs (in our case) can be further improved. The second case includes modifying only the values of output 2 (y 2 ). According to the results, 5 versions, namely v.3, v.4, v.5, v.7 and v.8 obtain an efficiency score equal to unity. However, only 2 of them, v.5 and v.8, are fully efficient while the rest ones have not become efficient in spite of lying on the efficient frontier. Once again, this means that there are slacks for the particular units. 4. CASE STUDIES 4. Context In our approach, we have applied DEA in order to examine its applicability and suitability for estimating the evolution of two open-source software projects, namely JFreeChart and JMol. JFreeChart is an open-source chart library. It supports bar charts, pie charts, line charts, time series charts, scatter plots, histograms, simple Gantt charts and more. The JFreeChart project was founded in 2000, by David Gilbert. Today, JFreeChart is the most widely used chart library for Java. In our research we have employed 22 successive versions, ranging from 0.9.0 to. For the analysis of this project we have used as outputs several metrics. However, we have not employed any inputs since the presence of inputs could affect heavily the final results overshadowing the importance of output metrics. More specifically, we have considered as outputs cohesion, fan-in coupling, and fan-out coupling. JMol is a free, open source molecule viewer for students, educators, and researchers in chemistry and biochemistry. It is cross-platform, running on Windows, Mac OS X, and Linux/Unix systems. Its first version, JMol version 0.0, was released in December 2004 and since then it has been continually evolving. For the purpose of this paper, we have used 26 successive versions, ranging from.0 to.6. In the same way, for the analysis of JMol, we have employed coupling, cohesion, and complexity as outputs and we have not used any inputs. Concerning the measurement of the outputs we have used the Message Passing Coupling (MPC) for coupling, the Lack of Cohesion of Methods (LCOM) for cohesion, and Cyclomatic Complexity per Method for complexity. The selection of the above three metrics was based on the fact that they constitute the most common way for evaluating the design quality of an objectoriented system. It should be noted that all computations have been extracted employing the software package DEA-Solver []. 4.2 Employed Metrics In this section we are going to briefly illustrate what the abovementioned metrics actually measure. Message Passing Coupling (MPC) measures the complexity of message passing among classes [2]. Generally, in object-oriented programming message passing is the primary means of communication. Thus, the objects of a project can send messages to each other, for example when an object demands a service that other objects provide. A message consists of the object ID, the service (method) that the object requests and the parameter list for the method. However, the types of messages sent are defined in classes. For this reason, we calculate message passing at class level. Therefore, according to Li and Henry [2], MPC constitutes the number of send statements that are defined in a class and reveals the amount of the messages passed among the objects. Essentially, it implies the dependence of the implementation of local methods on other classes methods. The desirable value of this metric is low. The LCOM measures the lack of cohesion of a class, that is, the degree by which the methods in a class are unrelated to each other [3]. More specifically, it is calculated by the number of disjoint set of local methods. Disjoint sets constitute a collection of sets which do not intersect with each other [2]. When a class is cohesive its local methods use the same set of variables. High cohesion is desirable as it implies good class subdivision. 23

Generally, high cohesion entails simplicity, facilitates reusability and maintainability of the software system. On the other hand, lack of cohesion usually increases complexity and consequently the occurrence of errors during the development process. A solution to reduce the lack of cohesion is to split the classes into two or more classes. Cyclomatic Complexity per Method is a metric of complexity, proposed by McCabe in 976 [4]. Generally, this metric measures the number of the linearly independent paths existing in the program s source code, and provides a single unified numerical score for each method. We can calculate its value using the control flow chart of the examined program. Cyclomatic complexity can also be applied to individual functions, modules, methods or classes. Moreover, it provides an upper limit for the number of test cases required to attain complete branch coverage for a method and subsequently help us control the size of a program. The desirable value for Cyclomatic Complexity is low. Methods with a high value of complexity are usually complex and should be split. The fan-in coupling measures the number of inputs that a function uses. Inputs can be parameters and global variables that are used. There are two approaches to calculate fan-in coupling: the structural and the informational [5, 6]. Informational approach considers data communication (e.g., through parameter passing), whereas structural approach considers exchange of program control (e.g., via function calls or method invocations) [7]. The fan-out coupling is a structural metric equal to the number of modules called within a given module. The outputs can be parameters or global variables (modified). There are also two approaches to calculate fan-out coupling: the structural and the informational [5, 6]. 4.3 Case Study : JFreeChart The data for the output variables (metrics) of JFreeChart are given in Appendix A, Table A.. The results obtained by DEA for this project reveal that there are only 3 efficient versions, which have attained an efficiency score equal to unity:,, and. The rest 9 versions are slightly inefficient given that their efficiency scores are smaller, but very close to one. The scores, ranking and the reference set for each version are shown in Table 4. According to the summary produced by DEA, the maximum score is equal to, the minimum one is 0.95, while the average score for all versions is 0.982. Moreover, the versions which are proposed as reference set to other DMUs are the efficient ones, namely,, and. Table 4. Scores and reference set of each version of JFreeChart No. DMU Score Rank 0.9.0 0.994 8 2 0.9. 0.996 6 3 4 Reference set (lambda) (0.728) (0.83) (0.882) 5 0.9.4 0.957 9 6 0.9.5 0.955 20 7 0.9.6 0.955 20 8 0.9.7 0.958 8 9 0.9.8 0.959 7 0 0.9.9 0.950 22 0.9.0 0.972 6 2 0.9. 0.977 5 3 0.9.2 0.983 4 4 0.9.3 0.993 5 6 0.9.5 0.998 4 7 0.9.6 0.997 5 8 0.9.7 0.993 9 9 0.9.8 0.996 7 20 0.9.9 0.989 3 2 0 0.993 0 22 0.99 2 (0.248) (0.230) (0.230) (0.244) (0.435) (0.330) (0.309) (0.68) (0.038) (0.27) (0.220) (0.20) (0.58) (0.74) (0.255) (0.255) (0.45) (0.394) (0.78) (0.564) (0.669) (0.690) (0.83) (0.980) (0.96) (0.95) (0.872) (0.779) (0.789) (0.84) (0.576) (0.53) (0.53) (0.56) (0.528) (0.576) Figure 3 provides a more comprehensive view of the efficiency score s evolution for all versions of JFreeChart. It should be noted that the efficiency score is not always increasing during the evolution of the project. More specifically, as we have already mentioned, versions and have attained the maximum efficiency score while their posterior ones, even the most recent versions, except, are evaluated as less efficient. This fact comes as no surprise since it is common for software systems to suffer from the so called software aging symptom, according to which the design quality deteriorates over time [8]. As already mentioned, DEA can assist in identifying opportunities for improvement. The projections of the inefficient units onto the efficient frontier indicate the extent by which selected output metrics should be improved in order for an examined version to become efficient. For example, version 0.9.9 which according to DEA has an efficiency score less than one, could be projected onto the efficient frontier by increasing fan-in coupling and cohesion, and by decreasing fan-out coupling, as it would be reasonable. How much each of the selected metrics should be changed is automatically extracted by the application of DEA and the percentage of change is shown in Appendix A, Table A.2. For version 0.9.9, if fan-in is increased by.06%, fan-out is decreased by.05%, and cohesion is increased by 38.24% then the corresponding version would be efficient. 232

score.0 0.99 0.98 0.97 0.96 0.95 0.94 0.93 0.92 0.9.0 0.9. 0.9.4 0.9.5 0.9.6 0.9.7 0.9.8 0.9.9 0.9.0 versions Figure 3. Efficiency scores for the JFreeChart versions To look out for opportunities for improvement, as already explained, one has to consult the reference set of each inefficient version. For version 0.9.9, according to Table 4, the reference set contains efficient versions (against which the examined version resembles by 22%) and (against which the examined version resembles by 78%). Since version is the closest preceding version, it can be chosen as a benchmark to investigate which properties of the design in version 0.9.9 have deteriorated compared to version. 4.4 Case Study 2: JMol The data for the output variables (metrics) of JMol are given in Appendix B, Table B.. The results provided by DEA for JMol reveal that initially there are only 3 efficient versions, which are characterized by an efficiency score equal to one:.0,, and. It should be noted that in this case, there are versions that have attained the maximum efficiency score, such as.0.2 and.0.3, which however are not fully efficient (as in the example in section 3). Table 5 contains the scores, the ranking, and the reference set for each version, while Figure 4 provides a more representative view of the efficiency scores for all the versions of the project. According to the summary produced by DEA, the maximum efficiency score is equal to ; the minimum is 0.938, while the average score is 0.989. Furthermore, the versions most frequently proposed as reference set to other versions are and. At this point, we should mention that version.0, in spite of being efficient, is not proposed as reference set to none. Table 5. Scores and reference set of each version of JMol No. DMU Score Rank Reference set (lambda).0.0 2 3.0.2 4 4.0.3 4 5.2.0 0.986 2 (0.767) (0.232) 6.2. 0.986 3 (0.77) (0.228) 0.9. 0.9.2 0.9.3 0.9.5 0.9.6 0.9.7 0.9.8 0.9.9 0 7.2.3 0.985 4 8.2.4 0.985 4 9.2.5 0.985 8 0.2.6 0.985 9.2.7 0.985 6 2.2.8 0.985 7 3.2.9 0.985 20 4.2.0 0.985 2 5.2. 0.984 24 6.2.2 0.984 22 7.2.3 0.984 22 8.2.4 0.984 25 9 20.4. 0.999 6 2.4.2 0.999 7 22.4.3 0.999 8 23.4.4 0.999 8 24.4.5 0.999 0 25.4.6 0.999 26.6 0.938 26 (0.772) (0.772) (0.766) (0.764) (0.76) (0.764) (0.769) (0.766) (0.76) (0.76) (0.76) (0.766) (0.269) (0.227) (0.227) (0.233) (0.235) (0.238) (0.235) (0.230) (0.233) (0.238) (0.238) (0.238) (0.233) (0.987) (0.985) (0.983) (0.983) (0.988) (0.985) (0.730) In Figure 4, we can observe the evolution of software quality according to the efficiency score of each version. The overall conclusion that can be drawn is that there are no dramatic changes in software quality (according to the selected metrics) except for the final version. According to the projections, we can comprehend the extent by which selected output metrics should be increased or decreased in order to convert an examined version to an efficient one. For example, version.6, which has an efficiency score equal to 0.938, could be projected onto the efficient frontier by increasing cohesion and by decreasing coupling and complexity. DEA, as mentioned before, computes automatically the corresponding changes required for each of the selected metrics. The percentage of change is shown in Appendix B, Table B.2. For version.6, if CC is decreased by 6.3%, LCOM is decreased by 3.76%, and MPC is decreased by 6.3%, then the corresponding version would be efficient. 233

Figure 4. Efficiency scores for the JMol versions According to Table 5, the reference set of version.6 contains efficient versions (against which the examined version resembles by 27%) and (against which the examined version resembles by 73%). Thus, version.0.4, which is the closest preceding version, can be used as a benchmark to look out for possible improvement opportunities. 5. CONCLUSIONS In this paper, we have illustrated the employment of Data Envelopment Analysis as a tool in order to analyze the evolution of software quality over successive versions. In particular, we stressed the ability of DEA to highlight which versions are inefficient based on a set of selected metrics, how and to what extent these versions can be improved, and which other versions can serve as a benchmark. Inefficient software versions can be compared against their efficient peers in order to comprehend which aspects of the design quality can be improved. By computing a single efficiency score for each version of a software project, a clear picture of the evolution of the project can be acquired. This is a significant advantage of DEA compared to the analysis of often convoluted trends of selected metrics in the field of software engineering. The results from the application of DEA can provide a unified view of these metrics. To summarize, the contribution of this work is that it demonstrates the use of a well known methodology in production economics, in the context of software quality assessment. 6. REFERENCES [] Correia, J.P., Kanellopoulos, Y. and Visser J. 2009. A survey-based study of the mapping of system properties to ISO/IEC 926 maintainability characteristics. 25th IEEE International Conference on Software Maintenance (ICSM'2009). (Edmonton, Canada). 6 70. [2] Bansiya, J. and Davis, C.G. 2002. A hierarchical model for object-oriented design quality assessment. IEEE Transactions on Software Engineering. 28,, 4 7. [3] Spinellis, D., Gousios, G. and Samoladas K. 2009. Evaluating the quality of open source software. Electronic Notes in Theoretical Computer Science. 233, 5 28. [4] Burger, S. and Hummel, O. 202. Lessons learnt from gauging software metrics of cabin software in a commercial airliner. ISRN Software Engineering. Article ID 62305. [5] Charnes, A., Cooper, W. W. and Rhodes, E. 978. Measuring the efficiency of decision making units. European Journal of Operational Research. 2, 6, 429 444. [6] Johansson, H. 2005. Technical, allocative, and economic efficiency in Swedish dairy farms: The Data Envelopment Analysis versus the stochastic frontier approach. In XI th International Congress of the European Association of Agricultural Economists (Copenhagen, Denmark, August 24-27). [7] Farrell, M. J. 957. The measurement of productive efficiency. Journal of the Royal Statistical Society. 20, 253 28. [8] Coelli, T. 996. A guide to DEAP version 2.: A Data Envelopment Analysis (computer program). CEPA working papers No. 8. University of New England, Australia. [9] Cooper, W. W., Seiford, L. M. and Tone, K. 2007. Data Envelopment Analysis: A comprehensive text with models, applications, references and DEA-Solver software. 2 nd edition. Springer Science+Business Media. [0] Chatzigeorgiou, A. and Stiakakis, E. 203. Combining metrics for software evolution assessment by means of Data Envelopment Analysis. Journal of Software Maintenance and Evolution: Research and Practice. 25, 3, 303 324. [] SAITECH Inc. Products DEA Solver PRO, http://www.saitech-inc.com/products/prod-dsp.asp, March 203. [2] Li, W. and Henry, S. 993. Object-oriented metrics that predict maintainability. Journal of Systems and Software. 23, 2, 22. [3] Chidamber, S. R. and Kemerer, C. F. 99. Towards a metrics suite for object oriented design. In Proceedings of the 6 th international conference on object-oriented programming systems, languages, and applications (Phoenix, Arizona, October 06-). 97 2. [4] McCabe, T. J. 976. A complexity measure. IEEE Transactions on Software Engineering. SE-2, 4, 308 320. [5] Henry, S. and Kafura, D. 98. Software structure metrics based on information flow. IEEE Transactions on Software Engineering. 7, 5, 50 58. [6] Lee, Y., Yang, J. and Chang, K. H. 2007. Metrics and evolution in open source software. In Proceedings of the 7th international conference on quality software (Portland, Oregon, October -2). 9 97. [7] Chowdhury, I. and Zulkernine, M. 20. Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities. Journal of Systems Architecture. 57, 294 33. [8] Parnas, D. L. 972. On the criteria to be used in decomposing systems into modules. Communications of the ACM. 5, 2, 053 058. 234

Appendix A Version Table A.. JFreeChart data Avg fan-in coupling Avg. fan-out coupling Average cohesion 0.9.0.9 2. 2.7 0.9. 2 2.2 2.9.8 2 2.9.6 0.9.4 3. 4. 2.8 0.9.5 2.8 3.8 2.6 0.9.6 2.8 3.8 2.6 0.9.7 2.7 3.5 2.4 0.9.8 2.8 3.6 2.5 0.9.9 3 4.2 2.7 0.9.0 3.2 4. 3.2 0.9. 3.4 4.4 3.4 0.9.2 3.6 4.4 3.5 0.9.3 4 4.8 3.8 5.3 5.4 4. 0.9.5 5.2 5.3 4 0.9.6 5. 5.2 3.8 0.9.7 5 5.2 9.8 0.9.8 4.7 4.7 9.9 0.9.9 4.2 4.3 9.8 0 4.3 4.3 9.8 4.5 4.6 9.8 Table A.2. Required changes according to the projections (JFreeChart) Version Req. change in fan-in coupling (%) Req. change in fan-out coupling (%) Req. change in cohesion (%) 0.9.0 0.54% -0.54% 0.54% 0.9. 0.3% -0.3% 0.3% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.9.4 4.4% -4.22% 4.4% 0.9.5 4.63% -4.43% 4.63% 0.9.6 4.63% -4.43% 4.63% 0.9.7 4.30% -4.2% 4.30% 0.9.8 4.7% -4.00% 4.7% 0.9.9 5.9% -4.93% 5.9% 0.9.0 4.36% -2.78% 2.86% 0.9. 5.55% -2.2% 2.26% 0.9.2 4.54% -.67%.70% 0.9.3 5.06% -0.70% 0.70% 0.00% 0.00% 0.00% 0.9.5 0.% -0.% 0.37% 0.9.6 0.23% -0.23%.47% 0.9.7 0.6% -0.6% 42.64% 0.9.8 0.34% -0.34% 39.9% 0.9.9.06% -.05% 38.24% 0 0.67% -0.67% 38.5% 0.82% -0.8% 39.83% Appendix B Table B.. JMol data Version CC LCOM MPC Versions CC LCOM MPC.0 599 28 4200.2.0 9245 49 7463 5993 28 498.2. 9252 48 7465.0.2 5995 28 499.2.2 925 48 7465.0.3 5995 28 420.2.3 925 48 7465.2.0 9200 48 7436.2.4 9265 48 747.2. 920 49 7439 2797 408 9743.2.3 925 49 7449.4. 2798 409 9754.2.4 925 49 7449.4.2 280 409 9756.2.5 9234 49 7459.4.3 2804 409 9758.2.6 9238 49 7460.4.4 2804 409 9758.2.7 9234 49 7456.4.5 283 409 9759.2.8 9236 49 7458.4.6 283 409 9764.2.9 924 49 7463.6 24643 383 2282 Table B.2. Required changes according to the projections (JMol) Version Req. Change in average CC (%) Req. Change in average LCOM (%) Req. Change in average MPC (%).0-0.00% -0.00% -0.00% -0.00% -0.00% -0.00%.0.2-0.0% -0.00% -0.0%.0.3-0.0% -0.00% -0.02%.2.0 -.34% -4.52% -.34%.2. -.35% -4.79% -.35%.2.3 -.4% -4.80% -.4%.2.4 -.4% -4.80% -.4%.2.5 -.47% -4.63% -.47%.2.6 -.48% -4.57% -.48%.2.7 -.46% -4.49% -.46%.2.8 -.47% -4.56% -.47%.2.9 -.49% -4.72% -.49%.2.0 -.49% -4.64% -.49%.2. -.5% -4.32% -.5%.2.2 -.5% -4.35% -.5%.2.3 -.5% -4.35% -.5%.2.4 -.54% -4.49% -.54% -0.00% -0.00% -0.00%.4. -0.04% -.35% -0.04%.4.2-0.05% -.50% -0.05%.4.3-0.06% -.63% -0.06%.4.4-0.06% -.63% -0.06%.4.5-0.07% -.24% -0.07%.4.6-0.09% -.52% -0.09%.6-6.3% -3.76% -6.3% 235