THE CORRELATION BETWEEN DEVELOPER-ORIENTED AND USER-ORIENTED SOFTWARE QUALITY MEASUREMENTS (A CASE STUDY)

Size: px
Start display at page:

Download "THE CORRELATION BETWEEN DEVELOPER-ORIENTED AND USER-ORIENTED SOFTWARE QUALITY MEASUREMENTS (A CASE STUDY)"

Transcription

1 THE CORRELATION BETWEEN DEVELOPER-ORIENTED AND USER-ORIENTED SOFTWARE QUALITY MEASUREMENTS (A CASE STUDY) M. Xenos, D. Stavrinoudis and D. Christodoulakis Summary This paper presents a case study on the correlation of user-oriented and developer-oriented measurements. Developer-oriented measurements were performed on 46 different software projects. These projects were given to a number of users, in order to act as quality evaluators, according to our useroriented measurements method. The results of this case study (measurements of 46 projects and evaluation responds from 33 users for each project) are presented and the correlation between developer-oriented and user-oriented product quality measurements is analysed. Michalis Xenos, Computer Technology Institute, Research Unit II: Software Engineering and Applications, 3 Kolokotroni Street, Patra, 26110, Greece, Tel , , Fax: , Introduction Product quality characteristics were initially described by Boehm (1978) and standardised in ISO-9126 (1991). According to ISO-9126, software quality could be broken down into six major characteristics (functionality, reliability, usability, efficiency, maintainability and portability). These product quality characteristics can be divided into two different sets: useroriented and developer-oriented characteristics. Users care about quality characteristics such as: functionality, reliability, usability, efficiency, flexibility, friendliness, simplicity, etc. simply because these are the characteristics which can easily be seen by the use of the product. On the other hand, developers care about quality characteristics such as: maintainability, portability, reusability, testability, etc. because these characteristics relate to their development efforts. According to McCall s (1977) model, developer-oriented characteristics could be measured by assigning to each characteristic a number of criteria and to each criterion a number of metrics. On the contrary, very few user-oriented characteristics can be measured in a similar manner. Reliability, for example, could be related to the measurable number of reported defects, but the same does not apply to friendliness and usability. Such user-oriented characteristics can only be 'measured' 1 by querying the end-user as regards these characteristics. 1 A measure is an empirical objective assignment of a number to an entity to characterize a specific attribute. Therefore, such user-oriented 'measurements' must be handled as objectively as possible and with a method that eliminates errors when measuring users opinion.

2 The case study presented in this paper is the measurement, according to two different methods, of 46 software projects with the same underlying purpose. The first method is based on automated developer-oriented measurements using a triptych of metrics (size, complexity and data metrics). The second method is based on user-oriented measurements using a sum of 1551 questionnaires to evaluate product quality. Section 2 presents the existing research problems that have prodded this case study, while section 3 briefly presents both the methods and metrics used. Section 4 presents the measurements results and illustrates the findings. Finally, section 5 summarises the conclusions reached as a result of this case study. 2. The problem Modern software developers that have a quality assurance program use metrics and standards not only to assure developer-oriented quality characteristics, like reusability and maintainability, but also to improve product quality as perceived by the end-users of the product (the customers). The questions which present themselves are thorny ones. Firstly, whether the satisfactory use of such metrics ensures end-user satisfaction. Secondly, whether the metrics can be relied on by the end-user. According to Fenton s (1991) axiom, good internal structure provides good external quality. Naturally a properly established quality assurance program will help an enterprise to improve overall software production by improving many process and product quality characteristics. A part of this external quality is the customer s perception of quality, but how and to what level is this related to the satisfaction of developer-oriented product metrics standards that enterprises expect. Such questions can be generalised as follows: a) Will a product that has succeeded in fulfilling developer-oriented metrics standards also satisfy the end-user? b) Will a product that has failed internal metrics standards also be seen as a failure by the end-user? 3. Metrics and Measurements Developer-oriented measurements use metrics that measure internal product characteristics, without conferring with the users. User-oriented measurements are direct measures of external product quality characteristics by questioning the end-users. Various methods have been developed to measure both types of characteristics. Sections 3.1 and 3.2 present the methods used in this case study and how measurements were interpreted and analysed. The advantages and disadvantages of the developer-oriented and user-oriented measurement methods are presented in table Developer-Oriented Measurements

3 In order to measure developer-oriented product quality characteristics three sets of metrics were used. Halstead s (1975) software science metrics, McCabe s (1976) cyclomatic complexity metrics and Tsai s (1986) data structure complexity metrics. These metrics, which make up a complete set of internal metrics, constitute a set that covers the triptych size, complexity and data structures. In this case study these metrics were used within the scope of a developer-oriented method (Xenos, 1994). Choosing from a variety of software science metrics, this case study has focused on the language level λ=n(log 2 n)(2n 2 /n 1 N 2 ) 2, which measures how well the programming language is used, and on the essential size ratio R=(n 1 log 2 n 1 + n 2 log 2 n 2 )/N, which detects if the code contains any code impurities (where n 1, n 2 is the number of distinct operators and operands, N 1, N 2 the number of total operators and operands, N=N 1 +N 2 is the program size and n=n 1 +n 2 is the vocabulary). Laying a greater weight on larger routines, the language level of the whole project can be defined as λ sum =(ΣN i λ i )/ΣN i. Equivalently for cyclomatic complexity the factor 10/V(g) sum was used, where 10 is the proposed by McCabe maximum complexity, V(g) sum =(ΣN i V(g) i )/(ΣN i ) is a measure of the complexity of the program and V(g) i =e-n+2 is the complexity of a routine (where e is the number of edges and n is the number of nodes). Finally, from data structure complexity the factor 1/(T+1) was used, where T is the greatest degree of the polynomials of the project s routines that derive from Tsai s formula C(K)=Σ(C(K i ))+S(K) and S(K)=(v+e)x L where v is the number of nodes, e is the number of edges and L is the number of simple circular paths into the reduced data graph. In the derived polynomials, as the degree of a polynomial grows, the data structure complexity of the program also grows. So the factor 1/(T+1) is disproportional to data structure complexity. Table 1 - Advantages and disadvantages Developer-oriented measurement methods Advantages Easy to automate and easy to collect. Cost effectiveness. Easy to analyse with statistical methods. Disadvantages Difficult to interpret. Measure internal quantities with vague relation to external quality characteristics. User-oriented measurement methods Advantages Directly measure the desired external product quality characteristics. Based on the quality definition (satisfied users). Disadvantages Not objective. Not cost effective. Difficult to analyse measurements, due to high error rates and various data scale types. The formula DO=0.2λ sum +0.2R+0.4(10/V(g) sum )+0.2(1/(T+1)) was used in order to summarise all developer-oriented metrics which were used in this case study. It must be made clear that the above formula does not measure a physical quantity of the product, but is both a customised collective formula created especially for this particular case study and a means by which the results of all the metrics can be combined. The purpose of DO is to facilitate the presentation of the results providing an estimation for the overall performance of each product regarding to the metrics used. The results presented in the following section are illustrated not only according to the correlation of DO with the user-oriented measurements, which the reader may find easier to study, but also separately for each of the metrics used. DO, as a

4 customised formula, focuses more on size and complexity and less on data structure since, in this particular case study, data structure complexity was limited for all projects and polynomial degrees vary only from 0 to 2, resulting in values of 1/(T+1) to be 1, 0.5 and 0.33 (actually, 87% of values were equal to 1). 3.2 User-Oriented Measurements In order to measure external product characteristics, a method (Xenos, 1995) that measures users opinion for product quality and weights users opinions according to their qualifications has been used. The QWUO (Qualifications Weighted Users Opinion) is measured as shown in the following equation: where O i [ 01] QWUO = i= n ( Oi Ei) i= 1 i= n, is the opinion of the user i for the program that is measured and E i 01, ] is the measurement of the user's i qualifications. The opinion O i of user i was measured by questioning his opinion for quality characteristics that can easily be detected when using the product. Such characteristics include functionality, reliability, friendliness, simplicity, etc. In order to estimate the qualifications E i of user i, three different points of the user s knowledge were measured; personal background, syntactic knowledge of the product and semantic knowledge of the application. Using the above equation, the opinions of users with different qualifications within the field of application contribute to the overall software evaluation, in accordance with their qualifications. In order to collect the user-oriented measurements, both questionnaires that evaluate product quality and users qualifications were structured in order to guide the user to select predefined responses that were ordered in interval scales (with choice bars, percentage estimations, etc.). When a multiple-choice format was used in the questionnaire, the end-user were told that the difference among their possible answers were of equal gravity. In this way, statistical analysis of the results was possible and non pragmatic statistical analysis was not required. i= 1 E i ( 4. The findings This case study presents measurement results from 46 projects written in ANSI C by fourth semester students of the Department of Computer Engineering and Informatics in the University of Patras. These projects were created to produce an environment of language tools that will operate under Ms-DOS. Each project consists of an average of 4 modules and each module consists of 1 to 14 routines. In order to collect user-oriented measurements for these projects, the projects deliverables were given for evaluation to 33 reviewers (software evaluators that act as possible users), in

5 order to evaluate the projects according to the method briefly presented in section 3.2. The overall effort consisted of the collection of sets of questionnaires, increased by the 33 sets used for the evaluation of users qualifications, summing up 1551 sets. The questionnaires were based only on the user-oriented characteristics extracted from the ISO9126 quality characteristics. The evaluators, some of whom were students and post-graduates of the Department of Computer Engineering and Informatics in the University of Patras and others were from local enterprises and whose knowledge of computers varied widely, were given these 46 projects in order to evaluate them. Evaluation was based solely on the day-to-day use of the projects. It must be stressed that, according to Thomas (1995), when customers are about to purchase a product, their judgement is affected not only by its quality (as fitness for use), but also by other characteristics like the price, the company that produced it, the accolade or condemnation it has received from software critics, its relation with other products, etc. In this case study, since the products were not for sale, such problems did not occur and the users judgement was focused entirely on product quality. The QWUO ungrouped distribution measurements are shown in table 2. Table 2 - QWUO measurement results Qualification Weighted Users Opinion for the 46 projects The users were asked to evaluate the projects equitably, which means that if they were strict (or charitable) with one project, they must remain the same strict (or charitable) with the others. Besides, the aim was not the isolated marking of each project, but the correlation between the projects, i.e. how much the correlation of the developer-oriented measurements keep up with the correlation of the user-oriented measurements. The results of the QWUO measurements follow a rather normal 2 distribution (as expected to do) with average and standard deviation 0.219, as can be seen by the bell curve of the grouped frequency distribution with class range 0.13 around the marked midpoints, illustrated in figure 1. 2 Considering the very small sampling size of the 46 projects the data distribution can be considered as normal, despite the slight anomalies.

6 ,05 0,18 0,31 0,44 0,57 0,70 0,83 0, ,483 0,705 0,927 1,149 1,371 1,593 1,815 2,037 Figure 1: Distribution of QWUO Figure 2: Distribution of DO Developer-oriented measurements were automatically contacted by using the software measurement and metrics environment ATHENA (Tsalidis, 1991). This environment provides completely automated measurements and therefore, the collection of raw data was effortless, as it should be in any enterprise that collects developer-oriented measurements. In table 3 the data for factor λ sum are shown. The projects are in the same order as in table 2. As can be seen by studying tables 2 and 3, the correlation between the QWUO and λ sum is highly positive and measured to be 81.43%. In table 4 the data for the factor R are shown in the same order as in table 2. Although the correlation between QWUO and R is not as high as between QWUO and λ sum, it is definitely a positive correlation and measured to be 66.62%. Table 3 - The language level measurement results λ sum factor Table 4 - R factor measurement results R factor Table 5 shows the data for factor 10/V(g) sum, also in the same order as in table 2. The correlation between QWUO and 10/V(g) sum was measured to be 54.94%. Finally, in table 6 the data for the customised formula DO are shown, also sorted as in table 2. The correlation between QWUO and DO was found to be 70.44%. DO also follows a normal distribution with

7 average and standard deviation 0.363, as can be seen by the bell curve of the grouped frequency distribution, illustrated in figure 2. Table 5 - Cyclomatic complexity measurement results 10 / V(g) sum factor Table 6 - DO Results DO Formula The aforementioned tables show a positive correlation between QWUO and each one of the distinct developer-oriented measurements and therefore a positive correlation between the QWUO and the DO formula. This correlation is not very high (except from the correlation of QWUO and λ sum ) and can not guide one to the conclusion that projects which have satisfied some or all developer-oriented measurements will also satisfy user-oriented measurements. The situation is very clear when studying the scatter plots that show the correlation between QWUO and the other factors. The following figures illustrate this correlation. Figure 3 shows the correlation between QWUO (in the horizontal bar) and λ sum (in the vertical bar). Figure 4 shows the correlation between QWUO (in the horizontal bar) and R (in the vertical bar). Figure 5 shows the correlation between QWUO (in the horizontal bar) and 10/V(g) sum (in the vertical bar). Finally, figure 6 illustrates the correlation between QWUO (in the horizontal bar) and DO (in the vertical bar). The scatter plots show the correlation between QWUO and developer-oriented measurements. The horizontal line shows the measurements for the QWUO and the vertical line shows the measurements for the developer-oriented metrics. Projects are marked as black points in the coordinates for each measurements line. The diagonal line (called the correlation line) is where all points should be, if the two measurement methods were 100% correlated. The points that are marked below the correlation line are measurements with higher score in useroriented method in relation to their score in developer-oriented measurements. These points represent projects, which although they have a low score in developer-oriented measurements, the end-users perception for quality was not equally low. The points that are marked above the correlation line are measurements with lower score in user-oriented method in relation to their score in developer-oriented measurements These points represent projects that, although they have high score in developer-oriented measurements, users perception for quality was not equally high.

8 2,8 2,4 2,0 1,6 1,2 0,8 0,4 0,0 0,00 0,20 0,40 0,60 0,80 1,00 1,8 1,6 1,4 1,2 1,0 0,8 0,6 0,4 0,2 0,0 0,00 0,20 0,40 0,60 0,80 1,00 Figure 3: QWUO and λ sum Figure 4: QWUO and R From the scatter plots of figures 3,4,5 and 6 it is obvious that programs with low developeroriented measurements score pure in user-oriented measurements. Besides there is no program which has low developer-oriented measurements while having high, or even average, user-oriented measurements. On the contrary, high developer-oriented measurements do not necessarily imply high user-oriented measurements. As the scatter plots show, there are many projects that score high in developer-oriented measurements, but score pure in user-oriented measurements. In any case, the points in the scatter plots are gathered around and over the correlation line. So, the derivative of this case study is that pure developer-oriented measurements always imply pure user-oriented measurements. Therefore, it is generally acceptable to set a limit to internal metrics in order to preserve the external quality of a program. For example, routines with a very low language level or a very high cyclomatic complexity can automatically be rejected, because they will yield pure external quality. But, having rejected these routines, there is no reliable method to find out if the routines with high developer-oriented measurements score, will score equally high in user-oriented measurements. Internal metrics are only a first indication and a good one for external quality, but they cannot always guarantee successful user-oriented measurements as well. To sum up, although the answer to the second question of section 2 is positive, we can not, with any amount of precision, answer positively to the first question.

9 2,0 1,8 1,6 1,4 1,2 1,0 0,8 0,6 0,4 0,2 0,0 0,00 0,20 0,40 0,60 0,80 1,00 2,0 1,8 1,6 1,4 1,2 1,0 0,8 0,6 0,4 0,2 0,0 0,00 0,20 0,40 0,60 0,80 1,00 Figure 5: QWUO and 10/V(g) sum Figure 6: QWUO and DO 5. Conclusion Developer-oriented measurements provide an easy and inexpensive way to detect and correct possible causes for low product quality as this will be perceived by the customers. Setting up measurement programs and metric standards will help in preventing failures to satisfy customers demand for quality. However, satisfaction of internal quality standards does not guarantee a priori success in fulfilling this customers demand for quality. Programs that will succeed in the developer-oriented measurements perhaps will not receive the same acknowledgement by the customers. Rare deployment of user-oriented measurements could be used in order to test the soundness of the developer-oriented measurements programs and, occasionally, even to calibrate internal metrics. Quality assurance teams must never forget that, despite what internal measurements indicate, the final judge for the quality of the produced software is the customer. References Boehm B. W, Brown J. R., Lipow M., MacLeod G. J., and Merritt M. J., Characteristics of Software Quality, New York: Elsevier North-Holland, Fenton N. E., Software Metrics: A Rigorous Approach, Chapman & Hall, Great Britain, ISBN , Halstead M. H., Elements of Software Science, Elsevier Publications, N-Holland, ISO/IEC Standard ISO-9126, "Software Product Evaluation - Quality Characteristics and Guidelines for their Use", McCabe T. J., A complexity measure, IEEE Transactions of Software Engineering, SE- 2(4), McCall J. A., Richards P. K., Walters G. F., Factors in Software Quality, US Rome Air Development Centre Reports NTIS AD/A , 015, 0,55, Thomas B., The Human Dimension of Quality, McGraw-Hill, London, ISBN , Tsai W. T., Lopez M. A., Rodreguez V., Volovik D., An Approach to Measuring Data Structure Complexity, COMPSAC86, pp , 1986.

10 Tsalidis C., Christodoulakis D., and Maritsas D., Athena: A Software Measurement and Metrics Environment, Software Maintenance: Research and Practice, Xenos M., and Christodoulakis D., An Applicable Methodology to Automate Software Quality Measurements, IEEE International Conference on Software Testing Reliability and Quality Assurance, STRQA' 94, New Delhi, India, Xenos M., and Christodoulakis D., Software Quality: The User's Point of View, Software Quality and Productivity, pp , Chapman & Hall, Great Britain, ISBN , 1995.