Comparison of various Techniques for Software Effort Estimation

Size: px
Start display at page:

Download "Comparison of various Techniques for Software Effort Estimation"

Transcription

1 Comparison of various Techniques for Software Estimation 1 RshmaChawla, 2 Deepak Ahlawat 1,2 MMICTBM, MMU Mullana 1 rshma.chawla@mmumullana.org, 2 deepakahlawat1983@gmail.com Abstract: The most significant activity in software project management is Software development effort prediction. Software development effort estimation is the process of predicting the most realistic use of effort required for developing software based on some parameters. Software effort estimation at early stages of project development holds great significance for the industry to meet the competitive demands of today s market. In this paper, software development effort estimation using Fuzzy Triangular Membership Function, GBell Membership Function, Gauss2 Membership Function and Trapezoidal Membership Function is implemented using Mamdani Type Fuzzy inference system of Fuzzy Logic Toolbox Software of Matlab 7.5 and the results of these membership functions are compared with each other and with COCOMO model. It is found that the Fuzzy Logic Model using Trapezoidal Membership Functionprovided best results. However, if transition between the phases is smoothly desirable, then Gauss2 Membership Function gives better result then GBell Membership Function. The criterion chosen for comparing the results is Mean Magnitude of Relative Error (MMRE). Keywords:Software Estimation, Fuzzy Logic, COCOMO Model, Membership Functions, NASA63 dataset. I. INTRODUCTION Software development is the computer programming, documenting, testing, and bug fixing involved in creating and maintaining applications and frameworks involved in a software release life cycle and resulting in a software product Software Development Estimation Software effort estimation is the prediction of the likely amount of cost, time, and staffing levels required to build a software system at an early stage during a project. It is difficult to obtain the effort estimate during the preliminary stages, because of the limited resources available at that time [7] Need for Estimation 1. It would facilitate increased control of time and overall cost benefit in software development life cycle. 2. Software development effort estimates are the basis for project bidding and planning. 3. Software effort estimation has even been identified as one of the three most demanding challenges in software application areas. During the development process, the cost and time estimates are useful for the initial rough validation and monitoring of the project s completion process. And in addition, these estimates may be useful for project productivity assessment phases Size Estimation The estimation of size is very critical and difficult area of project planning. It has been recognized a crucial step from the very beginning. Metric used for size estimation is the Source Lines of Code.A line of code is any line of program text that is not a comment or a blank line, regardless of the number of statements or fragment of statements on the line. This specially includes all lines containing program header, declarations, and executable and non-executable statements.unit of measurement is KLOC (Kilo lines of code). There are several cost, schedule, and effort estimation models which use LOC as an input parameter, including the widely-used Constructive Cost Model (COCOMO) invented by Barry Boehm [7] COCOMO Model The Constructive Cost Model (COCOMO) is an algorithmic software cost estimation model developed by Barry W. Boehm. COCOMO was first published in Boehm s 1981 book Software Engineering Economics as a model for estimating effort, cost, and schedule for software projects. COCOMO is a model that allows one to estimate the cost, effort, and schedule when planning a new software development activity. This model estimates the total effort in terms of person months of the technical project staff. COCOMO is a hierarchy of software cost estimation models, which include basic, intermediate and detailed sub models. Boehm proposed 3 modes of projects: 1

2 1. Organic mode simple projects that engage small teams working in known and stable environments. 2. Semi-detached mode projects that engage teams with a mixture of experience. It is in between organic and embedded modes. 3. Embedded mode complex projects that are developed under tight constraints with changing requirements Intermediate COCOMO Model The basic model initially developed by Barry Boehm allowed for a quick and rough estimate, but it resulted in a lack of accuracy. So Boehm introduced (reduced large number of candidate factors down to relatively manageable number of factors which can be used for practical software cost estimation) resulting set of 15 factors (predictors), or cost driver attributes. This model is referred as Intermediate COCOMO Model or COCOMO II. The cost drivers are grouped into following four categories [7, 23] 1. Product attributes (a) (b) (c) Required software reliability (RELY) Database size (DATA) Product complexity (CPLX) 2. Computer attributes (a) (b) Execution time constraints (TIME) Main storage constraints ( STOR) 2 (c) (d) Virtual Machine volatility (VIRT) Computer Turnaround time (TURN) 3. Personal Attributes (a) (b) (c) (d) (e) Analyst capability (ACAP) Application experience (AEXP) Programmer capability (PCAP) Virtual machine experience (VEXP) Programming language experience (LEXP) 4. Project attributes (a) (b) (c) Modern programming practices (MODP) Use of software tools (TOOL) Required development schedule (SCED) Each of these cost driver attributes determines a multiplying factor which estimates the effect of the attribute on software development effort. These multipliers are applied to nominal COCOMO development effort estimate to obtain a refined estimate of software development effort [23]. Table 1.1: Intermediate COCOMO nominal effort estimating equations Development Mode Organic Semi-Detached Embedded Table 1.2: multipliers for different cost drivers Nominal Equation, Ei (MM) NOM = 3.2(KDSI) 1.05 (MM) NOM = 3.0(KDSI) 1.12 (MM) NOM = 2.8(KDSI) 1.20 S. No Cost Driver Symbol Very Low Low Nominal High Very High Extra High 1 RELY DATA CPLX TIME STOR VIRT TURN ACAP AEXP PCAP VEXP LEXP MODP TOOL SCED Fuzzy Logic Approach The multiplying factors for all 15 cost drivers are multiplied to get the effort adjustment factor (EAF). The final effort estimate, E, is obtained by multiplying the initial estimate by the EAF. That is, Development = EAF * Ei...(1) Since fuzzy logic foundation by LotfiZadeh in 1965, it has been the subject of important investigations [15, 16]. It is a mathematical tool for dealing with uncertainty and also it provides a technique to deal with imprecision and information granularity. The fuzzy logic model uses the fuzzy logic concepts introduced by LotfiZadeh. Fuzzy reasoning consists of three main components:

3 fuzzification process, inference from fuzzy rules and defuzzification process. Fuzzification process is where the objective term is transformed into a fuzzy concept. The membership functions are applied to the actual values of variables to determine the confidence factor or membership function (). Fuzzification allows the input and output to be expressed in linguistic terms. Inferencing involves defuzzification of the conditions of the rules and propagation of the confidence factors of the conditions to the conclusion of the rules. A number of rules will be fired and the inference engine assigned the particular outcome with the maximum membership value from all the fired rules Parameters Analysis The main parameter for the evaluation of cost estimation models is the Magnitude of Relative Error (MRE) which is defined as follows Magnitude Relative Error (RE) = E Eˆ Where E = Estimated, Ê =Actual. The MRE value is calculated for each observation i whose effort is predicted. The aggregation of MRE over multiple observations (N) can be achieved through the Mean MRE (MMRE) as follows: Lower value of MMRE is better [24]. E (2) II. FUZZY IDENTIFICATION A fuzzy model [6] is used when the systems are notsuitable for analysis by conventional approach or when theavailable data is uncertain, inaccurate or vague [15, 17]. The pointof Fuzzy logic is to map an input space to an output spaceusing a list of if-then statements called rules. All rules areevaluated in parallel, and the order of the rules is unimportant.for writing the rules, the inputs and outputs of the system areto be identified. To obtain a fuzzy model from the dataavailable, the steps to be followed are: 1. Select a Mamdani type Fuzzy Inference system. 2. Define the input variables mode, size and output variableeffort. 3. Set the type of the membership functions (T orgbell orgauss2 or Trapezoidal). 4. Write the appropriate fuzzy rules in Rule editor. 5. Select the Rule viewer and calculate the estimated fuzzy effort. The framework is shown in Fig Figure 2.1: Fuzzy Framework for Estimation MODE Estimated SIZE Fuzzy Rules In FIS, first gives the range of MODE of the project for particular Membership Function then gives the range for the SIZE. The range of Development is the Actual given in the NASA63 dataset Fuzzy Rules FIS Our rules [24] are based on the fuzzy sets of MODE, SIZE and EFFORT appears in the following form: If MODE is Organic and SIZE is S1 then EFFORT is EF1 If MODE is Semidetached and SIZE is S1 then EFFORT is EF2 If MODE is Embedded and SIZE is S1 then EFFORT is EF3 If MODE is Organic and SIZE is S2 then EFFORT is EF4 If MODE is Semidetached and SIZE is S2 then EFFORT is EF5 If MODE is Embedded and SIZE is S3 then EFFORT is EF5 If MODE is Embedded and SIZE is S4 then EFFORT is EF3 If MODE is Organic and SIZE is S3 then EFFORT is EF4 If MODE is Embedded and SIZE is S5 then EFFORT is EF6 If MODE is Organic and SIZE is S4 then EFFORT is EF This is represented in MATLAB as shown in figure below: Figure 2.2: Fuzzy Rules 3

4 2.2.Membership Functions A membership function () is a curve that defines how each point in the input space is mapped to a membership value (or degree of membership) between 0 and 1. The input space is sometimes referred to as the universe of discourse, a fancy name for a simple concept.one of the most commonly used examples of a fuzzy set is the set of tall people. In this case the universe of discourse is all potential heights, say from 3 feet to 9 feet, and the word tall would correspond to a curve that defines the degree to which any person is tall. If the set of tall people is given the well-defined (crisp) boundary of a classical set, we might say all people taller than 6 feet are officially considered tall. But such a distinction is clearly absurd. It may make sense to consider the set of all real numbers greater than 6 because numbers belong on an abstract plane, but when we want to talk about real people, it is unreasonable to call one person short and another one tall when they differ in height by the width of a hair [31] Fuzzy Logic Membership Functions used in this Research 1. Trimf - Triangular-shaped built-in membership function Syntax: y = trimf(x, params) y = trimf(x,[a b c]) Description: The triangular curve is a function of a vector, x, and depends on three scalar parameters a, b, and c, as given by f (x;a,b,c) = max( min(, (3) 2.Gbellmf-Generalized bell-shaped built-in membership function Syntax:y = gbellmf(x,params) Description: The trapezoidal curve is a function of a vector, x, and depends on four scalar parameters a, b, c, and d, as given by f(x;a,b,c,d)=max(min( ),0) (5) The parameters aandd locate the feet of the trapezoid and the parameters b and c locate the shoulders. 4. Gauss2mf - Two-sided Gaussian membership function. Syntax:y = gauss2mf(x,[sig1 c1 sig2 c2]) Description:TheGaussianfunctiondependsontwoparame terssigandcasgivenby f (x;σ,c) =.(6) The function gauss2mf is a combination of two of the setwoparameters. The first function, specifiedbysig1andc1, determines the shape of the leftmostcurve. Thesecondfunctionspecifiedbysig2andc2determines the shape of the right-most curve. Whenever c1 < c2, thegauss2mf functionreachesamaximumvalueof1. Otherwise,themaximumvalue islessthanone. III. EXPERIMENTAL RESULTS Experiments were done by taking original datafrom NASA63 dataset [7]. The softwaredevelopment efforts obtained when usingcocomo and other membership functions wereobserved. Figure 3.1: Output of the results in MATLAB Description:The generalized bell function depends on three parameters a, b, and c as given by f(x;a,b,c) = (4) Where the parameter b is usually positive. The parameter c locates the center of the curve. Enter the parameter vector params, the second argument for gbellmf, as the vector whose entries are a, b, and c, respectively. 3. Trapmf-Trapezoidal-shaped built-in membership function Syntax: y = trapm f(x, [a b c d]) 4

5 Table 3.1:Developmental Estimate using various Techniques S. No Project id MODE SIZE EAF Actual Int. COCOMO Est. Est. using Trap Est. using T Est. using GBell Est. using Gauss E O O E E E E E SD O E E E O O O SD O O E E Table 3.2: Calculated MRE MRE using MRE using T MRE using GBell TRAP MRE using Inter. COCOMO Model MRE using Gauss

6 Table 3.3: Computed MMRE Developmental Model MMRE Intermediate COCOMO FL Trapezoidal FL Triangular FL GBell FL Gauss Figure 3.2: Comparison of MMRE 0.25 MMRE MMRE 0 Intermediate COCOMO FL TRAP FL T FL GBell FL Gauss2 DEVELOPMENTAL MODELS IV. CONCLUSION AND FUTURE RESEARCH This research work is to provide a technique for software cost estimation that performs better than other techniques on the accuracy of effort estimation. In this research an improved approach to software project effort is projected by the use of fuzzy sets rather than classical intervals in the COCOMO model. This study explores four fuzzy logic membership functions: Fuzzy Triangular Membership Function, GBell Membership Function, Gauss2 Membership Function and Trapezoidal Membership Function. Results shows that the fuzzy logic can predict the more accurate results than Classical Model (COCOMO). Also it can be easily seen that the value of MMRE for Trapezoidal Membership Function is the lowest. So Trap. is the best among the four fuzzy logic functions used. However if the transition between the phases is smoothly required then the linear s like Trapezoidal and Triangular are not suitable. In that case it was found that Gauss2 has lower value of MMRE than GBell and hence is better. The above research work can be analyzed in terms of feasibility and acceptance in the industry.it can be deployed on COCOMO II environment with experts providing required information for developing fuzzy sets and an appropriate rule base. Future research involves that various techniques like neural networks can also be used for Software effort estimation [3]. Fuzzy logic techniques can also be compared with other criterions like VAF, BRE, MARE, Prediction %. By applying some more effort there is the possibility to develop other customized membership functions, which represents inputs more closely to tolerate uncertainty and imprecision in inputs, and hence restricting the same to be propagated to the outputs. REFERENCES [1] A. R. Gray, S. G. MacDonell, Applications of Fuzzy Logic to Software Metric Modelsfor Development Estimation,Fuzzy 6

7 Information Processing Society 1997 NAFIPS 97, Annual Meeting of the North American, 21& 24 September 1997, pp. 394 & 399. [2] A.C. Hodgkinson and P.W. Garratt, A NeuroFuzzy Cost Estimator, in Proceedings of the 3rd International Conference on Software Engineering and Applications SAE, pp , [3] A.R. Venkatachalam, Software cost estimation using artificial neural networks, Proceedings of the 1993 International Joint Conference on Neural Networks, pp , [4] AbouBakarNauman, Romana Aziz., Development of Simple Estimation Model based on Fuzzy Logic using Bayesian Networks, IJCA Special Issue on Artificial Intelligence Techniques - Novel Approaches & Practical Applications AIT, [5] Albrecht A.J., and J.E. Gaffney, Software function, source lines of code, and development effort prediction: a software science validation, IEEE Transactions on Software Engineering, pp , November1983. [6] A. Idri, T. M. Khoshgoftaar, A. Abran. Can NeuralNetworks be Easily Interpreted in Software CostEstimation, IEEE Trans. Software Engineering, 2002,pp & [7] B. W. Boehm, Software Engineering Economics,Englewood Cliffs, NJ, Prentice Hall, [8] B. W. Boehm, C. Abts and S.Chulani, Software Development Cost Estimation Approaches- A Survey, Technical Reports, University of Southern California Centre for Software Engineering, usccse [9] Boetticher, G.D., An assessment of metric contribution in the construction of a neural network-based effort estimator, Proceedings of Second International Workshop on Soft Computing Applied to Software Engineering, [10] GeetikaBatra and MahimaTrivedi, A Fuzzy Approach for Software Estimation, International Journal on Cybernetics & Informatics (IJCI), Vol.2, No.1, pp. 9-15, February [11] ImanAttarzadeh and Siew Hock Ow, Software Development Estimation based on a New Fuzzy Logic Model, International Journal of Computer Theory and Engineering, Vol. 1, No. 4, October [12] ImanAttarzadeh and Siew Hock Ow, A Novel Algorithmic Cost Estimation Model Based on Soft Computing Technique, Journal of Computer Science 6 (2): ISSN Science Publications, pp , [13] Jones C., Estimating Software Costs, McGraw- Hill, [14] L. H. Putnam, A general empirical solution to the macro software sizing and estimating problem. IEEE Transactions on Software Engineering, SE-4(4),pp , [15] Lotfi A. Zadeh, Fuzzy sets,information and Control, Vol. 8, pp , [16] Lotfi A. Zadeh, Fuzzy Algorithms,Information and Control, Vol. 12, pp , [17] Lotfi A. Zadeh, Outline of a new approach to the analysis of complex systems and decision processes, IEEE Trans. Systems, Man and Cybernetics, Vol. 3, No. 1, pp , [18] Lotfi A. Zadeh, Making Computers Think like People, IEEE Spectrum, 8, pp , [19] Lotfi A. Zadeh, Fuzzy Logic, Neural Networks and Soft Computing, Communication of ACM 37(3): pp , [20] Lofti A. Zadeh, The Future of Soft Computing, In Joint 9th IFSA World Congress and 20th NAFIPS International Conference, Vancouver, Canada,2001.[21] Mamdani, E.H., Advances in the linguistic synthesis of fuzzy controllers, International Journal of Man-Machine Studies, Vol. 8, pp , [22] Mamdani, E.H., Applications of fuzzy logic to approximate reasoning using linguistic synthesis, IEEE Transactions on Computers, Vol. 26, No. 12, pp , [23] PankajJalote, An Integrated Approach to Software Engineering, Third Edition, Narosa Publishing House, [24] Prasad Reddy, Sudha.K.R., Rama Sree P, Ramesh, Fuzzy Based Approach for Predicting Software Development International Journal of Software Engineering (IJSE), Vol. 1, No. 1, pp. 1-11,May [25] Rahul Kumar Yadav,Dr. S. Niranjan, Software Estimation Using Fuzzy Logic: A Review, International Journal of Engineering Research & Technology (IJERT) Vol. 2, No. 5, ISSN: ,pp , May [26] Roger S Pressman, Software Engineering-A Practitioner s Approach, Tata McGraw-Hill Edition [27] RoheetBhatnagar, MrinalKanti Ghose, Vandana Bhattacharjee, Predicting the Early Stage Software Development using Mamdani FIS, RoheetBhatnagar et al, (IJCSIT) International Journal of Computer Science and 7

8 Information Technologies, Vol. 2, No. 4, pp , ISSN: , [28] RshmaChawla, Deepak Ahlawat, Mukesh Kumar, Software Development Estimation Techniques: A Review, International Journal of Electronics Communication and Computer Engineering, Vol. 5, No. 5, ISSN (Online): X, ISSN (Print): , pp ,September [29] RshmaChawla, Deepak Ahlawat, Mukesh Kumar, Improved Software Development Estimation Based on Fuzzy Logic Functions, International Journal of Engineering Sciences & Research Technology, ISSN: ,Vol. 3, No. 12, pp , December [30] RshmaChawla, Deepak Ahlawat, Improved Software Development Estimation Based on Four Fuzzy Logic Functions, International Journal on Advanced Computer Theory and Engineering (IJACTE), IRD India, ISSN (Print): , Vol. 4, No. 2, pp. 9-13, March [31] The Math Works Inc., Fuzzy logic toolbox- user guide, September