Software Efforts & Cost Estimation Matrices and Models. By: Sharaf Hussain

Software Efforts & Cost Estimation Matrices and Models By: Sharaf Hussain

Techniques for estimating Software Cost Lines of Code Function Point COCOMO SLIM

Lines of code (LOC) Lines of Code LOC NCLOC (Non commented LOC) CLOC (Commented LOC) Total Length (LOC) = NCLOC + CLOC KLOC = 1000 of LOC

Function Point Function points (FP) measure size in terms of the amount of functionality in a system. Function points are computed by first calculating an unadjusted function point count (UFC). Counts are made for the following categories (Fenton, 1997):

one of the primary goals of Function Point Analysis is to evaluate a system's capabilities from a user's point of view. the analysis is based upon the various ways users interact with computerized systems. From a user's perspective a system assists them in doing their job by providing five (5) basic functions. Two of these address the data requirements of an end user and are referred to as Data Functions. The remaining three address the user's need to access data and are referred to as Transactional Functions.

Function Points (FP) Based on the number and complexity of the system functions to be delivered to the customer Steps: (1) Categorize the functions according to type (input, output, database, interface, etc.) and complexity (simple, moderate, average, complex, highly complex) (2) Derive the number of function points: multiply the number of functions in each category by the appropriate complexity weights and total the number of PF (3) Determine the total Project Influence Factors (PIF): PIF types (distributed processing,multiple development sites, etc.), and levels of difficulty ( from 0-no difficulty, to 3-average difficulty, to 5- great difficulty) (4) Compute the Total Effort

The Five Components of Function Points Data Functions Internal Logical Files External Interface Files Transactional Functions External Inputs External Outputs External Inquiries

Internal Logical files logical master files in the system External Interface files machine-readable interfaces to other systems External inputs those items provided by the user that describe distinct application-oriented data (such as file names and menu selections) External outputs those items provided to the user that generate distinct application-oriented data (such as reports and messages, rather than the individual components of these) External inquiries interactive inputs requiring a response

External Input External Output External Inquiry Internal Logical File External Interface File External Input External Output Application Boundary Other Applications

Once this data has been collected, a complexity rating is associated with each count according to Table 1. Each count is multiplied by its corresponding complexity weight and the results are summed to provide the UFC. Weighting Factor Item Simple Average Complex External inputs 3 4 6 External outputs 4 5 7 External inquiries 3 4 6 External files 7 10 15 Internal files 5 7 10 Table 1. Function point complexity weights. UFC = 4I + 5O + 4E + 10L + 7F

Spell-Checker Specification The checker accepts as input a document file and an optional personal dictionary file. The checker lists all words not contained in either of these files. The user can query the number of words processed and the number of spelling errors found at any stage during processing. Errors Found # Words processed message User Words processes enquiry Document file Spelling Checker # errors message Report on misspelled words User Personal dictionary Dictionary

Spell-Checker Specification A. The two external inputs: document filename, personal dictionary-name B. The three external outputs: misspelled word report, number of words processed message, number of errors so far message C. The two external inquiries: words processed, errors so far D. The two external files: document file, personal dictionary E. The one internal file: dictionary A= # external inputs = 2 B= # external outputs = 3 C= # inquiries = 2 D= # external files = 2 E= # internal files = 1

Assume that the complexity of each item is average, if instead we learn that the dictionary file and the misspelled word report are considered complex UFC 4A 5B 4C 10D 10E UFC 4(2) (5 2 7 1) 4(2) 10(2) 10(1) UFC 8 17 8 20 10 UFC 63

The adjusted function point count (FP) is calculated by multiplying the UFC by a technical complexity factor (TCF). Components of the TCF are listed in Table 2.

Table 2. Components of the technical complexity factor. F1 Reliable back-up and recovery F2 Data communications F3 Distributed functions F4 Performance F5 Heavily used configuration F6 Online data entry F7 Operational ease F8 Online update F9 Complex interface F10 Complex processing F11 Reusability F13 Multiple sites F12 Installation ease F14 Facilitate change Each component is rated from 0 to 5, where 0 means the component has no influence on the system and 5 means the component is essential (Pressman, 1997). The Technical Complexity Factor TCF can then be calculated as: TCF = 0.65 + 0.01(SUM(Fi)) The factor varies from 0.65 (if each Fi is set to 0) to 1.35 (if each Fi is set to 5) (Fenton, 1997). The final function point calculation is: FP = UFC x TCF

Factors Questions 1. Data Communication Are data communications required? 2. Distributed data processing Are there distributed processing functions? 3. Performance Is performance critical? 4. Heavily used configuration Will the system run in an existing, heavily utilized operational environment? 5. Transaction rate Does the online data entry require the input transaction to be built over multiple screens or operations? 6. On-line data entry Does the system require online data entry? 7. End-user efficiency Are the inputs, outputs, files or inquiries complex? 8. On-line update Are the master files updated online? 9. Complex processing Is the internal processing complex? 10. Reusability Is the code designed to be reusable? 11. Installation ease Are conversions and installation included in the design? 12. Operational ease Does the system require reliable backup and recovery? 13. Multiple sites Is the system designed for multiple installations in different organizations? 14. Facilitate change Is the application designed to facilitate change and ease of use by the user?

TCF computation for Spell checker After having read the specification, we assume that F 3,F 5,F 9,F 11,F 12, and F 13 are 0. that F 1,F 2,F 6,F 7,F 8, and F 14 are 3, and that F 4 and F 10 are 5. Thus we Calculate the TCF as TCF 0.65 0.01(18 10) since UFCis 63, then FP UCF TCF FP 63 0.93 59 0.93

Language SLOC per Function Point Language LOC Language LOC 1GL Default Language 320 2GL Default Language 107 3GL Default Language 80 4GL Default Language 20 Access 35 Assembler 62 C 337 C++ 162 COBOL 77 Excel 46 Java 63 JavaScript 58 JSP 59 Oracle 30 Perl 60 RPG II/III 61 Smalltalk 26 SQL 40 VBScript 36 Visual Basic 47 Java = 59 x 63 = 3.717 KLOC

Example 2 Stock Control System Scenario J.A. Roberts is a company that sells 200 different electrical goods on the phone. To do this they want you to create a computerized stock control system. This system should have the following functionality: Allow the operator to enter an existing customers number or for new customers their details (up to 100 customers) Check the credit rating of customers and reject those with poor ratings Allow the operator to enter the goods being ordered. Check the availability of the goods being ordered. Where there are sufficient goods in stock supply all the goods Where there are not sufficient goods supply number available and create back order to be supplied when goods become available. Update the stock levels and customer account details. Produce Dispatch note and invoice. Update stock levels based on and delivery of goods. Update customer account details based on payment by a customer.

External Inputs = 7 Customer Number New Customer Details Order Details Stock Delivery Details Customer Payment Details Main Menu Selection External Outputs = 6 Credit Rating Invoice Dispatch Note Customer Details Information Order Details Information Stock Details External Inquiries = 3 Customer Details Request Order Details Request Stock Details Request External Files = 0 Internal Files = 4 Goods Transaction File Customer File Goods File Customer Transaction File Function Point Complexity Weighting External Inputs Simple 3 External Outputs Simple 4 External inquiries Simple 3 External Files None Internal Files Simple 7 Unadjusted Function Point Count = 3(7)+4(6)+3(3)+7(0)+5(4) Unadjusted Function Point Count = 65

Technical Complexity Factors i F 1 Does the system require reliable back up and recovery? 3 2 Are data communication required 0 3 Are there distributed processing functions? 0 4 Is performance critical 4 5 Will the system run in an existing, heavily utilized operating system 3 6 Does the system require on-line entry? 4 7 Does the on-line data entry require the input transaction to be built over multiple screen or operations? 3 8 Are the master files updated on-line? 5 9 Are the inputs, outputs, files, or inquiries complex? 0 10 Is the internal processing complex? 2 11 Is the code designed to be reusable 4 12 Are the conversion and installation included in the design? 0 13 Is the system designed for multiple installations in different organizations? 0 14 Is the application designed to facilitate change and ease by the user? 3 Total 31 FP = 65 * [0.65+0.01*31] FP=62.4 Java = 62.4 x 63 = 3.9312 KLOC

Constructive Cost Model (COCOMO) COCOMO is a relatively straightforward model based on inputs relating to the size of the system and a number of cost drivers that affect productivity. The basic model is E = b KLOC c Where E = efforts in person month KLOC = Lines of code/1000 b,c = development mode constants

Boehm has defined three development modes: Organic mode ( b = 2.4 c = 1.05) relatively simple projects in which small teams work to a set of informal requirements (ie. thermal transfer program developed for a heat transfer group). Semi-detached mode ( b = 3.0 c = 1.12) an intermediate project in which mixed teams must work to a set of rigid and less than rigid requirements (ie. a transaction processing system with fixed requirements for terminal hardware and software). Embedded mode ( b = 3.6 c = 1.2) a project that must operate within a tight set of constraints (ie. flight control software for aircraft).

Efforts in man-months Organic Semidetached Embedded KLOC E=2.4 KLOC 1.5 E=3.0 KLOC 1.12 E=3.6KLOC 1.20 1 2.4 3.0 3.6 10 26.9 39.6 57.1 50 145.9 239.4 392.9 100 302.1 521.3 904.2 1000 3390.0 6872.0 14333.0

Example 2: To predicate the effort required to implement the software for a major telephone switching system, we are told that system will require approximately 5000 KLOC(KDSI). The software is embedded, since it is a real-time system that is part of a large, complex hardware system. Calculate person months of effort by using COCOMO method. E=3.6 (5000) 1.2 E= 100 000 person months of efforts Previous example-1 => E = 2.4 (3.717) 1.5 E = 17.19 Previous example-2 => E = 2.4 (3.9312) 1.5 E = 18.706

COCOMO Duration Mode a b Organic 2.5 0.38 Embedded 2.5 0.35 Semi-detached 2.5 0.32 D (Duration) = a E b = 2.5 (100000) 0.35 = 140.58 months Avg. staffing: 100000/140.5 = 711 persons??? Avg. productivity: 5000000/100000 = 50 LOC/PM Example 2 Continue: D (Duration) = a E b = 2.5 (18.706) 0.38 = 7.608 months Example 1 Continue: D (Duration) = a E b = 2.5 (17.19) 0.38 = 7.36 months Avg. staffing: 18.706/7.608 =2.45 persons??? Avg. productivity: 3931 /18.706 = 210.157 LOC/PM Avg. staffing: 17.19/7.36 =2.335 persons??? Avg. productivity: 3717 /17.19 = 216.230 LOC/PM

Intermediate The Intermediate COCOMO model computes effort as a function of program size and a set of cost drivers (Pressman, 1997). The Intermediate COCOMO equation is: E = akloc b x EAF The factors a and b for the Intermediate COCOMO model are shown in Table 4 (Boehm, 1981). a b Mode Organic 3.2 1.05 Semi-detached 3.0 1.12 Embedded 2.8 1.20

The effort adjustment factor (EAF) is calculated using 15 cost drivers. The cost drivers are grouped into four categories: 1. product 2. Computer 3. Personnel 4. Project Each cost driver is rated on a six-point ordinal scale ranging from low to high importance. Based on the rating, an effort multiplier is determined using Table 5 (Boehm, 1981). The product of all effort multipliers is the EAF.

Cost Driver Description Rating Very Low Low Nominal High Very High Extra High Product RELY Required software reliability 0.75 0.88 1.00 1.15 1.40 - DATA Database size - 0.94 1.00 1.08 1.16 - CPLX Computer TIME STOR VIRT TURN Product complexity Execution time constraint Main storage constraint Virtual machine volatility Computer turnaround time 0.70 0.85 1.00 1.15 1.30 1.65 - - 1.00 1.11 1.30 1.66 - - 1.00 1.06 1.21 1.56-0.87 1.00 1.15 1.30 - - 0.87 1.00 1.07 1.15 -

Cost Driver Description Rating Very Low Low Nominal High Very High Extra High Personnel ACAP AEXP PCAP VEXP LEXP Project Analyst capability Applications experience Programmer capability Virtual machine experience Language experience 1.46 1.19 1.00 0.86 0.71-1.29 1.13 1.00 0.91 0.82-1.42 1.17 1.00 0.86 0.70-1.21 1.10 1.00 0.90 - - 1.14 1.07 1.00 0.95 - - MODP Modern programming practices 1.24 1.10 1.00 0.91 0.82 - TOOL Software Tools 1.24 1.10 1.00 0.91 0.83 - SCED Development Schedule 1.23 1.08 1.00 1.04 1.10 -

Empirical Estimation E=A + B x (e v ) c Where A,B and C are empirical derived constants, E is effort in person month, and e v is the estimation variable (either LOC or FP).

Proposed models LOC Oriented Models E = 5.2 x (KLOC) 0.91 Walston-Felix model E = 5.5 + 0.73 x (KLOC) 1.16 Bailey-Basili model E = 3.2 x (KLOC) 1.05 Boehm simple model E = 5.288 x (KLOC) 1.04 Doty model for KLOC >9 model FP Oriented Models E = -91.4 + 0.355 x FP Albercht and Gaffney model E = -37 + 0.96 x FP Kemerer model E = -12.88 + 0.405 x FP small project regression model

% of total efforts The Norden-Rayleigh Curve The Norden-Rayleigh curve represents manpower as a function of time. Norden observed that the Rayleigh distribution provides a good approximation of the manpower curve for various development processes (Pillai, 1997). Time

SLIM Applicable, when lines of code exceeded from 7000 lines. SLIM model is expressed as two equations describing relation between the development effort and the schedule.

First Equation the software equation states that development effort is proportional to the cube of the size and inversely proportional to the fourth power of the development time. The software equation is expressed as: E = [LOC x B 0.333 /P] 3 x (1/t 4 ) where E = efforts in person-months or person years t = project duration in months or years B = special skill factor [(KLOC = 5 to 15, B=0.16). (KLOC > 70, B= 0.39)] P = Productivity Parameter [real time embedded = 2000, telecom = 10,000, system software = 28,000 and scientific software = 12,000.]

The technology factor is a composite cost driver involving 14 components. It primarily reflects: 1. Overall process maturity and management practices 2. The extent to which good software engineering practices are used 3. The level of programming languages used 4. The state of the software environment 5. The skills and experience of the software team 6. The complexity of the application The software equation includes a fourth power and therefore has strong implications for resource allocation on large projects. Relatively small extensions in delivery date can result in substantial reductions in effort (Pressman, 1997).

Second equation the manpower-buildup equation, states that the effort is proportional to the cube of the development time. Putnam introduced the manpower-buildup equation: t min = 8.14 (LOC/P) 0.43 in months for t min > 6 months E = 180 Bt 3 in person months for E 20 person months

The manpower acceleration is 12.3 for new software with many interfaces and interactions with other systems, 15 for standalone systems, and 27 for re implementations of existing systems. Using the software and manpower-buildup equations, we can solve for effort (Fenton, 1997): E = (S / C)^9/7 (D^4/7) This equation is interesting because it shows that effort is proportional to size to the power 9/7 or ~1.286, which is similar to Boehm's factor which ranges from 1.05 to 1.20.