Spelunking Tools - Phoenix IEEE-CS May 3, Dan Houston, Ph.D. Estimation Techniques for Software Projects

Size: px
Start display at page:

Download "Spelunking Tools - Phoenix IEEE-CS May 3, Dan Houston, Ph.D. Estimation Techniques for Software Projects"

Transcription

1 Spelunking Tools - Estimation Techniques for Software Projects Phoenix IEEE-CS May 3, 2006 Dan Houston, Ph.D.

2 Caves Software Costing and Sizing Accuracy vs. Phase Taken from COCOMO II Model Definition Manual, available at Phoenix IEEE-CS 2

3 EstimationAccuracy = f(knowledge) K n o w l e d g e S c a l e Substantial None Estimating Guessing Kinds of knowledge Application domain Product type Software engineering Project information Not prediction Document assumptions Influence outcome Technical feasibility Phoenix IEEE-CS 3

4 Kinds of estimates Rough Order of Magnitude Initial job or SPI project selection High level Budgeting Trade-off/risk analysis Portfolio investment analysis Detailed Fixed cost proposal Project planning & control Information Required Phoenix IEEE-CS 4

5 Estimation Principles Use all available project information. Charter, specification, design, etc. Use documented facts about similar past projects. Use your organization s knowledge and data (as opposed to no data or industry data). It eliminates much of the variation due to: Application domain Development platform and tools Product type Development processes Cultural factors Use industry figures to disprove optimistic estimates. Uncalibrated estimation tools Don t rely on guessing and intuition. Reuse guidelines (RCR=.2, RCWR=1.6) Estimate ranges and probabilities, not points. Estimate in phases. Phoenix IEEE-CS 5

6 Plan the project s estimation process Tie estimates to your development phases When will estimates be updated? Estimation resources Estimation scope (development, testing, documentation, etc.) How will estimates be produced Bases of revised updated estimates? Desired confidence levels Phoenix IEEE-CS 6

7 Regression models Generally accepted method as best method Uses previous project data Relates effort to size, effort = a*size b Incorporated into automated software estimation tools SLIM, COCOMO, Price-S, SEER-SEM, etc. Phoenix IEEE-CS 7

8 Regression steps effort = a*size b log(effort) = log(a*size b ) = log(a) + b*log(size) y = A + b*x, where A = log(a), x = log(size) Phoenix IEEE-CS 8

9 Example Fitted Line Plot logten(effort (hrs)) = logten(size (kloc)) S R-Sq 97.6% R-Sq(adj) 97.3% size (kloc) effort (p-hrs) effort (hrs) size (kloc) effort = 42 * size Є Phoenix IEEE-CS 9

10 Regression models Regression assumptions Normality of residuals Independence Identically distributed residuals Similar objects Valid for regression range Inputs Size: LOC or FP Phoenix IEEE-CS 10

11 Check normality 99 Normal Probability Plot of the Residuals (response is logten(effort (hrs))) Percent Residual Phoenix IEEE-CS 11

12 Check residual distribution Residuals Versus the Fitted Values (response is logten(effort (hrs))) Residual Fitted Value Evenly distributed means no bias in model Phoenix IEEE-CS 12

13 Regression analysis problems Enough data 5-8 projects for a logarithmic fit Similarity of objects Failure point for automated tools How to get size inputs? Phoenix IEEE-CS 13

14 Techniques for starting Function points Analogy Pairwise comparison Three point (PERT) decomposition Wideband Delphi pooled expertise Phoenix IEEE-CS 14

15 Function points Count transaction data and files as a measure of functional size. Requires a complete functional specification. Independent of language, tools, and technology. Phoenix IEEE-CS 15

16 Function point analysis Define software boundary Count transactions External inputs, external outputs, external inquiries Count files External, internal Adjust counts for system characteristics Adjusted FP = Raw FP * ( * Sum (Ratings)) For more information, see the Function Points Training Manual and Phoenix IEEE-CS 16

17 Translating between languages Language Access ASP Assembler C C++ FORTRAN HTML SQL Visual Basic Factor 38 SLOC/FP 62 SLOC/FP 315 SLOC/FP 109 SLOC/FP 53 SLOC/FP 210 SLOC/FP 53 SLOC/FP 37 SLOC/FP 42 SLOC/FP source: Quantitative Software Management 2002 Phoenix IEEE-CS 17

18 Sources of Variation: LOC vs FP Source LOC Function Points Counting Can be reduced Median 12% w/ automation. (Kemerer 1993) Representation High, can be Low due to reduced partially empirical w/ strict coding weights; language rules. independent. Other Considerations LOC Function Points Ease of measurement Automated Takes practice Verifiability Completed code Specification Phoenix IEEE-CS 18

19 Estimation by analogy Size or effort Works very well when 1 or more similar jobs are available Attributes have similar values Enables assessment of degree of similarity Phoenix IEEE-CS 19

20 Procedure Choose variables for characterizing work Using selected variables, gather data on completed jobs & the new job Include actual outcomes for completed Select good analogies Other extraordinary factors? Weight analogies based on similarity Take weighted average of outcomes Phoenix IEEE-CS 20

21 Analogy example No. Application Domain Product Type Major Functions Critical Qualities Phases 1 Environmental control & security system Passenger aircraft cabin Air temperature, humidity, pressure; lighting; audio-video; depressurization alarm Responsiveness; false alarm rate Spec-UT 2 Video security system Video motion detection system Visual monitoring, recording, and alarming Low error rate; operational economy Spec-ST 3 Environmental control & security system Office building control system Air temperature & humidity, lighting, visual monitoring, fire detection Application generality; ease of configuration Req-ST 4 Environmental control system Greenhouse control system Air temperature & humidity, lighting Accuracy, responsiveness Spec-UT 5 Environmental control system Self-storage facility control system Air temperature & humidity, lighting, breakin detection Robust to user errors; reliability Req-ST New Environmental control & security system Passenger cruise ship Air temperature, humidity, fire detection, breakin detection Robust to user errors; reliability Req-ST No. Include User Doc? # Environmental factors # Main Control Stations # User Controls Remote control? # Zones # Libraries Used API? Actual Effort (md) 1 Yes No 1 2 No No Yes 4 1 No Yes Yes 12 2 Yes No No 2 1 No Yes No 3 1 No 283 New Yes No Yes Phoenix IEEE-CS 21

22 Example Answer Which of the variables are useful? All except # User Controls and # Libraries. Which projects provide suitable analogies? No. 5 is very similar in major functions, critical qualities, phases, user doc, # environmental factors, # main control stations, and need for remote control. However, No. 5 doesn t have an API, and has no user controls and only 3 zones. No. 3 has 12 zones and an API, which means customization is through an API rather than setting parameter values. Overall, No. 5 is a better analogy than No. 3, but No. 3 has accounts for ~25% of the functional lacking in No. 5. How much confidence can be placed in the analogies? This application is a variation on existing products, so the analogies offer good guidance. Predict the effort for the new product. Based on the description above, use a weighted average. Give 75% weight to No. 5 and 25% to No * 283 md +.25 * 966 md = 454 md Phoenix IEEE-CS 22

23 Pairwise comparison For a set of items in which one value is unknown, compare the unknown to each of the known items. Improves subjective judgments Assumes that accuracy increases with consistency Effort is proportional to (n 2 -n)/2. Phoenix IEEE-CS 23

24 Procedure with 1 Unknown 1. Enter the name of the item being estimated in the first row. 2. Below the name of the unknown, enter the names of all items having known values. All names will be shown across the top of the matrix. 3. Enter a value of 1 for each item compared to itself (on the diagonal). 4. Enter the known values on the Known Values line in the respective columns. The ratios of these values for each pair of known items is calculated. 5. Compare the unknown item with each known item and estimate the ratio of values: enter the ratio Item(1)/Item(j) in cell 1j. In cell j1, enter the inverse of the ratio. Continue until the unknown item has been compared to each known item. 6. Check consistency ratio and revise ratings if necessary. 7. The Relative Values and the Known Values are used to calculate the estimates, one for each known value. 8. Use the average and standard deviation to calculate a desired prediction interval. Phoenix IEEE-CS 24

25 Example See Estimation Examples for IEEE-CS.xls Phoenix IEEE-CS 25

26 Three point (PERT) Starting with a Work Breakdown Structure (WBS), produce 3 estimates for each item: Optimistic (5% likelihood) Pessimistic (50% likelihood) Most likely (5% likelihood) For each task, Mean = (a + 4b + c) / 6 Variance = (c - a) 2 / 36 Combine task estimates Sum means and variances Take square root of total variance to get standard deviation of total. 70% prediction interval Total Mean ± Standard Deviation 95% prediction interval Total Mean ± 2 * Standard Deviation Phoenix IEEE-CS 26

27 Example See Estimation Examples for IEEE-CS.xls Caveat for long task lists Triangular distribution Mean = (a 2 + b 2 + c 2 ab ac - bc) / 18 Variance = (a + b + c) / 3 Pearson distribution (Lau & Lau, 1998) Mean =.63 x (x.95 + x.05 ) Variance =.28 (x.5 x.05 ) +.34 (x.95 x.5 ) Lau, A. H. L., & Lau, H. S.(1998). An Improved PERT Type Formula For Standard Deviation. IIE Transactions (V.30, pp ). Phoenix IEEE-CS 27

28 Wideband Delphi Structured form of expert judgment Understand estimation Know the application domain Designed to remove bias Collaborative Moderated Anonymous Iterative Karl E. Wiegers provides a very good explanation and an example of the Wideband Delphi technique in his article, Stop Promising Miracles. It is available at Phoenix IEEE-CS 28

29 Estimation strategy Plan for phased estimation Use techniques in combination As a situation allows In parallel for confidence Phoenix IEEE-CS 29

30 Estimation capability Relative error Average RE (Ê A) RE = * 100% Prediction at level R A ARE = 1/n Σ (RE) PRED(R) = 100% * k / n Heuristics: <10% excellent <25% good >50% can do better >100% problems Heuristics: PRED(25%) = 100% excellent PRED(25%) > 75% good PRED(25%) > 50 can do better PRED(25%) = 0% problems n = Number of projects k = Number of projects with RE R Phoenix IEEE-CS 30

31 Improving estimation capability Track actuals to estimates Use actuals to update models Phoenix IEEE-CS 31