Software Measurement Pitfalls & @jstvssr

Size: px
Start display at page:

Download "Software Measurement Pitfalls & @jstvssr"

Transcription

1 Software Measurement Pitfalls & @jstvssr

2 Introductions

3 It takes a village Tiago Alves Jose Pedro Correira Christiaan Ypma Miguel Ferreira Dennis Bijlsma Tobias Kuipers Ilja Heitlager Bart Luijten

4 And what about you?

5 Why talk about software measurement? You can t control what you can't measure. DeMarco, Tom. Controlling Software Projects: Management, Measurement and Estimation. ISBN

6 Why talk about software measurement? You can t improve what you can't measure.

7 Software measurement is used for: Cost and effort estimation Productivity measures and models Data collection Reliability models Performance evaluation and models Structural and complexity metrics Capability-maturity assessments Management by metrics Evaluation of methods and tools Quality models and measures Fenton et. al., Software Metrics: a rigorous and practical approach. ISBN

8 Which software measures do you use?

9 (Software) Measurement

10 What is measurement? Formally, we define measurement as a mapping from the empirical world to the formal, relational world. A measure is the number or symbol assigned to an entity by this mapping in order to characterize an attribute Fenton et. al., Software Metrics: a rigorous and practical approach. ISBN

11 Entity Attribute Mapping Measure

12 Entities Product: Specifications, Architecture diagrams, Designs, Code, Test Data, Process: Constructing specification, Detailed design, Testing,. Resources: Personnel, Teams, Software, Hardware, Offices, Fenton et. al., Software Metrics: a rigorous and practical approach. ISBN

13 Attributes External Usability, Reliability Internal Size, Structuredness, Functionality

14 Mapping Park, Rober E., Software Size Measurement: A Framework for Counting Source Statements, Technical Report CMU/SEI-92-TR-020

15 Representation Condition Attribute: Size Measure Measure Measure Foo 500 LOC 1 File Large Bar 100 LOC 5 Files Small

16 Measure - scale types Scale type Allowed operations Example Nominal =, A, B, C, D, E Ordinal =,, <, > Small, large Interval =,, <, >, +, - Start date Ratio All LOC Absolute All -

17 Concept summary Attribute (Length) Entity (Child) Measure (cm) Mapping (feet on ground)

18 Why does this matter?

19 It determines what you want Alves. Categories of Source Code in Industrial Systems, ESEM 2011:

20 It determines who wants it System Component Module Unit

21 Aggregation exercise From Unit to System

22 Unit measurement: T. McCabe, IEEE Transactions on Software Engineering, 1976 Academic: number of independent paths per method Intuitive: number of decisions made in a method Reality: the number of if statements (and while, for,...) Method McCabe: 4

23 Available data For four projects, per unit: Lines of Code McCabe complexity In which system is unit testing a given method the most challenging?

24 Option 1: Summing Crawljax GOAL Checkstyle Springframework Total McCabe Total LOC Ratio 0,260 0,259 0,288 0,288

25 Option 2: Average Crawljax GOAL Checkstyle Springframework Average McCabe 1,87 2,45 2,46 1, " 1600" 1400" 1200" 1000" 800" 600" 400" 200" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 17" 18" 19" 20" 21" 22" 23" 24" 25" 27" 29" 32"

26 Option 3: Quality profile Cyclomatic complexity 1-5 Low Risk category 6-10 Moderate High > 25 Very high Sum lines of code" per category" Springframework# Lines of code per risk category Low Moderate High Very high 70 % 12 % 13 % 5 % Checkstyle# Goal# Crawljax# 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

27 Always take into account Volume Explanation Distribution

28 Pitfalls in using measurements What are they and how to counter them?

29 One-track metric

30 When only looking at size

31 Combination shows problems on different levels Equals HashCode

32 Metrics Galore

33 So what should we measure?

34 GQM Goal Question Metric Question Metric Metric Basili, Et. Al., The Goal Question Metric Approach, Chapter in Encyclopedia of Software Engineering, Wiley, 1994.

35 GQM - Example

36 Treating the metric

37 Metric in a bubble SB CSU CB I II III IV

38 Software Quality as a goal

39 Software Quality waves 1970s Higher-order languages 1980s Design methods and tools (CASE) 1990s Process 2000s Product 2010s? Focus on the product to improve quality The CMM [..] is not a quality standard Bill Curtis co-author of the Capability Maturity Model (CMM), the People CMM, and the Business Process Maturity Model. In CMMI: Good process doesn't always lead to good quality Interview for TechTarget, June

40 The Bermuda Triangle of Software Quality Process (organizational) CMMI (Scampi) J2EE (IBM) MCP (Microsoft) People (individual) Product ISO ISO Project (individual) Prince2 RUP (IBM)

41 ISO 25010

42 Software product quality standards Previous: separate standards Currently: ISO series, a coherent family of standards

43 ISO/IEC Quality perspectives Quality model phase software product Product quality development (build and test) effect of software product Quality in use deploy

44 ISO/IEC Software product quality characteristics Functional Suitability Performance Efficiency Compatibility Usability ISO Reliability Security Maintainability Portability

45 The sub-characteristics of maintainability in ISO Maintain Analyze Reuse Modify Test Modularity

46 A Practical Model for Measuring Maintainability

47 Some requirements Simple to understand Allow root-cause analyses Technology independent Easy to compute Heitlager, et. al. A Practical Model for Measuring Maintainability, QUATIC 2007

48 Suggestions for metrics?

49 Measuring ISO maintainability using the SIG model Component independence Volume Duplication Unit size Unit complexity Module coupling Unit interfacing Component balance Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X

50 From measurement to rating A benchmark based approach

51 Note: example thresholds Embedding A bechmark based approach sort % HHHHH 30% HHHHI 30% HHHII 30% HHIII % HIIII HHIII

52 From measurements to ratings Volume Duplication Unit size Unit complexity Module coupling Unit interfacing Component independence Component balance Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X

53 Heitlager, et. al. A Practical Model for Measuring Maintainability, QUATIC 2007 Remember the quality profiles? 1. Compute source code metrics per method / file / module 2. Summarize distribution of measurement values in Quality Profiles Cyclomatic complexity 1-5 Low Risk category 6-10 Moderate High > 26 Very high Sum of lines of code per category" Lines of code per risk category Low Moderate High Very high 70 % 12 % 13 % 5 %

54 First level calibration

55 Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010 The formal six step proces

56 Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010 Visualizing the calculated metrics

57 Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010 Choosing a weight metric

58 Calculate for a benchmark of systems 70% 80% 90% Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010

59 SIG Maintainability Model Derivation metric thresholds 1. Measure systems in benchmark 2. Summarize all measurements 3. Derive thresholds that bring out the metric s variability 4. Round the thresholds Derive & Round" Cyclomatic complexity 1-5 Low Risk category 6-10 Moderate High > 26 Very high (a) Metric distribution (b) Box plot per risk category Fig. 13: Module Interface Size (number of methods per file) Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010

60 Second level calibration

61 How to rank quality profiles? Unit Complexity profiles for 20 random systems HHHHH HHHHI HHHII? HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

62 Ordering by highest-category is not enough! HHHHH HHHHI HHHII? HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

63 A better ranking algorithm Order categories Define thresholds of given systems Find smallest possible thresholds Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

64 Which results in a more natural ordering HHHHH HHHHI HHHII HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

65 Second level thresholds Unit size example Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

66 SIG Maintainability Model Mapping quality profiles to ratings 1. Calculate quality profiles for the systems in the benchmark 2. Sort quality profiles 3. Select thresholds based on 5% / 30% / 30% / 30% / 5% distribution Sort" Select thresholds" HHHHH HHHHI HHHII HHIII HIIII Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

67 SIG measurement model Putting it all together Measurements Quality Profiles Property Ratings ΗΗΙΙΙ ΗΙΙΙΙ ΗΗΗΙΙ ΗΗΗΗΙ c. d. Quality Ratings ΗΗΙΙΙ ΗΗΙΙΙ ΗΗΗΙΙ Overall Rating ΗΗΗΙΙ ΗΗΗΗΗ ΗΗΗΗΙ ΗΗΗΗΙ

68 Does it measure maintainability?

69 SIG Maintainability Model Empirical validation Internal: maintainability external: Issue handling 16 projects (2.5 MLOC) 150 versions The Influence of Software Maintainability on Issue Handling MSc thesis, Technical University Delft Indicators of Issue Handling Efficiency and their Relation to Software Maintainability, MSc thesis, University of Amsterdam Faster Defect Resolution with Higher Technical Quality of Software, SQM K issues

70 Empirical validation The life-cycle of an issue

71 Empirical validation Defect resolution time Luijten et.al. Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010

72 Empirical validation Quantification Luijten et.al. Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010

73 Empirical validation Quantification Bijlsma. Indicators of Issue Handling Efficiency and their Relation to Software Maintainability, MSc thesis, University of Amsterdam

74 Does it measure maintainability?

75 Is it useful?

76 Software Risk Assessment Kick-off session Strategy session Technical session Technical validation session Risk validation session Final presentation Final Report Analysis in SIG Laboratory

77 Example Which system to use? HHHII HHHHI HHIII

78 Example Should we accept delay and cost overrun, or cancel the project? User Interface User Interface Business Layer Business Layer Data Layer Data Layer Vendor framework Custom

79 Software Monitoring Source code Source Source 1 week code 1 week code 1 week Source code Automated analysis Automated analysis Automated analysis Automated analysis Updated Website Updated Website Updated Website Updated Website Interpret Discuss Manage Act! Interpret Discuss Manage Act! Interpret Discuss Manage Act! Interpret Discuss Manage Act!

80 Software Product Certification

81 Summary

82 Measurement challenges Entity Attribute Mapping Measure Springframework# Too little Too much Checkstyle# Goal# Meaning Metric in a bubble Treating the metric Crawljax# # metrics One-track metric Metrics galore 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

83 A benchmark based model Volume Duplication Unit size Unit complexity Module coupling Unit interfacing Component independence Component balance Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X (a) Metric distribution (b) Box plot per risk category Fig. 13: Module Interface Size (number of methods per file) results are again similar. Except for the unit interfacing metric, the low risk category is centered around 70% of the code and all others are centered around 10%. For the unit interfacing metric, since the variability is relative small until the 80% quantile we decided to use 80%, 90% and 95% quantiles to derive thresholds. For this metric, the low risk category is a round 80%, the moderate risk is near 10% and the other two around 5%. Hence, from the box plots we can observe that the thresholds are indeed recognizing code around the defined quantiles. A. Contributions Kick-off session Strategy session Technical IX. CONCLUSION session Technical validation session HHHHH HHHHI HHHII HHIII HIIII Risk validation session Final presentation We proposed a novel methodology for deriving software Analysis in SIG Laboratory metric thresholds and a calibration of previously introduced metrics. Our methodology improves over others by fulfilling three fundamental requirements: i) it respects the statistical properties of the metric, such as metric scale and distribution; ii) it is based on data analysis from a representative set of systems (benchmark); iii) it is repeatable, transparent and straightforward to carry out. These requirements were achieved by aggregating measurements from different systems using relative size weighting. Our methodology was applied to a large set of systems and thresholds were derived by choosing specific percentages of overall code of the benchmark. Final Report

84 The most important things to remember Goal Entity Attribute Mapping Context Validate in theory and practice