B.H. Far

Size: px
Start display at page:

Download "B.H. Far"

Transcription

1 SENG 521 Software Reliability & Software Quality Chapter 7: Defining Necessary Reliability Department of Electrical & Computer Engineering, University of Calgary B.H. Far 1

2 Contents Steps in defining necessary reliability Failure severity class (FSC) Failure intensity objective (FIO) Strategies to meet FIO System reliability 2

3 SRE: Process /1 5 steps in SRE process: Define necessary reliability Develop operational profiles Prepare for testt Execute test Apply failure data to guide decisions Define Necessary Reliability Develop Operational Profile Prepare for Test Execute Test Apply Failure Data to Guide Decisions far@ucalgary.ca 3

4 How to define Necessary Reliability? 4

5 Reliability and Risk Necessary reliability depends on the Risk. Higher risk software requires higher reliability. Necessary reliability also depends on profitability, budget, man-power, etc. Q. "What are you going to test?" A. "The Most Important things." Q. "And how do you know what the most important things are?" Reference: Software Testing Fundamentals: Methods and Metrics Marnie L. Hutcheson ISBN: X John Wiley & Sons 2003 (408 pages), Chapter

6 What is Risk? Risk is a combination of occurrence of an abnormal event or failure and the consequences of that event or failure to a system s operators, users, or environment. A risk can range from catastrophic to negligible. Risks are also categorized according to the likelihood of occurrence. [David Gluch] Severity Likelihood of Occurrence Probable Occasional Remote Improbable Catastrophic High High High-Medium Mdi Mdi Medium Critical High High-Medium Medium Medium-Low Marginal High-Medium Medium Medium-Low Low Negligible Medium Medium-Low Low Low far@ucalgary.ca 6

7 Necessary Reliability: How to 1) Define failure with failure severity classes (FSC) (SC) for the product. 2) Set a failure intensity objective (FIO) for each system to be tested. 3) Choose a common scale for all associated systems. 4) Find the developed software failure intensity objective. 5) Engineer strategies to meet the software failure intensity objective. far@ucalgary.ca 7

8 1. Failure Severity Classes Failures usually differ by their impact on the system A failure Severity Class (FSC) is a set of failures that have the same per-failure impact on users using a failure classification criteria Note: there are other possible rankings Common classification criteria: cost, system capability, human life, environment Failure severity is different from its complexity Severity can change with the time of failure and can be subjective far@ucalgary.ca 8

9 Where to Find Failures? Failure: The termination of the ability of a system to perform a required function (IEC ) To identify all potential failures of a system, one must identify all functions of the system, and the associated functional requirements V2V Various functions: Essential functions Auxiliary functions Protective functions Measurement functions Interface functions SENG521 (Winter 2008) far@ucalgary.ca 9

10 FSC: Common Classification Common classification i criteria: i Cost What does this failure cost in terms of operational cost, repair cost, loss of business, disruption, i etc. Severity classes based on cost may be scaled by a factor of f10. Usually 4 ranges Severity class Definition ($) are enough. 1 > 100, , , ,000 10, < 1,000 far@ucalgary.ca 10

11 FSC: Common Classification Common classification i criteria: i System capability (Services) May include factors such as loss of data, downtime, recoverability, etc. Severity class Definition 1 Basic service interruption 2 Basic service degradation 3 Inconvenience, correction not deferrable 4 Minor tolerable effects, correction deferrable far@ucalgary.ca 11

12 FSC: Common Classification Common classification criteria: i Environment May include factors such as harmful to environment, loss of wild life, etc. Applicable to nuclear, chemical industry, etc. Severity class Definition 1 Severe and unrecoverable damage to environment and/or wild life 2 Severe but partially recoverable damage to environment 3 Minor damage to environment or wild life 4 Minor but recoverable deficiencies far@ucalgary.ca 12

13 FSC: Common Classification Common classification criteria: Human life May include factors such as harmful to human or environment, loss of human life, etc. Applicable to aeronautical, automotive, nuclear, health care industry, military systems, etc. Severity class Definition 1 Possible loss of human life 2 Severe damage to human immune system or environment 3 Minor damage to human immune system or environment 4 Minor but recoverable deficiencies far@ucalgary.ca 13

14 How to Define FSC? Experience based: analyze functions/ ask users/ stakeholders/ developers/ compare to similar il products / use FTA and/or FMEA List all factors that may be considered as failure severity for the project Narrow the list down to the most critical and/or measurable ones Some factors may be hard to measure, such as impact on company reputation, etc. far@ucalgary.ca 14

15 FSC: Conflicting Concerns Conflicting viewpoints (concerns) between the software developer and customer regarding failure severity class (FSC) should be resolved before proceeding to set target failure intensity objective Comparison of the FSC for the software with a similar product is usually useful far@ucalgary.ca 15

16 Documenting FSC User profile (type or concern) Classification criteria Failures (ordered list: start with the most severe ones ) Class 1 Class 2 Class 3 Class 4 Cost f1, f2, f5, f9, System capability (Services) Human life Environment Other (specify) Define classes for each criterion separately far@ucalgary.ca 16

17 2. Failure Intensity Objective (FIO) Failure intensity objective (FIO) reflects an estimation of the bugs allowed to be remained in the product at the release time. FIO is an alternative way of expressing reliability. 17

18 Failure Intensity Objective Failure intensity is usually given in terms of number of failure per time (or some other defined units), e.g., 3 alarms per 100 hours of operation. 5 failures per 1000 print jobs, etc. Failure intensity of a system is the sum of failure intensities for all of the components of the system (assuming serial system and exponential model). far@ucalgary.ca 18

19 How to Set FIO /1 Mainly il experience based and ddepends d on the project. Depends on the trade-off among quality characteristics (development time and development cost) and functionality and technology. Rule of thumb: Estimate the project s total cost (C), e.g., g, using COCOMO s Early Design Model, etc., and set FIO to be 1 over C (i.e., C total cost, assuming that the cost of highest impact is roughly equal to the total development costs) far@ucalgary.ca 19

20 How to Set FIO /2 Typical FIO for various projects Failure Impact Typical FIO ( ) Time between failures (MTTF) More than 1,000,000,000 $ cost 1 per 1,000,000,000 hours 114,000 years More than 1,000,000 $ cost 1 per 1,000,000 hours 114 years Around 1,000 $ cost 1 per 1,000 hours 6 weeks Around 100 $ cost 1 per 100 hours 100h Around 10 $ cost 1 per 10 hours 10 h Around 1 $ cost 1 per hour 1 h far@ucalgary.ca 20

21 How to Set FIO: Reliability Setting FIO in terms of reliability ln t R or 1 t R for R 0.95 is failure intensity R is reliability t is natural unit (time, etc.) For reliability around for 8 hours of operation, is set to far@ucalgary.ca 21

22 Reliability & Failure Intensity Reliability for 1 hour mission time Failure intensity failure / hour failure / 1000 hours f failure /day failure / 1000 hours failure / week failure / month failure / 1000 hours failure / year far@ucalgary.ca 22

23 How to Set FIO: Availability Setting FIO in terms of system availability (A) for the exponential model : 1 1 A t At or t t t t A t 1 m m is failure intensity t m is downtime per failure e.g., if a product must be available 99% of time and downtime is 6 min, then FIO is about 1 per 10 hours. far@ucalgary.ca 23

24 Example Suppose we want 99 percent availability of a human-machine team. Assume that a service interruption i requires an average recovery time of 14 minutes for the person involved, since he/she must refresh hhi his/her memory before restarting. Assume the average machine downtime at each failure is 1 minute. The total ldowntime is 15 minutes. 1 At t t A t m or approximately 4 failures per 100 hr. Example From Musa s Book far@ucalgary.ca 24

25 How to Set FIO: MTTF Using MTTF MTTF MTTF MTTR MTTF A MTTF MTTR MTBF failure intensity meantime to repair meantime to failure Another definition of availability: 1 MTTF MTTF MTTR 1 MTTR MTTF M TTF M TTF MTTR far@ucalgary.ca 25

26 How to Set FIO: Hazard Rate Hazard Rate z(t): The probability that a component will fail in a given time interval given that it has not failed prior to the interval Hazard rate of 0.05 means that there is a 5% chance that the first failure will occur in the specified time interval and not before For exponential distribution, z(t) is far@ucalgary.ca 26

27 How to Set FIO: Profitability Based on analyze experience with previous or similar systems by comparing field measurements of major quality characteristics and degrees of user satisfaction with them with similar measurements for a previous release or a similar product. Compare trade-off trends between profitability po tyand failure ueintensity. sty. far@ucalgary.ca 27

28 Example Tip: select a range that leads to highest profit margin Example From Musa s Book far@ucalgary.ca 28

29 Reliability vs. Availability Why specify reliability when availability is better understood and has better intuitive appeal? Availability has a subjective appeal to the user and there are usually workarounds to make the system available without increasing the intrinsic reliability of it. Example: Using a replica server in case the main server goes down increases availability of the system but it does not necessarily increase the reliability of the server software/hardware. far@ucalgary.ca 29

30 Developed Software Product Developed software product is usually only a part of the whole system Example: stand alone system Interface to other systems Acquired Developed components components OS, System software Hardware far@ucalgary.ca 30

31 Developed Software Product Example: Developed Software layered system Components,... Everything that goes between your developed software and the hardware Data access libraries,... Windowing, widgets,... Networking middleware,... Virtual machine Device drivers,... Operating system Hardware Applicat tion use es serv ices SENG521 (Winter 2008) 31

32 3. Choose a Common Scale There may be various scales for expressing FIO for various project parts. Example: System failure intensity objective = 30 failure/1,000,000 transactions MTTF for OS is 3,000 hours for 10 million transactions MTTF for hardware is 1 per 30 hours of operation One must define a unique scale for all FIOs far@ucalgary.ca 32

33 FIO for Developed Product How to compute failure intensity it objective for the developed software? 1. Set FIO for the whole system 2. Set a common measurement unit for failure intensity for the whole system 3. Subtract expected failure intensity for acquired components from the FIO. 4. Subtract expected failure intensity for the environment (OS, interface systems) that the developed software will run on 5. The remaining will be failure intensity objective for the developed software components. far@ucalgary.ca 33

34 Computing Developed FIO Example 1: System failure intensity objective = 100 failure/1,000,000 transactions Failure intensity for hardware = 0.1 failure/hour OS failure for a load of 100,000 transactions = 0.4 failure/hour = 4 failure/1,000,000 transactions Therefore, developed software FIO = 95 failure/1,000,000 transactions far@ucalgary.ca 34

35 Computing Developed FIO Example 2: Database system running on Win 2K System failure intensity objective = 30 failure/1,000,000 transactions OS: MTTF for Windows server 2K is around 3,000 hours for 10 million transactions Hardware: hardware failure is 1 per 30 hours Other: Failure rate for other systems is 9 for one million transactions What is FIO for the developed software? far@ucalgary.ca 35

36 Computing Developed FIO 1 os 1/ 3000 MTTF 1 hardware 100 / os hardware 101/ 3000 for 10 transactions 7 90 for 10 transactions other total 191 for 10 F 300 for 10 therefore 7 7 transactions transactions developed _ software for 7 10 transactions far@ucalgary.ca 36

37 4. Strategies to Meet FIO Engineer strategies to meet the software failure intensity objective for the developed software. 4 main strategies: Fault prevention Fault removal Fault tolerance Fault/failure forecasting far@ucalgary.ca 37

38 a) Fault Prevention To avoid idfault occurrences by construction Activities: Requirement review Design review Clear code Establishing standards (ISO , etc.) Using CASE tools with built-in i check mechanisms Effectiveness factor: Proportion of the faults remaining after prevention activities far@ucalgary.ca 38

39 b) Fault Removal To detect, by verification i and validation, i the existence of faults and eliminate them Activities: Code review Testing Effectiveness factor: Reduction of failure intensity due to code review Ratio of failure intensity after test and before test far@ucalgary.ca 39

40 c) Fault Tolerance To provide, by redundancy, service complying with the specification in spite of faults occurrences Activities: t Designing and implementing redundancy d Effectiveness factor: Reduction of failure intensity as a result of redundant design far@ucalgary.ca 40

41 d) Fault / Failure Forecasting To estimate, t by evaluation, the presence of faults and the occurrences of failures Activities: Establishing reliability model Collecting failure data Analysis and interpretation of results Effectiveness factor: Reduction of failure intensity as a result of applying reliability engineering i far@ucalgary.ca 41

42 42