Reliability Prediction of a Trajectory Verification System

Size: px
Start display at page:

Download "Reliability Prediction of a Trajectory Verification System"

Transcription

1 Reliability Prediction of a Trajectory Verification System Abstract Bojan Cukic, Diwakar Chakravarthy The Department of Computer Science and Electrical Engineering West Virginia niversity PO Box 69 Morgantown, WV cukic@ece.wvu.edu The existence of software faults in safety-critical systems is not tolerable. The goals of software reliability assessment are estimating the failure probability of the program, θ, and gaining statistical confidence that θ is realistic. The paper presents practical problems and challenges encountered in an ongoing effort to assess and quantify software reliability of NASA s Day-of-Launch I-Load pdate (DOLIL II system. DOLIL II system has been in operational use for several years. Bayesian framework was chosen for reliability assessment, because it allowed incorporation of failure free executions observed in the operational environment into the reliability prediction.. Introduction Quality assurance of software is a notoriously difficult problem. Due to the complexity of software systems and to the issue of scalability, most techniques that are sufficiently precise to ensure quality are too cumbersome to scale up. Therefore, the use of software in safety-critical systems remains a somewhat paradoxical issue. The properties of software that can be exploited to improve the control and safety functions in critical applications, are exactly the qualities that lead to our concern about the reliability of the systems that are being produced. In this paper, we present practical problems and challenges encountered in an ongoing effort to assess software reliability of NASA s Day-of- Launch I-Load pdate (DOLIL system. The Day-of-Launch I-Load pdate system for the Space Shuttle program has been developed to allow modification of the Shuttle first stage guidance commands based on actual wind Dan McCaugherty Intermetrics, Inc. NASA Software IV&V Facility niversity Drive Fairmont, WV mccaugherty@intermetrics.com conditions measured in hours preceding launch. This system consists of the trajectory software required to generate and verify the new I-Loads, to evaluate wind and trajectory conditions, and to recommend decisions to fly (or not to fly with the new I-Loads. From this short description, it is apparent that DOLIL is a high consequence system, i.e., there is a very high cost associated with eventual occurrence of a failure. Software reliability is defined as a probability of failure free execution given a specific environment and a fixed time interval. The goal of software reliability assessment is not just to estimate the failure probability of the program, θ, but to gain statistical confidence that θ is realistic. In practice, the required failure probability θ and the confidence level C are application specific and predefined. Our discussions with NASA IV&V personnel revealed that DOLIL system requires demonstration of failure probability to be under -4. Due to the criticality of the program, required confidence level should surpass.99. The state of the practice in software reliability engineering states that these reliability levels are practically achievable [But93, Law92]. In principle, software reliability can be quantified through program verification or statistical testing. The requirements specification for DOLIL system is written in English and no attempt has been made to formalize it in any form of mathematical notation. Furthermore, the size and the complexity of the specification documents make formal program verification virtually impossible. Therefore, the reliability assessment of DOLIL system is to be obtained from program testing. The rest of the paper is organized as follows. Section 2 introduces the DOLIL System. Section 3 outlines the approach used in reliability assessment of DOLIL. Section 4 presents the theoretical background of Bayesian estimation. Section 5 describes required testing

2 Primary System Decision Data Generate Wind Profile Tracking Data Wind Data E X E C T I V E Design Guidance Cmds Simulate Trajectory Verify Trajectory and Guidance Cmds DIVDT Generate Range Data Evaluate Results Decision Data RANGE SAFETY SHTTLE ORBITER Transmit Guidance Cmds and Range Data Range Data Guidance Cmds Simulate Trajectory Verify Trajectory and Guidance Cmds Secondary System Figure. Integrated Day-Of-Launch I-Load pdate (DOLIL System Diagram effort for the reliability assessment of DOLIL. Only successfully simulated trajectories are Section 6 concludes paper with a summary. forwarded to the verification module. 2. Description of DOLIL System The DOLIL system diagram is shown in Figure. In order to increase fault detection capabilities, DOLIL system consists of two independently developed computational lanes. Guidance commands are considered valid only if the outputs of both computational lanes are the same. For the purpose of this study, we are interested in the reliability assessment of the primary system. The DOLIL system functions as follows. When provided with the wind and atmospheric data, the executive module invokes the Day-of-Launch Ascent Design System (DADS. DADS module generates guidance commands for the day of launch condition. Guidance commands are passed to the Space Vehicle Dynamic Simulation (SVDS Processor. It generates trajectories for reference winds and atmospheric conditions. Near real-time verification is the most critical function of the DOLIL system. Successfully simulated trajectories and their corresponding I- Loads are verified for conformance with safety related rules, called envelopes. The envelopes have been derived from previous experience (experience envelopes, and known system constraints (system envelopes. If any of the system constraint rules is violated, the violation must be reported and the trajectory must be dismissed. The Day of Launch I-Load Verification Table (DIVDT Processor performs trajectory verification, and it should detect all potentially unsafe flight conditions [Roc93]. Therefore, the reliability quantification of the DIVDT processor is highly desirable. Since it verifies the outputs of all the other modules of DOLIL system, our study in limited to the reliability assessment of DIVDT processor.

3 3. Reliability Assessment of DIVDT Dynamic testing is a conventional method of program checking. A program under test is executed with different combinations of input data, and the results are compared with the expected values. The outcome of testing is used to predict the reliability of the program. Note, however, that software reliability depends on the software quality as well as on its operational usage. Since testing cannot guarantee the absence of faults, exposing the program to the operations anticipated to be the most frequently used should catch the failures that are most likely to appear during field use. It is assumed that these failures, if detected, are the ones that matter the most to the user. The quantitative approximation of the system's field use is called the operational profile [Mus93]. In order to predict software reliability of DIVDT based on testing, we considered three different groups of methods: time domain methods, fault based methods, and input domain methods. Interestingly, at the time of its reliability assessment, DOLIL system has been in operational use by NASA for several years already. This fact distinguishes our study from most other reported case studies in software reliability engineering. Rigorous quality assurance procedures, described in [NASA93], have been performed prior to the deployment of DOLIL. These included fault based testing, and several other white-box testing techniques. No assumptions were drawn concerning the distribution of the remaining faults in the program, since any such prediction would have been very difficult to justify. Similar to fault based methods, reliability growth models (time domain methods make difficultto-justify assumptions about the size of faults that are removed at each step. These models attempt to extrapolate future software behavior from its past. During the validation testing and operational use of DOLIL system, no faults or failures have been revealed. Therefore, there is no data available to build a reliability growth model and predict software reliability. Consequently, an input domain based approach has been chosen for the reliability prediction of DIVDT processor. In this approach, the reliability of a program is the probability of failure free operation for specific, statistically independent, inputs. A sound foundation for reliability assessment in the input domain is provided by statistical sampling theory [Bas94]. Practical drawbacks of the input domain based software reliability assessment are the following:. A large number of test cases is required, 2. Reliability estimation depends upon the ability to closely approximate/predict the operational profile of the field use, and 3. The existence of a test oracle is assumed. The fact that DOLIL system has been is use for several years helped us to cope with the first problem. A Bayesian estimation framework, described by Miller, et. al. [Mil92], and Littlewood [Lit95], is an effective reliability prediction framework. It allowed us to take into account failure-free executions of DOLIL system that were observed in its operational environment. Due to its ability to take into account prior information, the Bayesian reliability assessment framework provided a significant reduction in the number of required test cases, when compared with the number of test cases inferred from the statistical sampling theory. Furthermore, we used slicing transformations [Cuk97] to speed up testing. The application of slicing transformations to DIVDT processor allowed parallel execution of test cases on independent parts of the program. 4. Bayesian Estimation of Software Reliability A Bayesian estimation framework, described by Miller et al [Mil92] and Littlewood et al [Lit95], gave the foundation for an effective framework for reliability prediction, which could incorporate past data (prior information as a predictive measure for future executions. Let θ denote the required probability of failure for the given program and C represent the confidence level that is required (due to the criticality of the system under test this should surpass.99. The system in its present usage requires a demonstration of failure of probability to be under -4. Mathematically this can written as 4 P( θ < C...( The probability density function of the Beta distribution with parameters a, b is

4 θ f ( θ a B( a, b where B(a,b is the complete beta function. The parameters a and b can be adjusted to reflect our prior beliefs about the reliability of software under test. 4. Prior Assumptions b...(2 Assuming ignorance prior entails that it is equally likely to have any value for θ in the range between. to.. Setting the values of a and b to constant (ab reflects the ignorance prior [Joh69]. ]. This yields a rectangular function depicting that θ can assume any value between. and. with equal probability, as shown in the following formula θ f ( θ B(,...(3 The selection of a and b in Equation 2 allows prior information about the software to influence the interpretation of the test results. Equation 3 suggests that when no proir knowledge of software reliability is available, parameters a and b should be set to. One method of accomplishing incorporation of prior information (gathered through software inspection, for example [Mil92] into Bayesian reliability estimation is through the assumption that the given information is equivalent to a certain number or successfully executed random tests. For example, when the value of a is set to and b is set to x, this encodes the belief that prior to testing the program, it has executed x tests without encountering a failure (these test cases have been generated in accordance with the operational profile. In general, any prior assumption that is based on justifiable prediction of a mean and a variance for θ can be converted into values of parameters a and b [Mil92]. 4.2 Sampling vs. Bayesian Estimation As mentioned earlier, the goal for reliability estimation is to estimate the failure probability of the program θ, and to gain statistical confidence that θ is realistic. In practice the required failure rate θ and the confidence level C are usually predefined. The remaining question is how much testing needs to be done to satisfy these reliability requirements. Let T be a random variable denoting the total number of test cases that are executed till the first failure is detected. To achieve a required confidence, an unknown number of test cases needs to be executed, such that P ( T C...(4 The distribution of T is geometric, and the probability that T assumes a particular value t is t P( T t (...(5 By combining (4 and (5 we have t θ θ t The left hand side of equation (6 is a geometric series. When solved, the required number of test cases can be expressed as Within the Bayesian estimation framework, the assumption of ignorance prior (ab mimics random sampling. The posterior distribution f(θ that reflects the ignorance prior, after the failure free execution of independent test cases, is In Bayesian estimation, the posterior distribution follows the same distribution as the prior, thus forming a conjugate family of distributions [John69]. B(,+ is the complete Beta function. According to [Mil92], complete Beta function B(a,b can be expressed as θ C...(6 ln( C...(7 ln + θ f ( θ B(, +...(8 a b B( a, b θ dθ...(9

5 Hence, B(,+ can be expressed as B(, + dθ + Practical reliability requirements usually state that the failure rate θ should be less than a premeditated value P. Reliability assessment must establish the confidence level C, expressing the evidence supporting the claim that the desired reliability has been achieved. This can be written as follows: P( θ P C...( P(θ < P is the cumulative density function given by P( θ P f ( θ dθ Thus, Equation can be rewritten to Substituting for f(θ, from (8 into (2, entails Combination of ( and (3 provides...( After integration, the following expression emerges: f ( θ dθ C...( 2 dθ C...(3 B(, + ( + dθ C. ( + C...(4 ( + ( o Simplification of Equation 4 provides the expression for the total number of test cases : ( P + C ln( C...( 5 ln( P Formulae (7 and (5 clearly indicate that, when prior information is not available (or not rigorously justifiable, the same number of failure free test case runs are required in both reliability estimation frameworks: statistical sampling and Bayesian estimation. If prior information can be taken into consideration, according to the above outlined method, a is set to, and b is set to T, where T is the number of test cases proportional to the strength of the prior belief in program correctness. Therefore, the posterior density function will be + To f ( θ...(6 B(, To + Then, the corresponding complete Beta function is B(, To + dθ To + To+ After substitution in (3 and simplification, the number of failure free test runs needed to achieve confidence level C in software reliability is given by the expression below ln( C To...(7 ln( In essence, the strength of the prior evidence in reliability of software may reduce the testing effort. In the following section, we demonstrate the impact of this reduction in testing to the reliability assessment process of DOLIL. 5. Required Testing Effort for DOLIL II Reliability Assessment According to the reliability requirements of DOLIL System, the probability of software failure should be less than -4 failures per

6 execution, and the confidence should surpass.99. Substituting in (7, ln(.99 T 4 ln( ln(. ln(.9999 T 4.6 For example, under the assumption that DOLIL II has executed 5 (T 5 times without encountering a failure, the number of test cases is 45,552, i.e., Table and Table 2 provide better insights into the required number of test cases as functions of reliability requirements. The number of test cases in both tables assumes that no prior information has been taken into account for reliability assessment purpose (T. Table indicates that an order of magnitude in required reliability requires additional order of magnitude in terms of the number of tests. The confidence level in Table is constant, C.99. Value of θ Number of Test cases , , ,54-6 4,65,67 Table : Number of test cases as a function of required failure rate, with C.99 In Table 2, the required failure rate of the program is set to a constant, θ -4, while the confidence level varies between.92 and.999. Value of C Number of Test Cases.92 25, , , , , ,74 Table 2: Number of test cases as a function of required confidence level C, with θ -4 4 T ln( T ln(. 6. Summary We presented a Bayesian framework for quantification of software reliability of NASA s DOLIL II system. The quantification of reliability is inferred from continual failure free operation of the system. Since DOLIL II system has been in operational use for several years, one of the important features of the described reliability assessment methodology is the ability to incorporate information on system s (failure free past performance into the reliability estimation. Our current activities target automated test data generation (trajectories for the Space Shuttle, and identification of suitable testing oracles. References [NASA93] NASA, Space Shuttle: DOLIL-II, Definition and requirements document, Volume VI, QALITY ASSRANCE RLES, May 993. [Roc93] Rockwell International, Software Requirements Specifications, Flight design and Dynamics ascent discipline Ascent Subsystem Day of launch function (DOLIL-II DIVDT program function: Ver: 4.2 FD, March 993. [Bas94] F. B. Bastani, A. Pasquini, Assessment of a Sampling Method for Measuring Safety-Critical Software Reliability, Proc. ISSRE'94, Monterey, CA., Nov [But93] R. W. Butler, G. B. Finelli, The Infeasibility of Quantifying the Reliability of Life-Critical Real- Time Software, IEEE Trans. Software Eng., Vol. 9, No., Jan. 993, pp [Cuk97] B. Cukic, Transformational Approach to Software Reliability Assessment,'' Doctoral Dissertation, Department of Computer Science, niversity of Houston, July 997. [Joh69] N. L. Johnson, S. Kotz, Distributions in Statistics: Continuous nivariate Distributions, Wiley Series in Probability and Mathematical Statistics, 969. [Law94] J. D. Lawrence, W. L. Persons, G. G. Preckshot, J. Gallagher, Evaluating Software for Safety Systems in Nuclear wer Plants, in Proc. COMPASS '94, Gaithersburg, MD, June 994. [Lit95] B. Littlewood, D. Wright, Stopping Rules for the Operational Testing of Safety-Critical Software, in Proc. FTCS'25, Pasadena, CA, June 995, pp [Mil92] K. Miller, L. J. Morell, R. E. Noonan, S. K. Park, D. M. Nicol, B. W. Murrill, J. W. Voas, Estimating the Probability of Failure When Testing Reveals no Failures, IEEE Trans. on Software Eng, Vol. 8, No., Jan. 992, pp [Mus93] J. D. Musa, Operational Profiles in Software Reliability Engineering, IEEE Software, Mar. 993, pp