Using Auto-Generated Diagnostic Trees for Verification of Operational Procedures in Software-Hardware Systems

Size: px
Start display at page:

Download "Using Auto-Generated Diagnostic Trees for Verification of Operational Procedures in Software-Hardware Systems"

Transcription

1 Using Auto-Generated Diagnostic Trees for Verification of Operational Procedures in Software-Hardware Systems Tolga Kurtoglu Mission Critical NASA Ames Research Center tolga.kurtoglu@nasa.gov Robyn Lutz NASA Jet Propulsion Laboratory/CIT and Iowa State University robyn.r.lutz@jpl.nasa.gov Ann Patterson-Hine NASA Ames Research Center ann.patterson-hine@nasa.gov

2 What are procedures? A procedure is a detailed set of instructions specifying how a piece of equipment is operated, or a task is to be performed. Each step of a procedure may have conditions that must be satisfied before it can take place, and effects that must be understood when considering the implications on other steps of procedures. Procedure execution involves issuing commands to spacecraft, robots or systems; interpreting the responses of those systems; and choosing the next step in the procedure based on those responses. Procedures embody the engineering knowledge of the system or equipment involved in the tasks, and cover both nominal and off-nominal cases that arise. (Franks, 2008)

3 What do procedures include? software checks and calibrations conditional commands manual inputs and checks of console data inspection of physical equipment, recovery actions Why verify operational procedures? (1) mission safety, (2) accomplishment of the scientific mission objectives are highly dependent on the correctness of procedures

4 Key technical challenges for verification of procedures traditional procedure development is labor-intensive and critically dependent on human expertise traditional procedures: shortcomings difficult to handle changes to system configuration (change risk) actions may depend on system health conditions multiple faults are typically not accounted for

5 Existing verification techniques for software-hardware systems manual verification: inspection and reviews conformance of command programs to procedure definitions automated verification: static checkers correctness of syntax of procedures variable declarations, run-time errors, null pointers operational bounds, order of procedure calls, etc. automated verification: model checkers systematic exploration of a systems state space deadlocks, race conditions verification of reaching a desired system state

6 What are we proposing? a model-based perspective for verification of operational procedures by exploiting knowledge and automated analysis techniques applied for the diagnostic process by MBD systems (TEAMS tool suite) the research problem we are studying is how to use auto-generated diagnostic trees from existing software-hardware system models to verify and improve a procedure s sequence of diagnostic checks and recovery actions

7 The Diagnostic Tree for Verification (DTV) Method

8 The DTV Method: Modeling Environment TEAMS Overview Testability Engineering and Maintenance System (from QSI) Test Properties Hardware/Software Properties Diagnostic Tree Variety of Analysis Options Testability Figures of Merit

9 The DTV Method: Modeling Environment TEAMS Overview Cause-effect dependency modeling using multi-signal directed graphs collect and review all available system documentation create a hierarchical structural model of the system add links between the modules indicating the dependency (electrical, mechanical, hydraulic, commands, etc.) flow add test points and tests to the model perform various testability analysis on the model review diagnostic tree and resulting diagnostic strategy

10 The DTV Method: Modeling Environment TEAMS Overview Cause-effect dependency modeling using multi-signal directed graphs Failure modes of components are embedded inside the modules Tests Symptoms

11 The DTV Method: Modeling Environment TEAMS Overview Reasoning is done by a dependency matrix that captures which failure sources can be observed by each of the checks ( tests ) S1 T1 T2 T3 T4 Module Module Module Module4 1 1

12 The DTV Method: Diagnostic Tree Overview Test results (i.e pass/fail) are used to produce a diagnostic tree of checks needed to detect & isolate failures and to model recovery actions Original Symptom Tests (Pass/Fail) Set-up Actions Recovery Actions

13 The DTV Method: Procedure Overview Original Symptom Tests (Pass/Fail) Set-up Actions Recovery Actions

14 The DTV Method: Analysis of Procedures this comparison is currently manual however, we are working towards integrating DTV with work of others to make future analyses automated

15 The DTV Method: Success Measures and Metrics for Evaluation 1. Correctness path coverage determine whether the operational procedure covers all the paths in the diagnostic tree auto-generated by the model. branch coverage determine whether the operational procedure covers all the branches (i.e., includes all the tests) in the diagnostic tree auto-generated by the model.

16 The DTV Method: Success Measures and Metrics for Evaluation 2. Reduced Complexity shorter path does the diagnostic tree identify an operational procedure that is equivalent in terms of isolating the same fault(s) as the operational procedure, but that contains fewer steps? fewer branches does the diagnostic tree identify an operational procedure that is equivalent in terms of isolating the same fault(s) as the operational procedure, but that contains fewer tests?

17 The DTV Method: Success Measures and Metrics for Evaluation 3. Improved Efficiency resource usage does the diagnostic tree identify an operational procedure that is equivalent in terms of isolating the same fault(s) as the operational procedure, but that uses fewer resources? reduced cost does the diagnostic tree identify an alternative, lower-cost troubleshooting strategy that can be directed to use costs (financial, power, or duration) associated with specific tests?

18 The DTV Method: Success Measures and Metrics for Evaluation 3. Improved Efficiency increased autonomy does the diagnostic tree identify an alternative troubleshooting strategy with increased opportunity for autonomy over an existing manual procedure? improved sensor and test placement does the diagnostic tree identify an an alternative troubleshooting strategy with increased opportunity for improvements in procedures resulting from the addition /deletion/change of sensors and test points?

19 Case Study: ADAPT Electrical Power System Power storage batteries Power distribution relays, circuit breakers, inverter Variable load configuration A data acquisition and control system Various instrumentation points and over 100 sensors The simplified schematic of the ADAPT EPS System (from Ghosal and Azam, 2008)

20 Case Study: ADAPT Software Challenges DAQ module faults sensor input faults absent incorrect timing/order duplicate command faults absent, blocked incorrect timing/order duplicate

21 Case Study: ADAPT Fault Scenarios two scenarios and analysis of two associated procedures that illustrate how the procedures can be verified for branch coverage (metric), and fewer branches (metric). first scenario battery output voltage low anomaly second scenario load bank relay position anomaly

22 Case Study: ADAPT Fault Scenarios first scenario: battery output voltage low anomaly 1. verify the operational mode (or configuration) of the EPS system (in this case Battery 1 powers AC Load A1) 2. check the battery output voltage (EI 135 reading), and if low, 3. command Battery 1 off and Battery 2 on, 4. command Relay EY 241, EY 260, and EY 274 closed, 5. check the temperature of AC Load A2 (TE 505), 6. verify the reconfigured operational mode (or configuration) of the EPS system (now Battery 2 powers AC Load B2.)

23 Case Study: ADAPT Fault Scenarios the procedure checks for a battery failure and reconfigures the system to use a redundant battery to power an identical load type. however, it is missing a test that could have disambiguated between a false alarm due to a sensor failure (EI 135) and an actual battery failure (Battery 1). As a result, it directly prompts for reconfiguration of the system to use the redundant battery power. the TEAMS model and the auto-generated diagnostic tree can easily identify this missing test (metric) which would eliminate the possibility of a sensor failure and verify an actual battery failure.

24 Case Study: ADAPT Fault Scenarios second scenario: load bank relay position anomaly 1. verify the operational mode (or configuration) of the EPS system (in this case Battery 1 powers AC Load A1), 2. verify the relay position sensor output (ESH 170 reading) to be open, 3. verify Inverter 1 output voltage (EI 165) is within operational limits, 4. if true, check the temperature output of AC Load A1 (TE-500), 8. if false, go to. 5. if within operational limits conclude ESH 170 sensor failure, or 6. if outside of operational limits go to Procedure Inverter 1 Output Voltage Anomaly, 7. if zero conclude EY 170 relay failure, or

25 Case Study: ADAPT Fault Scenarios the procedure checks checks for a load bank relay failure, disambiguates between a relay and relay sensor failure and reconfigures the system to use the redundant load bank in case of a relay failure. the procedure includes 3 checks (ESH 170, EI 165, and TE 500) to conclude that the anomaly is a relay sensor failure. However, the same diagnosis can be made by using only two of the available tests (ESH 170 followed by TE500). the TEAMS model and the auto-generated diagnostic tree can easily identify this path with fewer branches (metric).

26 DTV: Summary we presented the Diagnostic Tree for Verification (DTV) Method unique aspects of the DTV method: identify limitations and potential improvements for procedures exploring alternative ways of performing diagnosis/recovery uses system models already constructed by NASA ability to fuse information from multiple sensors/test points reduces the risk that change introduces in procedures preliminary results are promising

27 Future Outlook expanding the model and procedure definitions to include sw faults checks in the procedures involving human elements (e.g., intervention) not currently represented in the model scalability of the approach for larger systems developing a representation to automate the comparative analysis integration with formal methods for V&V

28 This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, and at NASA Ames Research Center, under a contract with the National Aeronautics and Space Administration. The work was sponsored by the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA Software IV&V Facility. This activity is managed locally at JPL through the Assurance and Technology Program Office

29 Questions?