STPA: A New Hazard Analysis Technique. Presented by Sanghyun Yoon

Similar documents
A Systems Approach to Risk Management Through Leading Indicators

Assuring Safety of NextGen Procedures

Software Safety Testing Based on STPA

Boeing Engineering. Overview of

A System s Approach to Safety. Prof. Nancy Leveson Aeronautics and Astronautics MIT

STAMP Applied to Workplace Safety

Delivering Safety Through Design Using Early Analysis Methods. Mark A. Vernacchia, MSES, PE General Motors Company; Milford, Michigan, USA

A Human Factors Approach to Root Cause Analysis:

Lecture 7. Safety Analysis: Failure Modes and Effect Analysis (FMEA) Functional Hazard Assessment (FHA)

Iterative Application of STPA for an Automotive System

Requirement Error Taxonomy

STAMP Experienced Users Tutorial. John Thomas Blandine Antoine Cody Fleming Melissa Spencer Qi Hommes Tak Ishimatsu John Helferich

Engineering a Safer and More Secure World

INTEGRATED modular avionics, or IMA, is a shared set of flexible, reusable, and interoperable hardware and software resources that, when

Systems Theoretic Process Analysis Applied to Air Force Acquisition Technical Requirements Development

Application of STPA to ESF-CCS. Dong-Ah Lee, Jang-Soo Lee, Se-Woo Cheon, and Junbeom Yoo

Intelligent-Controller Extensions to STPA. Dan Mirf Montes

Using STPA in Compliance with ISO26262 for developing a Safe Architecture for Fully Automated Vehicles

The Path to More Cost-Effective System Safety

CORE TOPICS Core topic 3: Identifying human failures. Introduction

COAL - HAZOP SYSTEMS ANALYSIS GRINDING AND FIRING

Hazard Analysis Technique Selection

Using System Theoretic Process Analysis (STPA) for a Safety Trade Study

Incident Management Process

Incident [Accident] Investigations

Process Safety Management (PSM)

COMPARISON OF PROCESS HAZARD ANALYSIS (PHA) METHODS

Incident Management Process

Six Keys to Managing Workplace Electrical Safety

Automated System Validation By: Daniel P. Olivier & Curtis M. Egan

Application of STAMP to Improve the Evaluation of Safety Management Systems

Resilience Engineering and FRAM Today. (June 7, 2012)

Techniques and benefits of incorporating Safety and Security analysis into a Model Based System Engineering Environment

Enabling Improvement through the Product Lifecycle: Change Management within a PQS

USING STPA FOR EVALUATING AVIATION SAFETY MANAGEMENT SYSTEMS (SMS)

STAMP A SIMPLE GUIDE TO HAZARD ANALYSIS. Michael Killaars Ruby Weener Thom van den Engel Oktober 2016

CIS 890: High-Assurance Systems

Human Failure. Overview. People are never 100% reliable. Andy Brazier. Types of human failure Slips Mistakes Violations

Elements of an Effective Safety and Health Program

Human Factor in Functional Safety

Regulatory Compliance Inspections at MAH Offices: Implications for Manufacturers

Brief Summary of Last Lecture. Model checking of timed automata: general approach

Applying System Engineering to Pharmaceutical Safety

REGULATION REGARDING EQUIPMENT PROCUREMENT PROCESS AND APPROVAL OF MANUFACTURERS FOR NUCLEAR FACILITIES

Building Behavioral Competency into STPA Process Models for Automated Driving Systems

Guideline content. Page 1

Using STPA in Compliance with ISO26262 for developing a Safe Architecture for Fully Automated Vehicles

Towards a STAMP-Based Safety Plans Approach for Construction Projects

Strategy Analysis. Chapter Study Group Learning Materials

Applying STAMP/STPA to Analyze the Causes of the Unexpected Fire Happening at the Heat Treatment Process

FRAM First Steps How to Use FRAM and the FMV

Cause and Effect Relationship with Trending

Chapter 6. Software Quality Management & Estimation

Project vs Operation. Project Constraints. Pankaj Sharma, Pankaj Sharma,

What Extent is Risk Reduction Required?

Implementing Hazard & Operability (HAZOP) Studies Throughout the Project Life Cycle

Self-Inspection and its potential benefits via ICH Q9 & Q10

Process Hazard Analysis Fundamentals. Walt Frank, P.E. Frank Risk Solutions F R A N K R I S K S O L U T I O N S

Systemic Accident Analysis Methods What are they? How feasible in healthcare?

A systems-theoretic approach

Contents of the Failure Mode Effects Analysis the Plant Wellness Way Distance Education Course FMEA Training Online

ITIL Intermediate Capability Stream:

NBAA SAFETY CULTURE SURVEY

Risk analysis of serious air traffic incident based on STAMP HFACS and fuzzy sets

Engineering systems to avoid disasters

9. Verification, Validation, Testing

Getting Started with Risk in ISO 9001:2015

FMEA Failure Mode Effects Analysis. ASQ/APICS Joint Meeting May 10, 2017

SOFTWARE FAILURE MODES EFFECTS ANALYSIS OVERVIEW

Using STAMP to investigate decisionmaking. a DSB Case Study Proposal

STATE NUCLEAR POWER SAFETY INSPECTORATE Of THE REPUBLIC OF LITHUANIA (VATESI) REGULATIONS

V1.2 A LEADER S GUIDE TO DR. MARK FLEMING CN PROFESSOR OF SAFETY CULTURE SAINT MARY S UNIVERSITY

Risk Module: Risk Management, Fault Trees and Failure Mode Effects Analysis Exploration Systems Engineering, version 1.0

Engineering for Humans: A New Extension to STPA

Accident. and Incident Investigations. Investigation Procedures. Self-study Review and Answer Sheet

CHAPTER 1. Business Process Management & Information Technology

Challenge H: For an even safer and more secure railway

Business Focused Maintenance

Chapter 3 Structuring Decisions

Safety Committee Training December 6, 2017

Risk Management Using Spiral Model for Information Technology

A New Approach to Hazard Analysis for Rotorcraft

Managing Safely Online Course User Guide

Functional Analysis Module

APM Reliability Classic from GE Digital Part of our On-Premise Asset Performance Management Classic Solution Suite

Identify Risks. 3. Emergent Identification: There should be provision to identify risks at any time during the project.

FRAM: Setting the scene. Erik Hollnagel

COMPARISON - SMS AND QMS

The investigation of safety management systems, and safety culture

HAZOPS AND HUMAN FACTORS

Implement Effective Computer System Validation. Noelia Ortiz, MME, CSSGB, CQA

Resilience engineering Building a Culture of Resilience

Lesson 31- Non-Execution Based Testing. October 24, Software Engineering CSCI 4490

Risk Assessment: Chapter 12

Data Collection Systems

HUMAN ERROR ANALYSIS AT A REFINERY

A New Approach to System Safety Engineering

NANCY G. LEVESON JOHN P. THOMAS

electrical engineering division Arc Flash Study Planning Kit Everything You Need To Ensure The Success of Your Arc Flash Project

Practical Strategies for Improving Quality Risk Management activities in GMP environments

Transcription:

STPA: A New Hazard Analysis Technique Presented by Sanghyun Yoon

Introduction Hazard analysis can be described as investigating an accident before it occurs. Potential causes of accidents can be eliminated or controlled in design or operations before damage occurs. The most widely used existing hazard analysis techniques were developed fifty years ago and have serious limitations. This chapter describes a new approach to hazard analysis, bases on the STAMP causality model, called STPA (System-Theoretic Process Analysis). 2

Goals for a New Hazard Analysis Technique Old hazard analysis techniques Fault Tree Analysis Event Tree Analysis HAZOP FMEA 3

Goals for a New Hazard Analysis Technique (cont d) Including the new causal factors identified in STAMP that are not handled by the older techniques. Old techniques did not include design errors, software flaws, component interaction accidents, and so on. The goal is to identify accident scenarios that encompass the entire accident process, not just the electro-mechanical components. Old techniques have different assumptions which are different from new causal factors. To provide guidance to the users in getting good results. Fault and event tree: the tree itself is the result of analysis HAZOP: guidewords more than, less than and opposite It can be used before a design has been created. 4

The STPA Process (1) STPA can be used at any stage of the system lifecycle. STPA uses a control diagram, the functional requirements, system hazards, the safety constraints and safety requirements for components as defined in chapter 7. When STPA is applied to an existing design (chapter 8) This Information is available when analysis process begins. When STPA is used for safety-guided design, (chapter 9) Only system the system-level requirements and constraints may be available at the beginning of the process. These requirements and constraints are refined and traced to individual system components as the iterative design and analysis process proceeds. 5

The STPA Process (2) Main steps 1. Identify the potential for inadequate control of the system that could lead to a hazardous state. a. A required control action is not provided or not followed; b. An incorrect or unsafe control action is provided; c. A potentially safe control action is provided too early or too late, that is, at the wrong time or in the wrong sequence; d. A correct control action is stopped too soon; 6

The STPA Process (3) Main steps 2. Determine how each potentially hazardous control action identified in step 1 could occur. a. Augment the control structure with process models for each control component. b. For each unsafe control action, examine the parts of the control loop to see if they could cause it. Design controls and mitigation measures if they do not already exist or evaluate existing measures if the analysis is being performed on an existing design. For multiple controllers of the same component or safety constraint, identify conflicts and potential coordination problems. c. Consider how the designed controls could degrade over time and build in protection, including 1) Management of change procedures to ensure safety constraints are enforced in planned changes. 2) Performance audits where the assumptions underlying the hazard analysis are the preconditions for the operational audits and controls so that unplanned changes that violated the safety constraints can be detected. 3) Accident and incident analysis to trace anomalies to the system design. 7

The STPA Process (4) Control structure example 8

Hazard: Inadvertent launch 9

Identifying Potentially Hazardous Control Actions (Step 1) Assess the safety controls to determine the potential for leading to a hazard. Control actions can be hazardous in four ways a. A required control action is not provided or not followed. b. An incorrect or unsafe control action is provided that lead to hazard. c. A potentially safe control action is provided too early, too late, or out of sequence. d. A correct control action is stopped too soon (for a continuous or nondiscrete control action). 10

Identifying Potentially Hazardous Control Actions (Step 1) - Interlock example Safety constrains The power must always be off when the door is open; A power off command must be provided within x milliseconds after the door is opened A power on command must never be issued when the door is open The power on command must never be given until the door is fully closed 11

Identifying Potentially Hazardous Control Actions (Step 1) - FMIS example (1) 12

Identifying Potentially Hazardous Control Actions (Step 1) - FMIS example (2) 13

STPA: A New Hazard Analysis Technique DETERMINING HOW UNSAFE CONTROL ACTIONS COULD OCCUR (STEP 2) 14

Purpose of Step 2 Step 1 provides the component safety requirements(constraints). Step 2 used to identify the ways that the software safety constraint might still not be enforced by the software logic and system design. Step 2 identifies the scenarios or paths to hazard found in a classic analysis. 15

Step 2a: Augment the control structure with process models To create causal scenarios, the control structure diagram is first augmented with process models for each component. If the system exists, The content of these models should be easily determined by looking at the system functional design and its documentation. If the system does not yet exist, The analysis can start with a best guess then be refined and changed as the analysis process proceeds. 16

17

18

Step 2b: Determine how hazardous control actions could occur 1. Identify how hazardous control action could happen. 2. For each potential causes Design controls and mitigation measures if do not already exist Evaluate existing measure if the analysis is being performed on an existing design 19

Identify how hazardous control action could happen 20

Try to eliminate the hazard completely Interlock example Design: add a circuit which is broken as soon as the door opens Feedback: add a light that identifies whether the power supply is on or off Incorrect assumption More about the safety design? Chapters 16 and 17 of Safeware and Chapter 9 of this book Changes must be analyzed for new hazards or causal scenarios. 21

Step 2b: FMIS example (1) A missing case in the requirements A common occurrence in accidents where software is involved Described in next slide. 22

track Radar detected a missile -> Weapons Free -> Fire Enable 23

Step 2b: FMIS example (1) (cont d) When an operator make a track inactive? A given period passes with no radar input The total predicted impact time elapses for the track An intercept is confirmed One case was note considered by the designers If an operator deselects all of the options, no tracks will be marked as inactive -> FIRE ENABLE command would be send to the launch station immediately, even if there were no threats Solution Fix the software requirements and the software design to include the missing case. 24

Step 2b: FMIS example (2) confusion between the regular and the test software Launch Station has two operating modes Testing Live fire Causal sequence Small window of time before the launch station notifies the fire control component of the change 25

Step 2c: Consider the degradation of controls over time Build in protection against the degradation. Degradations, interlock example: warning light Operator(human who opens the door) might create workarounds to bypass the light They do not understand the purpose The light might be partially blocked from view because of workplace changes -> The assumptions and required audits should be identified during the system design process 26

Step 2c: Consider the degradation of controls over time (cont d) Management of change procedures need to be developed and the STPA analysis revisited. (chapter 10) After accidents and incidents, the design and the hazard analysis should be revisited to determine why the controls were not effective. (chapter 12) 27

Human controllers (1) Humans in the system can be treated in the same way as automated components in Step 1 of STPA. Difference: Humans need an additional process model All controllers need a model of the process they are controlling directly Human controllers also need a model of any process 28

29

Human controllers (2) Difference: algorithm Automated system: basically static control algorithms Humans: dynamic control algorithms that they change as a result of feedback and changes in goals Human error is best modeled and understood using feedback loops Example: Driving a car Driving algorithm is changed by car s performance, in goals and motivation, or driving experience. 30

Human controllers (3) Hazard analysis must identify the specific human behaviors that can lead to the hazard. We are not able to redesign humans. Training can be helpful, but not nearly enough Take the information obtained in the hazard analysis about worst case human behavior, Use it in the design of the other system components 31

STPA: A New Hazard Analysis Technique USING STPA ON ORGANIZATIONAL COMPONENTS OF THE SAFETY CONTROL STRUCTURE 32

Programmatic and organizational risk analysis NASA Independent Technical Authority (ITA) function risk analysis. 33

34

Gap Analysis Mapping the identified safety requirements and constraints to the assigned responsibilities in the safety control structure and identify any gaps. The responsibility for the system-level safety requirement: 1a. Sate-of-the art safety standards and requirements for NASA missions must be established, implemented, enforced, and maintained that protect the astronauts, the workforce, and the public 35

36

Gap Analysis (cont d) finding holes Omission: technical conscience No defined way to express concerns about the warrant holders. Omission: person(s) who see that engineers and managers are trained to use the results of hazard analyses in their decision-making. 37

Hazard analysis to identify organizational and programmatic risks Risks in STAMP terms can be divided into two types Basic inadequacies in the way individual components in the control structure fulfill their responsibilities Risks involved in the coordination of activities and decision-making that can lead to unintended interactions and consequences 38

Hazard analysis to identify organizational and programmatic risks basic risks (1) General types of risks: Unsafe decisions are made or approved by the Chief Engineer or warrant holders. Safe decisions are disallowed (e.g., overly conservative decision-making that undermines the goals of NASA and long-term support for ITA) Decision-making takes too long, minimizing impact and also reducing support for the ITA. Good decisions are made by the ITA, but do not have adequate impact on system design, construction, and operation. General responsibilities must be refined into more specific control actions to omit the control actions. 39

Hazard analysis to identify organizational and programmatic risks basic risks (2) The control actions the Chief Engineer has available to implement this responsibility are: To develop, monitor, and maintain technical standards and policy In coordination with programs and projects, to establish or approve the technical requirements and ensure they are enforced and implemented in the programs and projects (ensure the design is compliant with the requirements) To approve all changes to the initial technical requirements To approve all variance (waivers, deviations, exceptions to the requirements) Etc. 40

Hazard analysis to identify organizational and programmatic risks basic risks (3) To develop, monitor, and maintain technical standards and policy is identified using STPA Step 1 include: General technical and safety standards are not created. Inadequate standards and requirements are created. Standards degrade over time due to external pressure to weaken them. The process for approving changes is flawed. Standards are not changed over time as the environment changes. 41

Hazard analysis to identify organizational and programmatic risks coordination risks Coordination risks arise when multiple people or groups control the same process. The types of unsafe interactions that may result include Both controllers assume the other is performing the control responsibilities and as a result nobody does Controllers provide conflicting control actions that have unintended side effects. When similar responsibilities related to the same system requirement are identified, the potential for new coordination risks needs to be considered. 42

Use of the analysis and potential extensions The application of Step 2 of STPA to identify causes of the risks will help to provide better control measures. Used Fig. 8.6 43

44

Use of the analysis and potential extensions The application of Step 2 of STPA to identify causes of the risks will help to provide better control measures. Used Fig. 8.6 The risks are categorized. Categories for ITA: Immediate concern: An immediate and substantial concern that should be part of a near term-assessment. Longer-term concern: A substantial longer-term concern that should potentially be part of future assessments, as the risk will increase over time or cannot be evaluated without future knowledge of the system or environment behavior. Standard Process: An important concern that should be addressed through standard processes, such as inspections, rather than an extensive special assessment procedure. 45

STPA: A New Hazard Analysis Technique RE-ENGINEERING A SOCIO-TECHNICAL SYSTEM: PHARMACEUTICAL SAFETY AND THE VIOXX TRAGEDY 46

This section provides an example using pharmaceutical safety. Couturier has performed a STAMP-based causal analysis of the incidents associated with the introduction and withdrawal of Vioxx. Vioxx events 47

48

49

50

51

52