A Systems Approach to Risk Management Through Leading Indicators

Similar documents
STPA: A New Hazard Analysis Technique. Presented by Sanghyun Yoon

The Path to More Cost-Effective System Safety

A System s Approach to Safety. Prof. Nancy Leveson Aeronautics and Astronautics MIT

STAMP Applied to Workplace Safety

Software Safety Testing Based on STPA

Assuring Safety of NextGen Procedures

Boeing Engineering. Overview of

LIFE CYCLE FACILITY ASSET MANAGEMENT. Presented by Pedro Dominguez Managing Principal, The Invenio Group

Using STAMP to investigate decisionmaking. a DSB Case Study Proposal

Project risk management

STAMP and Workplace Safety

Continuous Improvement Toolkit. Risk Analysis. Continuous Improvement Toolkit.

Use of PSA to Support the Safety Management of Nuclear Power Plants

Iterative Application of STPA for an Automotive System

Despite a strong focus on personnel safety

A Human Factors Approach to Root Cause Analysis:

Software Quality. Unit 6: System Quality Requirements

USING STPA FOR EVALUATING AVIATION SAFETY MANAGEMENT SYSTEMS (SMS)

Investigating and Analysing Human and Organizational Factors

Cause and Effect Relationship with Trending

Project Planning & Management. Lecture 11 Project Risk Management

Delivering Safety Through Design Using Early Analysis Methods. Mark A. Vernacchia, MSES, PE General Motors Company; Milford, Michigan, USA

CORE TOPICS Core topic 3: Identifying human failures. Introduction

HOW TO DEVELOP A STRONG PROJECT DESIGN. User Guide #9

STAMP Experienced Users Tutorial. John Thomas Blandine Antoine Cody Fleming Melissa Spencer Qi Hommes Tak Ishimatsu John Helferich

COMPARISON OF PROCESS HAZARD ANALYSIS (PHA) METHODS

Risk Informed Project Management & Risk Assessments. John Holak Risk Assessment Manager Urban Engineers, Inc. (PMOC) Philadelphia, PA

Application of STAMP to Improve the Evaluation of Safety Management Systems

WINTER. Safety Culture High Reliability Strategies for High Consequence Professions. Who Else? Socio-Technical Systems. Template

Investigating and Analysing Human and Organizational Factors

Task identification for Human Factors Safety Critical Task Analysis

Risk-Based Thinking: An Operating Philosophy for Long-Term Success. Tony Muschara, CPT. Hu Conference March 2015 Atlanta, GA

Challenges of Strategic Supply Chain Planning and Modeling

Risk analysis in performancebased regulation (PBR) of safety

ECONOMIC MACHINE LEARNING FOR FRAUD DETECTION

Systemic Accident Analysis Methods What are they? How feasible in healthcare?

Preventing Fatal & Life Changing Injury Events Frank Baker, CSP, CFPS, ALCM

Introducing Risk Based Testing to Organizations

Utilizing Optimization Techniques to Enhance Cost and Schedule Risk Analysis

A STAMP-based approach for designing maritime safety management systems

Watson-Glaser III Critical Thinking Appraisal (US)

Engineering for Humans: A New Extension to STPA

Risk Assessment: Chapter 12

Organizational Introspection

Towards a STAMP-Based Safety Plans Approach for Construction Projects

Watson Glaser III (US)

Dynamic Simulation and Supply Chain Management

Intelligent-Controller Extensions to STPA. Dan Mirf Montes

NBAA SAFETY CULTURE SURVEY

SOFTWARE FAILURE MODES EFFECTS ANALYSIS OVERVIEW

A Guide to Develop Safety Performance Indicators (Draft no.1 22/5/2016)

Stress-Testing Frameworks and Techniques in the Banking Industry Donovan Hutchinson

Critical Systems Specification. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 9 Slide 1

Accident. and Incident Investigations. Investigation Procedures. Self-study Review and Answer Sheet

METRIC FOR THE SELF-ASSESSMENT OF AVIATION SAFETY MANAGEMENT SYSTEMS

Complexity of Organizational Cybersecurity Capability Development

Protecting Critical Infrastructure When the Costs and Benefits are Uncertain

RESILIENCE IN RISK ANALYSIS AND RISK ASSESSMENT

Identify Risks. 3. Emergent Identification: There should be provision to identify risks at any time during the project.

Turbulent Times Strategies for dealing with catastrophes, Black Swan events & Crisis Management

White Paper. Managing for the Unexpected. Kathleen M. Sutcliffe Stephen M. Ross School of Business University of Michigan

Objectives. Dependability requirements. Topics covered. Stages of risk-based analysis. Risk-driven specification. Critical Systems Specification

Economic Machine Learning for Fraud Detection

Dependability requirements. Risk-driven specification. Objectives. Stages of risk-based analysis. Topics covered. Critical Systems Specification

IMPROVE. Chapter 3-6 FMEA Institute of Industrial Engineers 3-6-1

FDA Seminar Miami October FDA Within SMS. Capt Paul DUBOIS Manager SMS & FDA Assistance AIRBUS

STAMP A SIMPLE GUIDE TO HAZARD ANALYSIS. Michael Killaars Ruby Weener Thom van den Engel Oktober 2016

ISO/DIS 9001: 2014 comparison with ISO 9001:2008. ISO 9001:2015 Updates. (Based on Draft International Standard, DIS) ISO/DIS 9001 ISO 9001:2008

UKCIP Adaptation Wizard

CONTEMPORARY APPROACHES TO PROJECT RISK MANAGEMENT: ASSESSMENT & RECOMMENDATIONS

INPO s Approach to Human Performance in the U.S. Commercial Nuclear Industry

LEVERAGING UNCERTAINTY. Developing strategy in times of uncertainty ROBIN CLELAND CLARITY. FOCUS. EMPOWERMENT.

Risk analysis of serious air traffic incident based on STAMP HFACS and fuzzy sets

Developing a Culture of Reliability and Preventing Catastrophic Events

Guideline content. Page 1

Purpose of this Presentation

Technology and Processes for Software Assurance Evaluations

Available online at ScienceDirect. Procedia CIRP 28 (2015 ) rd CIRP Global Web Conference

RISK MANAGEMENT STRATEGY AND POLICY

DOCUMENTATION OF ASSUMPTIONS AND SYSTEM VULNERABILITY MONITORING: THE CASE OF SYSTEM THEORETIC PROCESS ANALYSIS (STPA)

Risk-Informed Changes to the Licensing Basis - I

APPOLO 2 PROJECT RISK ASSESSMENT OF CAMAÇARI INDUSTRIAL COMPLEX

Capacity Planning with Rational Markets and Demand Uncertainty. By: A. Kandiraju, P. Garcia-Herreros, E. Arslan, P. Misra, S. Mehta & I.E.

Recent Developments in Risk Assessment: Future Perspectives

Engineering Safety-Critical Systems in the 21 st Century

A Real-time & Predictive Process Safety Dashboard

JOB SITE FOREMAN - SAFETY RESPONSIBILITIES

Applying System Dynamics Modeling to SMC Acquisitions

The Five Futures Glasses

Supporting Document 03. Information to support our proposed base capital expenditure programme

Duty of Care: from must to accelerator?

PV Investments Cost Benchmarking & Gap Analysis LCOE Simulation & Sensitivity Analysis

Risk. Risk Categories. Project Risk (aka Development Risk) Technical Risks. Business Risk. Example: Project Risk. Lecture 5, Part 1: Risk

IDENTIFY RISK AND APPLY RISK MANAGEMENT PROCESSES CANDIDATE RESOURCE & ASSESSMENT BSBRSK401A

Process Safety Culture and Environmental Management Systems. Bracewell LLP January 17, 2017

SOFTWARE HAZARD AND REQUIREMENTS ANALYSIS

Chapter 6-1: Failure Modes Effect Analysis (FMCEA)

Just Culture. Leading Through Shared Values and Expectations

ICHEME SYMPOSIUM SERIES NO. 144 TOP LEVEL RISK STUDY - A COST EFFECTIVE QUANTIFIED RISK ASSESSMENT

Fatality Prevention/Risk Management

Transcription:

A Systems Approach to Risk Management Through Leading Indicators Nancy Leveson MIT

Goal To identify potential for an accident before it occurs Underlying assumption: Major accidents not due to a unique set of random, proximal events Instead result from Migration of organization to state of heightened risk over time As safeguards and controls relaxed over time Due to conflicting goals and tradeoffs Detection not enough---need to embed in a risk management program

Current State of the Art: Industry Much effort, particularly in petrochemicals Trying to find generally applicable indicators (e.g., maintenance backlogs, minor incidents, equipment failure rates, surveys on employee culture (assumes that all or most accidents caused by employee misbehavior) Tend to focus on occupational safety Identification of leading indicators from hazard analysis Use standard techniques so limited types of causes Use likelihood to reduce scope of search May result in overlooking low likelihood events

Current State of the Art: Research Identify precursors to accidents Use incident reporting system (only look for things that have happened before) Use probabilistic risk analysis Hazard analysis (but limited scenarios) Organizational precursors Identify small number of factors (5 or 6) Quantitative risk analysis (fault trees, Bayesian analysis) Lack a model of framework on how and why accidents occur

Current State of the Art: PRA Risk and Risk Assessment Little data validating PRA or methods for calculating it Other problems May be significant divergence between modeled system and as-built system Interactions between social and technical part of system may invalidate technical assumptions underlying analysis Effectiveness of mitigation measures may change over time Why are likelihood estimates inaccurate in practice? Important factors left out (operator error, flawed decision making, software) because don t have probability estimates Non-stochastic errors involved in leading indicator events Heuristic biases

Heuristic Biases Confirmation bias (tend to deny uncertainty and vulnerability) Construct simple causal scenarios If none comes to mind, assume impossible Tend to identify simple, dramatic events rather than events that are chronic or cumulative Incomplete search for causes Once one cause identified and not compelling, then stop search Defensive avoidance Downgrade accuracy or don t take seriously Avoid topic that is stressful or conflicts with other goals

Controlling Heuristic Biases Cannot eliminate completely but can reduce Use structured method for identifying, detecting, and managing leading indicators Following a structured process and rules to follow can diminish power of biases and encourage more thorough search Concentrate on plausibility (vulnerability) rather than likelihood Think about whether an assumption could fail, not whether likely to do so Concentrate on causal mechanisms vs. likelihood Use worst case analysis (vs. design basis accident )

Assumption-Based Leading Indicators Hypothesis: Useful leading indicators can be identified based on Assumptions underlying safety engineering practices and Vulnerability of those assumptions (rather than likelihood of loss events) Monitor assumptions on which safety of system was assured to find Assumptions that were originally incorrect Those that have become incorrect over time

Leading Indicator Definitions 1. Warning sign used to detect when a safety-related assumption is broken or dangerously weak and action needed to prevent an accident 2. Warning signal that validity or vulnerability of an assumption is changing Shaping Actions Used to maintain assumptions, prevent hazards, and control migration to states of higher risk, e.g., Interlocks Dessicant to prevent corrosion Design human operation to be easy and hard to omit Feedforward control

Hedging (Contingency) Actions Prepare for possibility an assumption will fail Generate scenarios from broken assumptions (worst case analysis) to identify actions that might be taken Feedback control Examples: Signposts Performance audits Fail-safe design (e.g., protection and shutdown systems) Points in future where changes in safety controls (shaping and hedging actions) may be necessary or advisable Examples: New construction or known future changes may trigger a planned response or MOC

Assumption Checking Checking whether assumptions underlying safety design are still valid (Signposts identified during design and development and specific responses specified) Monitor operations to determine if assumptions still valid Might focus on signposts or on assumptions that have not been adequately handled by shaping and hedging actions Accidents often occur after a change Signposts used for planned or expected changes Assumption checking used for detecting unplanned and potentially unsafe change

Assumptions about Why Accidents Occur Starting point for identifying assumption-based leading indicators Development and Implementation Inadequate hazard analysis (assumptions about system hazards or hazard analysis process do not hold) Inadequate design of control and mitigation measures for identified hazards perhaps due to Inadequate engineering knowledge Inappropriate assumptions about operations

Assumptions about Why Accidents Occur (2) Operations Controls designers assumed would exist are not adequately implemented or used Controls are implemented but changes over time violate assumptions underlying original design of the controls New hazards arise with changing conditions, not anticipated during design and development, or dismissed as unlikely to occur Physical controls and mitigation measures degrade over time in ways not accounted for in design Components (including humans) behave differently over time (violate assumptions made during design) System environment changes over time

Assumptions about Why Accidents Occur (3) Management Safety management system design is flawed Safety management system does not operate as was designed (assumed) perhaps because (examples) Safety culture degrades over time Behavior of those making safety-critical decisions may be influenced by competitive, financial, or other pressures. To prevent accidents must Eliminate or reduce occurrence of these causes Response may take form of shaping or hedging actions Leading indicators program can be used to try to detect them before an accident occurs.

Vulnerability vs. Likelihood Vulnerability: assessment of whether assumption could plausibly fail during lifetime of system, not specific probability of that happening If assumption vulnerable, then should protect against it in some way Vulnerability may change over time Need to identify when vulnerability has changed Helps reduce heuristic biases: Whether assumption could fail to hold, not whether likely to Concentrate on causal mechanisms rather than likelihood

Is Vulnerability just a Proxy for Likelihood? Difference is potential for error Not assigning a probability or a relative category Frequent, probable, occasional, remote, improbable, impossible Categories usually undefined oir poorly defined Instead only two categories: possible or impossible If likelihood not zero, then needs to be included in leading indicators program Does not mean costly controls must be used Does mean that cannot dismiss it at beginning of development and never consider again (until first accident)

Identifying Safety-Critical Assumptions Process based on STAMP model Specify assumptions made during engineering development Use STPA to identify safety constraints and controls needed as well as specific assumptions underlying shaping and hedging actions designed to prevent hazards/losses Real examples and detailed explanation in paper Identify severity and vulnerability Generate leading indicators along with Associated assumptions How will be checked When will be checked Hedging actions to take if indicator is true

Checking Leading Indicators Many if not most can be handled through shaping and hedging actions. Then need to check these are effective. Accident/incident analysis process and error reporting system Periodic performance audits Signposts Monitoring program Dokas: EWaSAP (Early Warning Sign Analysis using STPA

Managing a Leading Indicators Program Integrate into risk management program Communicate to decision makers when fail Develop detailed action plans and triggers for implementing them before assumptions found to be invalid To lessen denial and avoidance behavior To overcome organizational and cultural blinders May need to assign responsibility to independent organization and not project managers or those with conflicting pressures Periodically revisit list of leading indicators. Establish a continuous improvement process

Feasibility Considerations Most assumptions identified and considered during development so just need to document them. I ve done it for TCAS II (technical) and NASA ITA program (management) Dokas has successfully used EWaSAP on industrial systems One reason for not handling worst case and just likely case is hazard analysis cost (remove through PHA). STPA turning out to be much cheaper than older methods Can create protection for where assumed risk even if do not know all the causes. May not be most efficient but also not very likely to be needed,