The Path to More Cost-Effective System Safety

Similar documents
A System s Approach to Safety. Prof. Nancy Leveson Aeronautics and Astronautics MIT

STAMP Applied to Workplace Safety

A Systems Approach to Risk Management Through Leading Indicators

Engineering systems to avoid disasters

Software Safety Testing Based on STPA

STPA: A New Hazard Analysis Technique. Presented by Sanghyun Yoon

Engineering for Humans: A New Extension to STPA

Just Culture. Leading Through Shared Values and Expectations

The investigation of safety management systems, and safety culture

Information sheet: STRATEGIC CASE: DEFINING PROBLEMS AND BENEFITS WELL

Effective Way of Conducting Highly Accelerated Life Testing Linking the Failure Mode Effects Analysis and Finite Element Analysis

Investigating and Analysing Human and Organizational Factors

Fifteen Undeniable Truths About Project Cost Estimates, or Why You Need an Independent Cost Estimate

4 Enterprise-Deep Risk Management

Assuring Safety of NextGen Procedures

Systems-Based Approaches for Effective Problem Solving

Using STPA in Compliance with ISO26262 for developing a Safe Architecture for Fully Automated Vehicles

Colorado Springs PMI Risk Management in Five Easy Pieces Colorado Springs, Colorado May 8 th, 2008

CS 147: Computer Systems Performance Analysis

Improving Organizational Politics. Ben Dattner, Ph.D. Allison Dunn

Use of PSA to Support the Safety Management of Nuclear Power Plants

ELEMENTS OF A HIGH PERFORMING SAFETY PROGRAM

The Common Language of Nuclear Safety Culture (and how it affects you!) 8/13/2012. The Problem: The Uncommon Language of Nuclear Safety

Safety Science. Applying systems thinking to analyze and learn from events. Nancy G. Leveson * abstract. Contents lists available at ScienceDirect

4. Hazard Analysis. CS 313 High Integrity Systems/ CS M13 Critical Systems. Limitations of Formal Methods. Limitations of Formal Methods

Managers at Bryant University

Safe and Just Culture. Lori A. Paine RN, MS KHA Quality Conference March 18, 2015

Guideline content. Page 1

Resilience engineering Building a Culture of Resilience

FRAM: Four Principles. Erik Hollnagel

A Trade Union Perspective on The New View of Health and Safety

Human Performance, Zero Harm and the Siemens Safety Journey

A Human Factors Approach to Root Cause Analysis:

Learning from Risk Management Integrating Root Cause Analysis and Failure Mode & Effect Analysis into Your Compliance Program

Prevention vs. Blame

Investigating and Analysing Human and Organizational Factors

The Board of Trustees of the University of Illinois,

ARE WE OBLIVIOUS TO UNSAFE WORKING PRACTICES PETER CORFIELD NASS, DIRECTOR GENERAL

Accident Investigation Procedures for Supervisors

Need for Hazard Analysis. Limitations of Formal Methods

The 5 Building Blocks of a CAPA Solution. Managing Corrective Actions/Preventive Actions for the Consumer Products Industry

Challenge H: For an even safer and more secure railway

Regulatory Compliance Health Check

Using Bayesian Networks to Model Accident Causation in the UK Railway Industry

Organizational Introspection

Before You Start Modelling

Strategic or Bust: Why Project Execution Needs to Change

getabstract compressed knowledge Motivation Management Overall Applicability Innovation Style

A systems approach to cyber-risk management

What Extent is Risk Reduction Required?

A Shift In Safety Thinking

Watson-Glaser III Critical Thinking Appraisal (US)

HENRY FAYOL AND GENERAL MANAGEMENT THEORY

Seven Principles for Performance

Reducing Human Error: Processes and Strategies that Get to the Root of the Problem. Objectives

Change Management for

Personnel Accountability Policy (Rev 3)

23 February Sir David Tweedie Chairman International Accounting Standards Board 30 Cannon Street London EC4M 6XH. Dear David

Leadership Worker Involvement Toolkit LWIT Graeme McMinn HM Inspector of Health and Safety

Managing Your Audit. Effective Preparation to Successful Certification. John Kukoly BRC Global Standards. BRC Global Standards. Trust in Quality.

Watson Glaser III (US)

GUIDELINES FOR WRITING COMMERCIAL OFFERS

The Psychology of Safety with regards to Confined Space Entry

Critical Thinking in Project Management. By: Matthew Holtan

Helicopter Association International March 2017

From Professional HR. Evidence-Based People Management & Development. Chapter 4. Educating the professional, evidence-based manager.

Using Wrong Tool Causes Injury

Qm 2 A community of consultants helping museums and cultural nonprofits Getting Results

Resilience Engineering and Crisis management

THE LEADER S GUIDE TO SAFETY IMPROVEMENT: FOUR LESSONS FROM SAFETY ASSESSMENTS

Giving to pilots some backup methods to address risks that could affect control of the flight. Could such backup be provided by training?

SOFTWARE FAILURE MODES EFFECTS ANALYSIS OVERVIEW

Feedback Report. ESCI - University Edition. Sample Person Hay Group 11/21/06

STAMP and Workplace Safety

Engineering a Safer and More Secure World

Injury Investigation Process. Using Root Cause Analysis

Hazard Perception Test is not fit for the purpose

Safety Management as communication

BUILDING A SAFER, MORE PRODUCTIVE WORKFORCE

Systemic Accident Analysis Methods What are they? How feasible in healthcare?

Leadership Case Study

Heinrich Deconstructed

Misinformation Systems

Techniques and benefits of incorporating Safety and Security analysis into a Model Based System Engineering Environment

Presented by. John McGraw. San Juan, Puerto Rico. January 27, 2015

Jaclyn Gault, Urban Studies Department

Guidelines for investigation of logistics incidents and identifying root causes

Just Culture for safety performance (can it work across national cultures?) Keven Baines Director 25 th March 2015 CHC Safety Summit

CONFLICT MANAGEMENT. Goal conflict is situation in which desired end states or preferred outcomes appear to be incompatible.

SESA Transportation Working Group

PROVIDE LEADERSHIP ACROSS THE ORGANISATION CANDIDATE RESOURCE & ASSESSMENT BSBMGT605B

Steam Valve Locked Out in Wrong Position

Monday, July 26, 2010, Minutes Same room: Please be about 10 minutes early!

Intelligent-Controller Extensions to STPA. Dan Mirf Montes

ENEA, Italian National Agency for New Technologies, Energy and Sustainable Economic Development Bologna, Italy

WHY THE BUSY SUPERVISOR?

MANAGING CONFLICT IN A MATRIX

FOLLOWERSHIP: THE OTHER SIDE OF LEADERSHIP

Meeting Tight Software Schedules Through Cycle Time Reduction

PLANNING BEST PRACTICES

Transcription:

The Path to More Cost-Effective System Safety Nancy Leveson Aeronautics and Astronautics Dept. MIT

Changes in the Last 50 Years New causes of accidents created by use of software Role of humans in systems and in accidents has changed Increased recognition of importance of organizational and social factors in accidents Faster pace of technological change Learning from experience ( fly-fix-fly ) no longer as effective Introduces unknowns and new paths to accidents Less exhaustive testing is possible Increasing complexity Decreasing tolerance for single accidents

Why Our Efforts are Often Not Cost-Effective Efforts superficial, isolated, or misdirected Often isolated from engineering design Spend too much time and effort on assurance not building safety in from the beginning Focusing on making arguments that systems are safe rather than making them safe Safety cases : Subject to confirmation bias Traditional system safety tries to prove the system is unsafe (looks for paths to hazards), not that it is safe Safety must be built in, it cannot be assured in or argued in

Why our Efforts are Often Not Cost-Effective (2) Safety efforts start too late 80-90% of safety-critical decisions made in early system concept formation (C.O. Miller) Cannot add safety to an unsafe design Most of our techniques require a relatively complete design to work

Why our Efforts are Often Not Cost-Effective (3) Using inappropriate techniques for systems built today Mostly used hazard analysis techniques created 40-50 years ago Developed for relatively simple electromechanical systems New technology increasing complexity of system designs and introducing new accident causes Complexity is creating new causes of accidents Need new, more powerful safety engineering approaches to dealing with complexity and new causes of accidents

Relation of Complexity to Safety In complex systems, behavior cannot be thoroughly Planned Understood Anticipated Guarded against Critical factor is intellectual manageability Leads to unknowns in system behavior Need tools to Stretch our intellectual limits Deal with new causes of accidents

Types of Complexity Relevant to Safety Interactive Complexity: arises in interactions among system components Non-linear complexity: cause and effect not related in an obvious way Dynamic complexity: related to changes over time

Chain-of-Events Model Explains accidents in terms of multiple events, sequenced as a forward chain over time. Simple, direct relationship between events in chain Events almost always involve component failure, human error, or energy-related event Forms the basis for most safety-engineering and reliability engineering analysis: e,g, FTA, PRA, FMECA, Event Trees, etc. and design: e.g., redundancy, overdesign, safety margins,.

Limitations of Chain-of-Events Causation Models Oversimplifies causality Excludes or does not handle Component interaction accidents (vs. component failure accidents) Indirect or non-linear interactions among events Systemic factors in accidents Human errors System design errors (including software errors) Adaptation and migration toward states of increasing risk

Reliability Engineering Approach to Safety Many accidents occur without any component failure Caused by equipment operation outside parameters and time limits upon which reliability analyses are based. Caused by interactions of components all operating according to specification. Highly reliable components are not necessarily safe

Reliability is NOT equal to safety in complex systems

It s only a random failure, sir! It will never happen again.

Software and Safety Software is very different from hardware. We cannot just apply techniques developed for hardware assumptions and expect them to work. We need something new that fits software properties.

Software-Related Accidents Are usually caused by flawed requirements Incomplete or wrong assumptions about operation of controlled system or required operation of computer Unhandled controlled-system states and environmental conditions Merely trying to get the software correct or to make it reliable will not make it safer under these conditions.

Human Error Human error is a symptom, not a cause All behavior affected by context (system) in which occurs To do something about error, must look at system in which people work: Design of equipment Usefulness of procedures Existence of goal conflicts and production pressures Role of operators in our systems is changing Supervising rather than directly controlling Systems are stretching limits of comprehensibility Designing systems in which operator error inevitable and then blame accidents on operators rather than designers

Why our Efforts are Often Not Cost-Effective (4) Focus efforts only on technical components of systems Ignore or only superficially handle Management decision making Operator error (and operations in general) Safety culture Focus on development and often ignore operations

Why our Efforts are Often Not Cost-Effective (4) Inadequate risk communication Applying probabilistic risk analysis for events that are not random Software errors are design errors, not random failures Human error is not random (slips vs. mistakes) Component interaction accidents (system design errors) are not random End up either leaving things out or making up numbers Need better ways to assess and communicate risk

Why our Efforts are Often Not Cost-Effective (4) Limited learning from events Oversimplification of accident causation Blame is the enemy of safety Focus on who and not why Root cause seduction Believing in a root cause appeals to our desire for control Leads to a sophisticated whack a mole game Fix symptoms but not process that led to those symptoms In continual fire-fighting mode Having the same accident over and over

The Starting Point: Questioning Our Assumptions It s never what we don t know that stops us, it s what we do know that just ain t so. (Attributed to many people) Many of our hazard analysis and System Safety techniques are based on assumptions that no longer hold

Assumptions No Longer True Safety is increased by increasing system or component reliability If components do not fail, then mishaps will not occur Mishaps are caused by chains of directly related events. We can understand mishaps and assess risk by examining the events leading to the loss Probabilistic risk analysis based on event chains is the best way to assess and communicate safety and risk information.

Assumptions No Longer True (2) Most mishaps are caused by operator error. Rewarding safe behavior, punishing unsafe behavior, and better training will eliminate or reduce accidents significantly Human error is random and can be assessed using probabilities Highly reliable software is safe Most major mishaps occur from the chance simultaneous occurrence of random events Assigning blame is necessary to learn from and prevent mishaps

Summary The world of engineering is changing. If system safety does not change with it, it will become more and more irrelevant. Trying to shoehorn new technology and new levels of complexity into old methods does not work

So What Do We Need to Do? Engineering a Safer World Expand our accident causation models Create new, more powerful and inclusive hazard analysis techniques Use new system design techniques Safety-guided design Integrate System Safety more into system engineering Improve accident analysis and learning from events Improve control of safety during operations Improve management decision-making and safety culture

Additional information in: Nancy Leveson, Engineering a Safer World: MIT Press, January 2012