Resilience of integrated operations

Size: px
Start display at page:

Download "Resilience of integrated operations"

Transcription

1 Resilience of integrated operations Erik Hollnagel ndustrial Safety Chair École des Mines de Paris, Pôle Cindyniques Sophia Antipolis, France use of T Why integrated operations? ntegrated operations: The use of information technology to change work processes to reach better decisions, remote-control equipment and processes, and to move functions and personnel onshore nformation technology is changed work processes used as a strategic tool to make work tasks and decision processes in the extended oil and gas sector more efficient ntegrated operations moving functions Extensive reallocation of work tasks between sea and land and between operators and suppliers. Use of T will create opportunities in the form of new advanced functions, but also a greater degree of automation of functions currently handled manually.? How should be designed to ensure adequate control of new functions and redefined systems? 1

2 Going down the river You have to look back (remember) to be able to anticipate what may lie ahead. PAST PRCESS Being in control is like going down a river - ENTTY You have to notice where you are and how the river is flowing PRESENT LCATN CNTRL You have to look ahead to see what is coming (bends, rocks, rapids, ) FUTURE CNTRL RM A location designed for for an an entity to to be be in in control of of a process. What is control? LCATN Place of controlling entity with necessary and sufficient / devices CNTRL Minimise or eliminate unwanted process variability to achieve a desired outcome. ENTTY A (joint) cognitive system (distributed) PRCESS An activity or set of function(s) with its own dynamics 2

3 Dilemma of integrated operations Accelerated and increased production Reduced operating costs Higher safety levels ncreased recovery Longer life-span How can it be done? How much risk is acceptable? What can go wrong? SAFETY = FREEDM FRM UNACCEPTABLE RSK Control requires predictability Regular events Events and functions that fall within the design base ( normal conditions). Can be supported by interface / interaction design, procedures, routines and work organisation. Predictable events Events and malfunctions that can be identified by risk analyses (regular and irregular threats). Unacceptable risks can be prevented through barriers, precautions and regulations. Unpredictable events nternal and external performance variability that cannot be captured analytically. Such events which can be either threats or opportunities are not addressed by normal design practice. 3

4 New designs reduce predictability ld system: Experience from normal operations increases the number of regular and predictable events. The number of unpredictable events is so large that changes are insignificant. New system: New operating practices reduces the number of regular and predictable events. ne consequence of integrated operations is that the proportion of regular and predictable events becomes smaller (temporarily or permanently). Problems arising from regular and predictable events are treated by safety management systems. Unpredictable events require resilience engineering. Threats to predictability Environment: Growing complexity and increasing demands System: Tighter (nontrivial) couplings and extended boundaries Analysis: Principal and practical problems in understanding the future (models, time) Work: Limited scope of design solutions (interface, procedures, organisation) The uncertainty and variety that cannot be eliminated or contained must be matched by the system s functional variety (resilience). 4

5 Maintenance oversight Certification Aircraft design knowledge Aircraft nterval approvals nterval approvals Aircraft High workload design Procedures Limiting Redundant design stabilizer End-play checking Allowable end-play Controlled Mechanics stabilizer Expertise Jackscrew Equipment up-down Excessive Horizontal High workload end-play stabilizer T Procedures Lubrication Lubrication Jackscrew P replacement Grease Expertise FAA T C Limited stabilizer Aircraft pitch control C R Accounting for the unpredictable Risk assessment requires an adequate representation or model of the possible future events. Accident model / risk model What may happen? The representation must be powerful enough to capture the functional complexity of the system being analysed. How should we respond? Simple, linear cause-effect model Assumption: Accidents are the (natural) culmination of a series of events, which occur in a specific and recognisable order. Domino model (Heinrich, 1930) Hazards-risks: Due to component failures (technical, human, organisational) Consequence: Accidents are prevented by finding and eliminating possible causes. Safety is ensured by improving the organisation s capability to respond. 5

6 High workload End-play checking Mechanics High workload P T P Lubrication R Expertise Equipment C R FAA Maintenance oversight Certification Aircraft design knowledge Aircraft nterval approvals nterval approvals Aircraft design Procedures Limiting Redundant design stabilizer Allowable end-play Controlled stabilizer Limited stabilizer Jackscrew up-down Excessive Horizontal end-play stabilizer Procedures Lubrication Aircraft pitch control Jackscrew replacement Grease Expertise T C P R Complex, linear cause-effect model Assumption: Accidents result from a combination of active failures (unsafe acts) and latent conditions (hazards). Swiss cheese model (Reason, 1990) Hazards-risks: Due to degradation of components (organisational, human, technical) Consequence: Accidents are prevented by strengthening barriers and defences. Safety is ensured by measuring/sampling performance indicators. Non-linear accident model Assumption: Accidents result from unexpected combinations (resonance) of normal performance variability. Functional Resonance Accident Model Hazards-risks: Emerges from combinations of normal variability (socio-technical system) Consequence: Accidents are prevented by monitoring and damping variability. Safety requires continuous anticipation of future events. 6

7 Normal behaviour is variable Human failures cannot be modelled as deviations from designed and required performance: - humans are not designed. - conditions of work are usually underspecified - humans are multifunctional, and can do many different things Performance variability is natural in sociotechnical systems, and a valuable part of normal performance. The many small adjustments enable humans to cope with the complexity and uncertainty of work (underspecification). The adjustments allow the system to be more efficient by sacrificing details that under normal conditions are unnecessary. Humans are adept at developing working methods that allow them to take shortcuts, thereby often saving valuable time and resources. Airprox As the analysis shows there is no root cause. Deeper investigation would most probably bring up further contributing factors. A set of working methods that have been developed over many years, suddenly turn out as insufficient for this specific combination of circumstances. Time - and environmental- constraints created a demand resource mismatch in the attempt to adapt to the developing situation. This also included coordination breakdowns and automation surprises (TCAS). 7

8 Reactive view Proactive view Success and failure Failure is normally explained as a breakdown or malfunctioning of a system and/or its components. Successes and failures are of a fundamentally different nature. Efficiency / safety can be achieved by preventing failures ndividuals and organisations must adjust to the current conditions in everything they do. Such adjustments are normal! Because information, resources and time are finite these adjustments will always be approximate. Success is due to the ability correctly to make such adjustments and to anticipate failures before they occur. Failure is due to the absence of that ability either temporarily or permanently. Safety is the strengthening this ability, rather than just avoiding or eliminating failures. Normal variability cannot be eliminated Designing for variability and surprises Reactive nteractive (attentive) Proactive (resilient) View on surprises Disturbances, or disruptions, which challenge the proper functioning of a process. Exceptions that must be regimented. Uncertainty about the future. A need constantly to update definitions of the difference between success and failure. Suggests that models and plans are likely to incomplete, despite best efforts. Focus of responses Try to keep process under control and ensure people do not exceed given limits. mprove ability to respond when challenged. Prepare routines and plans. dentify probable variations; ensure ability to handle these when they occur. Search for the boundaries of our assessments in order to learn and revise. 8

9 Equilibrium vs. adaptability Resilience is a measure of the persistence of systems and of their ability to absorb change and disturbance and still maintain the same relationships between populations or state variables. Holling (1973) Machines (technical systems) are built to maintain a set of equilibrium states. Safety is basically reactive trying to prevent harmful influences and re-establish normal functioning as soon as possible. Machines are built to be stable rather than resilient. Dynamic systems are always in a transient state, to ensure their continued existence. For such systems safety must be proactive. rganisations are not socio-technical machines but dynamic (living) systems. They must therefore be resilient rather than stable. Safety management must anticipate and attenuate external and internal variability, to enable survival of the organisation. The resilient organisation Resilience engineering changes the basis for operations and safety from an over-reliance on analysis techniques to adaptive and coadaptive measures and models. Reactive organisations Resilient (proactive) organisations Accidents are the result of a propagation (chain) of events Looks for separable or independent factors Variability is a threat (necessary evil) Solutions try to constrain variability (norms, routines, and standards) Accidents result from nontrivial couplings and functional resonance. Recognises that safety emerges from everyday actions. Variability is ubiquitous, but is a risk as well as a resource. Solutions based on harnessing variability (monitoring + damping).. 9

10 Components of resilience Dynamic developments Updating Anticipation Attention Response Learning Knowing what to expect (anticipation) Knowledge Knowing what to look for (attention) Competence Knowing what to do (rational response) Resources Resilience is the intrinsic ability of an organisation to absorb changes and disturbances, that allows it to continue operations even after a major mishap or in presence of continuous stress. Practice of resilience engineering Resilience Engineering must comprise the following critical tools or techniques: Ways to audit (analyse & measure) and monitor the resilience of organisations in their operating environment. Tools and methods to improve an organisation s resilience vis-àvis the environment. Techniques to model and predict the short- and long-term effects of change and decisions on risk. The overall aim is to improve the availability and readiness on all levels to: Respond to regular and irregular threats in a robust yet flexible manner, Monitor and revise risk models Anticipate potential disruptions and pressures. Resilience will both improve safety and enhance control, hence enable the organisation to predict, plan, and produce. 10

11 Conclusions ntegrated operations will bring advantages and disadvantages alike. extends boundaries of the joint system and increases control demands, and will therefore initially reduce the number of regulated and predictable events. Safety is not achieved by making everything regulated and predictable, but by making the organisation able to cope with unpredictability, hence making it more resilient. This requires the ability continuously to: dentify types of normal performance variability ( smart work) Recognise where and when this is likely (working conditions) dentify how couplings can arise (functional resonance) Finding ways of monitoring and damping resonance Safety is not a question of the absence of accidents but of the presence of resilience. 11