Developing resilience What companies really need to measure

Size: px
Start display at page:

Download "Developing resilience What companies really need to measure"

Transcription

1 Developing resilience What companies really need to measure Dik Gregory and Paul Shanahan gs partnership ltd A Möbius Strip Where the inside and the outside are on the same side The journey towards operational safety is continuous. It may also be viewed both as a management directive, and a personal experience. In fact, it is just a matter of perspective. In this position paper, we show how organisations can add significant leverage in their quest for continued safety improvements, via a new kind of safety measure. gs partnership ltd The authors, Dik Gregory and Paul Shanahan, assert their moral rights to be identified with this work in accordance with the Copyright, Designs and Patents Act Permission is hereby granted for any part of this paper to be freely reproduced as long as its source is fully attributed, the authors moral rights and copyright are preserved, and no fee, royalty or income is earned from it. gs partnership ltd organisational psychologists gsp@gspartnership.co.uk

2 Developing resilience What companies really need to measure IMPROVING SAFETY THROUGH IMPROVING ORGANISATIONAL RESILIENCE What is resilience? How can we increase it? How do we measure it? What is resilience? Organisations are complex systems with many moving parts. For example, shipping companies have ships, crews, vessel managers, cargo agents, HR managers, training managers, safety managers, Board Directors, etc. These parts all interact with each other, as well as externally with an outside world of regulatory, environmental, physical, economic and human factors. To remain successful, such systems must be able to maintain their stability in the face of constant perturbation both from outside and in. Resilience is the quality that enables complex systems to do this. Specifically, resilience is the ability of a system or organisation to react to and recover from disturbances at an early stage with minimal effect on its dynamic stability 1. Resilience is most cost-effectively addressed at early stages of disturbance when unwelcome resonance can be damped. At later stages, the system may be too far out of control to deal with wild oscillation without enormous cost and effort if at all. Why is it important? Complex systems are very different from systems that are merely complicated. If a commercial organisation s world was just complicated, there would come a time when all possibilities were anticipated, all possible responses documented and all uncertainties eliminated. Work would become a giant, fool-proof look-up table that could be resolved into a computer program and handed over to machines. However hard we might wish for such certainty, it can t happen in a complex system. This is because the normal variation of performance of the many interacting parts in a complex system creates irreducible uncertainty. The way in which these parts may combine in the future is unpredictable in principle. To continue with our maritime example, the 1 Erik Hollnagel, Resilience - the challenge of the unstable In Resilience Engineering: Concepts and Precepts, Hollnagel, Woods and Leveson (Eds), Ashgate, 2006 v1.0 page 2 of 9

3 precise actions needed in a particular set of circumstances on board (weather, cargo, manpower, equipment availability, working conditions, knowledge and skills capability) need to be determined in the moment by those involved. Because they are both highly adaptive and adaptable, people are very good at fixing things in the moment. It is why, rather than being the weakness in a complex system, people are the means by which it operates as well as being the final defence when things start going wrong. And they do this not just in the moment, but moment by moment throughout the operational day. This is why blame is usually inappropriate and must be replaced by fair-minded accountability, in which people are held accountable only for those interactions for which they had reasonably accepted responsibility and over which they actually had control. How can we increase resilience? The journey from blame to accountability Some enlightened organisations have already embarked on a journey designed to bring about a new company culture based on fair accountability rather than blame. If you work for one, you will have a view on its success so far. The fact is, that such culture change journeys are intrinsically difficult and take time 2. The point of the journey is to help people migrate from an institutionally-imposed, reactive response to safety, to a proactive and personally-felt responsibility for it. The key insight is that safety is generated in the moment by people s behaviour, beliefs and attitudes at the time. Behaviour at work is the concern of the human element 3, which is why insights in this area should feature as an explicit part of all technical training programmes. The journey from external imposition (by management) to internal appreciation (by everybody) is rather like progressing over the surface of the Möbius Strip on the cover of this paper. A Möbius Strip is a real three-dimensional figure with the odd property that it only has one surface. You can make one easily from a strip of paper by giving it a single twist and sticking the ends together. Starting from any given point results in traversing both sides of the structure and ending up at the same point. Outside and inside are simply matters of perspective; it is also illustrates why the journey never ends, but rather develops in maturity with each cycle. Organisations need to find ways to make it easy - and expected - for people to sign up for the journey. This helps to create a powerful psychological contract. To be maximally effective, however, this contract needs to be supported by other mechanisms, all acting in concert. Two such mechanisms are appropriate competency and investigation frameworks. 2 See gsp position paper on Creating change: how to implement change that sticks for an analysis of change strategies and why some are likely to lead to much more promising results than others. 3 Dik Gregory & Paul Shanahan, The Human Element: a guide to human behaviour in the shipping industry, published by TSO, 2010, and available as a free download via our website. v1.0 page 3 of 9

4 Competency frameworks A competency framework specifies the capabilities and behaviours against which staff will be appraised and assessed for development and promotion. Organisations tend to get the behaviour they measure. Competency frameworks specify the kind of responsible, personal ownership of target behaviours and attitudes and therefore represent an excellent opportunity to help assure an appropriately resilient workforce 4. An enhanced incident investigation framework Another mechanism for transmitting a clear requirement for personal responsibility and fair accountability lies in the kind of conclusions reached by the incident investigation process. Another of our position papers 5 shows how an existing investigative process can be enhanced by mindset analysis. The core idea in mindset analysis is concerned with establishing the local rationality of the people involved so that fair accountability can be assured and corporate attention can be steered away from relatively unhelpful discussions about blame, and towards the identification of systemic brittleness (the opposite of resilience). How do we measure it? Mechanisms which support and amplify a culture of increased operational mindfulness all work together to increase organisational resilience. However, there is an additional way to make progress towards resilience, and that is for organisations to become sensitive to the indicators aka markers that are known to be associated with resilience, or its opposite, brittleness. What are the organisational markers of resilience/brittleness? It helps to distinguish between organisational markers of resilience and more traditional indicators of safety. Traditional indicators of safety Traditionally, organisations have relied on measuring safety by collecting and analysing incident data. This results in measures such as numbers of lost time injuries (LTI) and numbers of days or months since the last accident. Such measures are known as lagging indicators since any safety initiative must await hoped-for reductions in incidents before conclusions can be proposed about their relative merits and improvements in the safety record demonstrated. While such indicators are instructive, their use as a safety management tool is a bit like a car driver looking in their rear-view mirror to figure out the road ahead. To improve this situation, increasing attention is being paid to the development of leading indicators of safety. Here, the focus is on proactive management of factors which are thought to be precursors of unsafe acts and accidents. In this way, organisations try to improve safety via attention to matters like risk identification 4 At the same time, care needs to be taken that other organisational demands do not provide informal measures that work against the explicit requirements of such competency frameworks. In the interests of efficiency, people always play the system via short-cuts, work-arounds and the development of tick-box mentalities. 5 See gsp position paper on Utilising mindset: how to learn from what happened for how the powerful, but after the fact perspective of hindsight can be enhanced by understanding the rationality of people involved in the incident. v1.0 page 4 of 9

5 and management, staff competency and training, inspections and audits against laid-down procedures, and following up on audit results to ensure recommendations are being heeded. While these leading indicators are wholesome enough, their use as a safety management tool tends to miss the fundamental nature of operational life. If they were applied to a world that was merely complicated and therefore predictable, they would be sufficient. But this is not the case. As we saw earlier, the world of operations is complex, and therefore uncertain in principle. As a result, operational life proceeds as a constant, moment by moment trade-off between the competing realities of efficiency and thoroughness 6. This means that while safety considerations form one type of input to operational decision-making, there are many others to do with (for example) economics, efficiencies of operation, available resources and time constraints. This means that safety itself or the lack of it tends to emerge as an output of the decisions made. So while risk management (for example) represents an improved forward-facing position for our car driver, driving is still heavily constrained by a focus only on hazards that can be anticipated, which itself is based upon an analysis of past hazards. But cars still crash - usually due to a precision of circumstances arising that were not predictable and won t arise again. In such circumstances, further prescriptive rules or re-doubled attempts to predict an unknowable future (or instruct drivers to be more careful ) are futile. Instead, what we need is to improve our sense of when things are becoming fragile and dangerous - and that s why we need organisational markers of resilience. Organisational markers of resilience Organisational markers of resilience are indicators that signal the breakability of a complex, adaptive system such as a commercial organisation as it organises itself and operates in its environment. To work, these markers must have something to say about one or another of the properties of complex, adaptive systems that make them vulnerable to increasing brittleness. We already know about a number of these properties, as explained in the table on the next page. The basic idea is that if we can develop indicators that are sensitive to these system properties, then we can produce the possibility for informed organisational intervention before brittleness gives way to catastrophe. Not least, we can produce a new kind of leading indicator of safety one that measures the ability of an organisation to deal effectively with the complexity it faces, rather than one that simply tries to avert known hazards 7. 6 Erik Hollnagel The ETTO Principle: Efficiency-Thoroughness Trade-offs - Why Things That Go Right Sometimes Go Wrong, Ashgate, For clarity, there is nothing wrong with averting hazards that can be identified ahead of time. The problem from a resilience point of view is that this will not be enough to deal with circumstances that cannot be anticipated. v1.0 page 5 of 9

6 Ways in which complex systems are vulnerable to brittleness 8 System properties 1 Buffering The size or kinds of disruption the system can absorb or adapt to without breakdown 2 Flexibility The system s ability to restructure itself in response to external changes or pressures Examples of system vulnerabilities Overly lean manning or just in time processes and operations that result in insufficient contingency. Insufficient knowledge of disruptibility, its consequences and realistic contingencies Too many rules, procedures and processes focused on the past, rather than adapting to the present Over-prescriptive operational practices that are insufficiently sensitive to changing operational circumstances Insufficient knowledge or ability to break outdated or inappropriate rules 3 Margin How closely or how precariously the system - or a specific component - is currently operating relative to one or another kind of performance boundary 4 Tolerance How a system behaves near a performance boundary whether it gracefully degrades as pressure increases, or collapses quickly when pressure exceeds adaptive capacity Chronically too much to do in too little time with too little resource (numbers, quality or both) perhaps due to an over-demanding organisational, management or appraisal culture, or else harsh economic realities Insufficient knowledge of performance limits, their proximity, or the damage that can arise from exceeding them Insufficient fall-back positions in the event of sudden changes to demands or circumstances Insufficient knowledge of what will happen - and the speed and extent of its spread - as performance limits are approached 8 Based on David Woods Essential characteristics of resilience in Resilience Engineering: Concepts and Precepts, Eds Hollnagel, Woods and Leveson, Ashgate, 2006 v1.0 page 6 of 9

7 System properties 5 Divergence The degree to which a system is working at cross-purposes with itself ie behaviour that is optimal at one level may produce maladaptive behaviour at another level. Examples of system vulnerabilities Downwards, operator resilience may be degraded by senior mis-management of goal conflicts, or poor automation design leading to unfair (and counter-productive) demands for responsibility. Upwards, management resilience may be degraded by local adaptations of operational staff, leading to unworkable management expectations of compliance with industry standards Examples of resilience markers operations. For example, there may be a large gap between checklist-based, The specific markers that should be procedure-driven processes mandated developed for any particular organisation are a matter for its own consideration. The by the management, and the work starting point for their development is the practices developed by the workforce to table above. However, they may also be get around real world problems and so informed by some of the types of example get the job done within the available set out below, all of which have emerged time. A revealing question here is to from organisational resilience studies. analyse how quickly productivity disappears or how suddenly targets Ability of people to make sacrifice become unreachable if everyone works decisions, and the reaction of their to rule. peers and managers when they do A sacrifice decision is one that gives up Evidence that the organisation immediate profit or efficiencies for the continues to scrutinise its risk models additional thoroughness, safety or even when things seem safe Some standards considered appropriate by the organisations do this by insisting on a decision maker - whatever their seniority constant state of edginess and level. The maturity and insight with investigation, or by increasing suspicion which sacrifice decisions are made and when things seem too quiet around responded to provides a good indicator here. Other organisations adopt a of the organisation s inherent resilience. pinging process in which factors that could herald a systemic change in risk Difference in understanding between are identified and monitored (pinged). how things are imagined (or Such factors include a sudden need for prescribed) to be done by senior more people, stalling of expected management, and how they are progress, higher reported workloads, actually done by operational staff Large common tasks (eg getting permits) not differences indicate that organisational performed or performed late, a decline in leadership is ill-calibrated to the communications (eg unreturned calls or challenges and risks encountered in real v1.0 page 7 of 9

8 s), and slowing or stoppage of routine maintenance. Importantly, the pinging process returns high benefits when it becomes part of everyday practice for all staff. This can be contrasted with an approach in which a Safety Department is created to carry out this process which may send inappropriate messages that seems to take safety away from the considerations of every day decision makers. Increasing time to recover from disruptions This has been shown to be a good indicator that a system is failing to adapt any further because it is close to its performance limits and is therefore close to collapse. Maintenance of, and commitment to, professional standards Insistence on adhering to professional standards - and management support for doing so - is a key protector of resilience in the context of economic or production pressures. Paying attention to the effectiveness of what is learned This refers to evidence that the organisation is not only collecting feedback about its operations and initiatives (sometimes known as single loop learning ), but is also analysing what the pattern of learned lessons means. This kind of double loop learning is essential to an organisation s ability to recognise how effective its learning is, and how urgently it may therefore need to learn new ways to work and adapt in a context of changing pressures and opportunities. Where does the human element fit? Organisations are made up of people, and organisational resilience depends on the personal resilience of its operational and management staff. The relevance of our recent human element book 9 is that it provides information and insight into the key sources of influence on human behaviour, as well as ways to deal positively with these influences. While much can be done at individual and small group levels to harness knowledge of these influences to increase personal resilience, the book also makes clear that organisations as a whole must become more resilient if the efforts of their people are to be effective. This paper is aimed at making progress towards the organisational resilience that complex human organisations will require if they are to maintain the high standards and achieve their commercial objectives both profitably and safely. What can be done? Safety in a system that is merely complicated (ie any designed system of components whose interactions are fully specifiable) is best addressed by focusing on leading and lagging indicators of component failure. This is the world of mean time between failures, preventive 9 Dik Gregory and Paul Shanahan, The Human Element: a guide to human behaviour in the shipping industry, published by TSO, 2010 and available as a free download via our website. v1.0 page 8 of 9

9 maintenance, probabilistic assessment of known risks, and root cause analysis 10 to discover and eliminate sources of definable, recurrent error. Systems become complex and adaptive when humans get involved, imbuing them with multiple agendas, multiple centres of control, and unknowable combinations of many degrees of freedom. Safety in complex, adaptive systems is concerned with assessing the vulnerability of the patterns of interactions that develop between its components as they adapt to each other in the collective pursuit of system goals. In order to tune in to this vulnerability, organisations need to develop and calibrate their own organisational markers of resilience. This is not hard to do - but it does take the adoption of a new perspective based on complex systems thinking, mindfulness, trust and the reality of human agency, rather than illusory notions of hindsight, rule-based control, and the linear models of cause and effect that are suited to rather simpler assumptions about the way the world works. Dik Gregory Paul Shanahan Dik Gregory & Paul Shanahan gs partnership ltd However, we take this opportunity to record our agreement with Jim Reason that root causes are essentially political decisions, not objective findings:. For Reason, a root cause is the contributing factor that you are working on when the money or the time runs out. See Revisiting the Swiss Cheese Model of Accidents, EEC Note No. 13/06, Project Safbuild, Eurocontrol, Oct 2006 v1.0 page 9 of 9