Are You Being Ruined by Best Efforts: Does your Maintenance & Reliability Strategy Really Support Defect Elimination and Incident Avoidance?

Size: px
Start display at page:

Download "Are You Being Ruined by Best Efforts: Does your Maintenance & Reliability Strategy Really Support Defect Elimination and Incident Avoidance?"

Transcription

1 Reliability Consulting White Paper June 2017 Are You Being Ruined by Best Efforts: Does your Maintenance & Reliability Strategy Really Support Defect Elimination and Incident Avoidance? By Mike Whittaker

2 Much has been publicised in recent years on high profile industrial failures and the impact these have had on people and the environment, not least the operating companies involved (reputation and costs). These industrial-scale failures are often attributed to the failure of the organisations to manage the risks, ineffective leadership and failure to follow prescribed standards either industry specific, national / international or indeed internal company defined standards and procedures. In the case of maintenance and reliability standards, there is now wide-spread acceptance and knowledge of what is already known to be best practices, whether that s from an asset management, legislative compliance, or process safety perspective. Therefore, why is it that we seem to regularly see incidents where companies have failed to adhere to the rules and face the ever growing number of organisations, people and our environment, that suffer from such failings? Surely, these standards should now be fundamental engineering maintenance and operational (asset care) basics? Unfortunately, too many times, organisations attempt to adopt best practices and industry standards, with a view of being seen to be doing something or selecting the elements of best practices that they see as the key. We can see many examples of this: Many companies now have some level of condition monitoring or predictive maintenance programmes in place, with use of many technologies, such as vibration analysis, infra-red thermal-imaging, oil wear debris analysis, and other non-destructive testing techniques; Also, most organisations have invested in a CMMS (computerised maintenance management system) - which all too often has been driven/selected by business and financial reporting & IT strategy requirements and not always by the maintenance and engineering functions who are handed the responsibility of using these systems; Similarly, the need for planning and scheduling functions is understood and in some cases additional resources are allocated to provide this functional support to the maintenance organisation and reliability objectives; Many companies have also been sold the panacea of RCM and invested many hours of resources and money expecting that at the end of it they will be reliable and compliant; Recent years has also seen a drive for Continuous Improvement and Operational Excellence as business initiatives or indeed proclaimed as not an initiative, but business as usual ; So why is it, with all this effort being put in, that we still experience defects and incidents across industries, whether from a health and safety, environmental or operational and financial perspective? The answer to this question is not simple, but the selective way in which organisations adopt best practices and then implement these is where the answers reside and also what differentiates organisations ultimately between the pretenders and the winners. 2

3 In the attempt to do the right things, companies and organisations often put their best efforts into setting up the tangible elements, like CMMS software and hardware, condition monitoring hardware and software. They also may have been able to make the case to increase head count or at least redeploy maintenance resources to be put into offices to take up the planning and scheduling roles. Even the less tangible elements, such as undertaking or introducing RCM to an organisation may be seen as simple and transactional. For example, they may raise a purchase order on an external consultant to come in to facilitate the analysis process, including provision of training of their own staff. They may undertake the RCM analysis themselves over time, generating reports that can be transferred into the CMMS in the form of maintenance plans and routines. In the case of application of condition monitoring and predictive maintenance, to support the work identification process, there must be a robust link between the systems/processes to collect and analyse predictive and condition based inspection findings (work identification) through to the work request, approval, planning, scheduling and execution (work management). Integration of information and reporting systems must ensure the planning function has easy and readily available access to information and recommendations, provided by condition monitoring inspections. Also, the planning function needs to be confident that the findings and recommendations are appropriate and credible, to ensure prioritisation of corrective or remedial work. Therefore, the organisation needs to ensure competency of the people to gather, analyse, interpret and act on the predictive maintenance tools. Planners should be focused on the results of the predictive and preventive maintenance (quantitative inspections) and need to be able to assimilate the available data and information to trigger well planned remedial work. These requirements mean a significant shift from traditional planning activities (time-based, overhaul) to a more analytical reliability-centred approach. This change needs to be supported through training, mentoring, coaching and continuous reinforcement of benefits to the organisation and the individuals charged with delivery. The adoption of a reliability engineering function can have significant impact in supporting this change, by providing the interface and arbitration between the inspection findings and remedial work generation requirements. The role of the reliability function in ensuring work order feedback and closure is reviewed and effective in capturing and validating the findings after the remedial work is key and self-serving. That is, proper feedback and coding of the work order, to ensure the problem or fault, cause and remedy are effectively described and categorised, enables the reliability function to analyse bad-actors, common causes, etc. effectively, via the defect elimination and continuous improvement processes. Unfortunately, many of these efforts are not always tied into the overall business strategy and often not delivered in a way that means these tasks are designed to pull the organisation s reliability efforts in the same direction. 3

4 An example of this is outlined below: Figure 1 Failed Gear-shaft. Figure 2 Damaged Planet Gears. One of six identical machines failed in mid-campaign, causing reduction in production of approximately 15% for 12 hours, while the maintenance team replaced the gearbox that had suffered a catastrophic failure. As these machines were critical to the process, a company spare was held; On inspection of the failed gearbox, it was found that the input gear-shaft was destroyed, together with the a number of the planetary gears (see figures 1 and 2 below); Investigation into the failure found that there was a history behind this particular gearbox going back several years, and indeed inspection reports identifying flaking of the case hardened surface of the gear-shaft which was polished and dressed and reported as being in good condition with an excessive service life left. The same gearbox had also been returned to the workshop a year later with similar issues found, but returned to service again; During this period and the following several years, oil sampling and analysis was conducted and reported on, with Iron content being identified as high from the tribology laboratories service provider but not raised as significant (see below graph of Iron parts per million (Fe ppm)) for failed gearbox); 4

5 Fe ppm Failed Gearbox changed A Oil Analysis Report Date Figure 3 Oil Analysis: Fe ppm Levels. Annual oil changes for the machines were followed, as per the CMMS planned maintenance work schedule and OEM recommendations; Vibration analysis monitoring was also conducted on all six machines and up to four years prior to the eventual failure, the gearbox was reported to be exhibiting signs of early gear-mesh fault at the frequency of the input gear-shaft. Recommendations were provided, by the condition monitoring vendor, outlining the link to the wear particles reported on the oil analysis report and the need to verify findings via an internal inspection. This was the case for the following two years. 5

6 Machine Condition Summary Report ID/Machine: A1 Area: Mill End Problem Description: Gear noise. Survey Date: 31st January 2008 PRIORITY 2 Monitor Analysis Detail Data gathered at the input and output shafts show more active patterns that are characteristic to gear tooth noise (a internal gear shaft rotating at 428rpm with input at 702rpm). Recommended Action The oil sample also taken from this gearbox appeared to show evidence of wear particles, which if this turns out to be the case on the oil analysis report, then it would be wise to inspect this gearbox during the off-season. Figure 4 Vibration Analysis Report Highlighting Gear-Mesh Fault and Recommendations. Availability for conducting maintenance on these machines was not an issue, but the annual plan was based on general OEM recommendations for replacement of wear parts, including oil change for the gearbox (annual oil change probably worked to extend the life of this gearbox); In this case, the company had put the effort in to invest in the application of multiple predictive maintenance techniques and indeed were getting the reports telling them that there was an incipient problem with the gearbox years before the final catastrophic failure. Indeed the fact there were 5 other identical machines, all of which were monitored to the same level, with vibration analysis and oil wear debris analysis reports being provided by external vendors. In the case of the oil analysis reports, when pulled together, for all six machines, it was clear that the particular gearbox that failed had significantly higher Iron wear (Fe ppm) particles than the other 5 machines (see figure 5 below). 6

7 700 Fe ppm Failed Gearbox Iron ppm analysis reported Note scale Oil analysis reports for all 6 machines were treated in isolation Failed Gearbox changed A1 A2 B1 B2 C1 C Oil Analysis Report Date Figure 5 Comparison of Fe ppm Oil Analysis Reports for all 6 Identical Machines. The dedicated planning team did not get to see the oil analysis reports or necessarily the vibration analysis reports, as these were collated by the maintenance technician and production engineers responsible for the area, in the form of electronic/paper reports. At the time of the failure, there was no single person / function (e.g. reliability engineer) or defined work flow outlining the responsibilities for collating the analysis or indeed for entry of corrective or remedial work requests in the CMMS, for approval and the planning function to include into the outage planning scope. The result, in this case, was not a significant safety or environmental consequence, but demonstrates that even with the best efforts and intentions for adopting best practices and what is known to be the right things to do, organisations are at risk of unplanned equipment failures even when the evidence is there for see, if there is a process in place to look for it. Conclusions Unfortunately, the systems and procedures, that are the glue to bring all these things together, are so often missing. This is not particular to a single organisation or industry sector, but is seen across all industries and this variability is indeed seen within individual companies. The key to the answer is that there needs to be a holistic approach to reliability and asset management to maximise benefits and ultimately achievement of the business goals and objectives whether financial, health, safety, environmental or product/process safety/compliance. 7

8 Therefore, we must look at the whole systems, processes and people aspects and develop the organisations ability to channel the best practices into a coherent and functioning asset management system of working: CMMS systems must be configured to, not just capture costs or issue work orders, but in a way that it contains the technical data required, has work processes that enable efficient management of work through to execution of remedial/corrective/ preventive work and utilisation of the resources (labour, materials and contracted services); Adoption of condition monitoring and predictive maintenance techniques, must be done with the end result in mind, i.e. defect and fault identification to drive effective planning and defect elimination activities. Results of the identification of faults must be prioritised and recorded in the CMMS (including detailed narrative and coding feedback) and defined within the gate-keeping processes, to ensure operations and maintenance are aligned and the risks associated with requirements for plant access are fully agreed on complied to; A process of reliability analysis and review is catered for and is a fundamental function, to ensure that common causes and identified defects are tracked and re-prioritised accordingly, together with feeding into PM optimisation and materials planning key for timing of corrective maintenance intervention, inclusion in reliability analysis, defect elimination and RCM or RBI (Risk Based Maintenance) processes, etc. CMMS should be the source/data warehouse (failure / work order close-out coding, engineering inspection points data trending, etc.); Maintenance organisation is aligned with achievement of the strategy correct number of planning, scheduling and supervisor personnel for the number of maintenance (craft) and contracted personnel; Alignment and integration of asset management processes within the overall business and functions, i.e. health, safety, environmental, procurement, operations, commercial, human resources (role definition and competency requirements, training and development, staff recruitment and retention, personal objectives and appraisal systems, etc.); With any holistic reliability transformation programme, it is an absolute imperative to fully appreciate and plan for the cultural impact and the process to successfully address people s acceptance to change and new ways of working. Just like the safety cultural changes seen over the past few decades, where responsibility for safety lies with everyone in the organisation and not just the few in-house safety professionals or appointees (safety department) reliability is everyone s responsibility. Breakdown Heroes Unplanned Downtime/Incidents Reliability Excellence Maturity Model Discipline Cultural acceptance to change is more than half the battle... Learning Operational Excellence Reactive Planned Condition Based Proactive ZERO IS UNREALISTIC Uptime by luck Product Out The Door is the Goal Delegated to Maintenance Supervisor/team Lack of Management Involvement ZERO IS DIFFICULT Management Commitment CMMS/Planning Discipline Rules/Procedures Supervisor Control, Emphasis and Goals Value All People Training ZERO IS ATTAINABLE Knowledge, Commitment, and Standards Personal Value pride in workmanship Practice, Habits Ability to Predict, Plan & Schedule ZERO IS SUSTAINABLE Continuous Improvement Engineered Reliability Defect Elimination Asset Management Organizational Excellence & Pride Benchmark Setter/ Top Performer 8

9 As organisational safety maturity has defined states through which an organisation and it s people transition, the reliability and asset management transition follows similar phases. All of which must start with recognition that standards and the discipline to adhere to the standards needs to be rigorously managed with just leadership and accountability. Maturity through this change, will mean that, as the organisation gets better at doing the right things and learning from events and history, then the aspiration of operational excellence and true asset management can be seen as realistic. Therefore, as part of the holistic implementation, the technical and cultural aspects must be integrated and effectively communicated throughout the life cycle of the change programme and beyond into business as usual. Finally Remember: Don t let best efforts ruin your company s performance Best efforts without knowledge are just best efforts. Be results focused rather than task focused. Ensure you system is designed to deliver the results you want to achieve; your systems are perfectly designed to deliver the results you get! , Emerson. All rights reserved. Emerson Reliability Consulting 1100 Buckingham St. Watertown, CT USA The Emerson logo is a trademark and service mark of Emerson Electric Co. All other marks are the property of their respective owners. The contents of this publication are presented for informational purposes only, and while every effort has been made to ensure their accuracy, they are not to be construed as warranties or guarantees, express or implied, regarding the products or services described herein or their use or applicability. All sales are governed by our terms and conditions, which are available on request. We reserve the right to modify or improve the designs or specifications of our products at any time without notice.