Root-Cause-Analysis for the Rest of Us

Size: px
Start display at page:

Download "Root-Cause-Analysis for the Rest of Us"

Transcription

1 Root-Cause-Analysis for the Rest of Us Howard W. Penrose, Ph.D., CMRP President, SUCCESS by DESIGN Introduction Root-Cause-Analysis (RCA) is a powerful tool for the maintenance and reliability professional. However, depending upon who you discuss the process with, it can be considered a very expensive tool. Does it need to be? No, the actual concept of RCA is to provide a logical process to determine the actual cause, or causes, for a failure or repetitive failures. There are a few times when a more advanced process is called for, requiring additional resources and time. There are far more times where a simple process can be used to evaluate an average failure. The purpose of this paper is to provide an outline for a simple process that can be used for common instances. The particular process that we will outline is the 5-Why process which utilizes the concept that for each failure it takes an average of five why questions to meet a conclusion. We will also include a basic worksheet at the end of this paper. Common Failure Causes Some of the common causes that are found include: Misapplication Improper operation Maintenance o Poor Practices o Maintenance reduced reliability Age Corrective Action Issues o Poor practices o Poor communication of requirements o Changes to design without impact review However, these are not considered root-causes. Root-causes tend to be much deeper, such as one, or more, of the following: Inadequate or no o Training o Procedure(s) o Specification(s) o Acceptance criteria o Vendor review

2 o Design review o Maintenance practice(s) Equipment Age Random failure The concept is to provide a process to go beyond the cause of failure, and get to the root of the problem. Some of the benefits of the 5-Why logic approach include: Assisting in problem solving team focus on the best areas to address short and long term solutions; Identify, analyze and eliminate gaps between the current situation and reliable equipment specifications; Reduce waste; Fault communications; Create an environment that encourages the surfacing of problems as opportunities for improvement; and, Reduce repetitive failures relating to the same root-cause. One of the key reasons that the 5-Why process lends itself to common RCA applications is that it does not require statistical analysis. Therefore, it is a process that can be applied by virtually anyone in the maintenance and reliability arena. The 5-Why Process The process is performed on systems where there are repetitive failures, safety issues, regulatory issues, an interruption to production or the failure incurs a significant cost. Based upon the resources available to the maintenance and reliability professionals involved, a more rigorous RCA process may be required. The first step in the process is to contain the failure. This involves stopping the event from continuing to occur. Once the failure has been stopped, determine what and how much damage has been done. Contain the effects of the damage and notify affected personnel and departments. The second step is to form a team of stakeholders. The stakeholders can be as simple as one or two personnel, or to a larger team, and can consist of production, maintenance, vendors and others who are directly affected or can provide information on the investigation. The third step involves clearly identifying the problem. The identification involves determining the scope of the problem and how many problems are involved. What systems are affected by the problem and what is the impact on the plant or facility. How often does the problem occur. Once the problem is defined, state it in simple terms with the event question being short, simple, concise, focused on one problem and starts with

3 Why. It must not tell what caused the event, instruct what to do next or explain the event. The fourth step involves the analysis. Each question must progressively begin with Why, and must continue for any multiple branches and multiple causes until each branch is worked down to its logical end. Evidence must be provided for the answer of each question using research, investigation and interviews. Many of these identified rootcauses may not directly relate to the problem at hand, but may point to issues that still need to be addressed to prevent future problems. Once completed with each branch, review the original problem statement and ensure that it is correct with the evidence that is found. The fifth step is how to deal with the solutions. Develop preventive actions that positively change or modify and consider the feasibility, effectiveness, budget, employee involvement, focus on systems and contingency planning. Some guidelines for solution development include: There may not be an absolute correct solution; Do not rush to a solution; Always be willing to challenge the root-cause as a symptom of a larger problem; Never accept an assumption without significant data; Will action reduce risk to a reasonable level? Are there any adverse effects for the application of the corrective action? If the solution(s) are unacceptable, take detailed notes on the reasons for rejecting the action(s). For the solution(s) that are acceptable, then set responsibility for accomplishment and the associated timeline(s). The sixth and final step involves assessing the solutions and completing the RCA. This involves scheduling a follow-up date in a reasonable time, in order to confirm the success of the solutions. Determine if everything was accomplished as stated in the report and that the tasks were completed within the established timelines. Finally, determine if the actions were effective and then complete the RCA and make findings available. Case Study: PAM Motor Rebuild Failure The electric motor in question is a 5000/9000 horsepower, 712/886 RPM, 6600 Vac, 441/690 Amp, 1.0 Service Factor, Westinghouse PAM Motor which drives a utility ID Fan. PAM motors are single winding, two-speed, squirrel cage induction motors that have the ability to operate at two fixed pole speeds. The motor had damage to the rotor above the rotor bars that had been noted more than a decade before it was finally repaired. The motor was removed during a two week shutdown and the repair shop replaced the rotor laminations, replaced the rotor bars and re-fabricated the end-rings. It was determined and agreed that the rotor bars would be set

4 0.055 inches deeper into the rotor and that the edges of the bars would be rounded so that there would be less chance of damage occurring to the rotor teeth in the future. Once the unit was returned for operation, the initial startup was found to be seconds with a voltage drop from 7200 Vac to 5800 Vac. It was determined that a voltage drop had occurred in the past, however the startup was calculated at 26 seconds. The motor was originally designed for a 14 second startup at 6600 Vac. The following were the results of the extended startup time: The winding (RTD) temperature from start to low speed increased from 16C to 46C; The current was 2400 Amps through a majority of the startup. When the motor was switched from low speed to high speed, it took 39 seconds with a current increase to 2460 Amps and a voltage drop to 6000 Vac for the duration of the switch. The winding temperature increased form 26C to 76C. In the past, the transfer from low to high speed was 5 to 7 seconds. A team consisting of engineers from the utility, the motor repair shop owners and Dr. Penrose as the motor/rca consultant/facilitator was assembled. All records, personnel and test data was provided for review and analysis. 1. Problem: Why was there an increase in starting and transition time, starting current and starting temperature? a. Changes to the rotor bar (increased impedance) or rotor core material can cause a decrease in starting torque. This will cause the type of problems identified when the motor was started. b. The review of records identifed that the wrong material was used in the rotor bar. 2. Why was the wrong material used in the rotor bars? a. There was less than two weeks to perform the repair; b. The owner would not shut down the motor long enough to take a rotor bar and end-ring material sample. c. The repair shop engineer assumed that the material used in this motor was the same as used by the same manufacturer in a similar sized single speed motor. 3. Why was the motor owner unaware of the changes to the motor design? a. The repair shop engineer calculated the torque curve, which appeared to be correct, on paper. A review of the original material versus the material used showed that the calculated torque curve was incorrect. b. The repair shop determined that they did not need to provide the information due to the calculated torque curve. 4. Why did the motor owner not observe the test run of the motor at the repair facility? a. The repair specification did not require an on-site observation of the final test and repair records.

5 b. The repair facility was unable to perform a full voltage test of the motor. 5. Why did the repair specification not call for an onsite observation of testing at full voltage? a. The repair specification was generic to all motor sizes. Note: You will notice that there are several additional branches that could have been followed, including a number of internal issues at the motor repair shop. During the actual RCA, each of these branches was explored fully. Following the 5-Wye analysis, several solutions were considered: Remove the motor and have the rotor rebuilt with the proper materials; Require the motor repair shop to request approval for all changes to the original motor design; Require that material analysis is performed on rotor materials for future rotor bar replacements; Change repair specifications so that larger motors (ie: any motor over 600 Volts) have their own specification. Include the requirement for an onsite inspection and review of repair records and any changes. All changes must be approved, in writing. It was determined that the rebuild was not feasible, at the time. Instead, an extended warranty was provided by the repair facility and a special starting procedure put in place. The remaining three recommendations were built into the new repair specifications. The rotating machine engineer for the utility was responsible for the modified specification and a timeline set. Conclusion The 5-Why RCA process is one of many logical processes available to the maintenance and reliability professional. RCA processes are powerful tools for continuous improvement in the plant for manufacturing, administration or other processes that impact safety, regulation, production or costly equipment. The process presented in this paper is a simple method for the application of RCA. For complex or costly systems, it is recommended that an RCA facilitator or formal RCA training is pursued. Bibliography Mobley, R. Keith, Root Cause Failure Analysis, Butterworth-Heinemann, Massachussetts, 1999 Pri-Network, Root Cause Corrective Action Nadcap-Style, August, 2004

6 Attachment 1: 5-Why Analysis Supervisor: Equipment: Date of Event: Time of Event: Type of Event: Maintenance Training Supplies Meeting Material Flow Part Availability Leadership Equipment Failure Priority Ranking: One-Time Issue Repetitive Failure Safety/Regulatory Operations Equipment Cost Other: Containment: Containment Action: Downtime (in minutes): x Per Minute Cost: $ = Loss: $ Team Members (name, affiliation, phone, ):

7 Investigation (add paper as appropriate): Problem Definition (State as Simply as Possible)

8 5-Why Analysis: Root Causes: Counter Measure (Preventive Actions): Responsibility: Deadline: Verification: No recurrence in months Signed: Close-Out RCA Continue RCA No Further Action (File) RCA Hours:

9 Attachment 2: RCA Flow Chart Failure Event Occurs Meets RCA Rules? Containment Yes No Corrective Action Define Problem Form RCA Team Identify Problem in Simple Terms Gather & Verify Data Analyze Data Determine Impact Develop Solution(s) Assessment No Live with Root Cause? No Resolved Root Cause? Yes Close-Out RCA Yes