UNIT 11 RELIABILITY 11.1 INTRODUCTION. Structure. Objectives Introduction System Reliability 11.3 Achieving Reliability

Size: px
Start display at page:

Download "UNIT 11 RELIABILITY 11.1 INTRODUCTION. Structure. Objectives Introduction System Reliability 11.3 Achieving Reliability"

Transcription

1 UNIT 11 RELIABILITY Reliability Structure 11.1 Introduction Objectives 11.2 System Reliability 11.3 Achieving Reliability Series Structure Parallel Structure Combination Structure 11.4 Design : Important Aspect of Reliability 11.5 Maintainability 11.6 Availability 11.7 Improving Reliability 11.8 Summary 11.9 Key Words Answers to SAQs 11.1 INTRODUCTION You will be introduced to the topic of reliability of a system in this unit. Reliability is a measure of the ability of a product, part or system in respect of its performance of its intended function under a prescribed set of conditions. Information about reliability of a product s performance and service generates confidence in user. This is one of the major considerations in the mind of a prospective user for opting for a particular product of a particular brand. Just because of high reliability of Japanese electronic goods, there is worldwide demand for electronic goods made in Japan. This also serves as one of the best examples in support of the above claim. In fact, reliability of a system is its ability to continue to be fit, for the purpose or the function it is designed and developed, over the life span and is governed by factors like probability of failure, performance level expected, duration of performance and environmental conditions. Reliability of a device or system also depends on reliability of its components or subsystems which may be connected in series or in parallel or an admixture of the both. It is expressed as a number in the range of zero to one. A reliability factor of one would mean that the device is expected to perform satisfactorily for the prescribed duration under the given environmental condition. Similarly, a reliability factor of zero would mean that in almost all cases, the equipment would fail to meet the required performance level. Objectives After studying this unit, you should be able to define reliability, calculate the reliability of a system, understand the term maintainability, compute simple calculations on availability, and 55

2 Quality Tools Others describe how to improve reliability of a system SYSTEM RELIABILITY If a system does not perform its intended functions specified by the system designer, it is said to have failed. Thus a breakdown induces an actual failure of the system as well as unsatisfactory performance. A system which is capable of sustaining performance over a longer period of time is said to be more reliable. There are three issues concerning reliability : Reliability as a probability, Definition of failure, and Prescribed operating conditions. Reliability as a Probability Suppose that a component of a machine has a reliability of This means that it has a 85 per cent probability of functioning as intended. The probability of failure is = 0.15 or 15 per cent. One of the conclusions that we draw from the fact is that out of 100 of such components, 15 will fail. Similarly, a reliability of implies 5 failures per 1000 trials. Definition of Failure The term failure is used to describe a situation in which an item does not perform as intended. For example, a smoke alarm might fail to respond to the presence of smoke either by not operating at all or by sounding an alarm which may be inadequate as warning (substandard performance), or it might sound an alarm even though no smoke is present (unintended response). Prescribed Operating Conditions Reliability is the probability of a device performing its purpose adequately for the period of time intended under the operating conditions encountered. The operating conditions my include concerns for noise, temperature, and humidity ranges as well as operating procedures and maintenance schedules ACHIEVING RELIABILITY 56 Increased emphasis is being given to product reliability. One of the reasons for this emphasis is the fact that the products are generally becoming more complicated. At one time the washing machine was a simple device that agitated the clothes in a hot, soapy solution. Today, a washing machine has different agitating speeds, different rinse speeds, different cycle times, different water temperatures, different water levels, and provisions to dispense a number of washing ingredients at precise times in the cycle. An additional reason for the increased emphasis on reliability is due to automation; people are, in many cases, not able to manually operate the product if an automated component fails. The possibility of system failure increases as a product becomes more complex (has more components). The method of arranging the components also affects the reliability of the entire system. Reliability theory provides a way to examine a multiple-component system, calculating its overall reliability, i.e. the probability that the system will work. Many systems can be modeled using series structure, parallel and redundant structure or combination of both. In the best case, a high reliability system would have many parallel

3 systems in series. In terms of design, a system designer must have deep understanding of the concept of series/parallel structures. Let us first consider the case of series structure Series Structure A system is a series system of its components/units if any of the units fails, then the system becomes inoperative. Hence the reliabilities of all the components affect the overall reliability of the system. The following are examples of series structures : Reliability 1 2 Figure 11.1 : Series Structure For such a system to work properly, both components 1 and 2 must work properly. 1 2 n Figure 11.2 : Series Structure with n Component There may be any number of components in a series. If n is number of series components in a system, then all n components must work in order for the whole system to work. Here R s = R 1. R 2. R R n n = R i i = 1 where R s = the reliability of series system, and R i = the reliability of unit (i). Example 11.1 Consider the system, which we call System I, given below diagramatically a b Solution Figure 11.3 : Series Combination For the system to work, both components a and b must be in working order. Suppose component a has a 75% chance of working and component b has a 60% chance of working. We would say the reliability of a is 0.75 and reliability of b is What is the probability that the whole system would work? Here R 1 = 0.75, R 2 = The probability that the whole system would work is R s = R R Parallel Structure 1 2 = = 0.45 Now we will consider parallel system. A parallel system with two components viz., c and d may be graphically represented as shown in Figure c d 57

4 Quality Tools Others Here = 1 (1 R ) (1 R ) R p 1 Figure 11.4 : Parallel Structure Hence, for a parallel structure of n components 2 R p = 1 (1 R ) (1 R )... (1 R 1 2 p ) n = 1 (1 R i i = 1 ) where R p is the reliability of a parallel system and R i is the reliability of ith component. For this system to work, we need at least one of the, which we call System II, components to work. Example 11.2 Solution Let for a parallel system, which we call system II, having two components c and d, the reliability of component c be 80% while the reliability of component d be 40%. (Please refer Figure 11.4). What is the probability that System II will work? Here R 1 = 0.80 R 2 = 0.4 R p = 1 (1 R ) (1 R 1 = 1 (1 0.8) (1 0.4) = = Combination Structure 2 ) A series may also combine both parallel and series structures as shown in Figure Such a structure is called a combination structure Figure 11.5 : Combination Structure Next, we will be considering reliability of a system with a combination of series and parallel components. Let us analyse a combination structure as shown in Figure A B

5 Reliability C Figure 11.6 : Simple Combination Structure To begin with, we assume the following reliabilities for the various components of the system, which we call System III : Component Table 11.1 Reliability A 0.7 B 0.5 C 0.6 The key is to reduce the system to series system of components in such a way that each sybsystem having only parallel components is treated as a single component. As a first step, we calculate reliability of each of such parallel subsystem. So we want to somehow simplify the right side of the system as shown in Figure 11.7 into a new component with one reliability value. We will call this new component D, and then we will have a series system with two components as shown by Figure B C Figure 11.7 A D Figure 11.8 For calculating the reliability in a combined system, first we find the reliability of the subsystem as shown in Figure 11.9 below. B C Figure 11.9 The reliability of the subsystem is 1 (1 0.5) (1 0.6) = 0.8 or 80%. So we will now attach this as the reliability of component D in the system below : A D 59

6 Quality Tools Others Figure Component A has a reliability of 0.7. So the reliability of series with A and D is simply the product of the reliabilities : R s = = 0.56 So the reliabilities of System III is 0.56 or 56%. Here are a few problems to try. SAQ 1 Determine the reliability of each system : (i) (ii) (iii) (iv) (v) (vi)

7 Reliability (vii) Figure DESIGN : IMPORTANT ASPECT OF RELIABILITY A products reliability depends on a number of factors including its design, manufacture, transportation, and maintenance. The most important aspect of reliability is the design. It should be as simple as possible. As previously discussed, the greater the number of components, the greater the chance of product failure, specially when these components are in series. If a system has 50 components in series, and each component has a reliability of 0.95, the system reliability is R s = R n = = 0.08 The fewer the components in series, the better the reliability. Another way of achieving reliability is to have a backup or redundant component. When the primary component fails, another component is activated. This concept was illustrated by the parallel arrangement of components. It is frequently cheaper to have inexpensive redundant components to achieve a particular reliability than to have a single expensive component. Reliability can also be enhanced by overdesign. Also, taking into consideration various safety factors at the time of design and development, can increase the reliability of a product. For example, a one-inch rope may be replaced by a half-an-inch rope, provided half-an-inch rope suffices for the purpose. When the failure of a product can lead to a fatality or substantial financial loss, a fail-safe type of device should be used. Thus, disabling extreme injuries from power-press operations can be minimized by the use of a clutch. The clutch must be engaged for the ram and die to descend. If there is a malfunction of the clutch-activation system, the press will fail to operate. The maintenance of the system is an important factor in reliability. Products that are easy to maintain are likely to be more reliable. In some situations, if the system is designed and developed taking into consideration various safety measures, it may be more practical to eliminate the need for maintenance. For example, oil-impregnated bearings do not need lubrication for the life of the product. Environmental conditions such as dust, temperature, moisture, and vibration can be the cause of failure. The designer must protect the product from these conditions. Heat shields, rubber vibration mounts, and filters are used to increase the reliability under adverse environmental conditions. There is a definite relationship between investment in reliability (cost) and resultant reliability. However, after a certain point, there is only a slight improvement in reliability for a large increase in product cost. For example, assume that a Rs component has a reliability of If the cost is increased to Rs. 10,000, the reliability becomes 0.90; if the cost is increased to Rs. 15,000, the reliability becomes 0.94; and if the cost is 61

8 Quality Tools Others increased to Rs. 20,000, the reliability becomes As can be seen by this hypothetical example, there is a diminishing reliability return for the invested rupee. Manufacturing The manufacturing process is the second most important aspect of relia bility. Basic quality control techniques will minimize the risk of product failure. Emphasis should be placed on those components which are least reliable. Manufacturing personnel can take action to ensure that the equipment used is right for the job and check new equipment as it becomes available. In addition, they can experiment with process conditions to determine which conditions produce the most reliable product. Transportation The third aspects of reliability is the transportation of the product to the customer. No matter how well conceived the design or how carefully the goods are manufactured the actual performance of the product judged by the customer is the final evaluation. The reliability of the product at the point of use can be greatly affected by the type of handling of the product in the transit. Good packaging techniques and shipment evaluation are essential. Maintenance While designers try to eliminate the need for customer maintenance, there are many situations where it is not practical or possible. In such cases, the customer should be given ample warning. For example, provision can be made for a warning light or buzzer when a component needs lubricant. Maintenance should be simple and easy to perform. The second way of looking at reliability may be through the incorporation of time dimension: Probabilities are determined relative to a specified length of time. This approach is commonly used in product warranties, which pertains to a given period of time after purchase of a product. A typical profile of product failure rate over time is illustrated in Figure Because of its shape, it is sometimes referred to as a Bathtub Curve. Frequently, a number of products fail shortly after they are put into service, not because they wear out, but because they are defective to begin with. The rate of failures decreases rapidly once the truly defective items are weeded out. During the second phase, there are fewer failures because most of the defective items have been eliminated, and it is too soon to encounter items that fail because they have worn out. In some cases, this phase covers a relatively long period of time. In the third phase, failures occur because the products are worn out, and the failure rate increases. Failure Rate (λ) Debugging Phase Chance Failure Phase Wear Out Phase 62 Time (t)

9 Reliability Figure 11.2 : Typical Life History of a Complexed Product for an Infinite Number of Items 11.5 MAINTAINABILITY Maintainability is the ability or probability that a device will be restored to a state in which it can perform the required function when maintenance action is performed in accordance with prescribed procedure. Maintainability is measured in terms of : Meantime to Repair (MTTR) : It is the ratio of total corrective maintenance time to total number of corrective maintenance action during a given period of time. Thus, this parameter becomes inverse of the Rapair Rate. Meantime for Service or Mean Preventive Maintenance Time : It is average time recorded for carrying out preventive maintenance for a given time interval. It is also called Mean Preventive Maintenance Time (MPMT). Like reliability, maintainability is a design characteristic of a system. The various concepts of maintainabilities are considered so that : all maintenance tasks are accomplished with minimum number of people and in the shortest possible downtime of equipment, there is minimum expenditure on spare parts, least number and types of tools are required for testing equipment and facility etc., task of maintenance is safely accomplished, and minimum training to maintenance staff is required. Mathematically, maintainability is defined as the probability of restoring a system in time (t). The equation (assuming Poisson distributed repair times) for maintainability in terms of allowable time for repair etc. is given by (11.1) where M = 1 e µt... M = maintainability and is the probability of repairing in time (t), t = maximum allowable time to repair, and µ = maintenance action rate or average number of maintenance actions per period of time. The repair time (t) can be management decision, designated by the terms of the contract, or determined by engineering considerations or the conditions of the problem. It has no relation to action or average repair times, although they may occasionally be the same. On the other hand, µ is determined by actual sampling; by dividing the total number of units repaired by the total hours needed to repair those units. The formula is (11.2) Repaired units µ =... Total repair hours Since the number of repaired units almost always equals the number of failures, the formula can often be rewritten as 63

10 Quality Tools Others (11.3) µ = f / hours... Another form of the maintainability formula is (11.4) where t / φ M = 1 e... M = maintainability, t = maximum allowable time to repair, and φ = average hours per maintenance action. Where, φ is determined by dividing the total number of repair hours by the number of repairs. This is just the reciprocal of µ (µ = 1/φ) and thus µ t M = 1 e (11.5) f (T) t / φ = 1 e... Note that the probability of not repairing in time (t) is (11.6) µt 1 M = 1 (1 e ) = e µt t / φ = e... Information on the distribution and length of each phase requires the collection of historical data and analysis of these data. It often turns out that the mean time between failures (MTBF) can be modeled by a negative exponential distribution, such as that depicted in Figure Equipment failures as well as product failures may occur in this pattern. In such cases, the exponential distribution can be used to determine various probabilities of interest. The probability that equipment or a product put into service at time it will fail before some specified time, T, is equal to the area under the curve between 0 and T. Reliability is specified as the probability that a product will last at least until time T, reliability is equal to the area under the curve beyond T. (Note that the total area under the curve in each phase is treated as 100 percent for computational purposes). Observe that as the specified length of service increases, the area under the curve to the right of that point (i.e. the reliability) decreases. Reliability = e T/MTBF 0 1 e T/MTBF Time T 64 Figure : An Exponential Distribution Determining values for the area under a curve to the right of a given T, becomes a relatively simple matter using a table for exponential values. An exponential distribution is completely described using a single parameter, the distribution mean, which reliability

11 engineers often refer to as the mean time between failures. Using the symbol T to represent length of service the probability that failure will not occur before time T (i.e. the area in the right tail) is easily determined. where Example 11.3 P (no failure before T) = e T/MTBF e = Natural logarithm, , T = Length of service before failure, and MTBF = Mean time between failures. By means of extensive testing, a manufacturer has determined that its Super Sucker Vacuum Clearner models have an expected life that is exponential with a mean of four years. Find the probability that one of these cleaners will have a life that ends (i) (ii) (iii) Solution (i) After the initial four years of service. Before four years of service are completed. Not before six years of service. MTBF = 4 years T = 4 years 4years T/MTBF = = years Now, e 1.0 = (ii) The probability of failure before T = 4 years is 1 e 4, or = T = 6 years 6years (iii) T/MTBF = = years Reliability e 1.5 = The probability that failure will occur before time T is 1.00 minus that amount : P (failure before T) = 1 e 1.5 = Product life can sometimes be modeled by a normal distribution AVAILABILITY A related measure of importance to customers, and hence to designers, is availability. It measures the fraction of time a piece of equipment is expected to be in operational (as opposed to being down for repair). Availability can range from zero (never available) to 1.00 (always available). Companies that can offer equipment with a high availability factor have a competitive advantage over companies that offer equipment with lower availability values. Availability is a function of both the mean time between failures and the mean time to repair. The availability factor can be computed using the following formula : Availability = MTBF MTBF + MTTR where, 65

12 Quality Tools Others MTBF = Mean time between failures, and MTTR = Mean time to repair. Two implications for design are revealed by the availability formula. One is that availability increases as the mean time between failures increases. The other is that availability also increases as the mean repair time decreases. It would seem obvious that designers would want to design products that have a long time between failures. However, some design options enhance repairability, which can be incorporated into the product. Laser printers, for example, are designed with print cartridges that can easily be replaced. There are three types of availability depending on : (i) (ii) (iii) Inherent availability Inherent Availability Achieved availability Operational availability. Inherent availability is the probability that a system or equipment, when used under stated conditions in an ideal support environment (i.e. readily available tools, spares, maintenance personnel etc.), will operate satisfactorily at any point in time as required. It excludes preventive or scheduled maintenance actions, logistics delay time, and administrative delay time, and is expressed as : (11.7) where Achieved Availability MTBF A i =... MTBF + MTTR MTBF = Mean time between failure, and MTTR = Mean time to repair. Achieved availability is the probability that a system or equipment when used under stated conditions in an ideal support environment (i.e., readily available tools, spares, personnel etc.) will operate satisfactorily at any point of time. This definition is similar of the definition for A i, except that preventive (i.e. scheduled) maintenance is included. It excludes logistic delay time and administrative delay time, and is expressed as : (11.8) MTBM A a =... MTBM + M where MTBM is the mean time between maintenance and M the mean active maintenance time. MTBM and M are functions of corrective (unscheduled) and preventive (scheduled) maintenance actions and times, respectively. Operational Availability Operational availability is the probability that a system or equipment, when used under stated conditions in an actual operational environment, will operate satisfactorily when called upon. It is expressed as : 66 (11.9) MTBM A o =... MTBM + MDT

13 where MDT is the mean maintenance downtime. The reciprocal of MTBM is the frequency of maintenance, which in turn is a significant factor in determining logistic support requirements. MDT includes active maintenance time (M), logistic delay time, and administrative delay time. Reliability Example 11.4 Solution A system has a mean time between failures of 100 hr and a mean time to repair of 10 hr. What is the inherent availability? Inherent availability is A i MTBF = MTBF + MTTR Here, and Therefore, MTBF = 100 hr MTTR = 10 hr A i 100 = A = or 90.9%. i SAQ 2 (a) (b) (c) The total number of failures is 106. The total number of maintenance hours used to correct the 106 failures is 646. Calculate maintainability for 1, 2 and 10 hr. Equipment is required to meet an inherent availability of A i = and a mean time between failures (MTBF) = 100 hr. What is permissible mean time to repair (MTTR). Determine the operational availability of a system with a mean of time intervals between corrective maintenance actions, MTBM = 900 hr. The sum of the mean corrective maintenance time intervals including supply downtime and administrative downtime, MDT = 10 hr IMPROVING RELIABILITY Reliability can be improved in a number of ways, some of which are listed as follows : Improve component design. Improve production and/or assembly techniques. Improve testing. Use redundancy. 67

14 Quality Tools Others Improve preventive maintenance procedure. Improve user education. Improve system design. Because overall system reliability is a function of the reliability of individual components, improvements in their reliability can increase system reliability. Unfortunately, inadequate production or assembly procedures can negate even the best of designs, and there are often sources of failures. As you have seen, system reliability can be increased by the use of backup components. Failures in actual use can often be reduced by upgrading user education and refining maintenance recommendations or procedures. Finally, it may be possible to increase the overall reliability of the system by simplifying the system (thereby reducing the number of components that could cause the system to fail) or altering component relationship (e.g. increasing the reliability of interface). A fundamental question concerning improving reliability is : How much reliability is needed? The answer depends on the potential benefits of improvements and on the cost of those improvements. Generally speaking, reliability improvements become increasingly costly. Thus, although benefits initially may increase at a much faster rate than costs, the opposite eventually becomes true. The optimal level of reliability is the point where the incremental benefit received equals the incremental cost of obtaining it. In the short term, this trade-off is made in the context of relatively fixed parameters (e.g. costs). However, in the longer term, efforts to improve reliability and reduce costs will lead to higher optimal levels of reliability. Maintainability and Availability Many systems in use today are highly sophisticated and fulfill most expectations when operating. However, experience has indicated that the reliability of many of these systems is marginal and that these systems are inoperative much of the time. Unreliable systems are unable to fulfill the mission for which they were designed and require extensive maintenance. In an environment where resources are becoming scarcer, it is essential that system maintenance requirements be minimized and that the maintenance costs be reduced. Therefore, it is essential that maintainability be considered as a major system parameter in the design process SUMMARY Let us summarize what we have learnt in this unit. We have understood that the reliability of a system is its ability to perform a required function under the stated conditions for a stated period of time. The reliability determines the frequencies of occurrences of failure. We had also learnt to compute the reliability of a system, whose subcomponents are connected either in series or parallel, or as a combination of both. Maintainability is the ability or probability that a device will be resorted to a state in which it can perform the required function when maintenance action is performed in accordance with prescribed procedures. Lastly, we have also understood the term availability which is the ability or probability that system or equipment will be available for standard level of operation at some specified instant of time KEY WORDS 68 Customer Satisfaction System : Customer s perception of all degrees to which the customers requirements have been fulfilled. : Set of interrelated or interacting elements.

15 Redundancy Reliability : The existence of more than one means of performing a given function. : The ability of an item to perform a required function under stated conditions for a stated period of time. Reliability ANSWERS TO SAQs SAQ 1 (i) 0.72 (ii) (iii) 0.92 (iv) 0.94 (v) 0.56 (vi) (vii) SAQ (a) µ = = t = = = 6.09 (average time to repair) µ 106 Maintainability for 1 hour M (1 hr) = 1 e µ t = 1 e (0.164)(1) = 1 e = = or 15% Maintainability for 2 hours M (2 hr) = 1 e 0.164(2) = or 28% Maintainability for 10 hours M (10 hr) = 1 e 0.164(10) = or 80.6% (b) A i = MTBF MTBF + MTTR Solving for MTTR, MTBF (1 MTTR = A i A ) i 69

16 Quality Tools Others (c) 100 ( ) Therefore, MTTR = = 1.52 hr A o MTBM = MTBM + MDT = = FURTHER READING N. Logothetis (1997), Managing for Total Quality, Prentice Hall of India. Barrie G. Dale (2004), Managing Quality, Blackwell Publishing, USA. D. H. Stamatis (1997), TQM Engineering Handbook, Marcel Dekker Inc, USA. R. A. Fisher (1925), Statistical Methods for Research Workers, Oliver AND Boyd : Edinburgh. R. L. Plackett and J. D. Burman (1946), The Design of Optimum Multifactorial Experiments, Biometrika, 33(3), pp G. Taguchi (1986), Introduction to Quality Engineering: Designing Quality into Products and Process, Asian Productivity Organization, Tokyo. P. Adam and R. Vande Walter (1995), Benchmarking on the Bottom Line :Translating Business Reengineering into Bottom-line Results, Industrial Engineering, Feb., pp

17 Reliability QUALITY TOOLS OTHERS The philosophical elements of Total Quality Management stress the operation of the organisation using quality as the integrating element. In Block 2, you have already learnt that the generic tools consist of various Statistical Process Control (SPC) methods that are used for problem solving and continuous improvement by quality term. In Block 3, you will learn more about Quality Tools. Unit 8 gives a brief introduction to Quality Function Deployment (QFD) that is typically used by managers to drive the voice of the customer into the design specification of a product. Basically, QFD is a set of powerful product development tools that were developed in Japan to integrate manufacture process into the new product development process. Unit 9 deals with Failure Mode and Effects Analysis (FMEA) and Failure Modes, Effects and Criticality Analysis (FMECA). Cause and effect diagrams, also known as fishbone (due to their shape) or Ishikawa charts (named after the person who developed this tool), identify potential causes and help 71

18 Quality Tools Others to direct problem-solving and data-collection efforts towards the most likely causes of observed defects. All these details will be discussed in Unit 10. You will be introduced to the concept of Reliability in Unit 11. Reliability is the probability of a device performing its purpose adequately for the period of time intended under the operating conditions encountered. While on the other hand we may define failure as the inability of an equipment to perform its required function. We will also learn that the reliability is the capability of an equipment not to breakdown in operation, when an equipment works well, and works whenever called upon to do the job for which it was designed. Such equipment is said to be reliable. 72