ENG-301 Reliability and Maintainability Engineering Overview

ENG-301 Reliability and Maintainability Engineering Overview This document supplements the material in ENG-301 Modules 12 and 13, and it is essential that you become familiar with its contents prior to coverage of these modules in class. As you go through this read-ahead, consider how you might apply them to your system. This read-ahead contains material for five topics: 1. Reliability and Maintainability (R&M) Requirements. What are the essentials that you need to consider as an engineering leader regarding R&M? Per the Defense Acquisition Guide (DAG) 4.3.18.19, R&M is a design consideration. R&M comprises a mandatory key performance parameter (KPP), key system attributes (KSAs), and additional performance attributes (APAs) that you should address; so, it is important that you understand these requirements and their associated terminology. 2. R&M Policy and Planning. DoD Instruction 5000.02 discusses some specifics regarding R&M for one to address. DAG Table 4.3.18.19.T1 delineates R&M activities by phase. These resources can help you put together a comprehensive R&M program for your system. 3. Designing for R&M. What are the factors you should consider when designing for reliability and maintainability? How do you measure progress? Reliability growth is a required measurement. What does that involve? 4. R&M Verification. There are many methods to verify that your system is meeting its R&M requirements. Their use is dependent on the specifics of your system (or component) and the current acquisition phase. What methods of verification apply to your system (or components)? 5. System Supportability. System supportability and system R&M are inextricably linked. How will you ensure system supportability? What R&M factors do you need to consider when putting together a Life-cycle Support Plan (LCSP)? How does a life cycle sustainment strategy influence design for R&M? 1.0 R&M Requirements In addition to the following paragraphs, Table 1-1 provides definitions to some R&M parameters arranged by discipline. 1.1 SUSTAINMENT KPP System sustainment KPP requirements are established via the Joint Capabilities Integration and Development System (JCIDS) mandatory KPP and KSAs. The Reliability, Availability, Maintainability, and Cost (RAM-C) Rationale Report process derives these parameters/attributes: Availability (KPP) Reliability (KSA) Operations & Support Cost (KSA) Additional Performance Attributes (APAs) 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 1

RAM-C Report The RAM-C Manual describes an approach that the program manager and combat developer use to develop valid sustainment requirements based on stated user needs. The RAM-C Report documents the sustainment requirements along with underlying assumptions and analyses that support the requirements. The report provides a quantitative basis for reliability requirements, and improves cost estimates and program planning. The RAM-C Report is developed jointly by the acquisition program manager and combat developer (requirements officer). This report will be attached to the SEP at Milestone A, and updated in support of the Development RFP Release Decision Point, Milestone B, and Milestone C. 1.1.1 Availability KPP Materiel Availability Measure of the percentage of the total inventory of a system operationally capable, based on materiel condition, of performing an assigned mission (enterprise perspective). Number of operationally available end items/total population. Includes systems for training, attrition reserve, depot maintenance, etc. Covers the timeframe from placement into operational service through the planned end of service life. Takes into account all calendar time that a system is in the inventory, including out-ofreporting status. Operational Availability Measure of the percentage of time that a system or group of systems within a unit are operationally capable of performing an assigned mission (unit perspective) and can be expressed as uptime divided by the sum of uptime and downtime. Determining the optimum value for Operational Availability (A O ) requires a comprehensive analysis of the system and its planned CONOPS, including the planned operating environment, operating tempo, reliability and maintenance concepts, and supply chain solutions. Materiel Availability (A M ) may be equivalent to A O, if the total number of a system or group of systems within a unit is the same as the total inventory. 1.1.2 Reliability KSA Measures the probability that the system will perform without failure over a specified interval under specified conditions. More than one reliability metric may be specified as KSAs and APAs for a system, as appropriate. Reliability parameters fall under two categories: Mission Reliability. The measure of the ability of an item to perform its required function for the duration of a specified mission profile, defined as the probability that the system will not fail to complete the mission, considering all possible redundant modes of operation. Logistics Reliability. The measure of the ability of an item to operate without placing a demand on the logistics support structure for repair or adjustment, including all failures 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 2

to the system and maintenance demand because of system operations. 1.1.3 Operations and Support (O&S) Cost KSA Provides balance to the sustainment solution by ensuring that the total O&S costs across the projected life cycle associated with availability and reliability (e.g., maintenance, spares, fuel, support.) are considered in making program decisions. The O&S cost should cover the planned O&S timeframe, consistent with the timeframe and system population identified in the Sustainment KPP. 1.1.4 Maintainability Attributes Maintainability Measure of the ability of the system to be brought to a state of normal function or utility. Maintainability includes supportability attributes, such as diagnostics capabilities. The following attributes may be considered either KSAs or APAs: Preventive Maintenance. Actions intended to prolong the operational life of the equipment and keep the product safe to operate. Corrective Maintenance. All actions performed because of any failure, to restore a system, subsystem, or component to a required condition. Mission Maintainability. The ability of the system to be retained in or restored to a specified mission condition. Maintenance Burden. A measure of maintainability related to the system s demand for maintenance labor. Built-In Test (BIT) Fault Detection. A measure of recorded BIT indications that lead to confirmed hardware failures. BIT Fault Isolation. A measure of recorded BIT indications that correctly identify the faulty replaceable unit, either directly or through prescribed maintenance procedures. BIT False Alarms. A measure of recorded BIT indications showing a failure when none has occurred. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 3

Table 1-1. Some Commonly Used R&M Parameters 2.0 R&M Policy and Planning 2.1 R&M PER DOD INSTRUCTION 5000.02 Program Managers shall formulate a comprehensive R&M program. RAM engineering activities, such as reliability block diagram analysis and Failure Modes, Effects, and Criticality Analysis (FMECA), must be included in the program s SEP. Attach the RAM-C Report to SEP at MS A and update it at Development RFP Release Decision Point, MS B, and MS C. The Acquisition Strategy (AS) must specify how system sustainment requirements are translated into R&M specifications. Reliability Growth (RG) curves shall be included in the SEP at MS A, and updated in the TEMP beginning at MS B. Monitor and report RG throughout the acquisition process. 2.2 R&M IN THE ACQUISITION STRATEGY Specify how testing and systems engineering requirements, including life-cycle management and sustainability requirements, have been incorporated into contract requirements. Identify the engineering activities to be included in the RFP, and required of the contractor to demonstrate the achievement of R&M design requirements. Provide a table to specify how the sustainment key performance parameter thresholds have been translated into R&M design and contract specifications. Table 2-1 illustrates the cross-walk of R&M requirements from the Acquisition Strategy to the System Performance Specification (SPS) and Statement of Work (SOW) for the Next Generation Jammer (XYZ PROGRAM) program. Table 2-2 provides R&M tailoring guidance. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 4

Table 2-1. Acquisition Strategy R&M Requirements An example from the XYZ program, showing how the R&M requirements in the Acquisition Strategy cross-walk to the contract Parameter Threshold Contractual Requirements Reliability 500 hr MTBF SPS 3.7.1; SOW 3.a.(2) Maintainability 2.7 hr MTTR SPS 3.7.2; SOW 3.b.(2) Table 2-2. R&M Tailoring Guidance Depending on your program phase and equipment type, use this guidance to tailor the specific R&M requirements for your program Notes: 1) Excludes parts count or stress analysis prediction, analysis generally limited to equipment end item 2) Maintainability analysis generally limited to equipment end item 3) Applicable to the interfaces of COTS/NDI equipment 4) Applicable to the modified portions and interfaces 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 5

2.3 R&M IN THE SEP The expectation is that program staff understands the content of R&M artifacts should be consistent with the level of design knowledge that comprise each technical baseline. SEP must describe planning and timing to generate R&M engineering artifacts: R&M Allocations. R&M requirements assigned to individual items to attain desired system level performance. Preliminary allocations are expected by SFR with final allocations completed by PDR. R&M Block Diagrams. The R&M block diagrams and math models prepared to reflect the equipment/system configuration. Preliminary block diagrams are expected by SFR with the final completed by PDR. R&M Predictions. The R&M predictions provide an evaluation of the proposed design or for comparison of alternative designs. Preliminary predictions are expected by PDR with the final by CDR. Failure Definition/Scoring Criteria (FD/SC). These criteria provide the basis for assessments of R&M contract requirements and compliance with operational requirements. FMECA. Analyses performed to assess the severity of the effects of component/subsystem failures on system performance. Preliminary analyses are expected by PDR with the final by CDR. Maintainability and BIT. Assessment of the quantitative and qualitative maintainability and BIT characteristics of the design. RG Testing at the System and Subsystem Levels. Reliability testing of development systems to identify failure modes, which if uncorrected could cause the equipment to exhibit unacceptable levels of reliability performance during operational usage. Failure Reporting, Analysis, and Corrective Action System (FRACAS). Engineering activity during development, production, and sustainment to provide management visibility and control for R&M improvement of hardware and associated software by timely and disciplined utilization of failure data to generate and implement effective corrective actions to prevent failure recurrence. Table 2-3 is an example of SEP R&M activities for the XYZ program during Technology Development (TD), now called Technology Maturation and Risk Reduction, Phase and later phases/milestones. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 6

Table 2-3. Example of SEP Summary of R&M Activities This example is excerpted from XYZ PROGRAM program and shows the timing of various R&M tasks by XYZ program phases R&M Engineering Activity R&M Allocations R&M Predictions FD/SC FMECA Maintainability and BIT Demonstrations RG Testing at the System and Subsystem Levels FRACAS Planning and Timing Will be made during TD Phase and presented at PDR. Have been made for preliminary conceptual configurations by XYZ program primes and validated by NAVAIR R&M engineers; will be refined during TD Phase and presented at PDR. These criteria for the reliability qualification test that verifies the MTBF requirement will be included in the EMD SOW. Will be included in EMD SOW and performed by XYZ program primes. FMECA work will be accomplished in time to influence system design. Maintainability demonstration will be performed by the Government during EMD on the aircraft with support from the XYZ program primes. BIT demonstration will be performed by the Government during EMD with support from the XYZ program primes in a lab environment initially and on the aircraft during DT/OT. RG tests will be completed post-cdr, during EMD Phase of development. The RAM team, combined with the XYZ program primes team, will develop and maintain a FRACAS to document the results of all design and test efforts, from initial laboratory development and system-level testing through flight test. All organizational-level maintenance data, reliability and health maintenance system statistics will be used to support this database and form the basis for corrective actions. Data collected will include, but will not be limited to: mean time between operational mission failure, mean flight hours between failures, MTTR, direct maintenance man-hours per flight hour, BIT fault detection rate, BIT fault isolation rate, mean flight hours between false alarm rate, and A O. 3.0 Designing for R&M 3.1PHYSICS OF FAILURE Physics of Failure (PoF) is a science-based approach to reliability that uses modeling and simulation to design-in reliability. It helps to understand system performance and reduce decision risk during design and after the equipment is fielded. This approach models the root causes of failure such as fatigue, fracture, wear, and corrosion. Computer-Aided Design (CAD) tools have been developed to address various failure mechanisms and sites. An example of a failure mechanism is the fatigue cracking of electronic solder joints. PoF saves time, money, and improves reliability. PoF is useful for both future and fielded systems. Performing PoF early in the design cycle allows dominant failure mechanisms to be identified. As a result, failures may be eliminated through redesign prior to testing. Applying PoF early in the design process can significantly reduce testing and provide a high return on investment. Additionally, PoF analyses help find life limiting failures in fielded products 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 7

and determine the root cause of failures. Often, simple, inexpensive design changes can be found to significantly decrease O&S costs (source: AMSAA). 3.2 DESIGN FOR RELIABILITY Design for Reliability (DfR) is a process specifically geared toward achieving high long-term reliability. Effective integration of a variety of tools and methods over the product life cycle are used to accomplish this objective. Figure 3-1 illustrates the DfR process. Figure 3-2 shows two strategies for implementing DfR, namely, design strength vs. usage/environmental stress and architecture patterns. Figure 3-1. Example Design for Reliability Process This graphic shows how various tools and methods are integrated over the product life cycle to implement DfR (Source: http://www.reliasoft.com/newsletter/v8i2/reliability.htm; accessed 2014-09-08). A typical design lifecycle begins with definition of the initial requirements, the operational and environmental loads on the system, assemblies, subassemblies, and components. The initially proposed system design is laid out via block diagramming. This leads to system reliability model creation to investigate the interconnectivity of assemblies and components in turn allowing for the examination of cause and effect relationships inherent in complex multi-level systems. The utilization of block diagramming also helps in the determination of various failures points within the design. Examination of these failure points and relationships through top-down Fault Tree Analysis provides a system level view of potential loss of functionality. In addition, block diagramming facilitates component level failure mode analysis of system reliability using a Failure Mode and Effect Criticality Analysis or Failure Mode and Effect Analysis approach. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 8

Early in the design processes, Highly Accelerated Life Testing (HALT) is utilized to expose early prototypes and existing components to the full range of expected operating conditions, within a controlled environment. Any deficiencies identified during HALT testing are inspected using a Physics of Failure (PoF) approach or are addressed directly in the refinement of the conceptual design. At this phase, PoF Computer Aided Design (CAD) practices including dynamic modeling and simulation, finite element stress and heat transfer analysis, and component fatigue analysis toolsets are utilized to predict failure mechanisms and conduct reliability assessments on the proposed design and any subsequent design revisions. As the iterative design process progresses, early prototype quality testing is employed to validate design changes and assumptions as well as the results derived from HALT and PoF analysis. Using the iterative DFR process provides benefits in reduction of early-on physical testing and traditional test-fix-test cycles, while ensuring that the reliability level of the Preliminary Design Review (PDR) design candidate is equal to or exceeds the minimum level identified by reliability growth modeling. Estimation of the design candidate s initial reliability can be done through a combination of modeling and simulation along with lower level testing. Milestone B requirements are typically met at this point, and the design process moves to the complete system prototype phase. Post Milestone B, complete system prototypes experience exhaustive testing to capture both hardware and software reliability metrics. Reliability growth testing is conducted in parallel with HALT, Accelerated Life Testing, and Environmental Testing to provide engineering confirmation and feedback data for mathematical modeling. Information captured from previous PoF and HALT analysis is leveraged during test to ensure that any areas of concern are properly instrumented and tracked. Training strategies are also investigated for comprehension and effectiveness. Corrective actions are identified to mitigate the reliability deficiencies that arise during the test phase. These actions are typically addressed via engineering redesign of mechanical components, software recoding, or adjustments to training practices. In the case of engineering redesign, PoF mechanisms assist in root cause analysis and provide insight for prototype design revision. The PoF toolset is the same as that utilized pre-milestone B and application again aids in the reduction of test-fix-test cycling. Accelerated tests can also be used at this point to quickly verify corrective actions. The subsequent reduction in time between failure and robust redesign is a large benefit of the enhanced iterative design process. As design testing proceeds and interim reliability goals are demonstrated though test results, the prototype design moves towards Low Rate Initial Production (LRIP) level maturity. As LRIP begins, Highly Accelerated Stress Screening (HASS) is implemented to ensure production line reliability. LRIP assets enter Operational Test and Evaluation (OT&E) for verification that final designs meet operational reliability requirements. Engineering rework, software recoding, and training practice corrective actions are identified for any failure modes that are identified through HASS or operational testing. PoF and HALT techniques are employed to expedite the time between any potential failures and corrective actions. They also help to reduce the length and complexity of any necessary follow-on test and evaluation. This reduces time between LRIP production and a move to full rate production and fielding. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 9

Figure 3-2. DfR Strategies The top portion compares and adjusts strength, which is a property of component design, versus stress, which is a property of the environment and usage. The bottom portion shows examples of reliability impacts due to different architecture patterns. 3.2 RELIABILITY GROWTH (RG) RG is the positive improvement in a reliability parameter over time due to implementation of corrective actions (fixes) to system design, operation or maintenance procedures, or the associated manufacturing process. RG of a complex system involves surfacing/identifying and analyzing failure modes and implementing corrective actions. Reliability growth is possible at any point in the system life cycle. Changes accomplished early in the life cycle cost less and affect reliability more significantly, but the information upon which early changes are based tends to contain many unknown factors. Design changes made later in the life cycle tend to be better applied as there are fewer unknowns in the information. Programs should not rely only on testing but should use a number of information sources to 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 10

grow reliability. Figure 3-3 shows an idealized RG curve. Figure 3-4 is the sample of a mandated RG curve that must be included in the SEP at MS A and updated at each successive milestone. Figure 3-3. Idealized RG Curve The MTBF is a function of test time, starting with initial MTBF. Design margin exists between the required MTBF and goal MTBF and varies, depending on acceptable risk and demonstration cost. A set of operating characteristic curves will be shown and discussed later in this document. For RG, the SEP expectation is that program staff should understand the amount of testing, test schedule, and resources available for achieving the specification requirement. Program staff should consider the following: Develop the growth planning curve as a function of appropriate life units (hours, cycles, etc.) to grow to the specification value Understand how the starting point that represents the initial value of system reliability was determined Know how the rate of growth was determined; rigorous test programs that foster the discovery of failures, coupled with management-supported analysis and timely corrective action, will result in a faster growth rate; the rate of growth should be tied to realistic management metrics governing the fraction of initial failure rate to be addressed by corrective actions along with the effectiveness of the corrective action Describe the growth tracking and projection methodology that will be used to monitor RG during system-level test (e.g., AMSAA-Crow Extended). 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 11

Figure 3-4. Sample of Mandated SEP RG Curve Every SEP, regardless of ACAT status, must include an RG curve beginning at MS A. This is a sample curve from the SEP Outline. 3.3 DESIGN FOR MAINTAINABILITY Methods for implementing Design for Maintainability include the following: Select standardized components; minimize the number of components Incorporate built in test, diagnostics, and prognostics Design so that items can be repaired using a minimum number of common/standard tools Ensure installed items are accessible (that is, enabled for quick removal and replacement); items that are expected to need the most maintenance should be most accessible Use a modular, functional partitioning approach; minimize interdependency of components Avoid use of short-life components and components that require frequent maintenance Incorporate the proper amount of labeling and identification of components Many methods that improve maintainability also improve producibility, and vice versa. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 12

4.0 R&M Verification It is important that test planning be reflective of the operation mode summary/mission profile for the system. 4.1 MAINTAINABILITY DEMONSTRATION A Maintainability Demonstration (M-Demo) is a joint contractor and procuring activity effort to determine whether specific maintainability contractual requirements have been achieved (formal proof). The M-Demo objective is to verify by demonstration the actual maintainability characteristics of a system, against the maintainability requirements or objectives. Appendix B of MIL-HDBK-470A provides a detailed approach for conducting an M-Demo. Some considerations: Test cases are based on analysis of expected maintenance actions (frequency, severity) Perform maintenance action and measure time required for each step Assess ease of maintenance, maintenance procedures, manuals, tools, etc. Results drive system design improvements Some organizations perform M-Demo on all maintenance actions (preventive, corrective) prior to fielding M-Demos sometimes are conducted on prototypes M-Demos sometimes are conducted virtually. 4.2 BUILT-IN TEST DEMONSTRATION A Built-In Test Demonstration (BIT-Demo) is a verification via demonstration of system testability. Three capability parameters that are usually tested in a BIT-Demo are: 1. Fault detection 2. Fault isolation 3. False alarm rate. Fault detection and isolation parameters are demonstrated using induced faults, while false alarm demonstrations are based on naturally occurring events. BIT maturation is a continuous process of eliminating false alarms and improving fault detection and isolation as the system matures. 4.3 RG & DEMONSTRATION TESTING RG testing is performed to assess current reliability, identify and eliminate faults, and forecast future reliability. Actual improvements in reliability are achieved due to corrective action configuration changes made during the test or in between test phases Reliability demonstration is employed toward the end of the growth testing period or during OT to verify that a specific reliability level has been achieved. During a demonstration test, the configuration is set and frozen, just as it would be in field use. Figure 4-1 shows a set of corrective actions being applied at the end of each test phase with resultant increases in MTBF. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 13

Figure 4-1. Generic RG Curve The MTBF grows due to corrective actions being applied at the end of each test phase. Figure 4-2 shows a set of OC curves being used to determine sufficient test durations and goal MTBF for demonstrating the reliability requirement for an unnamed system. The independent demonstration test or operational test of the fixed configuration, production representative system should be scoped to provide reasonable levels of consumer s risk (accepting an unreliable unit) and producer s risk (rejecting a reliable unit). Consider for example in Figure 4-2, if the RG goal was to achieve twice the requirement, then a test duration of 10 times the requirement would provide a high probability (87 percent) of the system successfully demonstrating the requirement in an operational test. However, if the system was only designed to achieve 1.5 times the requirement, a test duration of 20 times the requirement would provide a comparable level of risk. Resource requirements, including test articles and expendables, should reflect conducting all reliability test and evaluation activities within allowable test risks. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 14

Figure 4-2. OC Curve Analysis OC curve analysis is used to determine the goal reliability needed for a given set of test constraints (test duration and number of allowable failures) to demonstrate the required reliability with a specified level of confidence. 4.4 ACCELERATED LIFE TESTING Accelerated life testing (ALT) consist of quantitative tests designed to quantify the life characteristics of a component or system. ALT can be accomplished via: Usage acceleration. Data collected is analyzed using the same methods used to analyze regular failure data. Overstress acceleration. Data collected is used to extrapolate from accelerated conditions to normal use conditions. As shown in Figure 4-3, one should choose ALT overstress levels to accelerate failure modes under consideration without introducing failure modes that would never occur under normal use conditions. Analysis of ALT data consists of selecting an underlying life distribution, or probability density function (PDF), combined with a stress-life relationship, and using test data for parameter estimation. The PDF is stated as a function of time and stress level (temperature, voltage, mechanical stress, etc.). From the PDF, one can formulate all other metrics of interest. Figure 4-4 illustrates a sample ALT analysis. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 15

Stress 4.5 ENVIRONMENTAL STRESS SCREENING Environmental Stress Screening (ESS) is a process or series of processes in which environmental stimuli are applied to electronic items in order to reveal latent defects. ESS exposes newly manufactured products to environmental stresses in order to identify and eliminate latent defects introduced during the manufacturing process. It is part of the manufacturing process and is therefore performed on 100% of the items manufactured. ESS is not a simulation of the product s mission environment (thus, is not a substitute for environmental qualification testing) and has no relationship to the end use of the product. ESS is designed to apply appropriate stimulation (thermal, vibration, etc.) of sufficient magnitude to cause defective parts and workmanship errors to precipitate. Applied stimulation should not approach the mechanical, electrical, or thermal stress limits of any component to avoid accelerating the fatigue and/or causing damage. Each screening profile must be tailored for each item undergoing ESS. Destruct Limits Design Limits Specification Limits Design Limits Destruct Limits Figure 4-3. ALT Overstress Levels ALT overstress levels (shaded pink in the figure) should be chosen to fall outside system specification limits but inside destruct limits. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 16

Figure 4-4. Sample ALT Analysis The PDF is stated as a function of time and stress level (temperature here) for an unnamed system; test data was used for parameter estimation. 4.6 HIGHLY ACCELERATED LIFE TESTING Highly Accelerated Life Testing (HALT) is a method for discovering and then improving weak links in the product in the design phase. Performed on a small number of test articles, HALT exposes an item to stresses beyond what it would normally see in field use. By highly accelerating the testing, HALT compresses the test time required and quickly reveals weaknesses that would cause field failures. A series of step-stress approaches are used to determine and expand the operating and destruct limits of the system or product: Temperature Step Stress Test Thermal Transition Test Vibration Step Stress Test Combined Environment Stress Test Once a failure has been detected, root cause analysis (RCA) and corrective action design/implementation become an integral part of the HALT process. HALT is not designed to yield data that can be used to quantify the life (reliability) characteristics of the product under normal use conditions. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 17

Figure 4-5. Typical HALT Profile HALT uses the two most common testing stimuli, temperature and vibration. HALT precipitates failures faster than traditional testing approaches by applying stresses beyond the expected field environment for short durations. (Source: https://www.youtube.com/watch?v=3z5ta-dxit0, accessed 1/23/2012) 4.7 HIGHLY ACCELERATED STRESS SCREENING Manufacturing variations and vendor changes can result in reliability problems, whether in a high- dollar, low-volume product, or one to be used in critical applications where failure can be very expensive or dangerous. HASS is a quality control activity used to maintain reliability during the production/manufacturing process. HASS attempts to accelerate the removal of infant mortality failures. Stress profiles for HASS are derived from HALT results; thus, HASS generally is not possible unless a comprehensive HALT has been performed. The typical HASS analysis is a closed loop six-step process consisting of at least precipitation, detection, failure analysis, corrective action, corrective action verification, and database maintenance. Figure 4-5 shows a sample HALT/HASS chamber. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 18

Figure 4-5. Sample HALT/HASS Chamber Accelerated testing is achieved by inducing stress on the item under test via various methods, including vibration and ramping temperatures ( shake and bake ). HASS can take the place of ESS. The limiting factor for the HASS testing is the fact that a HALT MUST be performed first, to determine the destruction limits of the product and a liberal margin between destruction limits and the HASS stress levels. ESS always uses lower stress levels that are within the product operating limits, so ESS testing does not have this overstress potential. Of course, this means that the test for ESS will tend to be longer. 4.8 ENVIRONMENTAL TESTING MIL-STD-810G Testing MIL-STD-810G provides environmental lab test methods and test tailoring guidelines that simulate a system s life cycle environmental exposure to a broad range of environmental conditions. Tailored test methods should be included in a system s performance specification. Test methods include 501.6 (high temperature) and 507.6 (humidity). Electromagnetic Environmental Effects Testing DoD Instruction 3222.03 controls the Electromagnetic Environmental Effects (E3) program via policy and instructions to ensure mutual electromagnetic compatibility and effective E3 control among ground-, air-, maritime, and space-based platforms, electronic and electrical systems, subsystems, and equipment, and with the existing natural and man-made electromagnetic environment. E3 includes the mitigation of hazards of electromagnetic radiation to ordnance, personnel, and fuel before conducting military exercises, operations, and activities. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 19

4.9 PRODUCTION RELIABILITY ACCEPTANCE TESTING Production Reliability Acceptance Testing (PRAT) is performed to ensure that the reliability of hardware is not degraded as the result of changes in tooling, processes, workflow, design, parts quality, or any other variables affecting production. PRAT involves the testing of a sample of items drawn from a production batch or lot. The results obtained from sample testing allow informed decisions to be made regarding the reliability of the entire population from which the sample was drawn. Equipment tested during PRAT must be representative of the population and the environmental conditions under which items are tested should be as close as possible to the inservice environment agreed within the contract. The items to be tested should have been subjected to all standard production processes and tests; for example, each item should have been subjected to production-standard ESS and acceptance testing. 4.10 FRACAS FRACAS was discussed briefly at section 2.3. Section 4.10 continues that discussion. To amplify the previous discussion, FRACAS is a closed-loop process applied throughout the product life cycle to provide a systematic way to report, organize and analyze failure data. MIL-HDBK-2155 provides guidance on implementing FRACAS. Figure 4-6 shows the FRACAS process that covers: Failure Reporting. Capture failures and faults during testing and operation of the system Failure Analysis. Determine the root cause of each reported failure Design Modification. Document details of corrective actions. Failure Verification. Can the failure be reconstructed/repeated? Figure 4-6. FRACAS Process This closed-loop process is applied throughout the product life cycle to provide a systematic way to report, organize and analyze all failure data. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 20

5.0 System Supportability 5.1 SUPPORTABILITY ANALYSIS DEFINITIONS Supportability analysis includes various integrated analytical techniques for designing and developing an effective and efficient logistics/supportability approach. As with all abilities (reliability, maintainability, availability, etc.), supportability is inherent to the system design. In addition, external aspects comprise, support, and constrain a logistics infrastructure. Supportability analysis is applied in conjunction with R&M engineering analyses (FMECA, R&M allocation, Fault Tree Analysis, etc.) to support decisions of product support, and includes: Maintenance Task Analysis. Process for assessing the maintainability characteristics of a system configuration, and for evaluating a system configuration for resources required for sustaining maintenance and support to include quantity and skill level of maintenance personnel, spare parts, tools, test equipment, facilities, computer resources, packaging & handling, and technical data. Level of Repair Analysis. Analytical methodology used to determine when it is more cost effective for each item to be repaired, replaced, or discarded based on operational readiness requirements. If an item is economically reparable, this analysis determines the levels of repair (organizational, intermediate, depot). Reliability Centered Maintenance Analysis. Systematic approach to developing an effective and cost-efficient preventive maintenance program for a system. Condition Based Maintenance Analysis. Approach to developing a preventive maintenance system where maintenance is performed after one or more indicators show that equipment is going to fail or that equipment performance is deteriorating, rather than a certain number of elapsed miles, hours, or cycles. For hardware, system health monitoring and management is accomplished using embedded sensors 5.2 R&M AND SUPPORTABILITY R&M engineering facilitates supportability analysis process and development of the logistics product database. This database provides the foundation for designing the support for a system in the form of 12 product support elements. As Figure 5-1 shows, it is essential that there be close coordination and collaboration between the engineering and logistics functional areas. One of the keys for achieving system sustainment requirements is the development of effective support processes, which stem from effective, thorough R&M engineering. DoD has adopted GEIA-STD-0007 as the standard for defining how logistics data is structured for Defense systems. The SEP should account for R&M and supportability analysis activities and be consistent and complimentary to the Life Cycle Sustainment Plan. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 21

Figure 5-1. Supportability Framework R&M engineering and logistics/supportability should be integrated and complementary throughout the life cycle to design for support, design the support, and support the design. 23 January 2015 ENG-301 Leadership in Engineering Defense Systems Page 22