Through Life Availability Simulation

Size: px
Start display at page:

Download "Through Life Availability Simulation"

Transcription

1 Through Life Availability Simulation Greg van Bavel Department of National Defence Centre for Operational Research and Analysis Presented to the OmegaPS Analyzer UGM 2017 November 9 Her Majesty the Queen in Right of Canada (Department of National Defence), 2017

2 Acknowledgements Graham Brum Pennant, ΩPS Analyzer Subject Matter Expert Dave Evans DND Canadian Surface Combatant PMO, Supportability Engineer Neil Harris Pennant, ΩPS Analyzer Programmer Extraordinaire 1

3 Overview Availability is the measure-of-interest Individual equipment: from systems-of-systems to machines with one LRU Entire fleet: land, air, or sea Availability depends on many physical (observable) attributes, including: LRU properties mean time between failures The configuration of systems: types of redundancies, mission criticality Maintenance facility capacity and multi-echelon structure Management policies for spare parts and inventory Mission types and frequency 2

4 Introduction Through Life Availability Simulation (TLAS) Some of the main features Scalable: from one LRU to an entire fleet Reconfigurable: Multi-indenture Multi-echelon Uses standard GUI elements Simulates activities from the fleet level down to the SRU level Records availability-related parameters across these levels Mission Generation Random time and fixed schedule Stock management LRUs and SRUs 3

5 TLAS Schematic Objects Label Definition LRU Line Replaceable Unit MF Maintenance Facility MFn n th line MF MFn.k n th line MF, number k PE Prime Equipment SRU Shop Replaceable Unit 4

6 TLAS Schematic Events 6. If LRU not discarded send to LOM2 5. Is LRU Repaired at LOM1 4. Uses LRU Stock Item Stock Management 3. LRU Replaced restores PE 2. Prime Equipment waits in queue for LRU R/R 1. Prime Equipment has LRU failure. PE goes down 5

7 The LRU State LRU ages (subject to wear and tear) only when in operating in the PE LRU model uses: Mean Time Between Failure (MTBF) Exponential, Weibull, etc., distributions Random number generator 6

8 TLAS GUI Example: PE as a Multi-Indentured System 7

9 First Line MF Direct vs. Indirect Support Direct Support Replacement Repair Disposal Indirect Support Normal or urgent replenishment of stock All operations are modeled as random processes 8

10 TLAS GUI Example Multi-Echelon Maintenance Facilities 9

11 Second Line MF & Supplier Support to lower echelons Repair Disposal Replenishment Procurement The user can adjust parameters to simulate various second-line maintenance policies 10

12 Time advancement Fixed Increments The time advancement mechanism is fundamental characteristic a discrete event simulation For the simulation of failure and repair of a physical system, the key process to simulate is the intermittent accumulation of wear TLAS uses fixed increments, which repeats the following two-step procedure: 1. Advance the simulation time by a fixed amount; and 2. Update the state of the simulated objects: Determine what events occurred during the elapsed time interval Randomly generate new event times entailed by all the changes of state Record required information about the performance of the simulated objects 11

13 LRU Lifecycle the Core of the Simulation The result of randomly sampling the Exponential Probability Density Function LRU s MTBF is the mean, or expected, value The LRU's time to failure is iteratively reduced as the simulation advances in time, only if the PE is operating For each LRU, input to TLAS includes: identification, MTBF For each LRU, output from TLAS includes: The number of times the LRU failed The number of times the LRU was replaced The number of times a failed LRU was returned to stock; The number of LRU demands not filled (i.e., stock-out); and The number of LRU procurements 12

14 PE Availability TLAS Distinguishes between mission time and calendar time PE-related outputs: Total mission uptime and total mission downtime availability Operating schedule type (fixed or random) PE calendaruptime in each run Calendar downtime is the duration of the simulation minus the calendar uptime 13

15 LRU Key Definitions Procure. To purchase an LRU (or other replaceable item) from an outside source after the LRU is condemned (i.e., selected for disposal) Replenish. To obtain an LRU (or other replaceable item) from the stock of spare parts already held at another location within the PE fleet s In-Service Support (ISS) organisation. LRU Repair. The maintenance activity that restores the LRU itself to serviceable condition such that the LRU may be returned to stock LRU Replacement. The maintenance activity that restores the PE to serviceable condition by replacing a failed LRU with a serviceable LRU 14

16 Systems and Systems-of-Systems (SoS) Allows TLAS to handle the issues of redundancy and criticality For each system or SoS, inputs to TLAS include: The parent PE or system Redundancy specifications, if any Mission criticality For each system or SoS, outputs from TLAS include: The calendar uptime in each run (calendar downtime is the duration of the simulation minus the calendar uptime) The total mission uptime and total mission downtime Yields mission availability The total mission stopped time The total time that an item is up while and the mission is on, but some other mission-critical item is down. 15

17 System of Systems Example System of Systems System X System Y PE1 PE1 PE2 PE2 PE3 LRU1 LRU1 LRU2 LRU2 LRU3 SRU1.1 SRU1.1 SRU2.1 SRU2.1 SRU3.1 SRU1.2 SRU1.2 SRU2.2 SRU2.2 SRU3.2 HOT REDUNDANCY 16

18 MF Key Attributes TLAS allows replenishment between MFs at the same level Sometimes referred to as lateral resupply For each MF, inputs to TLAS include: Days and hours of operation Echelon level For each LRU, data repair-by-replacement data, procurement data, and replacement data For each MF, outputs from TLAS include: The number of hours each line and shop was busy and/or idle The number of times a SHOP ITEM(i.e., a failed LRU) was repaired and returned to stock 17

19 The Mission Generator Key Attributes There can be more than one mission generator in the simulation this allows for independent and distinct categories of missions Inputs include: The different types of missions The mission duration (fixed or variable) The required PE or systems-of-pe, and their operating schedules (fixed or variable) Outputs include: The number of times a mission is generated The number of times a mission is assigned The number of times a mission expires in the Primary Mission (PM) queue The total mission uptime The total mission downtime The total number of critical failures (i.e., entailed mission downtime) 18

20 Algorithm Summary Steps 1 and 2 1. Initialize the Monte Carlo Engine (MCE) Seed the random number generator 2. Initialize performance-related parameters to zero; for every LRU in every PE, initialize: time to failure time to replace calendar & mission uptimes calendar and mission queuetimes mission downtime mission stopped time number of failures number of replacements 19

21 Algorithm Summary Steps 3 to 6 3. Initialize state variables of all TLAS objects E.g., randomly generate time to failure for every LRU 4. Clear initial events that occur in the first time-step TLAS uses fixed time-steps of finite length 5. Advance the simulation by one time-step Compare to the simulation s end time 6. Update object states Resolve the state of all peripheral objects This enables the generation data relevant to the impact of missions, system criticality and/or redundancies, maintenance facilities, and in-service support policies on the life cycle of everylru Resolve the state of all LRUs If an LRU was operating, subtract the operating time from its time to failure 20

22 Algorithm Summary Steps 7 to 9 7. Update cumulative sums, for example: MISSION COMMAND AND CONTROL(MC2) state data: mission uptime vs. downtime System or system-of-systems: mission stopped time A system is stopped because of a failure in another system For any LRU in a service queue during a mission: update the total mission queuetime 8. Update calendar-based timers MISSION COMMAND AND CONTROL(MC2) state data: time to readiness transition MISSION GENERATOR: time in Primary Mission queue 9. Clear pending instant events that occur within the current time-step Example: readiness transitions, remove stale missions from queue 21

23 Algorithm Summary Steps 10 to Some LRU is down If some LRU in the current PE is in replacement & the MF is open, then subtract the time step from the LRU s time to replacement. If some LRU reaches the front of the queue and is replaced, carry out all inventoryrelated functions and randomly generate the replacement s time to failure. 11. PE is running and every LRU is up If the current PE is running, then subtract the time-step from the time to failure of every operating LRU in the current PE 12. Some LRU is in queue and no LRU is in replacement For every MF, if the MF is open & no LRU is in replacement, then for every LRU that is in queue, proceed in maintenance priority (from highest to lowest) to check the spare availability commence replacement of highest priority LRU for which a spare is available 22

24 Algorithm Summary Steps 13 to Clear pending instant events after processing all PE Example: if a LRU s time to replace is less than a time-step, then transition the LRU from a down state to an up state 14. Some SHOP ITEMis in repair If the MF is open & some shop item at the MF s shop is in repair, then subtract the time-step from the shop item s time to repair Perform all inventory updates and actions, including any affected SRU stocks 15. Some SHOP ITEMis in queue and no SHOP ITEMis in repair If the MF is open, proceed in maintenance priority to check the failure diagnosis, until one SHOP ITEM s maintenance progress is in repair or all SHOP ITEMs in queue have been checked 16. Check number of runs are we there yet? 23

25 Algorithm Summary Steps 17 to Initiate LRU replenishment Urgent. Some LRU-in-PE that is in queue has its spare availability transition from unknown to out of stock Normal. The inventory level of some STOCK ITEMis equal to or less than the reorder point Detailed updates to inventory-related states and actions, as well as tracking performance related-indicators 18. Initiate SRU replenishment Similar to LRU replenishment 19. Initiate SHOP ITEM After a failed LRU is removed from a PE, a SHOP ITEMis created and the LRU repair process begins 24

26 Algorithm Summary Steps 20 to Failure diagnosis Given the fraction of failures of each SRU in the LRU, etc, randomly generate the failure diagnosis of a SHOP ITEM 21. SHOP ITEM maintenance priority Adaptable: items that have suffered one or more stock outs always have higher priority than items that have never stocked out 22. Propagate the effects of top-level mission code transitions Depends on the mission code transition of the top-level system or SoS Mission requirements and coldstandbyredundancies affect which descendants (i.e., indentured items/systems) are activated when the mission code transitions from off to on 25

27 Algorithm Summary Steps 23 to Propagate the effects of upness transitions The issues of mission criticality and redundancy have a direct impact on how the effects of upness transitions are propagated 24. Propagate the effects of a PE operating code transition to running in a cold standby redundancy If the number of running items is at least the minimum up, then for all of the PE s ancestors (i.e., parents, grandparents, et cetera), set the upnessto up Stop after changing the upnessof the final-up item, which is either the highest-level item or the item that has a mission critical indicator no 25. Update MISSION GENERATOR state data Assign pending missions and clear expired missions in the mission queue Create new missions, either by a fixed schedule or a random time to next mission 26

28 Algorithm Summary Steps 26 to Readiness update to MISSION COMMAND AND CONTROL(MC2) data Determine the next readiness value from the user-specified readiness schedule and the current simulation time 27. Mission-phase update to MC2 data Affects a top-level object s MISSION COMMAND AND CONTROL(MC2) state. Depends upon the MC2 platform name of the top-level object whose mission-phase changed, the mission name, and the new mission-phase 27

29 Through Life Availability Simulation Wrap Up It s all about Availability Availability depends on many physical (observable) attributes TLAS is: Scalable Reconfigurable Multi-indenture Multi-echelon GUI-based TLAS does: Simulations from the fleet level down to the SRU level Data collection and presentation Mission generation Stock management 28

30 Questions? 29

31

32 Supplementary Slides 31

33 Dynamical State Elements: LRU and STOCK ITEM 32

34 Dynamical State Elements: SHOP ITEM 33

35 Dynamical State Elements: SYSTEM 34

36 Dynamical State Elements: PE & MISSION COMMAND AND CONTROL 35

37 Dynamical State Elements: MISSION GENERATOR& MISSION 36

38 Dynamical State Elements: PLATFORM REQUIREMENTS, PERFORMANCE RECORDS& MISSION PHASE 37