CS626 Data Analysis and Simulation

Similar documents
OPERATING SYSTEMS. Systems and Models. CS 3502 Spring Chapter 03

Introduction to Computer Simulation

Statistics, Data Analysis, and Decision Modeling

Determining the Effectiveness of Specialized Bank Tellers

Queuing Theory 1.1 Introduction

CH-1. A simulation: is the imitation of the operation of a real-world process WHEN SIMULATION IS THE APPROPRIATE TOOL:

Decision Support for Storage and Shipping Using Discrete Event Simulation Part 2. Javier Vazquez-Esparragoza and Jason Chen.

Dynamic Scheduling and Maintenance of a Deteriorating Server

Assume only one reference librarian is working and M/M/1 Queuing model is used for Questions 1 to 7.

Chapter 1 INTRODUCTION TO SIMULATION

AMERICAN SOCIETY FOR QUALITY CERTIFIED RELIABILITY ENGINEER (CRE) BODY OF KNOWLEDGE

Dynamic Scheduling and Maintenance of a Deteriorating Server

Understanding and Managing Uncertainty in Schedules

COMPUTATIONAL ANALYSIS OF A MULTI-SERVER BULK ARRIVAL WITH TWO MODES SERVER BREAKDOWN

Simulation Using. ProModel. Dr. Charles Harrell. Professor, Brigham Young University, Provo, Utah. Dr. Biman K. Ghosh, Project Leader

FUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT

FUNDAMENTALS OF QUALITY CONTROL AND IMPROVEMENT. Fourth Edition. AMITAVA MITRA Auburn University College of Business Auburn, Alabama.

Chapter 8 Variability and Waiting Time Problems

PERFORMANCE EVALUATION OF DEPENDENT TWO-STAGE SERVICES

An-Najah National University Faculty of Engineering Industrial Engineering Department. System Dynamics. Instructor: Eng.

G54SIM (Spring 2016)

Justifying Simulation. Why use simulation? Accurate Depiction of Reality. Insightful system evaluations

Simulation Analytics

Simulation Software. Chapter 3. Based on the slides provided with the textbook. Jiang Li, Ph.D. Department of Computer Science

Midterm for CpE/EE/PEP 345 Modeling and Simulation Stevens Institute of Technology Fall 2003

SEES 503 SUSTAINABLE WATER RESOURCES. Floods. Instructor. Assist. Prof. Dr. Bertuğ Akıntuğ

OPERATIONS RESEARCH Code: MB0048. Section-A

Slides 2: Simulation Examples

Cycle Time Forecasting. Fast #NoEstimate Forecasting

Program Evaluation and Review Technique (PERT)

Optimizing the supply chain configuration with supply disruptions

Chapter 14. Simulation Modeling. Learning Objectives. After completing this chapter, students will be able to:

Introduction to Transportation Systems

Chapter 2 Simulation Examples. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Operations and Supply Chain Management Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

CPU scheduling. CPU Scheduling

An Approach to Predicting Passenger Operation Performance from Commuter System Performance

Project Time Management

Inventory Management 101 Basic Principles SmartOps Corporation. All rights reserved Copyright 2005 TeknOkret Services. All Rights Reserved.

Simulation and Modeling - Introduction

Chapter III TRANSPORTATION SYSTEM. Tewodros N.

Introduction - Simulation. Simulation of industrial processes and logistical systems - MION40

Chapter 7 Entity Transfer and Steady-State Statistical Analysis

A DISCRETE-EVENT SIMULATION MODEL FOR A CONTINUOUS REVIEW PERISHABLE INVENTORY SYSTEM

LOAD SHARING IN HETEROGENEOUS DISTRIBUTED SYSTEMS

PRODUCTION ACTIVITY CONTROL (PAC)

Bottleneck Detection of Manufacturing Systems Using Data Driven Method

Queuing Models. Queue. System

INDIAN INSTITUTE OF MATERIALS MANAGEMENT Post Graduate Diploma in Materials Management PAPER 18 C OPERATIONS RESEARCH.

COMP9334 Capacity Planning for Computer Systems and Networks

9. Verification, Validation, Testing

Waiting Line Models. 4EK601 Operations Research. Jan Fábry, Veronika Skočdopolová

Line Balancing in the Hard Disk Drive Process Using Simulation Techniques

Line Balancing in the Hard Disk Drive Process Using Simulation Techniques

Analysis of Process Models: Introduction, state space analysis and simulation in CPN Tools. prof.dr.ir. Wil van der Aalst

Reliability Engineering

- Title: Value of Upstream Information in a Two-Stage Supply Chain with Random

Available online at ScienceDirect. Procedia Engineering 100 (2015 ) Demand Modeling with Overlapping Time Periods

Techniques and Simulation Models in Risk Management

INDUSTRIAL ENGINEERING

MAS187/AEF258. University of Newcastle upon Tyne

PERFORMANCE MODELING OF AUTOMATED MANUFACTURING SYSTEMS

Pricing in Dynamic Advance Reservation Games

SINGLE MACHINE SEQUENCING. ISE480 Sequencing and Scheduling Fall semestre

D.K.M.COLLEGE FOR WOMEN (AUTONOMOUS), VELLORE-1. OPERATIONS RESEARCH

A Modeling Tool to Minimize the Expected Waiting Time of Call Center s Customers with Optimized Utilization of Resources

Logistic and production Models

Mathematical approach to the analysis of waiting lines

Appendix A. Modeling Random Variation in Business Systems. A.1 Introduction

Appendix of Sequential Search with Refinement: Model and Application with Click-stream Data

Code Compulsory Module Credits Continuous Assignment

Managing Supply Uncertainty in the Poultry Supply Chain

PULL REPLENISHMENT PERFORMANCE AS A FUNCTION OF DEMAND RATES AND SETUP TIMES UNDER OPTIMAL SETTINGS. Silvanus T. Enns

Software Reliability

USING GENETIC ALGORITHMS WITH SIMULATION IN VALUE BASED ROUTING IN CONTACT CENTERS

Examining and Modeling Customer Service Centers with Impatient Customers

Chapter 2 Simulation Examples. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

SCHEDULING IN MANUFACTURING SYSTEMS

Standard Formula, Internal Models and Scope of Validation. Example IM Calibrations Interest Rates Equities. Self-sufficiency

Thus, there are two points to keep in mind when analyzing risk:

Risk Simulation in Project Management System

Quantitative Analysis for Management, 12e (Render) Chapter 2 Probability Concepts and Applications

Analytical Modeling of

AN ABSTRACT OF THE DISSERTATION OF

Chapter 7A Waiting Line Management. OBJECTIVES Waiting Line Characteristics Suggestions for Managing Queues Examples (Models 1, 2, 3, and 4)

Control Charts for Customer Satisfaction Surveys

Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K. V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds.

Queueing and Service Patterns in a University Teaching Hospital F.O. Ogunfiditimi and E.S. Oguntade

A Note on Convexity of the Expected Delay Cost in Single Server Queues

Business Process Simulation for Claims Transformation

A SIMULATION STUDY OF AN AUTOMOTIVE FOUNDRY PLANT MANUFACTURING ENGINE BLOCKS

Data Mining. CS57300 Purdue University. March 27, 2018

Models in Engineering Glossary

Use of Monte Carlo Simulation for Analyzing Queues in a Financial Institution

A TUTORIAL ON ERGONOMIC AND PROCESS MODELING USING QUEST AND IGRIP. Deidra L. Donald

Data Analysis and Sampling

Metaheuristics for scheduling production in large-scale open-pit mines accounting for metal uncertainty - Tabu search as an example.

Cross-Dock Modeling And Simulation Output Analysis

INTERNAL PILOT DESIGNS FOR CLUSTER SAMPLES

A Taxonomy for Test Oracles

Transcription:

CS626 Data Analysis and Simulation Instructor: Peter Kemper R 14A, phone 221-3462, email:kemper@cs.wm.edu Office hours: Monday, Wednesday 2-4 pm Today: Stochastic Input Modeling based on WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission Reference: Law/Kelton, Simulation Modeling and Analysis, Ch 6. 1

Big Picture: Model-based Analysis of Systems real world portion/facet perception description real world problem solution to real world problem transfer decision formal model transformation presentation probability model, stochastic process formal / computer aided analysis solution, rewards, qualitative and quantitative properties 2

What is input modeling? Input modeling Deriving a representation of the uncertainty or randomness in a stochastic simulation. Common representations Measurement data Distributions derived from measurement data <-- focus of Input modeling usually requires that samples are i.i.d and corresponding random variables in the simulation model are i.i.d i.i.d. = independent and identically distributed theoretical distributions empirical distribution Time-dependent stochastic process Other stochastic processes Examples include time to failure for a machining process; demand per unit time for inventory of a product; number of defective items in a shipment of goods; times between arrivals of calls to a call center. 3

Why are input models stochastic? We just cannot assume randomness away. Example (Nelson and Biller 23): Suppose you are a supplier of a component that you know has a mean time to failure of 2 years. A client is willing to pay $1 for your component, but wants you to pay a penalty of $5 if failure occurs in less than one year. Should you take this contract? No uncertainty: You will pocket $1 for each component you sell. Uncertainty: If you know that the distribution of time to failure is well modeled as being exponentially distributed (an input model) with mean 2 years, then F(1)=.39 and you can expect to lose $95 on each component you sell. If you know that the distribution of time to failure is well modeled as being uniformly distributed (an input model) between and 4 years (so that mean lifetime is 2 years), then F(1)=.25 the expected loss on each component is $25. from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 4

Learning objectives Concept of input modeling and its fit in simulation model development. Input modeling with data: Physical basis for distributions. Fitting and checking. Input modeling without data: Sources of information. Incorporating expert opinion. from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 5

What is input modeling? Input modeling: Deriving a representation of the uncertainty or randomness in a stochastic simulation. Randomness? A way to describe the behavior of a subsystem that - (lack of knowledge): we can not describe as a deterministic system - (lack of interest, abstraction from details): we do not want to describe as a deterministic system 6

What is input modeling? Example model: G/G/n/m FCFS queue Customers (Tasks) arrive according to some general distribution G Customers are served for a time according to some distribution G n servers are available to serve customers in parallel Customers are scheduled following first-come-first-serve (FCFS) m is the capacity of the queue, (customers hitting a full system are turned away) Design question: What values of n and m are necessary to limit the waiting time for 9% of all customers to 1 min and to limit the fraction of customers that get turned away to 5% in the long run What pieces of information does the input modeling contribute to this simulation study? Photo: Stuart Richards (Left-hand), Flickr, Creative Commons 7

Cookbook recipe for conducting a simulation study Statement of the decision problem and objectives System Analysis Data Collection Verification Output Analysis Validation Input Modeling Development Removal of initialcondition bias Experimental Design Model Building Design and coding of the simulation program Determination of the replication number for error control Simulation runs Rough-cut Model Development Static (Spreadsheet) Simulation Dynamic System Simulation Comparison via Simulation Statistical analysis of results and system design comparison Static Models Dynamic Models Simulation Optimization Recommendation for decisions and implementation of the model Final documentation from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 8

Simulation model development Real-World Simulation Modeling Simulation Programming Process or Phenomenon Simulation Model Random Input Model Simulation Program Random Variate Generator Simulation Input Modeling Random Variate Programming from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 9

G/G/n/m FCFS queueing model revisited Conceptual model Customers (Tasks) arrive according to some general distribution G Customers are served for a time according to distribution G n servers are available to serve customers in parallel Customers are scheduled following first-come-first-serve (FCFS) m is the capacity of the queue, (customers hitting a full system are turned away) Design question: What values of n and m are necessary to limit the waiting time for 9% of all customers to 1 min and to limit the fraction of customers that get turned away to 5% in the long run Input model Measurement data for task arrivals and service times for a certain time Option 1: Trace-driven simulation use measurement data to feed a simulation run Option 2: Simulation draws from a probability distribution needs selection/configuration of a distribution (distribution fitting) alternative: empirical distribution Option 3: Simulation executes stochastic process (later) 1

Input model development There is no true model for any stochastic input. The best that we can hope is to obtain an approximation that yields useful results. A key distinction in input modeling problems is the presence or absence of data: When we have data, then we fit a model to the data. Software support: Essentially, all models are wrong, but some are useful. Box, George E. P.; Norman R. Draper. Empirical Model-Building and Response Surfaces. Wiley 1987. Special purpose software, e.g., ExpertFit by A. Law Simulation environments include this, e.g., Arena by Rockwell Automation Statistics packages provide key functionality, e.g., R (www.r-project.org) When no data are available, then we have to creatively use what we can get to construct an input model. 11

Collecting data Generally hard, expensive, frustrating, boring: System might not exist. Data available on the wrong things might have to change model according to what is available. Incomplete, dirty data. Too much data (!) Sensitivity of outputs to uncertainty in inputs. Match model detail to quality of data. Cost should be budgeted in project. Capture variability in data model validity. from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 12

Example: Traffic measured at a node in a network Plot shows sequence of time stamps for a series of requests (arrival stream). Observations: concatenation of several measurements with a restart close to. or unreasonable wide gaps to higher values of time stamps Need thresholds to automate subsequence detection x = 2s for drop of time, y = 1s for increase Note: Check consistency ahead of any numerical analysis! 13

Example: Traffic measured at a node in a network Plot shows sequence of time differences for first 2k of events. Observations: closer look reveals that subsequence are not necessarily accurately ordered Options? 1.remove out-of-order entries 2.consider ordered subsequences 3.sort subsequence Note: Check consistency ahead of any numerical analysis! 14

Input model development Approaches Real-World Process Collecting Data Validation Fitting Probability Distributions Using Data Itself Expert Opinion Goodness of the Fit Input Model (Fit) from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 15

Notes on using data itself: Trace-driven simulation Example: Simulator needs arrival of i-th customer: pick i-th arrival from data Limitations and Challenges Can never go outside your observed data. No tail and nothing in the gaps. Difficult to reflect dependencies in the inputs. Need to change the data when the input process changes. May not have enough data for long or many runs. Difficult to configure, e.g., customers arrive twice as fast... Huge amount of data requires huge amount of space On the positive side measurement data can naturally incorporate all kinds of qualitative and quantitative constraints and necessary details for a realistic run allows for a direct comparison of real system with simulated system and validation 16

Fitting Probability Distributions Precondition: I.I.D assumption for sample data used in fitting I.I.D assumption for RVs in real system must be validated Corresponding graphical techniques/statistical tests... later! Focus: univariate distributions (i.e. just one RV) Most probability distributions were invented to represent a particular physical situation. If we know the physical basis for a distribution, then we can match it to the situation we have to model. Examples: Binomial Poisson and Exponential Normal and Lognormal Beta, Pert, and Triangular Uniform (See Law, Chapter 6 (27) for a detailed list) 17

G/G/m/n FCFS Example refined (from Law, Example 6.1) Does the selection of the distribution really matter? Arrivals: exponential, rate λ = 1, m=1, n= Service times: given 2 samples, distribution unknown Exercise different distributions with parameter being fitted to match data Make 1 independent simulation runs using each of the 5 distributions; continue each of the 5 runs to collect 1 delays; observe impact of selected distribution: Distribution Delay in queue Number in queue Prop. delays 2 Exponential 6.71 6.78.64 Gamma 4.54 4.6.19 Weibull (best) 4.36 4.41.13 Lognormal 7.19 7.3.78 Normal 6.4 6.13.45 18

Some Distributions Exponential Gamma Weibull Lognormal Normal 19

Parameterization of distributions Parameters of 3 basic types Location specifies an x-axis location point of a distribution s range of values usually the midpoint (e.g. mean for normal distribution) or lower end point for the distribution s range sometimes called shift parameter since changing its value shifts the distribution to the left or right, e.g., for Y = X + γ Scale determines the scale (unit) of measurement of the values in the range of the distribution (e.g. std deviation σ for normal distribution) changing its value compresses/expands distribution but does not alter its basic form, e.g., for Y = β X Shape determines basic form/shape of a distribution changing its values alters a distribution s properties, e.g. skewness more fundamentally than a change in location or scale 2

Physical basis for binomial distribution Binomial Models the number of successes in n independent Bernoulli trials, with probability p of success in each trial Example: The number of defective components found in a lot of n components with probability p of picking a defective component..45 Binomial(5,.2).45 Binomial(5,.8) -1. 1. 2. 3. 4. 5. 6. -1. 1. 2. 3. 4. 5. 6. E[X]=np Var=np(1-p) from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 21

Physical basis for Poisson distribution Poisson: Models the number of independent events that occur in a fixed amount of time. Example: Number of customers arriving at a store during 1 hr..4 Poisson(1).2 Poisson(5) -.5.125.75 1.375 2. 2.625 3.25 3.875 4.5-2. 2. 4. 6. 8. 1. 12. E[X]=λ Var=λ from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 22

Physical basis for exponential distribution Exponential Models the time between independent events, or a process time which is memoryless. Example: The time to failure for a system that has constant failure rate over time. Note: If the time between events is exponential, then the number of events is Poisson. 1.2 Expon(1) Shift=-2.5.35 Expon(3) Shift=-2.5.96.72.48.24-3. -2.45-1.9-1.35 -.8 -.25.3.85 1.4 1.95 2.5-4. -2. 2. 4. 6. 8. 1. 12. E[X]=λ Var=λ 2 from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 23

Physical basis for normal distribution Normal distribution Models quantities that are the sum of a large number of other quantities. Example: Time to assemble a product. Student t distribution Very similar to normal, but with heavier tails. Normal(, 1) vs Student(6).45.4.35.3.25.2.15.1.5 X <= -1.645 5.% @RISK Student Version For Academic Use Only X <= 1.645 95.% -4-3 -2-1 1 2 3 4 Normal: from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission E[X]=µ Var=σ 2 24

Physical basis for lognormal distribution Lognormal: Models the distribution of a process that can be thought of as the product of a number of component processes. Example: The rate of return on an investment, when interest is compounded, is the product of the returns for a number of periods. Time to perform some task Quantities that are the product of a large number of others (by virtue of central limit theorem).4 Lognorm(2.5, 2) Shift=-2.5.7 Lognorm(2.5, 5) Shift=-2.5-4. -2. 2. 4. 6. 8. -5. 5. 1. 15. 2. from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 25

Physical basis for beta distribution 3. Beta An extremely flexible distribution used to model bounded (fixed upper and lower limits) random variables in the absence of data. Used as a rough model in the absence of data Distribution of a random proportion such as the proportion of defective items in a shipment Time to complete a task, e.g. in a PERT network Example: Proportion of defective items in a shipment. Beta(1.5, 5) 3. Beta(5, 1.5) 2.5 2.5 2. 2. 1.5 1.5 1. 1..5.5 -.2.8.36.64.92 1.2 from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission -.2.8.36.64.92 1.2 26

Physical basis for Pert (Beta) distribution Pert, (Beta in disguise) Used to model the activity times in project management problems and defined by three point estimates: min, mode, max Example: Time to complete a task in a PERT network. PERT is a method to analyze the involved tasks in completing a given project, especially the time needed to complete each task, and identifying the minimum time needed to complete the total project..3 Pert(5, 6, 15).25 Pert(5, 13, 15) 4. 6. 8. 1. 12. 14. 16. 4. 6. 8. 1. 12. 14. 16. 27

Physical basis for triangular distribution Triangular: Models a process when only the minimum, most likely and maximum values of the distribution are known. Example: The minimum, most likely and maximum inflation rate we will have this year..25 Triang(5, 6, 15).25 Triang(5, 13, 15) 4. 6. 8. 1. 12. 14. 16. 4. 6. 8. 1. 12. 14. 16. from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 28

Physical basis for uniform distribution Discrete Uniform Models complete uncertainty, since all outcomes are equally likely. Example: A first model for a quantity that is varying among the integers 1 through 4, but about which little else is known..3 DUniform({x}).5 1. 1.5 2. 2.5 3. 3.5 4. 4.5 from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 29

Distributions Many theoretical distributions with nice properties experience with scenarios when to apply those well-studied properties, parameters, characteristics compact representation of data software support for sampling in simulation runs software support to perform parameter fitting easy to vary by modification of parameters some allow for closed-form analytical formulas for system analysis (queueing networks) may allow for numbers beyond reasonable limits, e.g. negative values, very high values such that truncation may be necessary less sensitive to data irregularities than an empirical distribution For distributions and their relationships see also: Wheyming Song and Yi-Chun Chen, Simulation Input Models: Relationships Among Eighty Univariate Distributions Displayed in a Matrix Format, Proceedings Winter Simulation Conference 21. Larry Leemis:Univariate Distribution Relationships www.math.wm.edu/~leemis/chart/udr/udr.html 3

Overview of fitting with data Select one or more candidate distributions based on physical characteristics of the process and graphical examination of the data. Fit the distribution to the data determine values for its unknown parameters. Check the fit to the data via statistical tests and via graphical analysis. If the distribution does not fit, select another candidate and repeat the process, or use an empirical distribution. from WSC 21 Tutorial by Biller and Gunes, CMU, slides used with permission 31