A STUDY ON THE APPLICATION OF SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES IN MANUFACTURING FLOW TIME ESTIMATION IN A DYNAMIC JOB SHOP

Size: px

Start display at page:

Download "A STUDY ON THE APPLICATION OF SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES IN MANUFACTURING FLOW TIME ESTIMATION IN A DYNAMIC JOB SHOP"

Buddy Thomas
5 years ago
Views:

A STUDY ON THE APPLICATION OF SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES IN MANUFACTURING FLOW TIME ESTIMATION IN A DYNAMIC JOB SHOP THESIS Submitted in partial fulfillment of the

1 A STUDY ON THE APPLICATION OF SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES IN MANUFACTURING FLOW TIME ESTIMATION IN A DYNAMIC JOB SHOP THESIS Submitted in partial fulfillment of the requirements for the award of the degree of DOCTOR OF PHILOSOPHY IN THE DEPARTMENT OF MECHANICAL ENGINEERING By V.THIAGARAJAN (Regn. No. SP09MEDA37) DEPARTMENT OF MECHANICAL ENGINEERING St. PETER S INSTITUTE OF HIGHER EDUCATION AND RESEARCH St. PETER S UNIVERSITY CHENNAI SEPTEMBER 2017

2 ii CERTIFICATE I hereby certify that the thesis entitled, A STUDY ON THE APPLICATION OF SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES IN MANUFACTURING FLOW TIME ESTIMATION IN A DYNAMIC JOB SHOP revised and resubmitted to the St. Peter s University, for the award of Degree of Doctor of Philosophy is the record of research work done by the candidate V.THIAGARAJAN (SP09MEDA37) under my guidance and that the thesis has not formed previously the basis for the award of any degree, diploma, associateship, fellowship or other similar titles. Dr. T.N. SRIKANTHA DATH SUPERVISOR Place : Date :

3 iii DECLARATION Certified that the thesis entitled A STUDY ON THE APPLICATION OF SUPERVISED AND UNSUPERVISED MACHINE LEARNING TECHNIQUES IN MANUFACTURING FLOW TIME ESTIMATION IN A DYNAMIC JOB SHOP is the bonafide record of independent work done by me under the supervision of Dr.T.N.SRIKANTHA DATH. Certified further that the work reported herein does not form part of any other thesis or dissertation on the basis of which a degree or award was conferred earlier. Dr. T.N.SRIKANTHA DATH V.THIAGARAJAN SUPERVISOR Place : Date :

4 iv ACKNOWLEDGEMENTS I owe a deep sense of gratitude to our Chairperson, Dr.Francis C. Peter, Vice Chancellor, Dr.S.Gunasekaran,D.Sc., Dean (R&D), Dr.M.A.Dorai Rangaswamy, Registrar and Dr. P.Periyasamy, Head, Department of Mechanical Engineering, St. Peter s University for providing a conducive atmosphere and the facilities for carrying out my research work. I would like to express my sincere gratitude to my supervisor, Dr.T.N.Srikantha Dath, Professor, Department of Mechanical and Manufacturing Engineering, M.S. Ramaiah University of Applied Sciences, Peenya, Bengaluru, who not only inspired me to take up this research project but constantly motivated me throughout this project. His support during times of despair was invaluable. I thoroughly enjoyed every moment of this journey with you. I am eternally indebted to Dr.C.Rajendran, Professor, Department of Management Sciences, IIT-Madras for his continuous guidance and support at every stage of this work for offering scholarly advice and teaching me the importance of attention to detail. I would like to thank Dr.N.Venketaswaran, Professor, Department of Computer Science, SSN College of Engineering, Chennai and Dr.K.Purushothaman, Head, Department of Mechanical Engineering, St.Peter s College of Engineering and Technology, Chennai, DC members, for taking time to review my work and offering suggestions for improvement. I am extremely thankful to Dr.L.Mahesh Kumar, Director (Academic), St. Peter s University, Chennai for his unwavering support throughout this research project. I am aware that I am standing on the shoulders of many researchers who have worked in this area and I am grateful to all of them.

5 v ABSTRACT Customers generally want early due-date promises from manufacturers due to competitive pressure in their businesses, while manufacturers typically prefer extended due dates / lead times in order to plan better production schedules and balance workloads. Flow time is the time a job spends in the shop from order release to completion. As a result of complex structure of the job shop, flow time elements become uncertain which makes the accurate prediction of flow time more difficult. Early research in due date assignment methods which is based on flow time prediction, mainly focused on developing heuristics for due date setting for steady state systems. Later, research focused on dynamic nature of job shops and several methodologies for adjusting flow allowances based on current shop conditions were proposed. Static rules ignored current status of the shop floor when an order arrives and may set unrealistic due dates for individual orders. Dynamic rules tend to be information intensive but more accurate. Previous research has established that priority rules, shop configuration, work centre utilization, job and shop characteristics have definite influence on flow patterns. In this background, it is worthwhile to explore the possibility of using data mining as a tool for predicting the manufacturing flow time. Data and knowledge mining is learning from data. This learning from data comes in two variants: Supervised and Unsupervised learning. This work aims to study the suitability of both the methods for solving this particular problem and will also attempt to establish the best approach for this problem, with the following objectives.

6 vi To propose use of model-tree induction approach as supervised machine learning technique for manufacturing flow time estimation in a dynamic job shop. To propose use of Self Organising Maps (SOM), an unsupervised machine learning technique, for prediction of manufacturing flow time in a dynamic job shop. To compare performance of supervised and unsupervised machine learning technique for manufacturing flow time estimation. Shop floor simulation is used to generate the data used for this study. MT-TWK, a method for flow time estimation using Model-Tree induction algorithm M5 (a supervised learning algorithm) and implemented in KNIME workbench and SOM-RA, another method for flow time estimation using Self Organising Maps (SOM) (a unsupervised learning algorithm) as implemented in Tanagra, are proposed in this work. Mean Absolute Lateness (MAL) and Mean Squared Lateness (MSL) are used for measurement of the performance of the proposed methods. The performance was also compared with conventional Total Work Content method and Dynamic Total Work Content (DTWK) methods. The results demonstrate that both the proposed methods perform better than conventional methods, and particularly the SOM based method performs very well, compared to all the other methods.

7 vii CONTENTS Certificate Declaration Acknowledgements Abstract List of Tables List of Figures List of Symbols List of Abbreviations Page ii iii iv v xi xii xiv xv CHAPTER 1 INTRODUCTION Preamble 1.2 The job shop environment 1.3 Issue in predicting manufacturing Flow time Flow time estimation and due date assignment Dimensions of the due date assignment problem Static Vs Dynamic Preemptive Vs Non-preemptive Stochastic Vs deterministic processing times Setup time/cost Server reliability Single Vs multiple classes of customers 1.4 Factors in due date assignment methods 1.5 Performance measures 1.6 Need for the study 1.7 Data mining 1.8 Machine Learning Supervised machine learning Unsupervised machine learning 1.9 Manufacturing simulation Advantages of manufacturing simulation 1.10 Statement of the problem Challenges Proposed solution 1.11 Objectives of the study 1.12 Methodology of the study

8 viii Model Tree Induction method Self organizing maps (SOM) 1.13 Merits of this research work 1.14 Organization of the thesis CHAPTER 2 REVIEW OF LITERATURE Due Date Assignment methods 2.2 Direct Procedures 2.3 Simulation method 2.4 Analytical method 2.5 Statistical Analysis 2.6 Artificial neural networks 2.7 Data mining techniques 2.8 Heuristics 2.9 Finding from literature review and research gaps CHAPTER 3 METHODOLOGY Data classification Overview of data classification techniques General approach to solving classification problems Hunt s Algorithm Decision tree induction Extracting classification rules from decision tree C4.5 Algorithm Splitting criteria Stopping criteria Pruning methods 3.2 Flow time estimation using C4.5 algorithm 3.3 Flow time estimation using regression Trees 3.4 Flow time estimation using model tree induction 3.5 Model Tree Induction method Methods and procedures Data collection Job shop production system 3.6 Simulation software 3.7 Job shop simulation model Job arrival section Manufacturing section Disposal section

9 ix CHAPTER Validation and verification Simulation output 3.8 Processing using KNIME 3.9 Advantages of model tree Induction FLOW TIME ESTIMATE USING CLUSTERING METHOD Unsupervised learning 4.2 Clustering: Problem definition Distance measure Euclid Distance measure Pearson Distance measure Mahalanobis Distance measure Rectilinear Distance measure Minkowski Distance measure Distance between clusters 4.3 Hierarchical clustering Vs Non-Hierarchical clustering 4.4 K-means clustering 4.5 Hierarchical clustering 4.6 Density based clustering 4.7 Self-organizing maps 4.8 Estimation of flow time using SOM Conceptual framework for the proposed method Macro modeling approaches used for flow time Prediction Methods and procedures 4.9 Summary CHAPTER 5 RESULTS AND DISCUSSION Performance measures 5.2 Results for MT-TWK methods 5.3 Results for SOM-RA method 5.4 Comparative study of SOM-RA and MT-TWK performance 5.5 Theoretical and managerial contributions

10 x CHAPTER 6 CONCLUSION AND SCOPE FOR FURTHER STUDY Conclusions 6.2 Implications of the present work 6.3 Scope for further study REFERENCES 110 PUBLICATIONS 115

11 xi LIST OF TABLES Table Page job shop problem data Simulation output (Sample) Sample input data for MT-TWK model (Learning mode) Sample output data with assigned class and predicted class Sample output data with predicted flow time Experimental results with respect to mean absolute lateness (MAL) 4.1 Sample input data for SOM clustering software Sample output data from SOM clustering software Cluster characteristics and flow time prediction equations Experimental results with respect to mean absolute lateness (MAL) 5.2 Experimental results with respect to mean squared lateness (MSL) 5.3 Percentage change in MAL and MSL with respect to utilization rate 5.4 Three Factor ANOVA - MAL (α = 0.05) Three Factor ANOVA - MSL (α = 0.05) Experimental results for mean absolute lateness (MAL) (SOM-RA) 5.7 Experimental results for mean squared lateness (MSL) (SOM-RA) 5.8 Three Factor ANOVA - MAL (α = 0.05)(SOM-RA) Three Factor ANOVA - MSL (α = 0.05)(SOM-RA) Experimental results with respect to all methods studied for MAL 5.11 Experimental results with respect to all method studied for MSL

12 xii LIST OF FIGURES Figure Page 1.1 Knowledge Discovery Process ANN-based due date assignment model Overall flow of MT-TWK model for flow time estimation General approach for building the classification model Basic algorithm for inducing a decision tree from training samples RTWK model Due date allowance factor, k Vs the frequency distribution of occurrences 3.6 CREATE template ARENA model used in this study Proposed MT-TWK model KNIME work flow diagram Model tree generated using KNIME workbench (Partial view) Overall flow of SOM-RA model k-means algorithm Tanagra work flow diagram SOM parameters used in this study MAP topology and quality SOM-RA model MAL performance with respect to dispatching rules at 75% utilization (MT-TWK) 5.2 MAL performance with respect to dispatching rules at 85% utilization (MT-TWK) 5.3 MSL performance with respect to dispatching rules at 75% utilization (MT-TWK) 5.4 MSL performance with respect to dispatching rules at 85% utilization (MT-TWK) 5.5 MAL performance with respect to dispatching rules at 75% utilization (SOM-RA) 5.6 MAL performance with respect to dispatching rules at 85% utilization (SOM-RA) 5.7 MSL performance with respect to dispatching rules at 75%

13 xiii utilization (SOM-RA) 5.8 MSL performance with respect to dispatching rules at 85% utilization (SOM-RA) 5.9 MAL performance with respect to dispatching rules at 75% utilization (all methods) 5.10 MAL performance with respect to dispatching rules at 85% utilization (all methods) 5.11 MSL performance with respect to dispatching rules at 75% utilization (all methods) 5.12 MSL performance with respect to dispatching rules at 85% utilization (all methods)

14 xiv LIST OF SYMBOLS ε ij - Error coefficient σ p - Std. Deviation of processing time µ p - Mean processing time k 1, k 2 - Parameter coefficients β 0, β 1, β 2, β 3 - Parameter coefficients

15 xv LIST OF ABBREVIATIONS ADRES - Adaptive Response Rate Exponential Smoothing AFT - Average Flow Time ANN - Artificial Neural Networks CART - Classification and Regression tree CON - Constant CV - Central Value DDAM - Due Date Assignment Method DTWK - Dynamic Total Work Content DM - Data Mining EDD - Earliest Due Date FCFS - First Cum First Served FT - Flow Time GEP - Gene Expression Programming ID3 - Iterative Dichotomiser 3 JIQ - Jobs in Queue JIS - Jobs in System LDP - Last Data Point LT - Lead Time L i - Lateness of job i MT-TWK - Model Tree Total Work Content MPE MAPE - Mean Percentage Error - Mean Absolute Percentage Error MAL - Mean Absolute Lateness MSL - Mean Squared Lateness

16 xvi ME - Mean Earliness MT - Mean Tardiness NOP - Number of Operations OGL - Open Grants License ORR - Order Review and Release PPW - Process Plus Waiting PSP - Pre Shop Pool RMR - Response Mapping Rule RTWK - Rule based Total Work Content RAN - Random RMS - Root Mean Square SPT - Shortest Processing Time SOM - Self Organizing Maps SLK - Slack 2D - Two dimension 3D - Three dimension WIQ - Work in Queue

17 1 CHAPTER 1 INTRODUCTION 1.1 Preamble Manufacturing flow time prediction is critical in a typical job shop as it affects both customer relations and shop floor management practices. Flow time is the time a job spends in the shop from order release to completion. Customers demand shortest possible delivery time and production desires a comfortable flow time to ensure prompt delivery. It is in this context that the flow time prediction becomes critical in shop floor management. Short, accurate and precise flow time estimates are desirable. This research work proposes machine learning approach for estimation of flow time in a dynamic job shop. This chapter presents an overview of the background to the research problem, data mining, machine learning, and simulation. The research problem and the objectives of this research work are stated and the methodology used is also outlined. The chapter is concluded with a summary of benefits of this research work and the thesis organization is presented finally. 1.2 The Job Shop Environment In a typical job shop environment, the factors that determine flow time are (a) processing time of the job (b) transportation time between workstations (c) waiting time at workstations and (d) setup times. As a result of the complex structure of the job shop, flow time elements become uncertain which makes the accurate prediction of flow time more difficult. It is a well-known fact that the better part of flow time is spent in queues waiting in front of machines for processing than in actual processing. Even though actual process needs can be predefined with sufficient accuracy, the

18 2 delays to be faced are dependent upon shop status in real time, which is hard to infer in advance. Prediction becomes even harder with longer lead times because uncertainty increases as the variances in job mixes and quantities accumulate in time. The uncertainty that prevails and dependencies on various factors are addressed in almost all methods previously proposed by researchers. However, the association among factors, the conditional presence of relations and clustering of flow times in response classes have rarely been addressed in previous studies. The reason this could not be done was non-availability of large amounts of data, data being available in different formats, semantics and quality. The inability of existing mathematical modelling methods to handle high dimensionality data was also another reason. The advent of machine learning (ML) techniques enabled processing of large amount data. Machine learning techniques such as Support Vector Machines (SVMs) are designed to analyse large amounts of data and capable of handling high dimensionality (>1000) very well (Yang and Trewn,2004). Machine learning techniques have a definite advantage over mathematical models in that they can learn and adapt to changing environment (Alpaydin,2010). Hopp and Spearman (2001) suggested that predictive systems, such as scheduling tools, due date quoting systems and capacity planning procedures should use the most accurate data available including the actual historical data, where appropriate. Early research in due date assignment methods which is based on flow time prediction, has been mainly focused on developing heuristics for due date setting for steady state systems. Later research focused on dynamic nature of job shops and several methodologies such as by Cheng and Jiang (1998) and Veral (2001) for adjusting flow allowances based on current shop conditions were proposed. Static rules ignore the current status of the shop floor when an order arrives and hence may set unrealistic due dates for individual orders. Dynamic rules tend to be information intensive, but more accurate.

19 3 Previous research has established that priority rules, shop configuration, work center utilization, job and shop characteristics have a definite influence on flow patterns.(cheng 1986;Ragartz and Mabert 1984;Sabuncuoglu and Comlekci 2002). 1.3 Issues in Predicting Manufacturing Flow Time Estimation of job flow times has always been an important issue since the late 1960s in job shop scheduling literature. Since the flow time estimation is used to assign order due dates, the problem has mostly been studied within the context of due date assignment (Sabuncuoglu and Comlekci, 2002).The importance of meeting promised delivery dates, or due dates in manufacturing and service industry is recognized by practicing production managers and academic researchers (Ragartz and Mabert,1984). Due-date is the date by which an order or a job is required to be delivered to the customer. Before a job is released to the shop floor of a dynamic job shop for processing, its due-date needs to be assigned. Due date performance depends not only on the scheduling procedure followed but also on the reasonability of the assigned due dates. There are two aspects of due date performance, delivery reliability and delivery speed (Hill, 1991). Delivery reliability, also referred to as missed due date (Cheng and Jiang, 1998), is the ability to consistently meet promised delivery dates. Delivery speed is the ability to deliver orders to the customers with shortest lead times (Philipoom,2000). Due date performance is becoming increasingly important in today s volatile production environment. The current trend has been towards due date assignment investigations that concentrate on dynamic flow time estimation models (Vig and Dooley, 1991). In fact, the flow time prediction problem is really the crux of the due date management problem. The due date assignment process consists of making an estimate of flow time for a job and then setting a due date on the basis of that estimate and some performance criteria (Ragartz and Mabert, 1984).

20 4 Due dates can be set either externally by the customer (exogenous method) or internally by the scheduling system (endogenous method). When due dates are externally set, the scheduling system is charged with appropriate prioritization and synchronization to provide a timely flow of operations. Internally set due dates usually reflect current job shop congestion levels, manufacturing system capacity and job content. In both cases, tight due dates and on-time completion of jobs are challenges to the scheduler. Several measurement criteria are proposed in the literature for measuring the effectiveness of different due-date setting methods and evaluating the performance of dispatching rules. The commonly used measures include mean tardiness, mean absolute lateness, the standard deviation of tardiness, the standard deviation of lateness and percentage of jobs tardy (Veral and Mohan, 1999) Flow time estimation and Due date assignment A survey of due date based research reveals that due dates are usually treated as given information and taken as input to a scheduling problem. However, in actual practice, the due date can be a decision variable within the domain of the scheduling problem. The former type of scheduling problems is reviewed in depth by Sen and Gupta (1984) and Gupta and Kyparisis (1987). Static and single-machine scheduling problems, when due dates are given have been analyzed in depth. Ragartz and Mabert (1984) have provided a conceptual model of a due date management problem which among other important variables, identifies a variety of due date assignment rules. They do recognize the need for further research using different due date assignment rules. Smith and Seidmann (1983) presented a comprehensive classification of due date selection procedures from which three major categories are derived, Direct procedures (rules), Heuristic procedures and Simulation. (Cheng and Gupta, 1989).

21 5 During the last decade, several important developments have happened which necessitates the need for an updated survey of due date assignment related research work and to identify the research areas for further studies Dimensions of the Due Date Assignment problem There are several dimensions that distinguish different due date management problems. A combination of these dimensions will affect the appropriate mathematical model for studying DDAM in that setting Static Vs Dynamic All the information about the problem, such as the job arrival and processing times, are available at the beginning of the scheduling horizon in a static setting, in a dynamic setting, future arrivals are not known with certainty and the information about a job becomes available only at the time of its arrival Pre-emptive Vs Non pre-emptive Interruption of the processing of a job is allowed in the pre-emptive setting. In a non pre-emptive mode, once the processing of a job starts, it must be completed without any interruption Stochastic Vs Deterministic processing times When the processing times are not known with certainty, it is usually assumed that they follow a probability distribution with known mean and variance Setup times/costs When changing from one order (or one customer class) to another, there may be a transition time. Most of the current literature ignores setup times.

22 Server reliability The capacity of the resources may be known (deterministic) over a planning horizon, or there may be random fluctuations (e.g., machine breakdowns) Single vs. multiple classes of customers Customers (or jobs) can be divided into different classes based on the revenues (or margins) they generate or based on their demand characteristics, such as (average) processing times, lead time related penalties, or maximum acceptable lead times. 1.4 Factors in due date assignment methods Most methods include one or more of the following factors to assign a due date for a job in a dynamic job shop, (Chang. R., 1997) 1. Total processing time for the job. 2. Number of operations of the job. 3. Number of jobs in work center queues on a job s routing when it is released to the shop. 4. Total processing time for all jobs in work center queues on job s routing when it is released to the shop. 5. Number of jobs in the system when a job is released to the shop. 6. Mean flow time in the system 7. Standard deviation of flow time in the system 8. Mean queuing time in the system 9. Standard deviation of queuing time in the system

23 7 10. Mean queuing time per operation 11. Standard deviation of queuing time per operation 12. Mean numbers of jobs in the system 13. Standard deviation of number of jobs in the system 1.5 Performance Measures The following performance measures are used to evaluate the relative effectiveness of the due date assignment rules and associated factors: 1. Mean Absolute Lateness (MAL) (1.1) 2. Mean Earliness (ME) ME = (1.2) 3. Mean Tardiness (MT) MT= (1.3) 4. Mean Squared Lateness (MSL) (1.4) c i, d i, and n denote the completion time of order i, promised due date of order i and sample size respectively. Mean Absolute Lateness (MAL), which measures the average absolute differences between actual completion dates and promised due dates of the jobs and Mean Squared Lateness (MSL) is used in this study as the primary performance measures. A smaller MAL value implies better due-date prediction capability. MAL is always equal to the sum of Mean Earliness (ME) and Mean Tardiness (MT).

24 8 1.6 Need for the study With the above issue in perspective, it is worthwhile to explore the possibility of using data mining as a tool for predicting the manufacturing flow time. Data mining, also called knowledge discovery in databases, can be defined as the nontrivial extraction of implicit previously unknown and potentially useful information from databases. In other words, the information stored in databases is different from the knowledge it contains and is discovered by using data mining tools. Data and knowledge mining are learning from data. This learning from data can be obtained from Supervised learning and Unsupervised learning. Supervised learning takes a known set of input data and known responses to the data, and seeks to build a predictor model that generates reasonable predictions for the response to new data. Unsupervised learning is about analyzing data and looking for patterns. It is an extremely powerful tool for identifying structure in data. It is worthwhile to explore the application of machine learning techniques such as supervised and unsupervised learning for manufacturing flow time prediction using computer integrated resource planning and shop floor control systems supported by state of art data warehousing software available today. This study envisages proposing and demonstrating two methods one based on supervised machine learning and other based on unsupervised machine learning for this task. 1.7 Data Mining Data mining is about explaining the past and predicting the future by means of data analysis. Data mining is a multidisciplinary field which combines statistics, machine learning, artificial intelligence and database technology. A

25 9 manufacturing system exists to produce a group of parts, subassemblies, and/or products. As part of the ongoing digital revolution, it has become easy to capture and store a vast amount of data in a fairly inexpensive storage media. Data mining and knowledge discovery is an interdisciplinary field for uncovering hidden and useful knowledge from such large volumes of data. Fayyad et al. (1996) presented a generic data mining process starting from understanding the application domain, selecting the data, cleaning the data by removing outliers and providing missing values, data preprocessing, data integration, data reduction, and transformation, selecting machine learning algorithms, data mining, interpretation of the results and using the discovered knowledge. Data Selection Target Data Preprocessing Preprocessed data Transformation Transformed data Patterns / Maps Visualisation Interpretation evaluation Knowledge Figure 1.1 Knowledge Discovery Process

26 Machine learning Machine learning systems automatically learn from data. This is often a very attractive alternative to manually constructing them, and in the last decade, the use of machine learning has spread rapidly throughout computer science and beyond. Among the different types of machine learning tasks a crucial distinction is drawn between supervised and unsupervised learning Supervised machine learning In supervised learning, the goal is to get the computer to learn a classification system that we have created. More generally, classification learning is appropriate for any problem where deducing a classification is useful and the classification is easy to determine. Supervised learning is the most common technique for training neural networks and decision trees. Both of these techniques are highly dependent on the information given by the pre-determined classifications. In the majority of supervised learning applications, the ultimate goal is to develop a finely tuned predictor function g(x) (sometimes called the hypothesis ). Learning consists of using sophisticated mathematical algorithms to optimize this function so that, given input data x about a certain domain, it will predict some interesting value g(x). Mathematically, a simple predictor has this form (1.5) where and are constants. The goal is to find the perfect values of and to make the predictor work as well as possible.

27 11 Optimizing the predictor g(x) is done using training examples. For each training example,x train, a corresponding output, y, is known in advance. For each example, the difference between the known, correct value, y and the predicted value g(x) is found. With a large sample of training example, these differences give us a useful way to measure the deviations of g(x). Subsequently, by changing the value of and the predictor equation is modified till the system converges on the best values of and. In this way, the predictor becomes trained and can be used for some real world prediction tasks. This equation is a simple univariate regression equation and can be solved by deriving a simple normal equation. However, as the number of variables increases, it becomes a significant challenge to derive a normal equation. Fortunately, the iterative approach taken by machine learning systems are much more resilient in the face of such complexity. Under supervised machine learning, the two major subcategories are Regression machine learning systems: Systems where the value being predicted falls somewhere on a continuous spectrum. Classification machine learning systems: Systems where we seek a yesor-no prediction Unsupervised Machine learning In unsupervised machine learning, the goal is to find relationships within data. The system is given a set of data and tasked with finding patterns and correlations therein. The algorithms used to do this are very different from those used for supervised machine learning. Unsupervised learning methods work with observed input patterns x i, which is often assumed to be independent samples from an underlying unknown probability distribution P(x) and some explicit or implicit a priori information as to what is important. Two classes of the method

28 12 have been suggested for unsupervised learning. Density estimation techniques explicitly build statistical models (such as Bayesian Networks) of how underlying causes could create the input. Feature extraction techniques try to extract statistical regularities directly from the inputs. Due to historical reasons, clustering is often considered synonymous with unsupervised learning. Clustering is a technique for finding similarity groups in data called clusters. It groups data instances that are similar to each other in one cluster and data instances that are very different (far away) from each other into a different cluster. 1.9 Manufacturing Simulation Simulation is designing a model of a real or imagined system and conducting experiments with that model. Computer simulation is an attempt to model a real or imagined system on a computer so that it can be studied to see how the system works. By changing variables in the simulation, predictions can be made about the behavior of the system. Manufacturing represents one of the most important applications of simulation. Simulation can be used to predict the performance of an existing or planned system and to compare alternative solutions for particular design problems. Among the most important application of simulation is to run whatif scenarios to evaluate proposed process change. features. Most of the manufacturing simulation software have the following Flowchart modeling methodology includes a large library of pre-defined building blocks to model the process without the need for custom programming.

29 13 Complete range of statistical distribution options to accurately model process variability. Ability to define object paths and routes for simulation. Statistical analysis and report generation. Performance metrics and dashboards. Realistic 2D and 3D animation capabilities to visualize results beyond numbers Advantages of manufacturing simulation Improve visibility into the effect of a system or process change. Explore opportunities for new procedures are methods without disrupting the current system. Diagnose and fix problems. Reduce delivery times. Increase profitability through overall improved operations Statement of the Problem This research work mainly focuses on establishing machine learning techniques as a viable technique for manufacturing flow time estimation in a dynamic job shop Challenges Shop floor managers rely on their tacit knowledge to estimate the flow time and use this information to quote due-dates. However, perceptions of the shop conditions, such as heavily loaded, moderately loaded, lightly loaded and

30 14 product complexity, which may change from individual to individual, depending on their shop floor experience, will have an impact on due date performance. Research has been progressing over the last few decades, to make the due date assignment process based on the explicit information. Several methods have been proposed to address this issue of relying only on shop floor manager s experience for due date assignment Proposed Solution The advent of data mining / machine learning techniques has opened new opportunities for making reliable due date assignment. This research describes an approach to use supervised machine learning technique by using some input factors, thereby eliminating the need for human intervention and automating the decision-making process in predicting the manufacturing flow time. This research also describes another approach using Self-Organizing Maps, an unsupervised machine learning technique for predicting. Another significant contribution of this work is the incorporation of order review and release (ORR) policy in the simulation model, which appears to have been ignored in previous studies. In contrast to immediate release of orders as and when they arrive, a controlled release of orders is pooling of orders received for releasing them systematically to the shop floor based on some predetermined criteria. The original purpose of controlled release of orders is to reduce shop congestion and reduce average tardiness (Melnyk,1988). However, conflicting research findings has suggested that ORR may not have significant impact on shop performance (Kanet, 1988). In this research, the modelling methodology used by Sha and Liu (2004), was largely adopted and ORR was included to improve the model. However, since the data used by Sha and Liu (2004) is not available, no definite conclusion about improvement/ otherwise in performance over inclusion of ORR policy could be drawn from this research.

31 Objectives of the study The main objective of this research work is to explore possibilities of using data mining techniques for prediction of manufacturing flow time. As mentioned earlier, there are two approaches to learning from data i.e., supervised machine learning and unsupervised machine learning. This work aims to study the suitability of both the methods for solving this problem and will also attempt to establish the best approach for this problem, with the following objectives. To propose the use of model-tree induction approach as supervised machine learning technique for manufacturing flow time estimation in a dynamic job shop. To propose the use of Self-Organizing Maps (SOM), an unsupervised machine learning technique for prediction of manufacturing flow time in a dynamic job shop. To compare the performance of supervised and unsupervised machine learning technique for manufacturing flow time estimation Methodology of the study The methodology adopted to implement the above-mentioned objectives is given below. Data mining is known to be effective with large amounts data. Since it is practically not possible to collect data for this study from the actual manufacturing environment, shop floor simulation is used to generate the data used for this study. A (10 10) job shop problem proposed by Lawrence (1982) is used to generate the data set required for the study. This problem specifies the process

32 16 sequence for 10 jobs and there are 10 machines in the system, and every job visits all the ten machines only once as per the sequence specified and no backtracking is allowed. This job shop is modeled using Arena (Evaluation Copy) and is run on Dell workstation. The inter-arrival time (in minutes) generated from a negative exponential distribution is adjusted suitably to achieve utilization levels of 75% to represent moderately loaded shop and 85% to represent the heavily loaded shop. A job type is chosen at random for processing and similar job types are collected to form a batch as described previously. Data sets of more than 5000 records are collected for each of the 10 replications. The following parameters are recorded when the job enters the shop (1) Entity serial number (2) Part Type (3) Number of jobs waiting in front of all machines and (4) The cumulative processing time of all jobs waiting in front of all machines. The following data are captured when the job exits the system (1) Entity serial number (2) Job arrival time and (3) Flow time of the job. The following assumptions are made about the production system. Setup is negligible. Transportation time between work centers is negligible and can be ignored.

33 17 No pre-emption is allowed and machines are continuously available Model Tree Induction Method Satisfying the first objective of this work, the model-tree induction method is used for supervised machine learning. Classical decision-tree and decision-rule learning methods have been developed in an environment in which class values, and originally attribute values too, were discrete. A new technique for dealing with continuous class learning problems, the model tree was developed by Quinlan (1992) and embodied in a learning algorithm called M5. In the first stage, a decision tree induction algorithm is used to build a tree. Instead of maximizing the information gain at each interior node, a splitting criterion is used that minimizes the intra-subset variation in the class values down each branch. In the second stage, consideration is given to pruning the tree back from each leaf, a technique that was pioneered independently by Breimann et al (1984) and Quinlan (1986), and this has become the standard in decision tree induction. The difference between decision-tree induction and model- tree induction is that when pruning to an interior node, consideration is given to replacing that node by a regression plane instead of a constant value. The attributes that serve to define that regression are precisely those that participate in decisions in nodes subordinate to the current one. This algorithm was named as M5 '. (see Wang and Witten, (1997) for details).knime workbench (available under OGL at implemented the M5' algorithm for model tree induction. This work uses KNIME workbench and uses decision tree learner and decision tree predictor nodes of KNIME workbench. Twothirds of the data was used for learning the relationship between the six factors and the assigned class and the model induction tree thus generated was used for predicting the class for remaining one-third data. The k factor is assigned based on the predicted class.

34 Self-Organizing maps (SOM) Satisfying the second objective of this work, Self-Organizing maps, an unsupervised machine learning technique is used. Past researchers in the area of flow time prediction have attempted to use macro modeling framework covering entire set of jobs for prediction irrespective of their characteristics. In other words, disparate and heterogeneous jobs have the same prediction model with same regression coefficients. It is evident that jobs that are dissimilar may have the same prediction model with the same coefficients for the independent variables. Although they have been generally successful, a single macro model may not allow an understanding of how the system changes in state and which variables are important under different conditions. Even if a macro model has good predictive behavior, a set of micro models associated with each state may produce a more precise model for prediction. It is for this reason that this work uses SOM (perhaps for the first time) to initially cluster the training data and then constructing linear regression models for each cluster. This method will be referred to as SOM-RA. Tanagra workbench was used for implementing this method. The simulation output is used to derive the values of Average Flow Time (AFT) and remaining work content (RWK). Total processing time (TWK) of each job is computed from (10 10) job shop problem instance (Lawrance, 1982). The data file in text (.txt) containing records with TWK, RWK and AFT fields is generated and used as input to Tanagra software for SOM clustering. Weka 3, a Data Mining Software is used to create separate linear regression equations establishing the relationship between flow time (predicted value) and the TWK, AFT, and RWK. Set of six regression equations has been used to predict the flow time of remaining one-third of the data in the original set.

35 Merits of this research work Compared to previous research work done in this area, the methods proposed in this research are more robust and easily amenable for an automated system for due date prediction. Application of data mining is largely confined to big companies for want of access to software whose acquisition cost is prohibitively high. In this research, software available through OGL license has been used thus demonstrating that even medium and small enterprises can reap the benefits of machine learning with little extra effort Organization of the thesis First chapter presents the introduction to data mining, machine learning, supervised machine learning, unsupervised machine learning, manufacturing simulation, features of manufacturing simulation software, advantages of manufacturing simulation. The problem and the proposed solution, the methodology adopted for solving the problem have been described. The contribution made by this work towards more robust flow time prediction is highlighted. The second chapter presents the literature review of the previous study done in this area. The chapter concludes with a summary of findings from literature survey. The third chapter presents the details of the flow time estimation using classification methods. A new method based on Model tree induction namely, MT-TWK is presented in this chapter. The fourth chapter describes a novel method for manufacturing flow time estimation using self-organizing maps (SOM). Self-Organizing maps provide a way of representing multi-dimensional data in much lower dimensional spaces,

36 20 usually one or two dimensions. This process of reducing the dimensionality of vectors is essentially a data compression technique known as vector quantization. One of the most interesting aspects of SOM is that they learn to classify data without supervision. A SOM does not need a target output to be specified unlike many other types of network. Instead, where the node weights match the input vector that area of the lattice is selectively optimized to more closely resemble the data for the class and over much iteration the SOM eventually settles into a map of stable zones, each zone is effectively a feature classifier. Our research uses SOM to initially cluster the training data and then constructing local regression models for each cluster. This method is referred as SOM-RA method. Fifth chapter presents the computational results of MT-TWK and SOM- RA method and also a comparative study of the performance of supervised and unsupervised machine learning techniques for flow time estimation in a dynamic job shop. Sixth Chapter contains the conclusions of this work and the scope of future work is also presented.

37 21 CHAPTER 2 REVIEW OF LITERATURE 2.1 Due Date Assignment methods In a job shop production system, each job on arrival is assigned a due date for delivery before it is actually released to the shop floor for processing. An analysis of literature reveals that a variety of methods have been suggested to assign due dates. They may be classified as 1) Direct Procedures (conventional rules) 2) Simulation Methods 3) Analytical Methods 4) Statistical Analysis 5) Methods using Artificial Neural Networks 6) Methods using heuristics. 7) Methods using Data Mining 2.2 Direct Procedures Direct Due date assignment procedures assign due dates using information such as job characteristics and shop conditions and can be classified as

38 22 (a) Exogenous. The due dates are set by an external agency and are announced upon arrival of the job. Two types of due date assignment methods are well known in this category (Conway,1965): (i) Constant (CON): All jobs are given exactly the same flow allowance. (2.1) where d i is due date, r i is job arrival date and k is a constant allowance. (ii) Random (RAN) : The flow allowance for a job is randomly assigned (2.2) where k is a constant and e i is a random number. It was observed that both these methods entirely ignore any information about the arriving job, jobs already in the system, future jobs or the structure of the shop itself. (b) Endogenous: The due dates are set internally by the scheduler as each job arrives on the basis of job characteristics, shop status information and an estimate of the job flow time. Some of the due date assignment methods which fall into this category (Conway,1965) are (i) TWK :Due dates are based on total work content (2.3) where is processing time of i th job (ii) SLK: Jobs are given flow allowances that reflect equal waiting times or equal slacks.

39 23 (2.4) (iii) NOP : Due dates are determined on the basis of a number of operations to be performed on the job. (2.5) where n i is the number of operations of job i. Due date assignment methods such as TWK, SLK and NOP take into account job characteristics in one form or another. Due date assignment methods such as JIQ, JIS, PPW and WIQ consider shop status information. (iv) JIQ : Due dates are determined based on current queue lengths( in the system (Eilon and Chowdhury,1976) (2.6) (v) JIS : Due dates are determined based on information on number of jobs in the system ( ) (Weeks,1979) (2.7) (vi) PPW : Due dates are determined based on information on waiting time in the system ( ) (Kanet,1982) (2.8) (vii) WIQ : Due dates are determined based on the total processing time of all jobs in the work center queues on job i s routing. (2.9)

40 Simulation Method Hsu and Sha (2004) observed that computer technology advances have made simulations to be one of the public methods used in due date assignment research Week s (1979) used simulation to find the following rule, based on the number of jobs in the system, to provide good due date performance: (2.10) where : D = mean delay(waiting ) time per job σ D = standard deviation of waiting time for system JISi = number of jobs in entire system when job i arrives 1 if JIS i J σ J Α(JIS i ) = 0 if J σ J < JIS i < J + σ J 1 if JIS i J + σ J J = mean number of jobs in system σ J = standard deviation of number of jobs Ragertz and Mabert (1984) used a simulation study to establish that, firstly, both job characteristic and shop status information should be used to develop due date assignment rules. Secondly, the dispatching rule used to sequence jobs at work centers influences shop performance. Thirdly, information about work center congestion along a job s routing is more useful information than general shop conditions. And fourthly, the use of more detailed information, like the RMR rule provides an only marginal improvement in performance over other rules, like WIQ and JIQ, that use more aggregate information. Udo (1994) based on simulation study concluded that

41 25 (1) Using workload information in assigning due-dates for jobs in multimachine shop environments may improve shop performance in terms of percentage of tardy jobs, mean lateness and lateness variance. (2) The form of structure of the workload information may also affect shop performance (3) Using the cumulative distribution function of a shop s workload may likely produce better performance than using proportional workload information. (4) The SLK/SPT combination is likely to be the most desired combination in terms of mean job lateness and percentage of tardy jobs while the TWK/EDD combination is the most desired combination in terms of lateness variance. Philipoom (2000) carried out a simulation study to examine the trade-offs involved in changing a dispatching rule in a shop that can set due-dates subject to penalties for the length of the quoted lead time and tardiness. The results of this study indicate that the static process time rule works well for the modest tardiness penalties. 2.4 Analytical Methods Enns (1993) proposed a method which assigns due-dates to jobs based on the predicted completion time of jobs. The following relationship is used to assign due-date of jobs. Due- date of a job is a sum of arrival time, total processing time, and product of a number of operations and expected waiting time per operation. (2.11)

42 26 When the jobs in the shop increase, the expected waiting time at each machine increases. Similarly, as the jobs in the shop decrease, the expected waiting time of an operation decreases. Dynamic total work content (DTWK) method is a modification of the TWK method, wherein the due-date allowance factor K, is determined using the information about the status of the job shop at the time a job arrives at the shop (Cheng and Jiang,1998; Sha and Liu,2005). The dynamic due-date allowance factor Kt when a job arrives at time t is determined using Little s Law. If N s denote the number of jobs in the system, λ denote the average job arrival rate and F denote the job flow time,then Little s law for a shop in steady state can be expressed by (2.13) The average flow time of a job F t in the shop with J t number of jobs when the shop load is relatively is, (2.14) F t can be expressed as the product of the dynamic flow allowance factor K t and the average processing time of a job. Thus, (2.15) Using the above two equations, K t can be expressed as (2.16)

43 27 To obtain an allowance factor not less than one, the dynamic allowance factor used for due-date assignment is Max (1,K t ) instead of K t. Thus, due date of a job is determined as (2.17) Baykasoglu and Gocken (2007) proposed ADRES (Adaptive Response Rate Exponential Smoothing) which is an interesting variant on simple smoothing. It has an important advantage over normal smoothing models. This is mainly because of the manner in which the smoothing constant is chosen. In ADRES smoothing there is no requirement to actually choose an α value. The word adaptive in its name gives a clue to how the model works. The α value in the ADRES model is not just a single number, but rather adapts to the data (Wilson and Keating, 2002). In a dynamic job shop, where the jobs of various types enter and leave the production system continually in a random manner, the queuing time of a job (i.e. the time that the job spends at work centers without being worked on, mainly because there are other jobs ahead of it) normally accounts for the major portion (>90% in some case) of its lead time ([Plossl, 1985] and [Chang, 1997]). Besides being a major portion of jobs lead time, its ambiguousness makes an estimation of queue waiting time increasingly important, especially in such complex systems. Estimation of job i s queue waiting time for each station on its route is determined by using ADRES technique because of its simplicity and ability to adapt to changing circumstances. ADRES technique uses only two types of data for the purpose of estimating future values of the queue waiting times. One of them is an actual queue waiting times of each station and the other is the last estimated queue waiting times of each station.

44 28 LDP (Last Data Point) is just a special case of ADRES when we set the parameter α constantly equal to 1. This model resembles the ADRES model except the technique uses the most recent actual queue waiting time of orders for estimating the queue waiting time of next order. Response Mapping Rule (RMR) was investigated by Ragartz and Mabert (1982) for the single machine environment is given by (2.18) This rule for determining due date utilized response surface mapping procedures to identify important independent variables (Xi s) and estimate various functional rule equations. Veral (2001) proposed a predictor model as given below : (2.19) where FT ij is the average allowance for operation i at machine j,t ij the processing time of operation i at machine j and ρ j is the average utilization of machine j. This FT ij equation tries to capture time in system variations in different jobs as a function of processing times and machine utilization levels. 2.5 Statistical Analysis Statistical Analysis uses regression method to find the relations between order flow times and other variables. Vinod and Sridharan (2011) proposed a regression based meta-model, based on the regression concept proposed by Kleijnen and Standridge (1988). The due- date assignment methods, the

45 29 scheduling rules and the mean inter-arrival time of jobs are the independent variables. The performance measures are the dependent variables. The regression based meta-models provide a concise analytical representation of the simulation model. The results obtained pertaining to the combination of due-date assignment methods and scheduling rules using these meta-models confirm well with those of the simulation models. 2.6 Artificial Neural networks Computer intelligence or Artificial Intelligence (AI) is the generic name given to a field of computer science dedicated to the development of programs that attempt to be as smart as humans in the sense of their abilities to learn. Artificial Neural Networks (ANNs) are one of the AI techniques that have gained an important role in solving problems with extremely difficult or unknown analytical solution (Lawrence, 1994). ANNs property of learning from examples makes them powerful programming tools when domain rules are not completely certain or when some amount of inaccuracy or conflicting data exist (Medsker and Liebowitz, 1994). Philipoom et al., (1994) used neural network models to forecast the order due date in a simple flow shop manufacturing system. The neural network model yields better forecasting results than conventional DDA.It was observed that neural network could outperform conventional regression based DDA rules and that neural networks were worthy of further experimentation as the methodology of choice in due date prediction.

46 30 Derya Eren Akyol (2007) have observed that artificial neural networks (ANN) have attracted much attention because of their characteristics such as By exposing examples of the relationship to the network, ANNs learn and are used to capture the complex relationship between the input and output variables that are difficult or impossible to analytically relate such as the relationship between the performance measures and operational policy of a manufacturing system or between the job characteristics and the performance measure of a scheduling system. After learning the unknown correlation between the input and output data, they can generalize to predict or classify for the cases they were not exposed to. In some cases of designing manufacturing systems, ANNs are preferred to time-consuming simulation approaches. In static scheduling environments, it is possible to obtain the optimal or near optimal schedules by mathematical modeling, dynamic programming, branch and bound or other advanced methods. But, since real manufacturing environments are dynamic, flexible scheduling methods are needed to react to any change in the system that varies with time. Thus, in dynamic scheduling environments, ANNs are employed to reduce the need for rescheduling. While optimizing networks such as Hopfield network and its extensions are involved directly in the optimization by mapping the\ scheduling objective functions to be optimized and constraints of the problems onto these networks. Min et al., (1998) observed that competitive networks can detect regularities and correlations in input vectors and adapt future responses accordingly

47 31 In recent years, besides their advantages of parallelism, learning, generalization capability, nonlinearity, and robustness, several limitations of ANNs such as settlement into local minima, trial and error parameter determination process, long learning time are perceived. To compensate its disadvantages, hybrid systems in which ANNs are combined with traditional heuristics or metaheuristics and/or evolutionary algorithms or different approaches, and evolutionary ANNs have been proposed. Hsu and Sha (1994) modeled an ANN-based DDA rule.the model is shown in figure 2.1. Figure 2.1 ANN-based due date assignment model Two types of ANN-based DDA rules are developed. The first rule ANN- Com, adopts one neural network to predict the waiting time (including waiting time in Pre-shop Pool and waiting time in the shop). The second rule ANN-Sep uses two neural networks to predict the waiting time in PSP and the shop separately.

48 Data mining techniques Ozturk et al.,(2006) have demonstrated the use of Data mining, specifically the regression tree approach, in make-to-order manufacturing flow time estimation. This work simulates three shop types, (SHOP-I, SHOP-A, and SHOP-V), generates the training and test data for DM. The analysis starts with a large set of attributes reflecting static and dynamic order and shop characteristics. An empirical scheme was developed to select a reasonably small subset of attributes having a relatively high predictive power. Data Mining approach was compared with linear regression and three other lead time (LT) estimation methods from the literature. Their attribute selection scheme proves to be effective. Eliminating attributes up to a certain point do not decrease the estimation quality. Hence, the conservative sets of attributes selected are almost as successful as the full set of attributes, regardless of the shop type. It was also observed that Data Mining was particularly successful in exploring the patterns in data and determining critical attributes in LT estimation. Some of the selected attributes are common to all shop types. In particular, the total process time of the order, the number of parts waiting or in process at the first machine on the order s route, and total expected potential load of the machines on the order s route prove to be essential. Among the three shop types, SHOP-V is the easiest to predict with DM. For this shop, the average absolute error is below 9% of the average realized flow time with conservative or risky attribute sets. This value is 12 16% for other shops. Their results indicate that the regression tree approach of DM coupled with proposed attribute selection scheme outperforms the methods compared.

49 33 Among these, linear regression with selected attributes has the closest estimation quality to DM. TWK has the worst performance. They concluded that the knowledge-based approaches constitute a viable alternative and can prove more effective in estimating LT compared to many other models. Instead of postulating a model and estimating its parameters from the data, DM focuses on learning what the data may reveal with all its peculiarities. The study done by Ozturk et al.,(2006) can be complemented by its application in an actual manufacturing environment. The application seems feasible given the computer integrated resource planning and shop floor control systems supported by state-of-the-art data warehousing software available today. A shop floor control system can be adjusted to collect the necessary data, and data warehousing can be used to extract the DM attribute values from these data. This underlines the essence of systematic treatment of historical data to be used not only for report generation but also for modeling purposes to provide decision support. DM is known to be effective with large amounts of data. In this study, they have used simulation as the data source. In a real life manufacturing system, accumulation of sufficient data for DM may take a long time during which the environment (part mix, demand, processes and so on) may change. Therefore, the potential of using DM with a limited amount of data may be worthwhile to explore. They have observed that attribute selection is an essential part of the DM process. Most attribute selection methods in DM literature deal with categorical attributes, and they are not directly applicable to continuous attributes as in this case. It seems there is a need to adapt these methods or to develop new ones for selection of continuous attributes.

50 34 Sha and Liu (2004) proposed a model that incorporates a data mining tool(decision tree) into the widely practiced and studied static due date assignment method (i.e.,twk method) to assign a suitable due date allowance factor k based on mined scheduling knowledge regarding due date assignment when the new order arrives.the new due date assignment method is capable of dynamically adjusting the due date allowance factor by using feedback information on critical factors available from mined scheduling knowledge. Following objectives were set for their study (1) use a data mining tool Decision tree to mine the knowledge of job scheduling regarding the date assignment in a dynamic job shop, which are expressed in IF-THEN rules, to assign a more accurate and precise factor k of the TWK method when the job arrives, so as to improve the performance of the TWK method; (2) mining the knowledge of job scheduling about due date assignment to assist production managers in comprehending which factors are most important for predicting the job due date, and how the job due date is affected by various levels of critical factors. model are The job characteristics and shop conditions that were considered in the (1) Total processing time of job (2) Number of operations of the job (3) Number of job s in work center queues on job i s routing when it is released to the shop (4) Number of jobs in the system when job is released to the shop. (5) Total processing time for all jobs in work-centre queues on job i s routing when it is released to the shop.

51 35 Based on this study Sha and Liu (2004) concluded that (1) The rule based TWK model clearly outperforms both the TWK and DTWK methods in ALL combinations of dispatching rule and shop utilization. This illustrates that the RTWK model renders a more accurate job due date prediction. (2) Relative performances of the DTWK and TWK methods depend on the dispatching rule used. When the SPT rule is used, the TWK method is better than the DTWK method, in the opposite case, the DTWK method is better than the TWK method, except for the combination of the FCFS rule and 80% shop utilization. The findings of this study indicate that production managers should develop a due date assignment methods that depend upon the characteristics of their own production system by using a data mining tool (i.e., decision tree).by analyzing historical production data, the scheduling knowledge about the due date assignment can be extracted and then be expressed in IF-THEN rules. This form of representing knowledge provides clear indications as to which factors are most influential in predicting the job due date, and of how the job due date is affected by various levels of critical factors. At the same time, the scheduling knowledge can significantly improve the performance of the due date assignment process. 2.8 Heuristics Raghu and Rajendran (1995) proposed a due date setting policy by using the following job and shop characteristics, such as work content of the job, the average flow time of previously (i.e. recently) completed jobs and the remaining work content of the jobs presently in the shop. The due date equation can be written as :

52 36 (2.20) where k 1, k 2, k 3 are the parameter coefficients. The average flow time of previously completed jobs (F) is actually a moving average and the flow time of three recently completed jobs are considered, as suggested by Vig and Dooley (1991). R is the remaining work content of all jobs present in the shop at the time of arrival of the job i. Among the job characteristics, the number of operations of a job is not used since it is found that in the shop under study many jobs have many operations in common with some operations missing for some jobs. It was decided to use the remaining work content of jobs presently in the shop instead of the queue length at machines, since it is a better indicator of the expected congestion in the shop and also is dimensionally consistent with the other two parameters. Moreover, as indicated by Vig and Dooley (1991) the addition of more parameters does not result in a proportional increase in accuracy of the due date prediction. The proposed rule is dynamic since the congestion information is calculated afresh for each order. The RMS value of lateness has been used as the performance measure to be minimized (henceforth referred to as the objective function). This measure indicates the conformance to the due-dates and also is a surrogate measure of variance in the deviation from the due date. The RMS value of lateness is given by the following equation: RMS value of Lateness = ) 2 ]) (2.21) Three different methodologies are used to set the control parameters viz., Hooke and Jeeves pattern search algorithm, simulated annealing algorithm and simulated annealing with regression analysis. It was concluded that simulated annealing algorithm was better than that obtained with the search algorithm. As

53 37 a method of fine tuning the search, the simulated annealing algorithm is used in combination with regression analysis. It was found that this results in an improvement in the solution obtained, as well as in the computational effort. Baykasoglu and Gocken (2009) proposed a genetic programming technique known as Gene Expression Programming (GEP) algorithm to estimate flow time of jobs in a multi-stage job shop. The main objective of their research was to compare the performance of the GEP with previously proposed due date assignment models from the literature with respect to some selected performance criteria. Based on the study, GEP performs better than other DDAM for all performance measures. GEP s MAPE performance, which holds for delivery reliability, is This means, for the 8602 jobs, the assigned due dates by GEP has deviated from the actual due date approximately 8.65% on average. For the same performance measurement, ADRES showed the second best performance. All other DDAMs are worse than GEP DDAM (all are above 12%) MPE performances of GEP and ADRES are nearly the same but have opposite signs. This means, on average, GEP overestimates the due dates of the orders (negative bias). On the other hand, ADRES underestimates the due dates of the jobs (positive biased) on average. In spite of being static, GEP has considerable better performance than dynamic DDAMs (i.e., ADRES, DPPW) for many performance measures. 2.9 Findings from literature review and research gaps This study has brought forth the general framework within which research on manufacturing flow time estimation was conducted during the last 5 decades with particular emphasis on developments taking place in the last 2 decades. This review helps in concluding that

54 38 (1) Research on flow time estimation is a distinct area of interest among researchers and is different from scheduling research. Flow time estimation research focus on narrowing the gap between actual flow time and estimated flow time. However, scheduling research focuses on optimizing one or many of the cost factors such as tardiness, lateness etc. and mostly involve model-based research. (2) Flow time estimation has kept pace with the developments taking place in areas such as simulation, mathematical sciences, soft computing techniques such as simulated annealing, data mining etc. (3) Factors which may be categorized as shop characteristics, job characteristics and system characteristics which may vary from just one as in the case of TWK method to 26 factors in the method proposed by Ozturk et al.,(2006) are used for lead time estimation. (4) Most of the authors rely on simulation methods for generation of data used for further analysis and it is evident from the papers reviewed that every author makes use of different factory set up for simulation purposes. Ozturk et al.,(2006) use a compressive model with three distinct set up such as A,V and I layouts, Sha and Liu(2005) use a factory setup with different jobs following the same process but with different processing times. Sha and Liu (2005) used a simple method of classification (WAUR method). (5) In a dynamic job shop with the random arrival of orders and shop congestion levels shifting, it is inappropriate to conclude any one priority rule is better than other. Instead, it will be better to have some look ahead function and an ability to choose the most appropriate priority rule will be another possible research area.

55 39 (6) With growing interest in big data and organizations carrying huge repository of all kind of manufacturing data author foresees a need for methods that will take advantage of this in helping organizations to quote lead times that are realistic and achievable.

56 40 CHAPTER 3 METHODOLOGY 3.1 Data classification The methodology adopted for this study is presented in this chapter. It is being addressed in two chapters. Chapter 3 describes the previous work done by authors on the application of supervised learning methods for flow time estimation and subsequently a new method for flow time estimation is proposed. Chapter 4 proposes a new method using Self-Organizing Maps for flow time estimation. Supervised learning discovers a pattern in the data that relate data attributes with a target (class) attribute. Classification is one of the forms of supervised learning that can be used to extract models describing important data classes or to predict categorical labels. This chapter describes data classification, techniques used, well-known algorithms used for data classification, previous work done on flow time estimation using classification methods and the proposed method for flow time estimation. The overall flow of the proposed MT-TWK model is presented in Figure 3.1. Another method based on unsupervised learning is proposed in next chapter. Data classification is a two-step process as shown in Figure 3.2(Tan et al.,2006). In the first step, a model is built describing a predetermined set of data classes or concepts. The model is constructed by analyzing database records described by attributes. Each record is assumed to belong to a predetermined class, as determined by one of the attributes, called the label attribute. The data records are analyzed to build the model collectively from the training data set.

57 41 Run simulation model to generate case set Transfer data to EXCEL workbook Compute k value (TWK method) for each case instance Create k-value file for this case set and use WEKA software to generate histogram Assign categorical values and also compute CV Compute values for F2,F3,F4,F5,F6 Divide case set to training set and test set Use training set for learn model phase using KNIME software Use inducted model tree for prediction of category using KNIME software for test set Assign k-value for predicted category and compute predicted flow time Figure 3.1 Overall flow of the MT-TWK model for flow time estimation

58 42 The individual records making up the training set are referred to as training samples. In the second step, the model is used for classification Overview of data classification techniques There are several basic techniques for data classification. Decision tree induction, Bayesian classification and Bayesian belief networks, neural networks, and association based classification are some of the well-known techniques. Each technique employs a learning algorithm to identify a model that best fits Figure 3.2 General approach for building the classification model the relationship between the attribute set and class label of the input data. The model generated by a learning algorithm should both fit the input data well and correctly predict the class labels of records it has never seen before. Hence, a key objective of the learning algorithm is to build models with good generalization capability i.e., models that accurately predict the class labels of previously unknown records.

59 General approach to solving classification problems A classification problem may be solved by asking a series of questions about the attributes of the test record. Each time an answer is received, a followup question is asked until a conclusion is reached about the class label of the record. The series of questions and their possible answers can be organized in the form of a decision tree, which is a hierarchical structure consisting of nodes and directed edges. The tree has three types of nodes A root node is one which has no incoming edges and zero or more outgoing edges, Internal nodes are those which has exactly one incoming edge and two or more outgoing edges, Leaf or terminal nodes have exactly one incoming edge and no outgoing edge. In a decision tree, each leaf node is assigned a class label. The nonterminal nodes which include the root node and other internal nodes, contain attribute test conditions to separate records that have different characteristics. Once the decision tree is constructed, it is easy to classify a test record. Starting from the root node, the test condition to the record is applied and then the appropriate branch is followed based on the outcome of the test. This will lead to another internal node, for which a new test condition is applied, or to a leaf node. The class label associated with the leaf node is then assigned to the record. Efficient algorithms are available that will induce a reasonably accurate decision tree within a short time. One such algorithm is Hunt s algorithm which is a precursor to many decision tree induction algorithms such as ID3, C4.5, and CART.

60 Hunt s Algorithm The following is a recursive definition of Hunt s algorithm. Step 1 : If all the records in D t belong to the same class y t, then t is a leaf node labeled as y t. Step 2 : If D t contains records that belong to more than one class, an attribute test condition is selected to partition the records into smaller subsets. A child node is created for each outcome of the test condition and the records in D t are distributed to the children based on the outcomes. The algorithm is then recursively applied to each child node. In this, D t is the set of training records that are associated with node t and Y = { y 1, y 2,.,y c } are the class labels Decision tree induction The basic algorithm for decision tree induction is a greedy algorithm that constructs decision trees in a top-down recursive divide and conquer manner. During each iteration, the algorithm considers the partition of the training set using the outcome of a discrete function of the input attributes. After the selection of the appropriate split, each node further subdivides the training set into smaller subsets, until no split gains sufficient splitting measure or a stopping criteria is satisfied. This algorithm more commonly known as the ID3 algorithm is the precursor to the popular C4.5 algorithm. The decision tree induction algorithm is presented in Figure Extracting classification rules from decision trees The knowledge represented in decision trees can be extracted and represented in the form of IF-THEN rules. One rule is created for each path from the root to the leaf node. Each attribute-value pair along a given path forms a conjunction in the rule antecedent ( IF part). The leaf node holds the class

61 45 prediction, forming the rule consequent ( THEN part). The IF-THEN rules may be easier for humans to understand, especially if the given tree is very large. Algorithm:Generate_decision_tree. Narrative:Generate a decision tree from the given training data. Input: The training samples, samples, represented by discrete-valued attribute; the set of candidate attributes, attribute-list. Output:decision tree. Method: (1) create a node N; (2) if samples are all of the same class, C then (3) return N as a leaf node labeled with the class C; (4) if attribute-list is empty then (5) return N as a leaf node labeled with the most common class in samples;//majority voting (6) select test-attribute, the attribute among attribute-list with the highest information gain; (7) label node N with test-attribute; (8) for each known value ai of test-attribute; (9) grow a branch from node N for the condition test-attribute = ai; (10) let si be the set of samples in samples for which test-attribute = ai; // a partition (11) if si is empty then (12) attach a leaf labeled with the most common class in samples; (13) else attach the node returned by Generate_decision_tree (si, attribute-listtestattribute); Figure 3.3 Basic algorithm for inducing a decision tree from training samples

62 C4.5 Algorithm The C4.5 algorithm proposed by Quilan (1992) is an extended form of the ID3 algorithm with some additional specific enhancements such as the ability to handle continuous attribute values and missing attribute values, alternative measures for selecting attributes and pruning decision trees. It uses gain ratio as splitting criteria. The splitting ceases when the number of instances to be split is below a certain threshold. Error-based pruning is performed after the growing phase Splitting criteria Oded Z. Maimon and Lior Rakach (2005) have identified Impurity-based criteria, information gain, Gini index, likelihood-ratio-chi-squared statistics, DKM criteria, normalized impurity based criteria, gain ratio, distance measure, binary criteria, towing criteria, orthogonal criteria, Kolmogorov-Smirnov criteria and AUC-splitting criteria as some of the well known splitting criteria referred in literature. Among these C4.5 uses gain ratio as the splitting criteria and is described below. The gain ratio normalizes the information gain as follows Gain Ratio (a i,s) = Information Gain(a i,s)/entrophy(a i,s) It is computed in two stages. First, the information gain is calculated for all attributes. Next, the attribute that has the best ratio gain is selected, taking into consideration only attributes, that have performed at least as good as the average information gain. It has been shown that the gain ratio tends to outperform simple information gain criteria, both from the accuracy aspect, as well as from classifier complexity aspect.

63 Stopping criteria The growing phase continues until a stopping criteria is triggered. The following conditions are common stopping rules. 1. All instances in the training set belong to a single value y. 2. The maximum tree depth has been reached. 3. The number of cases in the terminal node is less than the minimum number of cases for parent nodes Pruning Methods A tight stopping criteria tends to create small and under-fitted decision trees, while loose stopping criteria tends to generate large decision trees that are over-fitted to the training set. Pruning methods are employed to produce a sufficiently accurate compact decision tree. Thus the accuracy of a pruned decision tree indicates how close it is to the initial tree. Cost complexity pruning, reduced error pruning, minimum error pruning, pessimistic pruning error-based pruning, optimal pruning, minimum description length (MDL) pruning are some of the pruning methods employed. C4.5 algorithm employs error-based pruning. 3.2 Flow time estimation using C4.5 algorithm Sha and Liu (2004) proposed rule-based TWK due date assignment model. They used See5 software package (C4.5 commercial Windows version) as the decision tree learning tool to construct a decision tree and derive rule set from the case set, which represented the scheduling knowledge about due date assignment, for developing the rule based TWK due date assignment method.

64 48 This may be the first attempt to use data mining for the purpose of due date assignment. The main objectives of their study was (1) use an data mining tool Decision tree induction- to mine the knowledge of job scheduling regarding due date assignment in a dynamic job shop, which are expressed in IF-THEN rules, to assign a more accurate and precise factor k of the TWK method (Eq. 2.3) when the job arrives, so as to improve the performance of TWK method. (2) mining the knowledge of job scheduling about due date assignment to assist production managers in comprehending which factors are most important for predicting the job due date, and how the job due date is affected by various levels of critical factors. Sha and Liu (2005) considered six attributes in their study, which are (1) Job Type (2) Remaining work content on 2 nd bottleneck machine for all the jobs in the shop (3) Remaining work content for 1 st bottleneck machine for all the jobs in the shop (4) Remaining work content on the 3 rd bottleneck machine for all the jobs in the shop (5) Number of jobs in the system when a job is released into the shop and (6) Sum of remaining processing time for all jobs in the shop.

65 49 Figure 3.4 RTWK model Among the six attributes, attributes 1 and 2 are related to job characteristics and remaining ones are related to shop conditions. The graphical representation of RTWK model is given in Figure 3.4. Each case is a contextualized piece of knowledge representing an experience. The cases are then collected on the input variables and due date allowance factor k for the next 10,000 jobs as a case set. For each of the case sets (3 dispatching rules, 2 shop utilization levels), the decision tree tool (See5) induced the specific characteristic rules for each combination of dispatching rules and shop utilization. A total of 76 rules were inducted for all combination of dispatching rules and shop utilization factor. The performance of RTWK model was compared with TWK model, DTWK (Eq 2.17) model. The RTWK model appears to be the best with respect

66 50 to Mean Absolute Lateness (MAL) and Mean Squared Lateness (MSL) in all combinations of dispatching rule and shop utilization. 3.3 Flow time estimation using Regression Trees The C4.5 algorithm is more suited for categorical response variable. For continuous response variables, regression trees are more suitable. In regression tree approach basic principle of tree growth is the same. However, while decision tree can exploit frequency of a class at a node as the basis of impurity, variance figures must be employed in regression trees. Basic methods in this approach are CART and M5. In CART algorithm the total impurity measure of an offspring node is the weighted sum of the variances of transactions in that node. The M5 regression tree algorithm (Quilan, 1992) is similar to CART with two basic differences. The impurity measure in M5 is based on sample standard deviation instead of sample variance. Also, instead of predicting a fixed value at a terminal node, training transaction falling into a node is used for fitting a linear regression model for the node. Ozturk et al., (2006) used regression tree approach for manufacturing lead time estimation. In this study, a total of 26 data mining attributes were identified initially. About one-fifth of the attributes identified was pertaining to Part type characteristics. The remaining attributes were pertaining to shop/order characteristics. Cubist software which is based on M5 (Quilan, 1992) was used as DM tool. Cubist applies linear regression to the set of transactions classified in each terminal node. Hence, conclusion part of each rule is a distinct linear regression model. Ozturk et al (2006) articulated that using all 26 attributes will only add to the variance of the estimates without adding much to the model's predictive

67 51 power. A small but significant subset of attributes was selected using an attribute selection method and Cubist runs were repeated with these selected attributes. The performance of proposed data mining approach was compared with four other methods. The first method is linear regression where independent variables are restricted to the attributes in the conservative and risky sets. The second comparison is the method proposed by Ruben and Mahmoodi (2000) and is given below (3.3) where lead time (LT) estimate for an arriving order i (LT i ) is given by an expression with three elements: number of parts waiting or in process at the bottleneck machine k(q k ), number of parts waiting in machine queue j on the route R i of part i (Q j ) and interaction between the two. The third method proposed by Hopp and Sturgis (2000) was also taken up for comparison (Eq.3.4). (3.4) where µ(n i ) is the mean flow time and σ(n i ) is the estimated standard deviation of flow time with n i being a number of jobs in the system. The fudge factor, ff is used to adjust the LT estimation dynamically to account for a service level. The standard deviation component is Z α. The last method used for comparison was Total Work Content (TWK) method (Eq.2.3) Based on the simulation data and cubist runs and application to each of the 10 instances of three shops, Ozturk et al., (2006) concluded that

68 52 (a) Cubist software implementing regression tree algorithm was as good with conservative attribute set as it was with all 26 attributes. (b) Shop configuration (SHOP I) was most difficult to predict. In Shop-I, routings are random and bottlenecks may come later in the route. (c) Linear regression with selected attributes performs 6-15% worse than Cubist with conservative attribute set. (d) Compared to other three methods, the performance of linear regression with selected attributes is the closest to that realized by Cubist. (e) Regression tree approach of DM coupled with the attribute selection scheme proposed outperforms all the methods used for comparison. Among these, linear regression with selected attributes has the closest estimation quality to Cubist method. 3.4 Flow time estimation using Model- Tree induction A method for flow time estimation using Model Tree Induction is proposed in this section. 3.5 Model Tree Induction Method Model tree induction method is proposed in this work. Classical decisiontree and decision-rule learning methods have been developed in an environment in which class values, and originally attribute values too, were discrete. A new technique for dealing with continuous class learning problems, the model tree was developed by Quinlan (1992) and embodied in a learning algorithm called M5. In the first stage, a decision tree induction algorithm is used to build a tree. Instead of maximizing the information gain at each interior node, a splitting criterion is used that minimizes the intra-subset variation in the class values

69 53 down each branch. In the second stage, consideration is given to pruning the tree back from each leaf, a technique that was pioneered independently by Breimann et al., (1984) and Quinlan (1986), and this has become the standard in decision tree induction. The difference between decision-tree induction and model- tree induction is that when pruning to an interior node, consideration is given to replacing that node by a regression plane instead of a constant value. The attributes that serve to define that regression are precisely those that participate in decisions in nodes subordinate to the current one. This algorithm was named as M5 '. (see Wang and Witten (1997) for details). The original M5 algorithm was modified to reduce the tree size dramatically with only a small penalty in prediction performance leading to much more comprehensible models. KNIME workbench (available under OGL at implemented the M5' algorithm for model tree induction Methods and procedures The MT-TWK method proposed in this work is based on a machine learning method proposed by Wang and Witten (1997). This method provides a public domain scheme for inducing decision tree from data that involves continuous classes. The independent variables used by Sha and Liu (2005) to predict the k factor was used in this study. The bottleneck machines were characterized by the presence of large waiting lines and these factors are used as predictor variables and the computed value k is the ratio of actual flow time that the job experiences in the system divided by the total processing time of that job. The MT- TWK method cannot accept nominal values and works with categorical values only. Sha and Liu (2005) proposed a method for mapping the discrete values to categorical values,

70 54 as described below. Figure 3.5 is a sample histogram of a number of occurrences of factor k in a typical data set. Figure 3.5 Due data allowance factor, k Vs the frequency distribution of occurrences The representative value of class i (CV i ) is assigned and calculated by the weighted average method as follows: (3.5)

71 55 where n ij and k ij denote the number of occurrences of factor j in class i and the value of factor j in class i respectively. After the target classes are generated, the factor k for all the instances of the case set is transferred into a categorical target class. For example, categorical class values of A, B, C, D and E is assigned for 1 k 1.2, k 2.2, k 3.0, k 3.8,k > respectively. See Sha and Liu (2005). First cum First Served (FCFS), Shortest Processing Time (SPT) and Earliest Due Date (EDD) dispatching rules are used in this study as these rules are widely used in industry and in academic research as well Data collection Simulation has been commonly used to study the behavior of real-world manufacturing system to gain a better understanding of underlying problems and to provide recommendations to improve the systems. An enterprise will not easily open up its databases or data warehouse to any researcher, unless there is an implicit trust between them which usually takes time to develop. Two common approaches to getting around this data access problem are either using the open source data available in various repositories or by generating the data by building a simulation model. In the absence of such repositories in the domain of the researcher, a simulation model is an easier way to build up models for representing real life scenarios, to enhance system performance in terms of productivity, queues, resource utilization, cycle time, flow time, etc.,(sk Ahad Ali, Hamid Seifoddini, 2006) Job Shop Production system A job shop production system enables the manufacture of a number different part types so that each part type is allowed to have unique routing through the system (Rangsaritratsamee et al., 2004, Abidi 2009). In the

72 56 scheduling literature, job shop scheduling problems are classified into two categories static and dynamic job shop scheduling problems (French 1982, Maccarthy and Liu 1993). In the static problem, there are a finite set of jobs consisting of a specified sequence of operations, requiring processing on various machines. The dynamic job shop problem is described as follows. The job shop consists of a set of machines (work stations) and jobs of various types arrive continuously over time in a random manner. Each job requires a specific set of operations that need to be performed in a specified sequence (routing) on the machines and involves a certain amount of processing time, the job shop becomes a queuing system: a job leaves one machine and proceeds on its route to another machine for the next operation, only to find other jobs already waiting for the machine to complete its current task, so that a queue of jobs in front of that machine is formed. Dispatching rules are used to decide the order or priority for the jobs waiting to be processed at each machine to achieve the desired objectives. 3.6 Simulation Software This research work uses ARENA simulation software from Rockwell Automation. ARENA is built on the SIMAN simulation language. After creating a simulation model graphically ARENA automatically generates the underlying SIMAN model used to prepare simulation runs. The ARENA template is the core collection of more than 60 modules. It was designed to provide a general purpose collection of modeling features for all types of applications. In addition to providing core features for resources, queuing, inspection, system logic, and external file interface, the Arena template provides modules, specifically focused on aspects of manufacturing and material handling. Three panels compose the ARENA template: the BASIC panel,

73 57 SUPPORT panel, and TRANSFER panel. In order to develop a simulation model using the ARENA template, the user simply picks a module, places it in the model and then is prompted for necessary information. For example, when placing the CREATE module the user is prompted to enter details such as shown in Figure 3.6. Figure 3.6 CREATE template. After responding with the appropriate information, the user closes the dialog to accept the completed module. ARENA was designed to make creating simulation models an entirely graphical process. All system behavior is represented by using graphical modules. 3.7 Job Shop Simulation Model The purpose for which simulation model was created for this research is to generate data required for further analysis using data mining tools. Literature review reveals that traditionally authors have used data set published by Lawrence (1982) for testing new scheduling methods and compare the results to establish the superiority of the method proposed. The goal of scheduling

74 58 research was optimization of certain parameters. However, the goal of the present research is to infer knowledge from data generated using the simulation model. This research uses a (10 10) benchmark problem from Lawrence (1982). This problem has 10 jobs to be processed on 10 machines with unique routing and processing time. Table 3.1 provides the data for the problem instance. The inter-arrival time generated from a negative exponential distribution is adjusted suitably to achieve utilization levels of 75% to represent moderately loaded shop and 85% to represent heavily loaded shop. Part type (a predetermined routing) is randomly assigned upon order s arrival. The probability of each part being chosen to be released into the shop is equal. This research incorporates an order review and release (ORR) policy (Rajan Suri, 1998) of bunching of similar part types to form a batch size varying from 1 to 9 before releasing to the production shop. However, the batch contents are processed sequentially only at the first machine of the process sequence and the job in the batch joins the queue in the second machine in its sequence as soon as it leaves the first machine. Allowing the job to wait till all the members of the batch to complete processing and then moving to next machine in the sequence results in increasing the total waiting time and hence the flow time of the order. Table Job shop problem data Job Operation l ,18 8,21 10,41 3,45 4,38 9,50 6,84 7,29 2,23 1,82 2 9,57 6,16 2,52 8,74 3,38 4,54 7,62 10,37 5,54 1,52 3 3,30 5,79 4,68 2,61 9,11 7,89 8,89 1,81 10,81 6,57 4 1,91 9,8 4,33 8,55 6,20 3,20 5,32 7,84 2,66 10, ,40 1,7 5,19 9,7 7,83 3,64 6,56 4,54 8,8 2,39 6 4,91 3,64 6,40 1,63 8,98 5,74 9,61 2,6 7,42 10,15 7 2,80 8,39 9,24 4,75 5,75 6,6 7,44 1,26 3,87 10,22 8 2,15 8,43 3,20 1,12 9,26 7,61 4,79 10,22 6,8 5,80 9 3,62 4,96 5,22 10,5 1,63 7,33 8,10 9,18 2,36 6, ,96 1,89 6,64 4,95 10,23 8,18 9,15 3,64 7,38 5,8

75 59 The following assumptions are made about the operation of the job shop Pre-emption is not allowed once an operation is started on the machine it must be completed before another operation can begin on that machine. Machines never break down and are available throughout the run period. Each machine is continuously available for assignment, without significant division of the time scale into shifts or days and without consideration of temporary unavailability such as breakdown or maintenance. Machines may be idle. Processing time on the machines are known, finite and independent of the sequence of the jobs to be processed. Each of the jobs is processed through each of the 10 machines once and only once. Furthermore, a job does not become available to the next machine until and unless processing on the current machine is completed i.e., splitting of job or job cancellation is not allowed. In-process inventory is allowed. If the next machine on the sequence needed by a job is not available, the job can wait and joins the queue at that machine. In this study, the model was built on three main sections which are job arrival, manufacturing and job disposal. The ARENA model created for this study is presented in Figure 3.7. The process of building ARENA manufacturing models is explained in detail in Tayfur Altiok and Benjamin Melamed (2010).

76 Figure 3.7 ARENA Model used in this study 60

77 Job arrival section In the job arrival section, entities are created and some attributes are assigned. After assigning the attributes, depending on the attribute, the batching module starts batching and batch size is randomly assigned a number between 1 and 9. Once a batch is ready, it is routed to the first machine in the sequence assigned for that job type. Machining sequences are taken from SEQUENCE module of ARENA. The following parameters are recorded at this moment. (1) Entity serial number (2) Part type (3) Job arrival time (4) Number of jobs waiting in front of all jobs (5) Cumulative processing time of all jobs waiting in front of all machines Manufacturing section Ten manufacturing cells with one machine each are created Disposal section captured. In this section, entities created are disposed of and following data are (1) Entity serial number (2) Job exit time Flow time of the job is computed (= (Job exit time Job arrival time)) and added to the file against each entity number.

78 62 Serial Number Job type Entry Time (in Minutes) Table 3.2 Simulation Output (Sample) Number of Jobs in Queues Work Content of jobs in Queues (in minutes) Flow time (in minutes)

79 63 A sample set of data generated is presented in Table Validation and verification. The model was validated by running a test simulation run by setting the inter-arrival time of jobs slightly higher level than the maximum cumulative processing time for all jobs. The underlying assumption is that if there is no waiting time the cumulative processing time for each job type should equal the flow time for the job. The computed flow time for each job was tallied with manually computed cumulative processing time for each job and thus the model was validated Simulation Output Ten replications of the experiment was carried out for each of the three dispatch rules (First come First Served, Shortest Processing Time and Earliest Due date) and the experiment was repeated for 75% and 85% utilization levels. About 3,30,000 records were created and used for this study. Figure 3.8 Proposed MT-TWK Model

80 64 A graphical representation of proposed MT-TWK method is presented in Figure Processing Using KNIME This work uses KNIME workbench and uses decision tree learner and decision tree predictor nodes of KNIME workbench. The work flow diagram is presented in Figure 3.9. The approach used by Sha and Liu (2005) was based on C4.5 algorithm and the approach used in this study is based on M5' algorithm, and is, therefore, different. Figure 3.9 KNIME work flow diagram The sample data for learning mode is presented in Table 3.3. The input data for KNIME workbench consists of one order characteristic and 5 shop characteristics besides the computed k-factor and assigned class as arrived with schema described by Sha and Liu (2005).The model tree induced by KNIME is presented in Figure 3.10.

81 65 Figure 3.10 Model Tree generated using KNIME workbench (Partial) The data set for a given replication is divided into two parts in the ratio of 2:1. The two-third part is used for learning and one-third part is used for testing. The data set is stored in CSV format. The algorithm generates the predicted categorical class for each set of predictor variables. The k value (see Eq.2.3) assigned to the predicted class is the CV corresponding to that categorical class. The predicted class based on model tree learning is juxtaposed with the assigned class for a sample set and given in Table 3.4. Flow time is calculated with the class value, and a sample set of values with input data, predicted class and corresponding flow time are given in Table 3.5.

82 66 Serial number Table 3.3 Sample input data for MT-TWK Model (learning mode) F1 F2 F3 F4 F5 F6 Computed k-factor Assigned Class B C B B A C A E E E E B B B C Legend F1 Job Type F2 Work Content of jobs in 2 nd second bottleneck machine F3 Work content of jobs in 1 st bottleneck machine F4 Work content of jobs in 3 rd bottleneck machine F5 Number of jobs waiting in all queues in the shop F6 Sum of remaining processing time of all jobs in the shop. Note: 1. Serialnumber indicates record the serial number. 2. Jobs with the same process sequence are combined / batched (but not merged) while entering the shop.

83 67 In the RTWK model proposed by Sha and Liu (2005), a rule set was generated from the decision tree. It may be observed that, in certain cases, several of the rules are applicable, and to overcome this problem rules are sorted by confidence. The rule that reduces the error rate the most appears first and the rule that has the lowest confidence appear last. The first rule that covers the new order is applied and the rule consequent is used to arrive at the due date. There is also a default class that is used when none of the rules apply. In the method proposed in this study, the prediction of target class is handled by the algorithm built into the software. When a new order arrives, a new input case set consisting of predictor variables is appended to the test data set and the algorithm predicts the categorical class and the class value is used for arriving at the flow time. Table 3.4 Sample output with assigned class and predicted class Serial Computed Assigned Predicted F1 F2 F3 F4 F5 F6 number k-factor Class Class B C C B B B B B A B C A A A E A E A E B E C B B B B B C C B

84 68 Serial Number F1 F2 F3 F4 F5 F6 Table 3.5 Sample output with predicted flow time Computed k-factor Assigned Class Predicted Class k-factor for the predicted class MT- TWK Flow time B C C B B B B B A B C A A A E A E A E B E C B B B B B C C B

85 Advantages of Model tree induction The C4.5 algorithm uses an entropy-based measure known as information gain as a heuristic for selecting the attribute that will best separate the samples into individual classes. A branch is created for each known value of the test attribute, and the samples are partitioned accordingly. The algorithm uses the same process recursively to form a decision tree for the samples at each partition. The knowledge represented in decision trees can be extracted and represented in the form of IF-THEN rules. The method proposed by Sha and Liu (2005) generated a total of 76 rules and the decision maker is expected to use the most appropriate rule to assign a value for k factor in the conventional TWK method. In the regression tree algorithm, M5, while pruning the tree to the interior node, consideration is given to replacing that node by a regression plane instead of a constant value. The attributes that serve to define that regression are precisely those attributes that participate in decisions in nodes subordinate to the current one. The rules generated have a regression equation as antecedents and the decision maker uses that equation for predicting the flow time. In this background, the authors were able to identify a research gap wherein the effort involved in either using the IF-THEN rules or using regression equations for estimating flow times could be eliminated with the possibility of automating the decision-making process. M5 algorithm proposed by Wang and Witten (1996) adopts a modified approach of dropping attributes from the regression equation if it improves the error estimates. The approach used by Ozturk et al.,(2006) of creating conservative sets and risky sets by using an attribute selection scheme explicitly is eliminated in M5.

86 70 KNIME workbench implemented M5 was used in this study. The most significant advantage of this approach is the prediction of target class is handled by the algorithm. Rule set as in C4.5 (See5) and regression equations as in M5 (Cubist) algorithm is not explicitly generated. The predictor module uses the model tree generated for prediction of target class thus enabling automation of decision process. Thus, the present work is a significant contribution to the extant literature on this topic. The use of open source software for this purpose also demonstrates the possibility of using machine learning for decision making by SME sector that may not be able to afford commercial software such as See5/Cubist. The experimental results for MAL performance at 75% and 85% shop utilization is presented below. Table 3.6 Experimental results with respect to mean absolute lateness (MAL) Dispatching Rules Approach Utilization rate : 75% Utilization rate : 85% EDD SPT FCFS EDD SPT FCFS TWK DTWK MT-TWK (in minutes) The complete results of the experiments conducted are presented in Chapter 5.

87 71 CHAPTER 4 FLOW TIME ESTIMATION USING CLUSTERING METHODS 4.1 Unsupervised Learning In chapter 3, a supervised learning method,mt-twk was proposed. A novel method for estimating manufacturing flow time using unsupervised learning technique more specifically Self-Organizing Maps (SOM) is presented. A supervised machine learning algorithm learns a function that maps an input x into an output y. An unsupervised algorithm looks for natural grouping or clusters within the x s without requiring the y s. In other words, a supervised algorithm derives a mapping function from x to y so as to accurately estimate the y s corresponding to new x s whereas the unsupervised algorithm employs a predefined distance/similarity functions to map the distribution of input x s. Due to historical reasons clustering is often considered synonymous with unsupervised learning. 4.2 Clustering : Problem definition Assume that A is a finite set of n data objects a i. Set A has to be portioned into k disjoint subsets A j. The quality of partitioning is decided based on two criteria: (a) Distances between all data objects a j belonging to the same cluster A j are as small as possible. (b) Distances between different clusters A j are as large as possible.

88 72 Run simulation model to generate case set Transfer data to EXCEL workbook Compute AFT,RWk and TWK for each case instance Divide case set to training set and test set Use TANAGRA for clustering training data Group data cluster-wise Use WEKA software generating regression equations for each cluster Use TANAGRA for clustering test data Based on cluster identifier use regression equation for computing flow time Figure 4.1 Overall flow of the SOM-RA Model

89 Distance measure The similarity among cluster members is measured using distance measure. Several methods have been proposed for this measure. Some of the more prominent measures are given below Euclid distance measure This is the most common metric used. (4.1) Pearson distance measure Balanced inclusion of all dimensions (4.2) Mahalanobis distance measure This method takes correlation in the variables into account. ) (4.3) Rectilinear distance measure In this method distance between two points is the sum of the absolute differences of their coordinates. (4.4)

90 Minkowski distance measure 2 (4.5) Distance between clusters Distance between clusters is measured by any of the following measures - Minimum distance between members of two clusters (4.6) - Maximum distance between members of two clusters (4.7) - Difference between the centroids of the two clusters (4.8) 4.3 Hierarchical clustering Vs Non-Hierarchical clustering Hierarchical clustering techniques group data items according to some measure of similarity in a hierarchical fashion. Non-hierarchical or partitional clustering methods try to directly divide the data into a set of disjoint clusters. This is done in such a way the intra-cluster distance is minimized and the intercluster distance is maximized. 4.4 K-means clustering This algorithm classifies a given data set into a certain number of clusters (say K clusters) fixed a priori. Then, K centers are defined, one for each cluster.

91 75 Each point is then assigned to the closest centers. The points assigned to each center are a cluster. The basic k-means algorithm is defined below Step 1 Select K points with initial centroids. Step 2 Repeat Form k clusters by assigning each point to its closest centroid. Step 3 Until centroids do not change. Figure 4.2 K-means Algorithm This relatively efficient algorithm is fast, robust and easier to understand. However, the major drawback is, this learning algorithm requires a priori specification of the number of cluster centers. 4.5 Hierarchical Clustering (i) (ii) Agglomerative Hierarchical clustering algorithm Divisive Hierarchical clustering algorithm Both these algorithms are exactly the reverse of each other. A hierarchical clustering is often displayed graphically using a tree-like diagram called a dendrogram. Dendrogram displays both the cluster sub-cluster relationship and the orders in which the clusters are merged (agglomerative view) or split (divisive view). The algorithm works by grouping the data one by one on the basis of the nearest distance measure of all the pair-wise distance between the data point is recalculated and clusters are formed based any of the following methods. (1) Single nearest distance. (2) Complete farthest distance

92 76 (3) Average average distance (4) Centroid distance (5) Sum of squared Euclidean distance is minimized. Compared to k-means clustering, no a priori information about the number of clusters is required. 4.6 Density-based clustering Density based spiral clustering of applications with noise (DBSCAN) is most widely used density based algorithm. The density is estimated for a particular point in the data set by counting the number of points with a specified radius of that point. Points are classified as either core, border or noise points based on whether they are in the interior of a dense region, on the edge of a dense region or in a sparsely occupied region. The basic algorithm of DBSCAN is as follows 1. Label all points as the core, border or noise points. 2. Eliminate noise points. 3. Put an edge between all core points that are within of each other. 4. Make each group of connected core points into a separate cluster. 5. Assign each border point to one of the clusters of its associated core points. DBSCAN does not require knowing the number of clusters in a dataset a priori, as opposed to k-means.

93 Self-Organising Maps The SOM is a topologically based unsupervised clustering algorithm (Kohonen 1990). SOM constructs prototypical descriptions of the dataset in a spatially structured format. The resulting clusters are traditionally positioned on a two-dimensional map, where neighbors on the map are similar prototypes representing the cluster centers for a subset of the data. During training, data points lying near each other in the input space are mapped onto nearest neuron on the two-dimensional map. Thus, the SOM can be interpreted as a topology preserving mapping from i. The SOM is trained iteratively. At each training step, a sample vector x is randomly chosen from the input data set. Distances between x and all prototype vectors are computed. The best matching Unit (BMU), which is denoted here by b, is the neuron with the prototype closest to x, (4.9) Next, the prototype vectors are updated. The BMU and its topological neighbors are moved closer to the input space. The update rule for the prototype vector of unit i is (4.10) where t is time ; α(t) is learning rate factor (0 < α(t) < 1) The neighbors of the winning neuron are also adjusted, but the adjustment is decreasing as the distance from the winning neuron in the output grid

94 78 increases. This adjustment is by the neighborhood function h bi (t). The neighborhood function often has the Gaussian form where r i denotes the place of this neuron in the map and α(t) is some monotonically decreasing function over the iterations. The Gaussian form ensures a global best ordering of the map (the quantization error arrives at global minima instead of local minima). Variance and Min-max normalization are two commonly used normalization methods. In variance method, the mean of each attribute of the transformed set of data points is reduced to zero by subtracting the mean of each attribute from the values of the attributes and dividing the result by the standard deviation of the attribute. In Min-max normalization each value of the attribute is subtracted with a minimum value of an attribute and then divided by the range of the attribute. SOM has certain distinct advantages such as the following: - its ability to deal with high dimensional data - its ability to represent non -linear relationship - it does not assume a priori knowledge of data distribution and - it preserves the topology of multi-dimensional space 4.8 Estimation of flow time using SOM Mangiameli. P. et al., (1996) compared performance of SOM and seven hierarchical clustering methods with 252 data sets with various levels of imperfections that include data dispersion, outliers, irrelevant variables, and non-

95 79 uniform cluster densities. It was observed that SOM can improve the quality of decisions that require the cluster analysis such as market segmentation, credit analysis, quality problems and operations problems. It was also pointed out that SOM is easy to use and the analyst need not have extensive experience with neural networks. The SOM network is also not sensitive to its starting conditions, and cluster results are therefore repeatable. It was demonstrated that cluster results are not sensitive to the initial network learning coefficient. Consistent cluster results are produced with learning rates that vary from 0.05 to Conceptual framework for the proposed method Past researchers in the area of flow time prediction have attempted to use macro modeling framework covering entire set of jobs for prediction irrespective of their characteristics. In other words, disparate and heterogeneous jobs have the same prediction model with same regression coefficients. It is evident that jobs that are dissimilar may have the same prediction model with the same coefficients for the independent variables. Although they have been generally successful, a single macro model may not allow an understanding of how the system changes in state and which variables are important under different conditions. Even if a macro model has good predictive behavior, a set of micro models associated with each state may produce a more precise model for prediction. It is for this reason that this work uses SOM (perhaps for the first time) to initially cluster the training data and then constructing local linear regression models for each cluster. This method will be referred to as SOM-RA Macro modeling approaches used for flow time prediction Total work content (TWK) method is widely used in practice and in previous studies (Conway (1965), Baker and Bertrand (1981), Baker (1984)). In

96 80 this method, the due-date of each job is set equal to the sum of job arrival time and a multiple of due date allowance factor k with total processing time of job i. (4.12) where is the due date of job i, r i is arrival date of job i and P i is the total of processing time of job i along its route. Cheng et al., (1998) modified the TWK method to provide a more accurate estimation of job flow time. They reasoned that when the shop load is heavy a relatively larger flow time allowance (k) should be assigned to an arriving job and smaller flow time allowance (k) should be assigned if the shop load is not heavy. Since in this method, k is dynamically adjusted to reflect the shop condition when a job arrives in the shop this method is known as dynamic work content (DTWK) method. (4.13) where k t denotes the real level of tightness at the time t when the new job arrives. Bertrand (1983) proposed a due date assignment rule which considered the total work content of the job and total remaining work content of jobs presently in the shop to determine the due date. Raghu and Rajendran (1995) proposed a due date setting rule which is dynamic and included average flow time of three recently completed jobs to rule proposed by Bertrand (1983). Bertrand (1983), Raghu and Rajendran (1995) observed that performance of regression analysis based methodology to be quite effective in predicting the flow time of jobs while Bertrand (1983) considered processing time of the job (denoted by TWK) and total remaining work content of all the jobs in the system (denoted by RWK), Raghu and Rajendran (1995) included the average flow

97 81 time for three recently completed jobs (denoted by AFT) and observed superior performance. Therefore, in our paper, we have used the factors considered by Raghu and Rajendran (1995). Mathematically expressed, TWK= ; (4.14) AFT= (4.15) RWK. (4.16) where P i is processing time of operation i, F i is flowtime, S(i) is set of last three jobs completed in the system, t ij is processing time of operation i of job j and J(i) is the set of jobs in the system when the new job enters the system. However, the macro models described above may not allow an understanding of how the system changes in state and which variables are important under different conditions. In particular, analysis of correlation with a macro model can often lead to a poor or misleading understanding of how variable interacts with the dependent variable. For example, if a variable is negatively correlated with the dependent variable for some states of the system and positively correlated with other states the resulting macro model may determine that there is no strong correlation one way or the other. Although a macro model can be used for overall trends, when a system behaves differently under different states a set of micro models associated with each state, may produce more valid interpretations of the underlying system behavior. In this work, an attempt is made to use SOM to initially cluster the training data, and then constructing local linear regression models for each cluster. This method referred to as SOM-RA will be compared with TWK and DTWK for performance.

98 Methods and procedures The simulation output is used to derive the values of Average Flow Time (AFT) and remaining work content (RWK). Total processing time (TWK) of each job is computed from (10 10) job shop problem instance (Lawrence, 1982). The data file in text format (.txt) containing records with TWK, RWK and AFT fields is generated and used as input to Tanagra software for SOM clustering. Simulation runs were carried out for shop utilization rate of 75% and 85% with FCFS, SPT, and EDD priority rules. Six experiments were carried out for 50,000 simulated minutes for each replication(a total of 10 replications for each experiment and the data generated for first 5000 minutes were discarded to allow the system to stabilize. It has been observed that about 5000 jobs for 75% and 6000 jobs for 85% utilization rate were generated for each replication. The simulation output was used to calculate the value of remaining work content and average flow time for each job. A sample of data set used in this study is presented in Table 4.1. Tanagra Data Mining Software version was used for SOM clustering of the above data. Figures 4.2, 4.3 and 4.4 depict the screen shots of Tanagra software during the clustering process. A graphical representation of SOM-RA model is presented in Figure 4.5. Tanagra has inbuilt facility for outlier detection and the same was used for this study. The parameters for Grubb s test were set as 0.05 for p-value and as 3 for multiples of sigma. The parameters for SOM clustering were set as 6 for number of neurons and 0.2 for learning rate. Variance method was chosen for distance normalization.

99 83 Table 4.1 Sample input data for SOM Clustering software Total work content (TWK) (in mins) Average Flow Time (AFT) (in mins) Remaining work content (RWK) (in mins)

84 Tanagra software creates an output with TWK, RWK,

4 SOM parameters used in this study Figure 4.

100 84 Tanagra software creates an output with TWK, RWK, AFT, and cluster identifier. A sample output of Tanagra software is presented in Table 4.2. Figure 4.3 Tanagra workflow diagram Figure 4.4 SOM parameters used in this study Figure 4.5 MAP Topology and Quality The actual flow time data is added to this data file from the simulation

101 85 output. Two-thirds of this dataset is taken for deriving regression equations for each cluster after adding the actual flow time data for each record from simulation output. Weka 3, a Data Mining Software is used to create separate linear regression equations establishing the relationship between flow time (predicted value) and Fig 4.3 SOM-RA Model the TWK, AFT, and RWK. The Mean, Minimum Value, Maximum value, Standard Deviation of all the 6 clusters identified by SOM clustering algorithms for the three input parameters along with the regression equations are presented in Table 4.3. Figure 4.6 SOM-RA Model

102 86 Table 4.2 Sample Output data from SOM Clustering software Total work content (TWK) (in mins) Average Flow Time (AFT) (in mins) Remaining work content (RWK) (in mins) Cluster_Id c_som_2_ c_som_1_ c_som_1_ c_som_1_ c_som_2_ c_som_2_ c_som_1_ c_som_2_ c_som_2_ c_som_1_ c_som_1_ c_som_1_ c_som_2_ c_som_2_1

103 87 Table 4.3 Cluster characteristics and flow time prediction equations CLUSTER :1 Statistical TWK RWK AFT Flowtime Prediction Equation Measures Mean Smallest value Largest value Std. Deviation CLUSTER:2 Mean Smallest value Largest value Std. Deviation CLUSTER :3 Mean Smallest value Largest value Std. Deviation CLUSTER:4 Mean Smallest value Largest value Std. Deviation CLUSTER : 5 Mean Smallest value Largest value Std. Deviation CLUSTER :6 Mean Smallest value Largest value Std. Deviation

104 88 Set of six regression equations (Refer Table 4.3) have been used to predict the flow time of remaining one-third of the data in the original data set. 4.9 Summary The proposal to use SOM for clustering and developing a set of micromodels associated with each state was made to emphasize the fact that a job entering a shop for processing does not encounter uniform shop conditions. Shop conditions continuously change depending upon the factors such as number of jobs waiting in front of the machines and job types. A study of regression equations developed for the six clusters presented in Table 4.3 corroborates this fact. The regression equation for cluster 3 does not take into account the two factors Average Flow Time (AFT) and Remaining Work Content (RWK) and basically, it is same as that of TWK model. This supports our assertion that a single macro model may not allow an understanding of how the system changes in state and which variables are important under different conditions. The coefficients of TWK, RWK and AFT vary for each of the clusters and thus indicating the change in weightages to these factors according to system state. These findings corroborate the assertion in this thesis that a set of micro-models will have better prediction capability than a single regression equation that is insensitive to changes in system states. The results pertaining to flow time prediction is presented in Chapter 5.

105 89 CHAPTER 5 RESULTS AND DISCUSSION 5.1 Performance measures In this chapter results obtained from experiments conducted for the proposed Model tree induction method and SOM-based method is presented. A comparative study of proposed MT-TWK and SOM-RA method also is presented. The objective of this work is to evaluate the effectiveness and robustness of the proposed MT-TWK model and SOM based model in a simulated job shop. Mean Absolute Lateness and Mean Squared Lateness are used in this study for performance measurement. Vig and Dooley (1993) define the accuracy of an estimate as the closeness of the individual estimates to their true values and precision as the variability of the prediction errors. Sabuncuoglu and Comlekci (2002), Sha and Liu (2005) and Ozturk et al., (2006) have used Mean Absolute Lateness as a measure of accuracy and Mean Squared Lateness(MSL) as a measure of precision. 1. Mean Absolute Lateness (MAL) (5.1) A smaller MAL indicates better due date prediction capability and it is a sum of mean earliness and mean tardiness.

106 90 2. Mean Squared Lateness (MSL) (5.2) A small MSL indicated a smaller deviation from the actual due date occurred. c i, d i and n denote the completion time of order i, promised due date of order i and sample size respectively. 5.2 Results for MT-TWK method Table 5.1 presents the results with respect to MAL values calculated for utilization levels of 75% and 85%. Figures 5.1 and 5.2 depict the performance in terms of MAL with respect to dispatching rules i.e., EDD, SPT, FCFS. A smaller MAL value indicates better due date prediction capability. It may be noted that the behavior of our MT-TWK method is significantly different for 75% and 85% utilization rates. When the utilization rate is increased from 75% to 85% the performance of MT- TWK method improves significantly and gives the best performance under FCFS rule and does not perform well under EDD rule. Table 5.2 present the results with respect to MSL values calculated for 75% and 85% utilization rates. Figures 5.3 and 5.4 portray the performance in terms of MSL with respect to dispatching rules. As regards MSL performance, at 75% and 85% utilization rate MT-TWK approach performs better than TWK and DTWK approach under both FCFS and SPT rules and there is a moderate performance with respect to EDD rule. Sha and Liu (2005) used 80% and 90% shop utilization in their study. Table 5.3 presents percentage change in the MAL values and MSL values

107 91 with respect to FCFS dispatching rule between TWK and DTWK methods and it shows a decreasing trend with respect to decreasing utilization rates and MAL and MSL performance reverses in favor of TWK when the utilization factor is somewhere between 75% and 80%.This indicates that at lower utilization levels TWK could give better or comparable performance to DTWK. For the present study, data for 80% and 90% taken from Sha and Liu (2005) paper. Among the dispatching rules, in Sha and Liu s (2005) study, EDD produces the lowest MAL values in most of the shop conditions for all the flow time estimation methods. This results also corroborates with the findings of Sabunglocu and Comlekci (2002). However, in this study FCFS produces the lowest MAL values under 85% utilization rates and mixed results with respect to 75% utilization rate. It indicates that the ORR policy introduced in the simulation model could be the reason for this. Among the three due date assignment rules, MT-TWK gives the best performance under moderately and heavily loaded shops. However, it appears that the performance of MT-TWK rule is more distinct in a heavily loaded shop under all combinations of dispatching rules and utilization rates and gives mixed results with respect moderately loaded shop. This could be due to the effect of ORR policy. The performance of both MAL and MSL under SPT rule corroborates this fact. In similar studies conducted earlier, SPT rule yielded poorest results in comparison with other dispatching rules. In this study, pre-shop batching of similar jobs appears to have introduced a stabilizing effect on the shop condition and its effect is more pronounced with the proposed method.

108 92 The present study is different from Sha and Liu (2005) study in the manner in which a class is assigned to the input factor set. In Sha and Liu (2005) study, a set of rules is prescribed for each combination of dispatching rule and utilization rate and the human decision maker should make a choice from 76 rules to choose the class and hence the k value (used in Eq. 2.3). In the present study, the decision-tree predictor module works with the decision-tree learner module to predict the class value, and thus freeing the human decision maker from this task. Our findings indicate that the performance of the MT-TWK approach is comparable, if not better than RTWK (Rule based TWK) method proposed earlier. Table 5.1 Experimental results with respect to mean absolute lateness (MAL) Approach Dispatching Rules Utilization rate : 75% Utilization rate : 85% EDD SPT FCFS EDD SPT FCFS TWK DTWK MT-TWK (in minutes) Table 5.2 Experimental results with respect to mean squared lateness (MSL) Approach Dispatching Rules Utilization rate : 75% Utilization rate : 85% EDD SPT FCFS EDD SPT FCFS TWK DTWK MT-TWK

Performance measure Utilization Rate 75% 80% 85% 90% MAL -4.

109 93 Table 5.3 % Change in MAL and MSL with respect to utilization rate. Performance measure Utilization Rate 75% 80% 85% 90% MAL MSL Figure 5.1 MAL performance with respect to dispatching rules at 75% utilization (MT-TWK)

110 94 Figure 5.2 MAL performance with respect to dispatching rules at 85% utilization (MT-TWK) Figure 5.3 MSL performance with respect to dispatching rules at 75% utilization (MT-TWK)

111 95 Figure 5.4 MSL performance with respect to dispatching rules at 85% utilization (MT-TWK) A three-way ANOVA test is conducted and the results are presented in Tables 5.4 and 5.5. It may be observed that the interaction of flow time estimation methods TWK, DTWK and MT-TWK, utilization rates 75% and 85% and dispatching rules FCFS, SPT and EDD do have an impact on the MAL and MSL performance measures at α = However, the interaction of flow time estimation method and the dispatching rule for MSL is not significant at α = 0.05.

112 96 Table 5.4 Three Factor ANOVA- MAL (α = 0.05) SS df MS F p-value significant A E-76 yes B E-55 yes C E-20 yes A x B E-35 yes A x C E-19 yes B x C E-16 yes A x B xc yes Within Total A Utilization Rate; B Flow time Estimation Method ; C Dispatching Rules Table 5.5 Three Factor ANOVA-MSL (α = 0.05) SS df MS F p-value significant A 2.34E E E-43 yes B E-12 yes C E-21 yes A x B E-11 yes A x C E-18 yes B x C no A x B xc yes Within 1.05E Total 5.86E A Utilization Rate; B Flow time Estimation Method ; C Dispatching Rules

113 Results for SOM-RA method The motivation for exploring SOM based flow time estimation was on the premise that clustering of data will help to identify distinct shop conditions encountered by the job as it enters the dynamic job shop. This information could be used to develop regression-based prediction equations which will be able to predict the flow time more accurately. The results support this hypothesis as is evident from the data presented in Table 5.6 for MAL values, Table 5.7 for MSL values and corresponding figures 5.5, 5.6, 5.7 and 5.8. The results also indicate that MAL and MSL increase when utilization rates are increased from 75% to 85%. This is due to an increased congestion in the shop, and at increased utilization levels, it is difficult to predict flow time accurately and precisely. This agrees with the previous studies by Sha and Liu(2005) and Sabuncuoglu and Comlekci (2002). Among the three flow time estimation methods, the SOM-RA appears to the best with respect to MAL and MSL in all combination of utilization rate and dispatching rule. The best performance is obtained with FCFS rule for both MAL and MSL performance measures under both moderately loaded shop and heavily loaded shop. This could be due to transfer batching policy used in the simulation model. This study reinforces the findings of earlier studies by Conway (1965), Weeks (1979), Enns (1981), Miyazaki (1981), Bertrand (1983), Ragartz and Mabert (1984) and Cheng (1985)) that both shop and job characteristics are important for flow time estimations. A three- way ANOVA test was conducted to study the effects of utilization rate, flow time estimation method and dispatching rule on the

114 98 performance measures MAL and MSL. As shown in Tables 5.8 and 5.9, all the three factors have a significant effect on both performance measures, except for interaction between priority rule and flow time estimation method and the interaction of all the three factor on MSL with a significance level of Table 5.6 Experimental results for mean absolute lateness (MAL) (SOM-RA) Dispatching Rules Approach Utilisation rate : 75% Utilisation rate : 85% EDD SPT FCFS EDD SPT FCFS TWK DTWK SOM-RA (in minutes) Table 5.7 Experimental results for mean squared lateness (MSL) (SOM-RA) Dispatching Rules Approach Utilisation rate : 75% Utilisation rate : 85% EDD SPT FCFS EDD SPT FCFS TWK DTWK SOM-RA

115 99 Figure 5.5 MAL performance with respect to dispatching Rule at 75% utilization rate (SOM-RA) Figure 5.6 MAL performance with respect to dispatching rule at 85% utilization rate (SOM-RA)

116 100 Figure 5.7 MSL performance with respect to dispatching rule at 75% utilization rate (SOM-RA) Figure 5.8 MSL performance with respect to dispatching rule at 85% utilization rate (SOM-RA)

117 101 Table 5.8 Three-Factor ANOVA - MAL (α=0.05) (SOM-RA) Utilization Rate(A) FT Estimation Method(B) Priority Rule(C) Sum of Squares Degrees of Freedom Mean Square F-Value p-value 2.23E E E E E E E-25 A B E-10 A C E-17 B C A B C Within 1.33E Total 7.41E Table 5.9 Three-Factor ANOVA - MSL (α=0.05) (SOM-RA) Utilization Rate(A) FT Estimation Method(B) Priority Rule(C) Sum of Squares Degrees of Freedom Mean Square F-Value p-value E E E-31 A B E-34 A C E-19 B C E-11 A B C Within Total

118 102 Table 5.10 Experimental results with respect to all methods studied for MAL Approach Dispatching Rules Utilisation rate : 75% Utilisation rate : 85% EDD SPT FCFS EDD SPT FCFS TWK DTWK MT-TWK SOM-RA (in minutes) Table 5.11 Experimental results with respect to all methods studied for MSL Dispatching Rules Approach Utilisation rate : 75% Utilisation rate : 85% EDD SPT FCFS EDD SPT FCFS TWK DTWK MT-TWK SOM-RA Comparative study of SOM-RA and MT-TWK performance Tables 5.10 and 5.11 indicate that MAL value increases with increase in utilization level from 75% to 85%. This means that as the system stabilizes the performance of both MT-TWK method and SOM-RA produces consistent results with MAL taking lowest values under FCFS dispatching rule. This result is expected since at 85% utilization almost all the work stations will have jobs in the queue waiting to be processed in front of them.

119 103 It is also evident from Figures 5.9,5.10, 5.11 and 5.12 that SOM-RA outperforms the conventional methods i.e., TWK and DTWK and also the supervised learning method MT-TWK. The clustering algorithm is more efficient in segregating data representing different states of the system and the corresponding regression equations produce more accurate flow time prediction. Among the four flow time estimation methods SOM-RA appears to the best with respect to MAL and MSL in all combination of utilization rate and dispatching rule. The best performance is obtained with FCFS rule for both MAL and MSL performance measures under both moderately loaded shop and heavily loaded shop. This could be due to transfer batching policy used in the simulation model. This study conclusively proves that SOM-RA, an unsupervised machine learning method is better than MT-TWK, a supervised machine learning method in predicting manufacturing flow time in a dynamic job shop. Figure 5.9 MAL performance with respect to dispatching rules at 75% utilization (all methods)

120 104 Figure 5.10 MAL performance with respect to dispatching rules at 85% utilization (all methods) Figure 5.11 MSL performance with respect to dispatching rules at 75% utilization (all methods)

Job shop flow time prediction using neural networks

Job shop flow time prediction using neural networks Available online at www.sciencedirect.com ScienceDirect Procedia Manufacturing 00 (2017) 000 000 This copy has been created with permission of Elsevier and is reserved to FAIM2017 conference authors and