A thesis presented to. the faculty of. the Russ College of Engineering and Technology of Ohio University. In partial fulfillment

Size: px

Start display at page:

Download "A thesis presented to. the faculty of. the Russ College of Engineering and Technology of Ohio University. In partial fulfillment"

Philippa Jones
5 years ago
Views:

1 Methodology for Data Mining Customer Order History for Storage Assignment A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Carlos A. Egas December Carlos A. Egas. All Rights Reserved.

2 2 This thesis titled A Methodology for Data Mining Customer Order History for Storage Assignment by CARLOS A. EGAS has been approved for the Department of Industrial and Systems Engineering and the Russ College of Engineering and Technology by Dale T. Masel Associate Professor of Industrial and Systems Engineering Dennis Irwin Dean, Russ College of Engineering and Technology

3 ABSTRACT 3 EGAS, CARLOS A., M.S., December 2012, Industrial and Systems Engineering A Methodology for Data Mining Customer Orders History for Storage Assignment Director of Thesis: Dale T. Masel Order picking is the most costly activity among warehouse operations. Any unnecessary travel during order picking results in unnecessary travel time and increases the cost of operating a warehouse. Many scholars and researchers have worked to develop answers to the question of how to minimize travel distances during order picking. The goal of this paper is to investigate how Data Mining can be used to reduce excessive travel during order picking using a Data Mining Demand-based method. This algorithm is used to mine knowledge from historical warehouse transactions and define clusters. After defining the clusters, warehouses within warehouse are defined and SKUs are arranged according to the assigned aisles for each cluster. The methodology developed is tested and compared with traditional aisle assignment algorithms. The test results show that picker travel was reduced by about 14% when using the new Data Mining model.

4 4 ACKNOWLEDGMENTS I owe my deepest gratitude to Dr. Dale Masel for his patience, encouragement, and guidance on my thesis. During this long journey, he demonstrated commitment and granted me continuous support to finalize the project. Without his help, my thesis could not have been completed. I would also like to thank the other members of my committee, Dr. Berisso, Dr. Koonce, and Dr. Lamb for their time and effort in reviewing this document. I cannot express my full gratitude to Dr James Fales for his encouragement to apply to Ohio University, and for his guidance during my life as an International student in Athens. Although he is no longer with us, he is forever remembered. I am sure he shares our joy and happiness. My family deserves special mention. My utmost gratitude to my parents for their unconditional support and prayers, my wife Rocio for her constant encouragement and understanding, and my precious daughter Camila for making me the happiest father ever. Lastly, I would like to thank to my friend, Jerone Anderson, who patiently gave me his feedback. To each and every one of you, mil gracias.

5 5 TABLE OF CONTENTS Page ABSTRACT... 3 ACKNOWLEDGMENTS... 4 LIST OF TABLES... 7 LIST OF FIGURES... 8 CHAPTER 1: INTRODUCTION Warehouse Operations Methods for Managing Order Picking Data Mining Objective CHAPTER 2: LITERATURE REVIEW Warehouses Layout Storage Methods Batching and Zoning Routing Strategies Data Mining for warehousing General Data Mining Methods Data Mining for Order Batching Data Mining for Storage Location Assignment CHAPTER 3: METHODOLOGY System Description Assumptions Model Variables Data Reduction Data Pre-Processing Clustering Historical Orders Storage Assignment Layout Computing Objective Function CHAPTER 4: TESTING AND RESULTS Testing Method Data Reduction Data Pre-Processing Clustering Historical Orders Storage Assignment Layout Computing Objective Function Results Clustering DataSet Clustering DataSet Comparative results using the Rosenwein Distance Metric... 71

6 CHAPTER 5: CONCLUSIONS Main Findings Benefits of the Methodology Future Research REFERENCES APPENDICES A. DataSet1 Aisle Distribution for Pairs of SKUs greater than 200 and 5clusters B. DataSet2 Aisle Distribution for Pairs of SKUs greater than 200 and 5 Clusters.. 88 C. Minitab Analysis of Total Aisles Visited D. Minitab Analysis of Percentage Improvement E. Minitab ANOVA of Improvement DM vs DMDB and DMRB F. Minitab ANOVA of Improvement Popular vs. Non-Popular

7 7 LIST OF TABLES Page Table 1: Item Master Table 2: Sales Transactions Table 3: Pairs Dataset Table 4: Cleaned Pairs Dataset Table 5: Dataset Result Table 6: Number of orders containing SKU pair Table 7: Matrix scaled values Table 8: Randomly cluster assignment Table 9: Computed Weight Table 10: Assigning new clusters Table 11: Partial Cluster Assignments Table 12: Clusters Assigned Table 13: Complete Process for computing No. of aisles visited Table 14: Features Input Tables DataSet Table 15: Valid Pairs for DataSet Table 16: Calculation of Total Number of Aisles visited Table 17: Comparison Total Number of Aisles visited Table 18: Test Results including the most popular SKUs DataSet Table 19: Test Results without including the most popular SKUs DataSet Table 20: Test Results including the most popular SKUs DataSet Table 21: Test Results without including top popular DataSet Table 22: Comparison between Rosenwein and Data Mining Demand-based Table 23: Results of Rosenwein and Data Mining algorithms Table 24: Results Data Mining demand based vs. Demand based... 73

8 8 LIST OF FIGURES Page Figure 1: Warehouse Layout Figure 2: Total Aisles Visited including Popular DataSet Figure 3: LineItems vs. Improvement including Popular DataSet Figure 4: Total Aisles Visited without including Popular DataSet Figure 5: Line Items vs. Improvement without including Popular DataSet Figure 6: Improvement comparison for popular and non popular SKUs Figure 7: Total Aisles Visited including Popular DataSet Figure 8: Line Items vs. Improvement including Popular DataSet Figure 9: Total Aisles Visited without including Popular Dataset Figure 10: Line Items vs. Improvement without including Popular DataSet

9 9 CHAPTER 1: INTRODUCTION 1.1 Warehouse Operations Running a distribution center or warehouse involves a set of activities that need to be completed in order to accomplish its role within the Supply Chain. Warehouse operations include receiving, checking, pre-packing, putting away, picking, sorting, packing and shipping. Warehousing costs include land cost, storage equipment, databases, systems and the human resources to execute the operations. Consequently, warehousing is expensive and accounts for between 2 and 5 percent of the cost of sales of a corporation [4]. One important task performed in a warehouse is order picking. Order picking is a process of retrieving the SKUs (items) required for a customer s order. Picking can be performed with two different approaches: Picker-to-part systems and Part-to-picker systems. In the Picker-to-part method, the pickers travel to the areas where the needed SKUs are located, pick the SKUs and bring them to the packing area. Picker-to-part systems use systems such as Pick-to-light systems, synthesized voice or wireless mobile computing systems. In Part-to-picker systems, the SKUs travel to the picker through order-picking machines. AS/RS (Automatic Storage and Retrieval Systems) or Carousels are used to deliver the SKUs in a Part-to-picker system. One approach to reduce the total picking time of a warehouse is to use warehouse equipment for storing and retrieving items in a warehouse. There are two kinds of equipment needed in a warehouse. The first category is storage equipment. Selecting the right storage equipment for each warehouse is important because it allows us to

10 10 maximize the space utilization in the warehouse. To select the appropriate storage equipment it is necessary to consider not only the amount of material moved in a period of time but also the average inventory. According to these parameters, you can choose different storage equipment. For instance, there are single-deep pallet, double-deep pallet, flow rack, shelving, drive-in rack, drive-through rack, etc. The second category is material handling equipment. For instance, there are conveyors, forklifts, carts, walkie trucks, order pickers and automated guided vehicles. AIDC (Automatic Identification and Data Capture) technologies such as barcodes and RFID (Radio Frequency Identification) have been extensively applied to develop picking systems in order to reduce the travel time. These technologies have eliminated the need of manual data entry, decreasing the total time of the picking activity. WMS (Warehouse Management Systems), which integrate warehousing equipment and AI&DC technologies, have been developed to automate warehouse operations. Among the components of the picking process, travel is the most time consuming activity and represents about 50% of the total order picking time [6]. Travel time is a direct expense and has a direct impact on the overall performance of the warehouse [5]. Moreover, the total cost of order picking is estimated to be 55% of the total warehouse operating expenses [3]. Therefore, reducing the cost of the picking process has become an important issue goal for warehouse and distribution operations.

11 Methods for Managing Order Picking One important operational decision is whether or not to pick several orders on each picking trip. When multiple orders are picked on a single pick trip, the method is commonly known as wave picking or batching. After an order is received by the warehouse or distribution center, the order is sent to the picking area. In the picking area, orders are consolidated in order to minimize the number of trips to retrieve the SKUs [4]. The typical question when batching orders is how to form the waves so they can be retrieved using the minimum time. Another problem of batching orders is that orders must be sorted while they are being picked up or after the batching process has been finalized. In both cases, there are specialized material handling equipment that can help to minimize the time for sorting the orders. Another consideration for managers is to determine the best route to pick an order. The goal of routing strategies is to minimize the travel distance of the picker, saving time and improving warehouse efficiency. Many heuristics techniques and mathematical algorithms can be applied to determine the best route for picking an order. Variables such as warehouse layout and aisle length also influence the total picking distance. The most used routing strategies include traversal picking and return aisle picking [9]. In the traversal routing, the picker starts and ends in the same point in the warehouse, walking through every aisle of the warehouse. In the return aisle picking, the picker walks through and comes back in the same aisle. The benefit of return aisle picking is that the picker does not need to visit all aisles in the warehouse; hence the

12 12 travel is minimized. Mathematical algorithms been shown to produce more efficient routing strategies but they are difficult to implement due to their complexity. The third method is to assign the most accessible locations to items that represent the highest turnover in a warehouse. According to this criterion, the most popular SKUs are stored in the locations adjacent to the shipping area to minimize the distance that pickers need to travel. To determine the products with the highest turnover, it is necessary to collect and analyze a historical database of warehouse orders. Basic statistical techniques are used to determine which items have the highest turnover. The Warehouse Management System (WMS) is responsible for assigning the items with the highest turnover to locations in the warehouse. 1.3 Data Mining Improvements in database technology and the information costs of storage devices have enabled firms to save huge amounts of historical about their daily operations. Data Mining is a set of techniques that uses historical information to discover patterns or rules that can be used to benefit of the owner of the data [7]. Data Mining techniques includes clustering, association and classification algorithms. The goal of Data Mining clustering techniques is to form clusters of objects with similar characteristics. Data Mining association rules is a technique used to find correlation between two or more input variables that influence an output variable. Data Mining classification techniques help to forecast a discrete variable based on attributes of the historical data. Data Mining techniques have been tested to solve warehousing problems such as wave formation. For instance, association rules have been used to improve the

13 13 aggregation of orders in a distribution center with the goal of reducing cycle time of inventories [1]. This method has been applied to find correlation between customers in an orders database and use this information to develop a picking strategy to reduce travel of the picker. 1.4 Objective This objective of this thesis is to design a methodology to define warehouse within a warehouse storage areas using Data Mining clustering techniques to analyze historical customer orders of a distribution center. The methodology identifies the group of SKUs or clusters that must be stored together. A model for forming clusters is defined by mining historical data from customer s orders and comparing with other assignment techniques. Finally, a report is produced showing the improvements and conclusions. This document is organized as follows: Chapter 2 discusses background in warehousing research and different techniques to minimize the picking time in a warehouse. Chapter 3 discusses the methodology employed for identifying clusters on customer orders to design the storage areas in a warehouse. The methodology includes Pre-processing historical data; defining a model and its control variables; and creating assignment clusters (warehouse within a warehouse). Chapter 4 discusses the model testing process and compares preliminary results with alternative assignment methods. Finally, Chapter 5 discusses the conclusions and recommendations for future research.

14 14 CHAPTER 2: LITERATURE REVIEW This chapter summarizes the research that many authors have performed in different areas of warehousing such as layout, storage methods, routing strategies, batching and zoning, and Data Mining. This chapter concludes with an overview and analysis of the application of Data Mining techniques that will be applied to improve warehouse operations. More specifically, we focus on clustering techniques, which will be used in this research for defining warehouses within a warehouse for storage purposes. 2.1 Warehouses Warehouse is defined as a structure or room for the storage of merchandise or commodities [12]. Moreover, De Koster [3] states that warehouses are commonly used for storing or buffering products. Based on both definitions, we can define a warehouse as a storage facility that accomplishes buffering functions in the supply chain of firms. According to the firm s needs, different SKUs are stored in warehouses including raw materials, products in process and finished goods. In every phase of the supply chain management, it is necessary to store and retrieve such items from the warehouses according to the requirements of departments such as manufacturing or shipping. Increased globalization and e-commerce has also increased the pressure for high efficiency in warehouses. Baker [11] explains how new warehouses are being designed in order to meet the requirements of the supply chain agility. This research is based on structured interviews and the performance variable studied was the total cost of the warehouse design. The model developed included variables such as capital costs,

15 15 operating costs, mobility and uniformity. Results of the analysis showed that there is a slight difference in total design cost when designing warehouses for supply chain agility. The importance of warehouses within supply chain management has also been studied. Lambert et al. [13] studied the mission of the warehouses. The authors state that warehouses exist because companies want to achieve economies of scale in both purchasing material and manufacturing products. Firms can also take advantage of increased purchasing power by buying items in volume and storing them in a warehouse. Companies also uses warehouses as a buffer for storing final products in order to support a high level of customer service, meet market demands and support cost-savings initiatives such as Just-in-time systems. Bartholdi and Hackman [5] also discussed the rationale behind a warehouse. The reasons stated include: The ability to match supply with customer demand, consolidate products and provide value-added services. Warehouses also allow firms to handle seasonal changes in demand. During these periods, the demand of groups of SKUs can change dramatically and the firm has to respond at the same speed. Thus, warehouses help firms to store final products and use them to meet the demand. Moreover, warehouses are used to consolidate products from different vendors. Warehouses consolidate items and achieve both time and transportations savings. Lastly, firms also use warehouses to perform value added activities such as final assembly, product s personalization, labeling, sorting, packaging and shipping consolidation. Van den Berg [14] studied the importance of planning and controlling a warehousing system. He has also addressed the impact of strategic, tactical and

16 16 operational decisions on the total throughput of the warehouse. Strategic decisions mean decisions at high levels of the firm where the mission and goals are established for the organization. Tactical decisions deal with material and labor used to store and retrieve the item in the warehouse. Finally, operational decisions are decisions to solve issues in the short-term such as daily operations of a warehouse. Van den Berg also described planning and controlling operations in a warehouse. The planning stage in warehousing systems includes decisions such as how to distribute the items in the warehouse, how to store them in the warehouse and finally how to distribute the work among the warehousing team. The controlling phase consist of batching orders based on factors such as category or demand, and determining the best routing and sequencing strategies to minimize the distance that pickers travel. Warehouse costs have also been analyzed. Warehousing costs include: Operational cost of warehouses for movement of materials, investment in facilities, material handling equipment, hardware and software infrastructure as well as personnel. According to Establish and Davis, total warehousing cost accounts for about 22% of the total logistic costs [10]. The high operational cost of warehouses has contributed to intensive research in areas such as storage and picking strategies to improve the retrieval time of items. 2.2 Layout The layout of a warehouse is also an important topic in the warehouse literature. The physical design of the warehouse infrastructure include storage, material handling

17 17 equipment, location of departmental areas and the implementation of the operating policies. These design factors influence the general efficiency of a warehouse. Defining the correct layout for a warehouse in the initial phase of warehouse construction is very important because future changes in the layout may come at significant cost. Rouwenhorst et al. stated that The logistics costs that are made inside a warehouse are to a large extent already determined during the design phase [19]. The authors emphasized the complexity of decisions made during the design phase of a warehouse. They developed a framework for designing warehouses which is based on processes, resources and organization. The framework differs from traditional design frameworks which are analysis oriented to specific areas in the warehouse layout. The authors examined the design problems and emphasized the relationship among strategic, tactical and operational level decisions. Researchers have also studied the warehouse layout problem to select the location of areas in the warehouse which performs specific functions. De Koster [3] considers two sides to the warehouse layout problem. First, the facility-layout problem is considered. It consists of decisions related to the location of the receiving, picking, storage and shipping areas within the warehouse. The goal of this organization is to minimize the distance between complimentary areas. Second, the internal layout problem is considered. It includes decisions such as the number of aisle and cross-aisles, the type of storage equipment required and the number and length of storage areas. The warehouse layout problem has also been studied by Larson et al. [16]. The authors applied a class-based approach for the design of a warehouse. They presented a practical

18 18 method for determining the layout a warehouse based on a lift pallet storage and retrieval system. The procedure has three phases: 1) Determining aisle layout to minimize the rectilinear distance from input to output; 2) Assigning materials to a storage medium following a heuristic algorithm that obtains the row depth that maximizes storage usage; and 3) Allocation of the floor space to the classes of items according to the results of step 2. The algorithm developed by the authors was tested in a real warehouse where new classes of SKUs were formed and a 45% reduction of material handling costs was achieved. The warehouse layout problem has also been evaluated from various perspectives. Hwang [17] developed a performance evaluation method to evaluate a picking area layout of a warehouse. The mathematical model considers two phases: analysis of order picking warehouse design and transporter performance analysis. More specifically, the model computes the probability of picking a SKUs using a normal distribution. Next, the optimal size of the unit rack is calculated and lastly a travel time analysis of the transporters was performed to assign travel time, picking time and stopping time for each item in the picking order. To test the model, the author used simulation and the model produced lower picking time compared with that of the mathematical model. The warehouse layout and the quantity and dimensions of cross aisles impact the total picking performance of a warehouse. Hsieh et al. [18] developed a model for improving the picking performance of warehouses which includes warehouse layout, storage policies and pick routing strategy as the main factors that influence performance during order picking. The model considers a rectangular warehouse, cross-aisles between

19 19 9 and 11, Input/Output (I/O) points located in the lower part of the warehouse and assumes that the picker start from one I/O point and finish in the I/O on the other side. Finally, authors have proposed different methodologies for defining a warehouse layout. Mohsen [15] developed a flexible and modular framework for designing warehouses. The author states that the difficulty in designing warehouses is caused by many variables, including interactions among operations. Likewise, the author proposed a framework for a designing a warehouse layout which is based in a set of fourteen ordered steps. These include establishing the goal of the warehouse, forecasting and analysis of demand, operating policies, location of functional areas, number of docks, aisle dimensions, zoning, location of I/O areas among others. The methodology proposed by Mohsen is a tool that can be used by warehouse managers when making warehouselayout decisions because the framework takes into account factors which affect operations and design. 2.3 Storage Methods Storage methods are important within warehouse operations because they are designed to reduce the cost of order retrieval. The location where products are stored influences the travel distance and picking time of an order. Two different approaches have been developed to improve the put-away operation: Storage methods to minimize the cost associated with the picking operation and storage equipment to ease replenishment and picking operation. Frazelle [4] considers three storage location strategies: Dedicated, randomized and class based. The dedicated strategy assigned specific location areas of the warehouse

20 20 to specific SKUs. Random strategies assigned locations to SKUs based on the availability of the storage locations. Lastly, Class-based storage used a mix between dedicated and randomized storage strategies. A dedicated area is assigned to a class of SKUs and randomized assignment is used within that class. Bartholdi and Hackman classified storage locations in dedicated and shared locations [5]. The authors state that dedicated storage is the area assigned to a specific product. The drawback of this strategy is the dependency on demand and turnover of the products. Therefore, on average the storage capacity is only about 50% utilized. On the other hand, shared locations can be used to store more than one product. The drawback of this strategy is that the process can be complicated because the same product may be dynamically re-allocated to new locations depending on inventory demands. This strategy allows 66% space utilization in a warehouse. Van den Berg and Zijm [20] examined warehouse management issues focusing on inventory management and storage location assignment. Inventory reduction helps to improve warehouse efficiency because costs are reduced and the picking process is performed more efficiently. Regarding the storage location assignment they asserts that an effective storage policy may reduce the travel times for storage/retrieval and orderpicking. They also described the forward and reserve area of a warehouse. The Forward area is used to store products that have high rotation or demand. Reserve area is used as bulk storage area and for replenishment of the forward area. The division between Forward and Reserve area is important because proper placement of the division can

21 21 improve the picking time of orders by storing the most popular SKUs close to the shipping areas. Researchers have also studied the stock location assignment problem (SLAP). Brinzer et al. [21] described a heuristic method for assigning locations based on the product structure. Their method includes the following steps: 1) Identify the set of component objects to study; 2) Code each part and group them in variant groups (VG) based on similar characteristics; 3) Identify the common demand between VGs. Based on this model, the authors performed a testing using 89 different SKUs and the parts were grouped in 20 different VGs. They concluded that managing 20 groups of SKUs instead of 89 help to reduce up to 75% the information processed by the picker. Additional advantages include reduced picker errors and faster picking time. Petersen and Aase [22] evaluated the performance of Class-based storage between random assignment and volume based assignment. Class-based storage (CBS) strategies group SKUs based on demand and assign random locations within their class-storage area. Moreover, classes with high demand are located in the areas closer to the shipping docks in order to minimize the travel time during picking. The simulation model was set using a rectangular warehouse with one door in the left-lower side; the aisles are wide enough for two pickers and the shelf capacity provides room for 1,000 SKUs. The total travel time is calculated by adding the total turnaround travel time plus the picking time. Additional parameters for the model include the number of classes (two, three and four classes were selected) and the number of SKUs per class (5, 15, and 30). Results of the

22 22 simulation model showed 12 to 26% time savings when using Class-based storage policies compared with random assignment. Studies considering Golden Zone (GZ) storage policies have also been conducted. The Golden Zone is defined as the area between a picker s waist and shoulders [23] and has been studied because storage of SKUs in this area decreases the picking time. Petersen et al. conducted a simulation model to compare Golden Zone storage strategies with other storage methods. The study utilized a Monte Carlo simulation model and low level to picker part (the SKUs are smaller and the picker goes to the SKU). Parameters used in the model included a rectangular 90 x 50 ft, aisles with room for two pickers at a time, order sizes of 3, 10, 20 and 30 SKUs, 1000 unique SKUs, and 80/20 demand. The model also considered slotting measures such as popularity, turnover, volume, pick density and cube per-order index. Results of the simulation model showed that using the Golden Zone concepts reduces the total fulfillment time of an order compared with models where the concept of a Golden Zone is not used. 2.4 Batching and Zoning Alternative strategies to improve the total picking time include batching and zoning. According to De Koster et al. [3] zoning is the process of dividing the picking area in small picking-areas called zones. The rationale behind this strategy is that the total workload is divided by zones in which a picker is assigned. The picking process is performed by many pickers. An advantage of this method is that a picker can reduce the travel time within his/her area. The drawbacks of this method are that a picker is required to work in each zone and the order needs to be consolidated to include the items picked

23 23 from each zone. The authors developed two approaches for zoning: Progressive assembly and Parallel assembly. Under progressive assembly an order is processed by a picker and when his zone is complete, the order is passed to the next zone. This process is repeated until the order is completely filled. Parallel assembly considers many orders that are picked concurrently and assign a picker to each individual order. Batching is the process of consolidating many orders to be picked concurrently by the same crew of pickers. The authors cited two different types of batching: Proximity and Time Window batching. Proximity batching groups under the same batch orders that need to be retrieved together. Under Time Window batching orders are joined together based on either the time they arrived to the logistics system or by using fixed time schedules. De Koster et al. [24] analyzed two heuristic algorithms to batch many line orders together. Seed algorithms and Time Savings algorithms were evaluated using a rectangular warehouse of 15 x 4 meters. Seven orders were processed using batches of 5 SKUs. Seed algorithm evaluation followed two basic steps: First, selection of a seed order needs to be selected using random criterion, order with higher number of aisles, the order with largest aisle, etc. Second, the other orders need to be included in the seed order until the maximum capacity of the picker is reached. The authors also evaluated the Time Savings algorithm or the picking time saved when processing two orders together instead of picking them individually. Using this definition, S-shape and Largest gap routing policies were evaluated using the same warehouse defined by seed algorithms. Finally, simulation models were applied to test

24 24 both algorithms. The results showed that order batching achieved better performance compared to traditional methods such as first come first serve (FCFS) algorithm. Also Seed algorithms produce better results when used with the S-shape routing strategy. On the other hand, Time Savings algorithms perform better with the largest gap routing strategy. Order batching procedures have also been studied in conjunction with automated storage/retrieval system (AS/RS) for just in time control policy. Elsayed et al. [25] developed a function to minimize the sum of all earliness and tardiness of the orders. This study required the creation of algorithms for each step: (1) Sequencing orders which consist of computing a priority index based on weighted sum of earliness and tardiness of each batch; (2) Creating batches using a seed order. The next orders are selected if the objective function is not increased and the number of units within the limits of the AS/RS capacity. This procedure is followed until all the orders have been included in the batches; (3) Determining release time of AS/RS and inserting idle time between batches to have the orders as required by the JIT system. To test this methodology, the authors used two storage racks with 40 bins in each rack. Other parameters of the model included horizontal and vertical speed, and capacity of the AS/RS. Results of the model show that batching methods are advantageous for large orders and for small orders there is no improvement between individual picking and batching. Researchers have also studied batching orders in warehouses with parallel aisles. Gademann et al. [26] presented a branch-and-bound algorithm to solve this problem. The

25 25 authors performed this study based on a rectangular warehouse with parallel aisles, two cross-aisles at each side of the warehouse, no congestion time considered, orders retrieved from the same I/O, batches have the same latency and the speed is constant. The authors described a basic brand-and-bound algorithm and as a next phase they improved the basic algorithm. The basic algorithm is formed by a branch-and-bound search tree. A logical multi levels tree, where each node is a batch, is formed according to the number of orders. The improvements to the basic algorithm consist of preprocessing the data and rearranging the lower bounds. To order the input data the authors applied the following strategy: Random order and decreasing total picking time. To test the model, Gademann et al. used a computer model where 24 orders and four batches were created. The algorithm as well as its improvements provides an optimal solution. However, this algorithm is not suitable for a large number of orders. Gademann and Van de Velde [27] examined a batching procedure for orders in a parallel aisle warehouse. In their paper, they developed a branch and price heuristic algorithm for batching orders in a warehouse that applied linear programming. The authors developed a mathematical model that included minimizing the cost of finding a batch. The model was tested using a computer programmed using C++ programming and Cplex linear programming software. Results showed that optimal solutions were found after 8 CPU seconds of computer processing. 2.5 Routing Strategies An additional method for reducing the travel time of the pickers is to assign efficient routes to the pickers. Routing policies are the last step to fill an order in a

26 26 warehouse. Routing strategies determine the path the picker will follow in the warehouse when obtaining the SKUs required for an order. Selecting the best routing policies in a warehouse helps to decrease the picking time, because less travel time is devoted to pick an item. According to Petersen [30], there are basically two classes of routing policies: Heuristic and Optimal. Heuristic policies are well known because they are easy to implement and only minimal training is required for the pickers. Optimal policies offer more efficient routing strategies but require complicated computing algorithms. The author described three major heuristic routing policies and the optimal routing policy. The heuristic strategies include: (1) Traversal strategy in which a picker enters an aisle and exits at the opposite side; (2) Largest gap strategy in which a picker enters an aisle only as far as the start of the largest gap within an aisle ; (3) Composite strategy which mixes transversal and largest gap policies. The Optimal routing strategy uses a mixture of policies using computing programs. Petersen also performed experimental tests to evaluate heuristic and optimal routing strategies with two storage algorithms: Diagonal and within an aisle. The test used different scenarios considering from 2 to 50 items. His studies confirmed that the composite strategy was the best heuristic policy and that within an aisle storage policy produces less picking time. Cost savings during order picking can be estimated between 3 and 30 percent of an order picking. Following the same approach, Goetschalckx and Ratliff [33] also studied the picking process in an aisle. The authors identified two problems when assigning routes for pickers. (1) Within aisle sequencing problem which consist on the routing strategies

27 27 that will guide the team of pickers within the warehouse. Routing within aisle policies include traversal, split-traversal return and split return. (2) The between aisle sequencing problem which is the strategy to pick SKUs from both sides of an aisle. This particular problem depends on warehouse layout and therefore there is not an optimum general solution for this issue. The authors developed an optimum algorithm for picking between aisles. The algorithm is based on both traversal travel and on a non-skip property that allows the picker to enter in one side of the aisle and leave on the opposite side. The problem was solved by finding the shortest distance in an acyclic graph where each node represents a location to be picked. Finally, the authors compared the results with the heuristics algorithm called Z-pick and tested the algorithm using the routing policies that included: Optimum traversal, Z-pick and Optimal return simulation. Results showed that the traversal policy performs better than a return policy. The aisle s density and the width of the aisle are valuable variables to determine the travel distance. Roodbergen and De Koster [34] also studied the problem of routing pickers in a warehouse with a middle aisle. The study was based on a common warehouse formed by a set of parallel aisles with a cross aisle in the middle of the parallel aisles. Hence, the pickers have the possibility to cross to the other aisle using the middle aisle if required. Additional assumptions include: SKUs are stored on both sides of the aisle, traversal routing is used and all items in an order can be picked up in a single tour. Using this configuration, the authors developed an algorithm that defines sub graphs considering the aisles and the items to be picked up. The problem is then redefined as

28 28 finding the shortest tour sub graph that will allow the picker to cover the areas required to fill the order. The authors used linear programming to execute the algorithm. The authors compared the performance of the algorithm using two models: Warehouse with middle aisle and without middle aisle. A warehouse with a middle aisle provides less average travel time that the one without a middle aisle. Research regarding routing policies in a warehouse with a middle aisle has been extended to a warehouse with two or more middle aisles. Roodbergen and De Koster [31] focused on routing methods for warehouses with many cross aisles. The authors developed a new heuristic algorithm called combined heuristics which is based on existing heuristics and dynamic programming. Under this policy, the picker travels from left to the farthest block and then picks all the sub-aisles from left to right using traversal travel. Later, the SKUs in the next blocks closer to the depot are picked. The heuristic algorithm repeats these steps until all the items on the order have been picked. The model was tested using a rectangular warehouse with parallel aisles and two cross aisles. Assumptions include: the picker can travel in both directions within an aisle and all SKUs can be retrieved in a single trip. The authors performed simulation experiments to compare typical heuristic algorithms such as optimal, largest gap, s-shape, aisle by aisle, combined and the new algorithm combined. The experiment showed that the combined algorithm has the best performance of the other heuristic algorithms. Another interesting finding was that the average time of spent traveling while picking decreased when changing the number from two to three aisles with the exception of small warehouses with many picks.

29 Data Mining for warehousing Data Mining is defined as the process of extracting or mining knowledge from large amounts of data [7]. The fast development of Data Mining techniques was made possible by the availability of low cost computer storage systems and high speed computer processors combined with database technology such as Data Warehousing. These factors led the creation of huge historical databases that are available in many firms General Data Mining Methods According to Turban et al. [35] Data Mining can provide two main benefits for businesses: Trends identification and discovery of unknown patterns. The authors also discussed Data Mining applications for different fields such as marketing, banking, engineering, retailing and sales. The process of extracting information follows these steps: (1) The data is pre-processed, ordered and outliers are removed from the data (2) A data warehouse with multidimensional tables is populated using different data sources (3) Data Mining techniques such as clustering, classification or association are applied to the data Once these steps have been completed, knowledge about the data is obtained to support the decision making process. Association analysis can identify patterns or behaviors that happen frequently in a database. This analysis also helps to identify the relationship between variables that are

30 30 used to explain future behavior. Classification analysis focuses on discovering models that identify classes of objects within the database. Models are later used to forecast the behavior of the class. Decision trees and neural networks are part of classification techniques. Lastly, clustering analysis forms groups of elements or clusters. The cluster obtained contains elements with similar characteristics and also are different from other cluster s elements. There are various clustering algorithms such as K-MEANS, EM (Expectation maximization) and CLARANS (Clustering large applications). The most common partition algorithm is K-MEANS. Witten and Frank [36] described the K-MEANS as the algorithm where k is the input parameter that specifies the number of clusters to be mined. Initially k center cluster are designated randomly, then the elements are assigned to their closest clusters center using the Euclidean distance. Later, the mean or centroid of each cluster is computed and this value is the new centroid of the cluster. The process is initiated iteratively until all the elements are assigned to the same cluster Data Mining for Order Batching Clustering algorithms have also been applied to study batching orders in warehouses. Hwang et al. [37] examined order batching in an automated storage and retrieval system (AS/RS) using cluster analysis. Their methodology considered identifying orders based on attribute vectors and forming batches for pickers based on similarity among orders. The authors developed and tested six different batching algorithms for order picking using simulation software. The results showed that the new

31 31 algorithms that used the concept of similarity reduced the average travel distance of pickers. Kim et al. [38] also studied order batching for an automated warehouse using clustering techniques. They developed two heuristic algorithms: An x-coordinate algorithm and a clustering algorithm for the robots that perform the picking operations. Experimental results of this methodology showed an optimal solution for all the cases presented. Another clustering procedure for batching orders was proposed by Chen et al. [1]. The authors explored the problem of batching orders in warehouses with parallel aisles. They described a batching procedure based on association rule mining for customer orders. The association rule mining creates an order-item table, transposes the table into an item-order table and later applied the APRIORI algorithm to search for associations among orders. The results of the algorithm showed that the new algorithm provides a short average travel distance compared with typical batching procedures Data Mining for Storage Location Assignment Clustering algorithms have also been applied to improve storage location policies in warehouses. Jane and Laih [39] developed a heuristic algorithm to locate items in a multi-zone picking warehouse. The objective function was to minimize the idle time when the pickers were working on the same order. Therefore, the authors developed a heuristic algorithm to solve the Natural Cluster problem (NC). Finally, the authors used historical data from a real warehouse to test the heuristic. The results showed a 29%

32 32 increase in the utilization of the picking system and a 18% reduction in the average time spent picking the SKUs for an order. Wu [40] explored the location problem in warehouses to identify small number of items that can be stored in an automated order completion zone. The author used a frequent item set algorithm to develop his own algorithm for mining databases. He also performed experiments with the new algorithm and results showed that the new approach performed better than the item-order-completion distribution (IOCD) defined by Frazelle. Liu [41] studied clustering techniques for stock location in warehouses and utilized data from customer orders. The author utilized customer orders to obtain knowledge and improve both the stock location and the picking process of the warehouse. The clustering model uses two important similarity measures between the pairs of items: (1) Similarity between pairs of items (2) Similarity between pairs of customers In terms of implementation, the algorithm was tested using a computer simulation program named WITNESS. The results showed that the storage location policy using clustering could decrease the travel time of pickers. Rosenwein [42] applied clustering analysis to the problem of organizing items within a warehouse. The author utilized a binary algorithm to determine similarity among items in a set of orders and define a distance function to measure similarity between two SKUs. The distance between two different SKUs was calculated using the formula determined by Rosenwein. w ij = Q i j [ v ] q vq q= 1

33 33 Where: w ij =Distance between SKU i and SKU j v i = 1 if SKU i is included in order q; otherwise value is 0 (zero) v j = 1 if SKU j is included in order q; otherwise value is 0 (zero) Then, the cluster median, SKU that represents all SKUs of a single cluster, is computed. The goal of this method is to minimize the distance among items of the same cluster and its median value; while maximizing the distance among the SKUs in the other clusters. The author tested this clustering algorithm with a random assignment algorithm and observed 14% improvement. Ming-Huang et al. [43] proposed a new storage location method that assigns locations to new SKUs based on the association rule mining between the available storage location and the new SKUs. The method is adaptive and focuses on individual SKUs that need a storage location without the need to reorganize the entire warehouse. The method is developed in three phases: (1) Develop the association rule mining, (2) Develop the AIX index to evaluate the correlation between the new SKU in the warehouse and available storage locations and (3) Formulate the storage assignment based on the AIX index and the storage locations. The goal in Phase one is to mine historical data in order to discover association rules among SKUs, The main indicator is the frequency with which products appear together in an order but in order to eliminate small support values, the weighted support count (wsupc pq ) is used. In addition the authors define the following variables for the model:

34 34 Parameters I The location set which contains m locations I = {1, 2,..., m} K The product set which contains l locations K = {1, 2,..., m} L i T k D i The set of products already allocated within the aisle of location i The turnover rate of product k The distance between location i and the outbound exit wsupc pq The weighted support count between product p and q; Decision variable X ik 1; if product k is allocated in location i 0; otherwise The goal of Phase 2 is to develop the association index (AIX) determined by the following formula: AIX ik = k ' Li w c T i k sup x (1) kk ' Di The formula includes the association between product k and location i and products to location i (k ), the turnover rate of product k (T k ) and the distance between location I and the exit (D i )

35 Phase 3 formulates the generalized model to resolve the storage location problem. The model is formulated as follows 35 Max i k [ ] AIX ik X ik (2) X ik =1 k (3) X ik <=1 i (4) { } 0,1 = X ik (5) In the model equation (3) guarantees that each SKU will be assigned to a storage location, equation (4) limits each location so that it can only contain one and equation (5) assures a binary solution for the model. Note also that AIX is included in equation (2). The model was tested using data from a grocery store in Taiwan and an experimental setting that mimics a real distribution center. The DMSA method was then compared against random storage assignment and results indicate that travel distance improved between 0.40% and 2.85%.

36 36 CHAPTER 3: METHODOLOGY This chapter will explain the methodology on which this study is based. The description includes the variables and assumptions stated for this warehouse system. Furthermore, both the heuristic and the process of mining the historical warehouse data will be explained. Finally, both the model used to cluster historical data and the clustering method used will be explained. 3.1 System Description The travel time during order picking has been identified as the most costly activity within warehouse operations [4]. Many approaches have been developed to store products in order to minimize the travel distance when picking the SKUs. One popular approach is to divide the warehouse in two areas: Forward or Fast-pick area and Reserve area. The Forward area is used to store the SKUs in low quantities and typically the picking unit includes eaches and cartons. On the other hand, the Reserve area is used to store pallets of SKUs, which are used for replenishment of the forward area. The most important distinction between Forward and Reserve area is that the Forward area only contains a small quantity of each SKU (e.g., 1 pallet or 1 carton) for more efficient orderpicking. Therefore, the Forward area is a lot smaller than the Reserve area in order to reduce the travel distance of pickers. As a result, the picking process in the Forward area is less expensive than that of the Reserve area because there is less distance to travel between SKUs. A question that arises when using this typical layout is how SKUs will be stored in the Fast picking area. For instance, the most popular SKUs can be stored in this area

37 37 and use random or fixed storage assignments or a combination of these. Another possible solution is to arrange the products in classes and then assign each class to areas within the Forward Area. The clusters defined in this study are based on the historical behavior of customers orders and will be used to determine the storage locations for the SKUs. The ultimate goal is to define warehouse within a warehouse areas where products can be stored together to minimize the travel distance of the pickers. Data Mining association algorithms were also studied to determine relations among SKUs but with association, behavior can only be predicted for two SKUs and not for a group of SKUs which is what is needed define a warehouse within a warehouse. The system applies Data Mining techniques and more specifically a clustering algorithm to mine the data obtained from historical orders of a warehouse. Hence, a new distance-based clustering algorithm is defined to form the clusters because of its simplicity and the ability to perform well with large datasets. Clustering techniques incorporate the concept of similarity between pairs of items that must be stored in the same cluster. The objective is to keep within each cluster SKUs that were ordered together to maximize the probability that SKUs within the cluster will be picked together in future orders. The clusters defined by this methodology will be used to assign SKUs to aisles in the warehouse. To evaluate the effectiveness of the clustering method, the metric defined is the number of aisles that the pickers have to travel to pick an order. This objective function was selected because the exact location of an SKU in an isle in not determined; hence it

38 38 is impossible to calculate the travel distance to a location in an aisle. This metric is equivalent to minimizing the total distance traveled by the picker but allows the calculations to be simplified since warehouse distances will vary among warehouses. 3.2 Assumptions For purposes of a model definition, the following assumptions have been made: Lateral Travel distance: The typical Forward area of a warehouse is shown in Figure 1. The distance to pick SKUs from each aisle to the I/O dock is ignored, hence no lateral travel is considered. Figure 1: Warehouse Layout Capacity: Typically, the picker has a limited capacity on the number of SKUs that could pick on a single trip. The methodology assumes that the picker can retrieve all SKUs from an order on a single trip per aisle. Hence, the calculations are simplified.

39 39 Otherwise, the model should consider several trips to pick items on the same aisle and consider using other routing strategies, which will complicate the model. 3.3 Model This section explains the variables and processes that define the Data Mining model. The purpose of this section is to provide a model that can be implemented by a computer program for testing purposes. Throughout this section Dataset1 is the database that belongs to a warehouse in the office supply market Variables The input variables used for the system include database tables and warehouse management variables as detailed below. Database Tables The model requires as minimum input the tables described in Table 1 and Table 2. Table 1 shows the table Item Master structure that contains all the different SKUs managed by the warehouse. Table 2 shows the table Sales structure that contains the sales transactions of a complete year. Table 1: Item Master Field Field Name Comments 1 ITEM Unique ID 2 STATUS Item Status 3 SALESUOM Sales Unit of Measurement

40 40 Table 2: Sales Transactions Field Field Name Comments 1 TXNDATE Transaction Date 2 ITEM Unique ID 3 ORDNBR Purchase Order Number 4 LINENBR Purchase Line Order Number 5 SALES_UOM Sales Unit of Measurement 6 SHPQTY Shipped Quantity Warehouse Variables The model uses the following input variables: NAisles NClusters Number of aisles of the Forward area Number of clusters defined to mine historical data In terms of output variables, the objective function used to measure the performance of the model is the Total Number of Aisles Visited which is defined as the number of different aisles a picker must visit to retrieve the set of SKUs for a hypothetical order Data Reduction The initial step of this methodology consists of reducing the sales transactions table into a consolidated dataset. The information is grouped on pairs of SKUs that are together within a single order. The number of times that pair of SKUs is ordered provides a better measurement of a SKU s affinity for another SKU; therefore pair association will attract together SKUs that are often ordered together. The new dataset based on pairs of SKUs is obtained using the following algorithm: (1) Create a bi-dimensional matrix based on all the possible combinations of the SKUs from table Item Master. The matrix contains all the possible pairs (SKU1, SKU2) that can be ordered together.

41 41 (2) After creating the matrix, the next step is to calculate how many times each pair appears in an order. To compute this value the following Datasets are used: PairAll : Table that contains SKU1 and SKU2 and NumberTimes Sales: Table with all the yearly orders To compute the number of times a pair of SKUs appears together in the sales database, the pair (SKU1, SKU2) is checked in each order and then all the occurrences are added. The same procedure is followed by the other pairs of SKUs. A portion of the resulting matrix is shown in Table 3. As can be seen, certain pairs of SKUs do not appear together in the database (Zero value). Note also that a pair is defined as the instance of two SKUs that are ordered together (SKU1, SKU2); therefore for example the value for pair (SKU1, SKU2) is the same as the pair (SKU2, SKU1).

42 42 Table 3: Pairs Dataset SKU2 SKU11 SKU4 SKU5 Pairs SKU SKU SKU SKU SKU SKU SKU SKU SKU9 205 SKU SKU12 SKU13 SKU14 SKU15 SKU16 (3) Remove the pairs (SKU1, SKU2) that have a value of zero occurrences in the pairs dataset (Table 3) because these pairs will not be considered for the analysis. A portion of the resulting dataset is shown in Table 4. Table 4: Cleaned Pairs Dataset First SKU1 Second SKU2 Frequency SKU1 SKU2 192 SKU2 SKU SKU3 SKU4 320 SKU3 SKU5 203 SKU4 SKU5 645 SKU4 SKU SKU5 SKU SKU6 SKU SKU7 SKU SKU8 SKU SKU9 SKU SKU1 SKU11 74 SKU2 SKU

43 Data Pre-Processing This phase consists of selecting the pairs of SKUs that will be the input dataset for the clustering phase. The dataset described in Table 4 was obtained after calculating the number of times a pair of SKUs is ordered more than one time together. In order to select the pairs to be mined, the pairs with highest and lowest demand are removed from the dataset. In preliminary testing, the SKUs with highest demand tended to attract many SKUs which created some clusters with many SKUs. SKUs with lower demand are not stored normally in the forward area. For instance, the Dataset1 contained 7,139,075 pairs from 27,713 SKUs and this number was reduced eliminating the lowest and highest demand pairs. To eliminate the low demand pairs and select the SKUs that will be stored in the forward area (Typically between 100 and 150 SKUs) a threshold of 200 occurrences (Pairs with more than 200 times together) was considered for this analysis. This reduced the number of SKUs to 151. Furthermore, the 20 most popular SKUs were eliminated from the table. This again reduced the number of SKUs to 133. Note that the threshold can vary according to the number of SKUs and the space available in the forward area. In the dataset analyzed 133 is a reasonable number of SKUs to be stored in the forward area. An additional step consists of generating a new matrix with all the different combinations of SKUs with more than 200 occurrences. Since the value of some pairs of SKUs was zero, it is necessary to extract those pairs from the sales database. The new dataset will be used as an input dataset in the next phase. Table 5 shows a portion of the resulting dataset.

44 44 Table 5: Dataset Result First SKU1 Second SKU2 Frequency SKU1 SKU SKU2 SKU SKU3 SKU SKU3 SKU SKU3 SKU SKU4 SKU SKU4 SKU SKU5 SKU SKU6 SKU SKU7 SKU SKU8 SKU SKU9 SKU SKU10 SKU Clustering Historical Orders This phase consists of applying the Data Mining clustering algorithm to group the SKUs which will be stored together in the Forward Area. The six-step algorithm described below is used to assign SKUs to a cluster through an iterative process. The sixstep clustering algorithm is defined below. Input Variables Number of clusters: nc Big Number: V max Number of orders that contain both sku i and sku j : v ij Number of iterations: ni ; defines the number of times the six-step algorithm will run. The algorithm will stop and consequently assign a value to ni under the following conditions: 1) If the clusters assigned to the SKUs do not change from the previous iteration to the current.

45 45 2) If the clusters assigned to the SKUs is identical to a previous iterations. 3) If ni has reached the maximum number of iterations defined by the application. The methodology is defined by the following steps: Step 1 The Step 1 consists of creating a bi-dimensional matrix using the dataset resulting from the Pre-Processing phase which was shown in Table 5. The resulting matrix will have 50% of the elements with values since the matrix is symmetric and repeated values are considered once. The resulting matrix is shown in Table 6. Table 6: Number of orders containing SKU pair SKU1 SKU2 SKU3 SKU4 SKU5 SKU6 SKU SKU SKU SKU SKU5 5 SKU6 Step 2 This step consists of converting the matrix by dividing each value by the maximum value within the matrix (v ij = v ij /v max ). The goal is that pairs that appear more frequently in the order database should have greater values than pairs that appear together only a few times, with all values normalized between 0 and 1. The scale helps to simplify the calculations and also drops significantly the number of figures. Table 7 shows the normalized matrix.

46 46 Table 7: Matrix scaled values SKU1 SKU2 SKU3 SKU4 SKU5 SKU6 SKU SKU SKU SKU SKU SKU6 Step 3 This step makes the initial assignment of SKUs to a cluster based on popularity. The popularity of SKUs is computed by adding the number of times that the SKU is ordered in Dataset1. The initial assignment consists on assigning the most popular SKUs to Cluster1, next in Cluster2, etc. As shown in Table 8, two clusters are defined as Cluster1 and Cluster2. Table 8 shows the matrix resulting from the cluster assignment.

47 47 Table 8: Randomly cluster assignment SKU1 SKU2 SKU3 SKU4 SKU5 SKU6 Cluster# SKU SKU SKU SKU SKU SKU6 Cluster1 Cluster2 Step 4 This step consists on computing the weight among SKUs that belong to the same cluster. The weight is defined as the sum of all distance-weight from an SKU to the other SKUs within the same cluster. The formula below is used to compute the weight. Where: wi = m [ vij. xijk ] j= 1 w i =Weigh for SKU i v ij =Distance between SKU i and SKU j x ijk = 1 if i j are in cluster k ; 0 otherwise m = Number of SKUs For instance to compute the weight of SKU1, the weight of SKU1- SKU2 (0.35) and SKU1-SKU3 (0.25) are added resulting in a weight of Table 9 shows the computed weight of all SKUs.

48 48 Table 9: Computed Weight SKU1 SKU2 SKU3 SKU4 SKU5 SKU6 Cluster # Weight SKU SKU Cluster SKU SKU SKU Cluster SKU Step 5 Next, the SKU that has the maximum weight in each cluster is selected as the centroid of each cluster. Only one centroid is defined for each cluster. For the example in Table 9, the centroids for cluster1 and cluster2 are SKU1 and SKU4 with values 0.60 and 1.40 respectively. Step 6 Using the centroids selected from Step 5, a new Matrix of distances from an SKU to the centroids is created. The assignment of SKUs to the new cluster is made by selecting the maximum distance from the SKUs to the centroid of the cluster. Table 10 shows the detailed process.

49 49 Table 10: Assigning new clusters Weight with Centroid SKU1 SKU4 Result New Cluster SKU Assign to SKU4 Cluster2 SKU Assign to SKU4 Cluster2 SKU Assign to SKU4 Cluster2 SKU Assign to SKU4 Cluster2 Note that SKUs SKU4 and SKU1 cannot be moved out of the cluster while it is a centroid, so they are not included in the table. Using these new cluster assignments, Step 3 to Step 6 are repeated the number of times defined by the variable ni (Number of iterations) Storage Assignment Layout The last phase of this method is to use the clusters defined from the previous step to assign SKUs to aisles in the warehouse. The procedure for SKU assignment is defined as follows: (1) Order the SKUs based on the cluster assigned. First, SKUs with cluster1, cluster2, and so on; (2) Start assigning SKUs to the aisles in the warehouse. When an aisle is filled, continue to the next aisle until all SKUs are assigned. The assignment should start with the aisle closest to the warehouse door. Table 11 shows a portion of the dataset after the aisle assignment process.

50 50 Table 11: Partial Cluster Assignments SKU Aisle # Cluster # SKU1 1 1 SKU2 1 1 SKU3 1 1 SKU4 1 1 SKU5 1 1 SKU6 1 1 SKU7 1 1 SKU8 1 2 SKU9 1 2 SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU SKU As can be seen, the column Aisle# is the final aisle assignment for the SKUs in the in the column SKU. The column Cluster # is the cluster determined by the procedure on Table Computing Objective Function To evaluate the new storage assignment method, the objective function was defined as:

51 51 Minimize: Number of Aisles to be visited per order picked To compute the number of aisles visited per order, the methodology follows the steps described below. Selecting orders to be tested To test the new storage assignment strategy, a subset of orders from the historic sales database was selected. The orders were selected only if they had at least five different SKUs (line-items). If orders with less than five SKUs are retrieved, the number of aisles that the pickers have to travel becomes the same as if they were working in a system in which SKUs are assigned based on popularity. Therefore, the improvement will not be significant for orders containing less than five SKUs. Determining the aisles to be visited for each SKU To determine the aisles to be visited to pick up each SKU, the aisle assignments made in section are utilized. Table 12 shows the aisle assignment of the orders selected in the previous step. Table 12: Clusters Assigned Aisle to be Visited List of orders to be Tested for Each SKU Order SKU1 SKU2 SKU3 SKU4 #1 #2 #3 #4 1 SKU1 SKU SKU3 SKU4 SKU5 SKU SKU7 1 4 SKU8 SKU9 SKU SKU11 SKU12 SKU SKU11 SKU SKU SKU8 SKU SKU16 SKU8 SKU17 SKU SKU15 2

52 52 The columns Aisle1, Aisle2, Aisle3 and Aisle4 are computed using the information shown in Table 11. Computing the aisle to visit per each SKU To compute the number of visits to each aisle per each order, the number of visits to each aisle is counted and then consolidated and shown in Table 13. Computing the number of aisles to be visited per order This step computes the number of different aisles visited per order retrieved. To compute this value, the number of visits to each aisle is counted. Table 13 shows the result. Computing total number of aisles visited The final step is to add the total number of aisles visited per order retrieved. See Table 13 for the results. This is the final value that is used to compare different scenarios in the Chapter 4 section.

53 53 Table 13: Complete Process for computing No. of aisles visited List of orders to be Tested Aisle to be Visited for Each SKU # of Visits to Each Aisle ` Order SKU1 SKU2 SKU3 SKU4 Aisle1 Aisle2 Aisle3 Aisle # Aisles Visited 1 SKU1 SKU SKU3 SKU4 SKU5 SKU SKU SKU8 SKU9 SKU SKU11 SKU12 SKU SKU11 SKU SKU SKU8 SKU SKU16 SKU8 SKU17 SKU SKU Total 17

54 54 CHAPTER 4: TESTING AND RESULTS This chapter discusses the process outlined to evaluate the model described in Chapter 3. This chapter is divided into two sub-sections. The first section discusses the procedure used to test the model, and the second section discusses the results obtained from testing and compares them with traditional storage assignment policies. 4.1 Testing Method To test the methodology, historical datasets from two real warehouses were obtained. The first warehouse is part of the supply chain of a company in the office supply market (Dataset1), while the second dataset belongs to a company in the retail industry (Dataset2). To test the methodology, the following assumptions are considered. (1) The methodology assumes that the picker can retrieve all the SKUs of an order in a single trip; hence each aisle is visited a maximum of one time per trip. (2) The model assumes that the picker has unlimited capacity. (3) The model assumes that by minimizing the number of aisles visited to retrieve the SKUs per order minimizes the total picking time per order. The model also has the limitation of not managing the exact location of the SKU in the warehouse. The aisle is considered as unique location to simplify calculations. The testing process followed the methodology described in Chapter 3 to determine clusters and assign SKUs to aisles in the storage area. Dataset1 contains the tables Items and Sales for warehouse transactions. Table 14 lists the most important features of both tables from DataSet1 used as input to the model.

55 55 Table: Item Table 14: Features Input Tables DataSet1 Table: Sales Feature Value Feature Value # records 27,713 # records 2,184,153 records # valid SKUs 27,713 # orders 845,375 orders Data Reduction The process of data reduction, which reduces the Sales table into a set of pairs of SKUs that are ordered together, was performed to Dataset1. The results obtained are saved in the Pairs table that includes 7,139,075 different pair of SKUs (SKU1, SKU2) for Dataset1. The pairs of SKUs (SKU1, SKU2) with zero occurrences were eliminated Data Pre-Processing The first step of the data pre-processing phase consists of filtering all pairs from DataSet1 (table Pairs ) based on the number of times that the pair (SKU1, SKU2) are together in the database. To pre-process the Pairs table, two different scenarios were chosen to identify how the frequency of pairs in the database influences the outcome of the Data Mining method. This analysis is relevant because the greater the number of pairs, the more time the method will take to process. The first scenario considers the pairs that appear more than 200 times together (SKU1, SKU2 200 times), while the second scenario includes the pairs that are together for more than 300 times (SKU1, SKU2 300 times). The second step of the data pre-processing utilizes the table generated in Step 1 (Data Reduction) to generate a matrix with all the possible combinations of the relevant SKUs and then obtain the number of times these pairs appear together in the Sales table

56 56 from Dataset1. The results of this task are shown in Table 15. As can be seen, the number of pairs decreased from 7,139,075 to 7,734 and 2,628 respectively from the previous step. The pairs of SKUs (SKU1, SKU2) with zero occurrences were eliminated. Table 15: Valid Pairs for DataSet1 Table: Pairs > 200 Table: Pairs > 300 Feature Value Feature Value v ij > 200 7,734 pairs v ij > 300 2,628 pairs Clustering Historical Orders To allocate the SKUs into clusters, the six-step algorithm defined in Section was utilized. The following variables were considered in this analysis: (1) Pair of SKUs (SKU1, SKU2) that appear in sales orders at least 200 or 300 times. The objective was to test how the number of SKUs can influence the outcome of the algorithm. Two scenarios V ij >200 and V ij >300 were tested. Since, V ij >200 contains more SKUs than V ij >300, V ij can be used to restrict the number of SKUs that can be stored in the forward area of the warehouse. (2) Five different scenarios containing 2, 3, 5, 7, and 11 clusters were studied. The methodology was also tested considering the number of clusters as a variable. A cluster is a group of SKUs that have some similarity; therefore we were interested to find out how the internal grouping of SKUs was impacting the final outcome of the methodology.

57 57 (3) Two different methods for initial assignment of clusters were analyzed. As defined in Step 1 of the Six-step algorithm, the SKUs are initially assigned to the clusters and then the iterative process is run until SKUs are stable (Not migrating or minimum movement to other clusters). Two different methods for initial assignment were tested: Demand-based: Consists of assigning the SKUs based on popularity. The most popular SKUs are assigned to Cluster 1, the next most popular to Cluster 2 and so on until all SKUs are assigned. Random: Consists of assigning the SKUs randomly to the initial clusters. This method is commonly used to improve the utilization of the storage areas in the warehouse because the SKUs can be stored in any existing area in the warehouse Storage Assignment Layout Using the cluster assignment from the previous step, the SKUs were assigned to the aisles in the warehouse and compared with Demand-based aisle assignment. Demandbased consist of assigning the SKUs to an aisle based on popularity. Furthermore, a warehouse with 9 different aisles was considered and the number of SKUs was assigned uniformly to each aisle. The assignment of locations within an aisle was not considered because of the assumption that the picker will travel to the complete aisle to retrieve the all the SKUs stored in that aisle.

58 Computing Objective Function Using the aisle assignment from the previous step, the total number of aisles visited per order was computed. To test the new storage location assignment, a set of Sales orders from Dataset1 were obtained and the following steps were followed: The orders with more than 5 different SKUS (order lines) were selected and account for just 11% of total orders on Dataset1. However this method will provide greater benefits when orders with more line items are processed. The aisles to be visited to pick each SKU in each order were obtained from the sixstep clustering algorithm. The number of times an aisle was visited for each order was computed using the SKUs for each order in the database. Lastly, the total number of aisles to be visited per order was obtained. An example of how the values for the Objective Function were calculated for a sample of the orders is shown in Table 16.

59 59 Table 16: Calculation of Total Number of Aisles visited SKU Aisle1 Aisle2 Aisle3 Aisle4 Aisle5 Aisle6 Aisle7 Aisle8 Aisle9 Total As can be seen on Table 16, the column Total shows the number of aisles that the picker traveled to pick up a specific order. Note that some values in the Total column are zero because the items from the order tested were not part of the items considered in the forward area (133 SKUs for Dataset1).

60 Results This section explains the analysis performed to two different datasets. The fundamental analysis was performed on DataSet1 and findings were confirmed when DataSet2 was mined Clustering DataSet1 Table 17 shows the results of the performance of the six-step Data Mining algorithm using the Random and Demand-based initial cluster assignment. Furthermore, a comparison of these two methods with Demand-based aisle assignment is performed. Table 17: Comparison Total Number of Aisles visited Clusters v ij Data Mining Demand based Data Mining Random based Demandbased Improvement Data Mining Demand vs. Demandbased Improvement Data Mining Random vs. Demandbased n>200 58,750 59,939 67, % 13.20% n>300 45,385 44,897 52, % 16.87% n>200 59,817 59,705 67, % 13.65% n>300 45,557 45,682 52, % 14.86% n>200 63,422 60,504 67, % 12.14% n>300 46,356 47,042 52, % 11.54% n>200 61,236 61,328 67, % 10.64% n>300 49,559 46,509 52, % 12.82% n>200 65,171 63,014 67, % 7.68% n>300 48,716 48,392 52, % 8.43%

61 61 As shown in Table 17, each row contains the total number of aisles visited for different number of SKUs using Data Mining Demand-based and random, and Demandbased. The percentage of improvement was calculated between the Data Mining method (Initial Demand-based and Random) and the Demand-based aisle assignment because it is a typical assignment method used in warehouses and distribution centers. Table 17 also shows two cutoff values: Pair of SKUs with v ij > 200 and v ij >300. These values were introduced in the analysis to determine how the number of SKUs influences the outcome. Results showed that the Data Mining algorithm with either Demand-based or Random-based outperformed the Demand-based algorithm. The percentage of improvement range from 4% to 15% for Data Mining Demand-based against Demandbased and from 7% to 16% for Data Mining Random-based against Demand-based. Using Minitab, two experiments were run to prove the improvements achieved. The first experiment analyzed how the number of clusters, sample size and method (Main effects) influenced the Total number of aisles visited (Response variable). The p value for the number of clusters, sample size and method was 0.002, 0.0 and 0.0 respectively. Since all the p values are less than 0.05, there is a significant effect of each variable on the total number of aisled visited. Appendix C shows the detailed Minitab results. The second experiment analyzed how the number of clusters, sample size and method (Main effects) influence the Percentage of improvement (Response variable). The p value for the number of clusters, sample size and method was 0.054, and respectively.

62 62 Since all the p values are greater than 0.05, there is no significant effect of each variable on the percentage of improvement. Appendix D shows the detailed Minitab results. The variance of Percentage of improvement was also analyzed using Minitab. The two columns studied included: 1) Improvement Data Mining Demand vs. Demand based; 2) Improvement Data Mining Random vs. Demand based. The p value obtained was Since the p value is greater than 0.05, we can conclude that there is no significant difference between both columns of improvement. Detailed results are shown in Appendix E For further analysis, the scenario with 5 clusters was selected to analyze two more variables: (1) Number of line items per order in the set of orders used to test the Data Mining algorithm. This variable helped to focus in the picking routes in order to evaluate performance of the method. (2) Whether or not to include the 20 most popular SKUs in the analysis. Table 18 shows the results after including all popular SKUs in the analysis. Table 18: Test Results including the most popular SKUs DataSet1 Total Aisles LineItems Visited Data Mining Demand-based Total Aisles Visited Demandbased Percentage Improvement % % % % % % % % % % % %

63 63 As can be seen in the table above, the Data Mining Demand-based assignment has fewer aisles visited in 33% of the data points compared with Demand-based. Figure 2 shows the results of Table 18 graphically Aisles Visited Data Mining Demand-based LineItems Figure 2: Total Aisles Visited including Popular DataSet1 Percentage Improvement 35% 30% 25% 20% 15% 10% 5% 0% -5% -10% -15% -20% LineItems Figure 3: LineItems vs. Improvement including Popular DataSet1 Figure 3 shows the Percentage of improvement between Data Mining Demandbased and Demand-based assignment for all the different line items selected. As can be

64 64 seen, the improvement is positive in 4 cases, equal in 3 cases and negative for the remaining 5 cases. In addition, Data Mining method performs best for moderate numbers of lines. For small numbers of lines, both methods only have to visit few aisles and for large orders, they must visit almost all aisles, so it's more difficult for Data Mining Demand-based to have a benefit. Table 19 shows the results of analyzing DataSet1 for different numbers of line items without including the 20 most popular SKUs. As shown, the trend is a positive improvement for all different SKUs mined. Figure 4 shows a graphical comparison of the Total Aisles Visited between Data Mining Demand-based and Demand-based aisle assignment method. As can be seen, the Data Mining Demand-based method performed better than the Demand-based method for 92% of the Line Items tested. Table 19: Test Results without including the most popular SKUs DataSet1 Total Aisles Visited Demandbased Total Aisles Visited Data Mining Line Items Demand-based % % % % % % % % % % % % Percentage Improvement

65 65 Aisles Visited LineItems Data Mining Demand-based Figure 4: Total Aisles Visited without including Popular DataSet1 25% Percentage Improvement 20% 15% 10% 5% 0% LineItems Figure 5: Line Items vs. Improvement without including Popular DataSet1 As can be seen in Figure 5, the percentage of improvement of Data Mining Demand-based over Data-Mining method is positive for all data points tested. Finally, Figure 6 shows the positive improvement of Data Mining Demand-based assignment over Demand-based whether or not the analysis includes popular SKUs.

66 66 Improvement 35% 30% 25% 20% 15% 10% 5% 0% -5% -10% -15% -20% LineItems Improvement with popular SKUs Improvement without popular SKUs Figure 6: Improvement comparison for popular and non popular SKUs To prove the above findings, the variance of the Percentage of improvement was analyzed using Minitab. The two columns studied included: 1) Percentage improvements including popular SKUs; 2) Percentage improvement without include popular SKUs. The p value obtained was Since the p value is lower than 0.05, we can conclude that there is significant difference between the column Improvements. Detailed results are shown in Appendix F.

67 Clustering DataSet2 Following the same procedure as Section 4.2.1, DataSet2 was analyzed to compare the results obtained from DataSet1. The chosen scenario was 5 clusters and pairs of SKUs that appear more than 200 times in the database (v ij >200). The first analysis consisted of considering all SKUs including the 20 most popular ones. Table 20 shows the improvements obtained when the most popular were considered in the analysis. As can be seen, there is a positive improvement of the Data Mining algorithm for 6 out of 12 numbers of line items selected. Figure 7 shows the same data graphically. Table 20: Test Results including the most popular SKUs DataSet2 LineItems Total Aisles Visited Data Mining Demand-based Total Aisles Visited Demand-based Percentage Improvement % % % % % % % % % % % %

68 68 Aisles Visited Data Mining Demand-based LineItems Figure 7: Total Aisles Visited including Popular DataSet2 Figure 8 shows the percentage of improvement between Data Mining Demandbased and Demand-based algorithm. As can be seen, the Data Mining method outperformed Demand-based assignment method in 50% of the observations. In addition, the Data Mining Demand-based performed better for line items between 24 and 34, while Demand-based performed better for line items greater than 40. Percentage Improvement 30% 25% 20% 15% 10% 5% 0% -5% -10% LineItems Figure 8: Line Items vs. Improvement including Popular DataSet2

69 69 The second part of the analysis of Dataset2 included all pair of SKUs without including the 20 top popular. Table 21 shows that 8 out of 12 numbers of line items evaluated show positive improvement. Table 21: Test Results without including top popular DataSet2 Line Items Total Aisles Visited Data Mining Demand-based Total Aisles Visited Demand-based Percentage Improvement % % % % % % % % % % % % Figure 9 shows a graph with the percentage of improvement between Data Mining Demand-based over the Demand-based algorithm. The improvement is greater than the scenario where the Top Popular were included in the analysis. Figure 10 shows the improvement per each different number of line items.

70 70 Aisles Visited Data Mining Demand-based LineItems Figure 9: Total Aisles Visited without including Popular Dataset2 30% Percentage Improvement 25% 20% 15% 10% 5% 0% -5% -10% LineItems Figure 10: Line Items vs. Improvement without including Popular DataSet2

71 Comparative results using the Rosenwein Distance Metric To compare the proposed algorithm described in Section 3.3.4, the clustering model proposed by Rosenwein was tested in order to determine the effectiveness of applying his method to the Dataset1. The strategy was to compute the weight of each pair of SKUs (SKU i, SKU j ) using the formula determined by Rosenwein [42] in Chapter 2. Then the Data mining Demandbased algorithm was utilized to compute the distance between each pair of SKUs and finally the clusters were grouped. Table 22 shows a comparison of both methods tested, Table 22: Comparison between Rosenwein and Data Mining Demand-based Data mining Method Step1: Create a bi-dimensional table using number of times a pair of SKUs appears together Rosenwein Method Step1: Compute the weight between each pair of SKUs using Rosenwein formula Step2: Divide each matrix value by the v max number Step3: Assign SKUs based on popularity Step4: Compute weight/distance between each pair of SKUs Step5: Compute the centroid and assign cluster First, the distance between two different SKUs was calculated using Rosenwein s formula which was discussed in Chapter 2. The 133 SKUs from Dataset1 were the input to compute all possible combinations of pairs of SKUs. Then, the distance between each pair of SKUs (SKU i, SKU j ) was calculated as detailed below: Per each pair of SKUs (SKU i, SKU j ), the values v i and vj were calculated, using the sales transactions database and the Rosenwein s formula from Chapter 2.

72 w ij = Q i j [ v ] q vq q= 1 72 Where: w ij =Distance between SKU i and SKU j v i = 1 if SKU i is included in order q; otherwise value is 0 (zero) v j = 1 if SKU j is included in order q; otherwise value is 0 (zero) Next, the absolute value of the difference per each pair (w ij ) was calculated and the final value was stored in a database. The resulting table was then the input for the Data Mining methodology starting at step 2. The resulting clusters then were assigned to aisles 1 through 9 and using the historical database of orders, a simulation was run to compute the total number of aisles visited. Results from the testing are shown in Table 23. As can be seen, the combined Data Mining and Rosenwein method also performed 13.4% better than the Demandbased but it was equivalent to the Data Mining Demand-based described in the present thesis. Clusters Table 23: Results of Rosenwein and Data Mining algorithms Vij Data Mining Demandbased Data Mining Rosenwein Demandbased Improvement Data Mining Demand- based vs. Demandbased Improvement Data Mining Rosenwein vs. Demandbased 5 >200 57,871 58,749 67, % 13.4%

73 73 To further study the Rosenwein method, the order of the SKUs was modified before running the Data Mining Demand-based. In the Rosenwein method, the SKUs were ordered alphabetically. However, changing the alphabetical order to either Random based or Demand based, negatively affected the total number of aisles visited in the Data Mining Demand-based method. As can be seen in Table 24, the Demand based method outperformed the Data Mining Demand-based model in 67% of the observations. Table 24: Results Data Mining demand based vs. Demand based Initial Order SKUs Random order Demand based order Initial Cluster Assignment Demand based Random Demand based Random Total Aisles visited Data Mining Total Aisles visited Demand- based Improvement Data Mining vs. Demandbased Dataset Dataset1 70,318 67, % Dataset2 115, , % Rosenwein 71,832 67, % Dataset1 69,867 67, % Dataset2 116, , % Rosenwein 72,096 67, % Dataset1 66,430 67, % Dataset2 112, , % Rosenwein 68,210 67, % Dataset1 66,649 67, % Dataset2 110, , % Rosenwein 68,135 67, % The result shows a relationship between the Data Mining Demand-based method and the initial physical order of SKUs. As noted in Table 23, the best results are achieved when alphabetical order is used as initial physical assignment of SKUs.

74 74 A hypothesis to explain these results is the fact that alphabetical order of SKUs could show a relationship between items. For instance, when creating SKUs in an ERP system, SKU alphanumeric notation could group items in families or categories of SKUs.

75 75 CHAPTER 5: CONCLUSIONS 5.1 Main Findings This thesis presents a clustering method for assigning SKUs to locations in the forward storage area using historical data from real warehouses. The method defines clusters of SKUs that are assigned to contiguous areas in the forward area of a warehouse in order to minimize the picking time while retrieving a customer order. The method was tested using historical order data from two different warehouses and results of the following methods were compared: 1) Data Mining Demand-based sets the initial cluster order based on demand and then assigns SKUs to the forward area based on cluster assignments; 2) Data Mining Random where the initial cluster order is made randomly and then SKUs are assigned to the forward area based on cluster assignments; 3) Demand-based assigns the SKUs to aisles in the Forward area based on demand. The Data Mining assignments (Demand-based and Random) were compared against the Demand-based method. The results have shown the effectiveness of the Data Mining method compared with Demand-based method. As result of this comparative analysis, the following conclusions about the effectiveness of the Data Mining assignment method were obtained: First, the Data Mining Demand-based assignment showed an improvement between 4% and 17% in the total number of aisles visited for the 133 most-ordered pairs of SKUs. This study tested scenarios with different number of clusters, including 2, 3, 5, 7 and 11.

76 76 Second, including the top 20 popular SKUs in the analysis did significantly reduce the improvement of the Data Mining assignment method. The results were analyzed using ANOVA and p value of confirmed that there is a significant difference between the improvements for Dataset1 when the top 20 SKUs are and are not included. Table 18 shows that the Data Mining assignment had positive improvement for the 33% (including top popular) of the scenarios tested. Table 19 shows that the Data Mining assignment (without including top popular) had positive improvement for the 92% of the scenarios tested. For Dataset2, Table 20 shows that the Data Mining assignment had positive improvement for the 50% (including top popular) of the scenarios tested. Table 21 shows that the Data Mining assignment (without including top popular) had positive improvement for the 42% of the scenarios tested. Third, the number of SKUs selected from DataSet1 and DataSet2 did influence the outcome of the analysis. Two scenarios were also tested to determine a population of SKUs to make assignments for. For Dataset1, Scenario #1 contains 133 SKUs (vij> 200 times) and Scenario #2 (vij> 200 times) includes 90 SKUs. The Data Mining Demandbased assignment performed between 4% and 15% better performance in Scenario #1, and between 6 and 17% in Scenario #2. For Dataset 2, the only scenario studied was vij>200 and contains 132 SKUs.

77 Benefits of the Methodology The implementation of the Data Mining method described in the present thesis has the following benefits for firms: The first benefit of applying this method is that the picking distance is reduced and consequently the company performance is improved. This paper has demonstrated that using clustering techniques, which combined centroid and distance calculation, can group similar SKUs in the same aisle, reduce the total picking distance and ultimately increase the bottom line of a firm. Another benefit of this methodology is that the final arrangement of SKUs in the forward area does not require a lot of investment for firms. Even though, human capital and material handling are needed to reallocate the SKUs within the warehouse, the total cost of warehouse operation is lower. An additional benefit is that this method can be implemented by a computer program. Hence, the algorithm can be run several times without the need of additional investments. Thus, firms can consider additional arrangements of the forward storage area to account for changes in demand such as seasonality of SKUs. 5.3 Future Research It has been observed during this study that the Data Mining Demand- based method generates better results when it is compared with traditional SKUs assignment methods such as Random or Demand-based. However, future research should study the impact of the initial order assignment of the SKUs. As demonstrated in this paper, results

78 78 changed significantly when the order of SKUs was modified before the first centroid assignment. This fact will require future analysis of the relationship of the SKUs. Another future study should further study the Rosenwein s weight to examine if additional benefits can be obtained. For instance the method of weight calculation can be modified as well as combining different pieces of both Data Mining Demand-based and Rosenwein s algorithms Future studies could study on how the seasonality could affect the outcome of this method. The arrangement of SKUs in the storage area can vary significant over time and companies must react quickly by reordering the SKUs in the forward area to decrease travel time. In the present study, the yearly historical of two different distribution centers were analyzed and processed. The real benefit from considering seasonality is that stronger relationships might be found if a shorter time period is considered. For instance, two SKUs might have a strong relationship from January to June, but if the data analyzed covers January to December, the strength of the relationship might not be apparent, since 6 months of a weak relationship are included in the analysis. Future analysis should also study the effect of batching orders in the Data Mining method. During this research, the calculation of the total number of aisles visited was performed with the assumption that all SKUs for one order are retrieved at a time. The consolidation of various orders before the picking process could add additional savings in the number of aisles visited to retrieve the SKUs of the group of orders that can lead to decrease the total cost of warehouse operation.

79 79 REFERENCES [1] Chen, M., Huang, C., Chen, K. and Wu, H. (2005) Aggregation of orders in distribution centers using data mining, Expert systems with applications, 28, [2] Microsoft Corporation. Data Mining Concepts. Retrieved April 25, 2006, from [3] De Koster, R., Le-Duc, T. and Roodbergen, K.J. (2006) Design and Control of Warehouse order Picking: a literature review, Erim Report Series Research in Management. [4] Frazelle, E.H. (2002) World-Class Warehousing and Material Handling, McGraw-Hill. [5] Bartholdi, J.J. and Hackman, S.T. (2007) Warehouse & Distribution Science, Available: [2008, May 15]. [6] Tompkins, J.A., White, J.A., Bozer, Y.A., Frazelle, E.H. and Tanchoco, J.M. (2003) Facilities Planning, John Wiley & Sons. [7] Hand, D.,Mannila, H., Smyth P., (2001) Principles of Data Mining, Cambridge, Mass. : MIT Press. [8] Jiawei, H. and Kamber, M. (2001) Data mining: concepts and techniques, Morgan Kaufmann Publishers. [9] Goetschalckx, M. and Ratliff, H.D. (1998) Order picking in an aisle, IIE Transactions, (20)1, [10] Establish Inc, Herbert W. Davis & Co. (2005) Logistics cost and service 2005, in Council of Supply Chain Management Professionals Conference. [11] Baker, P. (2006) Designing distribution centers for agile supply chains, (9)3, [12] Merriam W., On-Line Dictionary, Available at [13] Lambert, D.M., Stock, J.R. and Ellram, L.M. (1998) Fundamentals of Logistic Management, Boston: Irwin/McGraw-Hill. [14] Van den Berg, J. (1999) Literature survey on planning and control warehouse system, IEE Transactions, 31,

80 80 [15] Mohsen, H. (2002) A framework for the design of warehouse layout, Facilities, (20)13/14, [16] Larson, N., March, H. and Kusiak, A. (1996) A heuristic approach to warehouse layout with class-based storage, IIE Transactions, 29, [17] Hwang, H S., A performance evaluation model for order picking warehouse design, 35th International Conference on Computers and Industrial Engineering, [18] Hsieh, L. and Tsai, L., The optimum design of a warehouse system on order picking efficiency, International Journal of Advanced Manufacturing Technology, 28, [19] Rouwenhorst, B., Reuter, B., Stockrahm, V., van Houtum, G.J., Mantel, R.J. and Zijm, V.H.M. (2000) Warehouse design and control: Framework and literature review, European Journal of Operational Research, 122, [20] Van den Berg, J.P. and Zijm, W.H.M. (1999) Models for warehouse management, International Journal of Production Economics, 59, [21] Brynzer, H. and Johansson, M.I. (1996) Storage location assignment: Using the product structure to reduce order picking times, International Journal of Production Economics, (46)47, [22] Petersen, C.G. and Aase, G.R. (2004) Improving order-picking performance through the implementation of class-based storage, International Journal of Physical Distribution & Logistic Management, (34)7, [23] Petersen, C.G., Siu, C. and Heiser, D. (2005) Improving order-picking performance utilizing slotting and golden zone storage, International Journal of Physical Distribution & Logistic Management, (25)10, [24] De Koster, M.B.M., Van Der Poort, E.S. and Woltrers, M. (1999) Efficient orderbatching methods in warehouses, International Journal of Production, (37)7, [25] Elsayed, E.A., Lee, M.K. and Scherer, E. (1993) Sequencing and batching procedures for minimizing earliness and tardiness penalty of order retrievals, International Journal of Production Research, (31)3, [26] Gademann, N., Van den Berg, J. and Van der Hoff, H. (2001) An order batching algorithm for wave picking in a parallel-aisle warehouse, IIE Transactions, 33, [27] Gademann, N. and Van de Velde, S. (2005) Order batching to minimize the total travel time in a parallel-aisle warehouse, IIE Transactions, 37,

81 [28] De Koster, R. and Van Der Poort, E. (1998) Routing orderpickers in a warehouse: a comparison between optimal and heuristic solutions, IIE Transactions, 30, [29] Roodbergen, K.J. and de Koster, R. (2001) Routing order pickers in a warehouse with a middle aisle, European Journal of Operational Research, 133, [30] Petersen II, C.G. (1999) The impact of routing and storage policies on warehouse efficiency, International Journal of Operations and Product Management, (19)10, [31] Roodbergen, K.J. and de Koster, R. (2001) Routing methods for warehouses with multiple cross aisles, International Journal of Production Research, (39)9, [32] Hwang, H., OH, Y.H. and Lee, Y.K., (2004) An evaluation of routing policies for order-picking operations in low-level picker-to-part system, International Journal of Production Research, (42)28, [33] Goetschalckx, M. and Ratliff, H.D. (1998) Order Picking In an Aisle, IIE Transactions, (20)1, [34] Roodbergen, K.J. and De Koster, R. (2001) Routing order pickers in a warehouse with middle aisle, European Journal of Operational Research, 133, [35] Turban, E., Leidner, D., Mclean, E. and Wetherbe, J. (2006) Information Technology for Management, John Wiley & Sons. [36] Witten, I. and Frank, E. (2005) Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann Publishers. [37] Hwang, H., Baek, W. and Lee, M. (1988) Clustering algorithms for order picking in an automated storage and retrieval system, International Journal of Production Research, (26)2, [38] Kim, B., Heragu, S., Graves, R. and Onge, A. (2003) Clustering-based orderpicking sequence algorithm for an automated warehouse, International Journal of Production Research, (41)15, [39] Jane, C. and Laih, Y. (2005) A clustering algorithm for item assignment in a synchronized zone order picking system, European Journal of Operational Research, 166, [40] Wu, C. (2006) Applying frequent itemset mining to identify a small itemset that satisfies a large percentage of orders in a warehouse, Computers & Operations Research, 33,

82 82 [41] Liu, C. (1999) Clustering techniques for stock location and order-picking in a distribution center, Computers & Operations Research, 26, [42] Rosenwein, M.R. (1994) An application of cluster analysis to the problem of locating items within a warehouse, IIE Transactions, (26)1, [43] Ming-Huang Chiang D., Lin C. and Chen M. (2011) The adaptative approach for storage assignment by mining data of warehouse management system for distribution centers, Enterprise Information systems, (5)2,

83 83 APPENDICES A. DataSet1 Aisle Distribution for Pairs of SKUs greater than 200 and 5clusters SKU Data Mining SKU Demand AAGE AAGE AAGSK AAGSK ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC BICGSM11-BE 8 BICGSM11-BE 4 BICGSM11-BK 8 BICGSM11-BK 4 BICGSM11-RD 8 BICGSM11-RD 6 BICGSMG11-BE 1 BICGSMG11-BE 7 BICGSMG11-BK 1 BICGSMG11-BK 7 CNMBCI-3EC 9 CNMBCI-3EC 9 CNMBCI-3EM 9 CNMBCI-3EM 9 CNMBCI-3EY 9 CNMBCI-3EY 9 EPST EPST EPST EPST EPST EPST EPST EPST EPST EPST EVEE91FP12 1 EVEE91FP12 7 EVEE92FP12 2 EVEE92FP12 8 FRJFM20 9 FRJFM20 8 FRJKM20 9 FRJKM20 8 FRJSX7 9 FRJSX7 9 FRJSX9 9 FRJSX9 8 FRJTM20 9 FRJTM20 8 GEPFM20 2 GEPFM20 8 GEPKM20 2 GEPKM20 9 GEPTM20 2 GEPTM20 8

84 84 HEW51629A 2 HEW51629A 6 HEW51645A 2 HEW51645A 3 HEW51649A 2 HEW51649A 8 HEWC1823D 2 HEWC1823D 7 HEWC6578AN 2 HEWC6578AN 8 HEWC6578DN 2 HEWC6578DN 5 HEWC6615DN 8 HEWC6615DN 5 HEWC6625AN 8 HEWC6625AN 8 HPG HPG HPG HPG LEX12A LEX12A LEX LEX LEX LEX LEX15M LEX15M LEX17G LEX17G LEX17G LEX17G MMM62003/4X MMM62003/4X MMM651 2 MMM651 7 MMM652 3 MMM652 6 MMM6539-YW 3 MMM6539-YW 4 MMM653-YW 3 MMM653-YW 6 MMM6549-YW 3 MMM6549-YW 1 MMM654-YW 3 MMM654-YW 1 MMM6559-YW 3 MMM6559-YW 1 MMM655-YW 3 MMM655-YW 3 MMM6569-YW 3 MMM6569-YW 3 MMM MMM MMM MMM MMM MMM MMM MMM MMM MMM MMM810-3/4X MMM810-3/4X MMMC38-BK 3 MMMC38-BK 4 NAT NAT NAT NAT OIC OIC OIC OIC PAP PAP PAP PAP

85 85 PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PAP PIL PIL PIL PIL SAN SAN SAN SAN SAN SAN SHA SHA SHA SHA SHA SHA SMC SMC SMC SMC SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR

86 86 SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPR SPRHB210 7 SPRHB210 2 SPRSP111-1/3 7 SPRSP111-1/3 2 SPRSP411-1/3 7 SPRSP411-1/3 5 SPRSP52-1/3 7 SPRSP52-1/3 4 SPRSP52-1/5 7 SPRSP52-1/5 4 SPRW SPRW SPRW SPRW SUG SUG SUG SUG SWI SWI TOM TOM TOM TOM WLJ WLJ WLJ WLJ WLJ WLJ

87 87 Aisle1 Aisle2 Aisle3 Aisle4 Aisle5 Aisle6 Aisle7 Aisle8 Aisle9 DM DE DM DE DM DE DM DE DM DE DM DE DM DE DM DE DM DE AAGE EVEE92FP MMM OIC SAN SPR SPR ACC CNMBCI-3EC 9 9 AAGSK GEPFM MMM6539-YW 3 4 PAP SAN SPR SPR BICGSM11-BE 8 4 CNMBCI-3EM 9 9 ACC GEPKM MMM653-YW 3 6 PAP SAN SPR SPR BICGSM11-BK 8 4 CNMBCI-3EY 9 9 ACC GEPTM MMM6549-YW 3 1 PAP SHA SPR SPR BICGSM11-RD 8 6 FRJFM ACC HEW51629A 2 6 MMM654-YW 3 1 PAP SHA SPR SPR EPST FRJKM ACC HEW51645A 2 3 MMM6559-YW 3 1 PAP SHA SPR SPR EPST FRJSX7 9 9 ACC HEW51649A 2 8 MMM655-YW 3 3 PAP SMC SPR SPR HEWC6615DN 8 5 FRJSX9 9 8 ACC HEWC1823D 2 7 MMM6569-YW 3 3 PAP SMC SPR SPR HEWC6625AN 8 8 FRJTM ACC HEWC6578AN 2 8 MMM PAP SPR SPR SPR LEX12A LEX17G ACC HEWC6578DN 2 5 MMM PAP SPR SPR SPR LEX17G MMM810-3/4X ACC HPG MMM PAP SPR SPR SPR SPRW SPR BICGSMG11-BE 1 7 HPG MMM PAP SPR SPR SPR SPRW SPR BICGSMG11-BK 1 7 LEX MMM PAP SPR SPR SPRHB SUG WLJ EPST LEX MMMC38-BK 3 4 PAP SPR SPR SPRSP111-1/3 7 2 SUG WLJ EPST LEX15M NAT PAP SPR SPR SPRSP411-1/3 7 5 SWI WLJ EPST MMM62003/4X NAT PIL SPR SPR SPRSP52-1/3 7 4 TOM EVEE91FP MMM OIC PIL SPR SPR SPRSP52-1/5 7 4 TOM

88 88 B. DataSet2 Aisle Distribution for Pairs of SKUs greater than 200 and 5 Clusters SKU Data Mining SKU Demand S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S X61MA 7 37X61MA 4 37X62MA 7 37X62MA 5 37X66MA 3 37X66MA 7 37X87MA 4 37X87MA 7 37X88MA 7 37X88MA 4 37X89MA 4 37X89MA S S 6

89 S S S S R R S S S S S S S S D D D D D D 2

90 D D D D

91 S S

92 Aisle1 Aisle2 Aisle3 Aisle4 Aisle5 Aisle6 Aisle7 Aisle8 Aisle9 DM DE DM DE DM DE DM DE DM DE DM DE DM DE DM DE DM DE S S X87MA S S S S X89MA S X61MA S S S S S S X62MA S S S D X88MA R S D S D S S D S D S S S S S S S X66MA S

93 C. Minitab Analysis of Total Aisles Visited General Linear Model: Resultxx versus Clusters, vij, Method Factor Type Levels Values Clusters fixed 5 2, 3, 5, 7, 11 vij fixed 2 >200, >300 Method

93 93 C. Minitab Analysis of Total Aisles Visited General Linear Model: Resultxx versus Clusters, vij, Method Factor Type Levels Values Clusters fixed 5 2, 3, 5, 7, 11 vij fixed 2 >200, >300 Method fixed 3 DB, DM-DB, DM-RB Analysis of Variance for Resultxx, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Clusters vij Method Clusters*vij Clusters*Method vij*method Error Total S = R-Sq = 99.72% R-Sq(adj) = 98.97% Unusual Observations for Resultxx Obs Resultxx Fit SE Fit Residual St Resid R R R denotes an observation with a large standardized residual.

94 D. Minitab Analysis of Percentage Improvement General Linear Model: Improvements versus Clusters, vij, Method Factor Type Levels Values Clusters fixed 5 2, 3, 5, 7, 11 vij fixed 2 >200, >300

94 94 D. Minitab Analysis of Percentage Improvement General Linear Model: Improvements versus Clusters, vij, Method Factor Type Levels Values Clusters fixed 5 2, 3, 5, 7, 11 vij fixed 2 >200, >300 Method fixed 2 DMDB-DB, DMRB-DB Analysis of Variance for Improvements, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Clusters vij Method Clusters*vij Clusters*Method vij*method Error Total S = R-Sq = 88.17% R-Sq(adj) = 43.83%

95 E. Minitab ANOVA of Improvement DM vs DMDB and DMRB One-way ANOVA: DMDB-DB, DMRB-DB Source DF SS MS F P Factor 1 0.00090 0.00090 0.68 0.421 Error 18 0.02399 0.00133 Total 19 0.02490 S = 0.

95 95 E. Minitab ANOVA of Improvement DM vs DMDB and DMRB One-way ANOVA: DMDB-DB, DMRB-DB Source DF SS MS F P Factor Error Total S = R-Sq = 3.63% R-Sq(adj) = 0.00% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev DMDB-DB ( * ) DMRB-DB ( * ) Pooled StDev =

96 F. Minitab ANOVA of Improvement Popular vs. Non-Popular One-way ANOVA: Improvement_POPULAR, Improvement_NON_POPULAR Source DF SS MS F P Factor 1 0.07382 0.07382 8.12 0.009 Error 22 0.19996 0.

96 96 F. Minitab ANOVA of Improvement Popular vs. Non-Popular One-way ANOVA: Improvement_POPULAR, Improvement_NON_POPULAR Source DF SS MS F P Factor Error Total S = R-Sq = 26.96% R-Sq(adj) = 23.64% Level N Mean StDev Improvement_POPULAR Improvement_NON_POPULAR Individual 95% CIs For Mean Based on Pooled StDev Level Improvement_POPULAR ( * ) Improvement_NON_POPULAR ( * ) Pooled StDev =

97 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Thesis and Dissertation Services!