On the Use of Optimization for Data Mining: Theoretical Interactions and ccrm Opportunities

Size: px
Start display at page:

Download "On the Use of Optimization for Data Mining: Theoretical Interactions and ccrm Opportunities"

Transcription

1 On the Use of Optimization for Data Mining: Theoretical Interactions and ccrm Opportunities Balaji Padmanabhan The WhartonSchool University of Pennsylvania Alexander Tuzhilin Stern School of Business New York University

2 Online Appendix A: How Data Mining Can Help Optimization In domains with a lot of data, optimization approaches can, but often do not, leverage the data in the best possible manner. For example, Caixeta-Filho et al. (2002) describe optimizing planting schedules of bulbs to maximize profits for a flower grower in Brazil. Parameters of this optimization problem involve sales forecasts which are assumed to be generated by the producer when, in fact, historical flower sales data are available. Past data on consumer purchases and shopping patterns can be used to constrain the search, optimizing the selection of products to recommend to a customer. In this section we study how DM can be useful for formulating and solving optimization problems in domains where large volumes of data are available. An optimization problem usually consists of the following specifications of (1) the space over which the optimization is done; (2) the objective function; and (3) constraints imposed on the space. In this section, we will describe how DM can help in each of these steps. How Data Mining Can be Used for the Specification of the Optimization Space DM can help in specification of the space by helping create new variables or select subsets of existing variables (or features) over which the optimization is done. A good example of this combination is the work in Brijs et al. (1999), which addressed the problem of how to select the ideal product assortment for a retail store. Brijs et al. (1999) proposes a method that uses association rule discovery methods in DM to discover frequently purchased sets of products and then formulates an integer program to pick the sets (and hence indirectly the products) that maximize profits. Hence DM here is first done over the sales transactions to find all itemsets, such as {muffins, coffee}, that occur above a chosen threshold number of times. The heuristic here is that optimizing product selection over these itemsets can capture cross-selling opportunities. 36

3 More specifically, assume that i=1 to n represents n products. A sales transaction j is a set of products purchased T j and the number of units f ij of each product i that was purchased in the transaction. The margin generated by transaction j is computed as m j = ( SP i PPi )* f i Tj ij where SP i and PP i are the selling and purchase prices of product i and f ij is the number of units of the product i purchased in the transaction j. Given a set of products, X, the gross margin for X is: M(X) = D M j j 1 where M j =m j if X = T j and 0 otherwise, where D is the number of transactions. Observe that this definition only adds margins from transactions that are exactly the same as X. Given constraints that require some products to be a part of the assortment and the limits on the total number of products, we can formulate an optimization problem over the sets of frequent items discovered from DM as: Max Z = # offrequent sets p 1 M( X p )* q p # products i 1 c * r i i where c i is used to represent the inventory handling cost of product i, q p and r i are binary and r i is 1 if product i exists in some selected (q n =1) itemset X n. In this example, the space was reduced from all possible combinations of products to a smaller subset of itemsets generated by a DM method. It is interesting to note that just as a retailer has space restrictions, in ecrm users have cognitive restrictions that require limiting the content presented in a page. Therefore, the problem of selecting an optimal set of links to place on a Web page is similar to the retail problem discussed above and may be formulated in a similar manner. DM can also be used to reduce the search space by simplifying an otherwise intractable optimization problem. Clustering techniques in DM are methods to group similar data records, and there are several clustering techniques proposed in the DM literature that deal with categorical and 37

4 numeric attributes. For example, consider a direct marketing application of mailing catalogs to customers developed by Campbell et al. (2001). The problem addressed by Campbell et al. (2001) is to determine the optimal mail streams (sequence of catalogs) that each customer receives. The specific application consisted of 7 million customers and 40 catalogs and the individual level problem has 280 million decision variables without considering the order in which the mailings need to be done. By clustering the customers into similar groups based on expected profitability and other customer characteristics obtained from the historical data, the number of clusters generated from 7 million points was 2000, and this corresponds to only 80,000 decision variables. How Data Mining Can be Used for the Specification of Objective Function Jan de Wit Co. is Brazil s largest lily farmer. Each year the company plants over 3.5 million bulbs of over 50 varieties of lilies in a greenhouse of fixed capacity of 20,000 square meters. The company must determine how many and what type of bulbs to plant in the greenhouse each week in order to maximize its profits over the year. An LP solution to this problem is presented in Caixeta- Filho et al. (2002). However specification of the objective function requires revenue estimations for various types of flowers during the year, and it is assumed that these forecasts are provided by the producer. More generally, the objective function usually has some parameters that need to be estimated. In domains where prior data can be used to build predictive models for these parameters, DM can be useful for generating accurate estimates. For Jan de Wit, historical sales data can be used to build predictive models of sales using DM. How Data Mining Can be Used for the Specification of Constraints Consider the problem of placing advertisements on Web pages such as done by DoubleClick. This can be formulated as an optimization problem as pointed out by Geoffrion and Krishnan (2001). Many of the constraints for this problem are about whom to show what types of advertisements. 38

5 Determining constraints for this task is a multi-dimensional non-linear problem for DoubleClick. More generally, content management in ecrm is an important optimization problem that determines the best content to deliver to specific customers (e.g., what three best wines to show to a particular customer at wine.com). Although this is an optimization problem, leading CRM vendors do not formulate it as such and instead use ad-hoc heuristics to determine what content to show. However, they still use constraints in finding solutions. For example, Broadvision lets the domain experts explicitly specify the types of content that should be delivered to certain types of users (e.g., users from.com domains during the day should not be shown advertisements with audio). In contrast to this, Blue Martini learns these constraints from data using DM. Before using these discovered constraints to guide content delivery, the constraints are shown to the experts who make decisions to use them or not. The applications of Web site design and recommender systems discussed in Section 2.3 give additional examples of using DM to learn constraints for ecrm problems. In general, any DM technique that discovers patterns may help in choosing constraints, since the discovered patterns can be examined by domain experts to decide if it makes sense to treat them as constraints. A standard approach would be to use decision tree or association rule induction techniques to discover rules from data and then examine the discovered rules manually. However, discovered rules are probabilistic, while constraints in an optimization procedure are usually deterministic. Hence it may be useful to have DM techniques focus on a few strongest rules that have few exceptions, and let the domain expert decide to use them as constraints or not. An alternative problem that has close parallels with discovering constraints is the problem of learning functional dependencies (FD) from data. An FD states that the value of a variable is uniquely determined by the values of some other variable(s). For example, the SKU uniquely determines the product name, or the zip code uniquely determines the city name. Huhtala et al. 39

6 (1999) describe an algorithm that learns FDs and approximate FDs from data, where an approximate FD is one that can have a few exceptions in data (e.g., gender is approximately determined by the first name). Rules discovered from approaches listed above may be especially useful to consider as constraints given their high strengths. Data mining can also be used to modify existing constraints. For example, starting from user-specified rules, Padmanabhan and Tuzhilin (1998) show that DM can be used to discover contradictions to the rules. These contradictions help refine previously known rules and, therefore, specify more elaborate constraints. There is also a large body of work (see SIGKDD Explorations (2002) for a recent collection) in DM that discovers patterns consistent with user-specified constraints. These methods can also be used to discover rules that help refine user-specified constraints based on data. Similar to the discussion above, there are optimization problems in which the constraints have parameters that need to be estimated. For example, Sery et al. (2001) describes an optimization-based solution to BASF s problem of minimizing distribution costs of its goods and improving customer service. In the mid 1990s BASF s North American operations shipped 1.6 billion pounds of goods to customers from a network of 135 locations, and needed to minimize distribution costs while satisfying demand and meeting the required product delivery service times. A constraint in this formulation is that the demand at each customer location is fully met. The actual value of the demand at each location is a parameter in the constraint and needs to be estimated before the optimal solution can be found. Predictive DM methods can be used to estimate these parameters as mentioned previously. There are several applications where optimization and DM approaches are used together for solving practical business problems. These applications are often characterized by an objective of finding an optimal solution to a problem that also deals with large volumes of data. This section 40

7 previously reviewed some examples of such applications, such as Brazilian lily farmer Jan de Wit and the Promocast system,. In addition to ecrm, other applications satisfying this property include fraud detection (Chan et al. 1999, Fawcett and Provost 1997, Lee et al. 2000) focusing on the identification of most likely or highest payoff fraudulent activities, inventory planning and shelf space allocation (Corstjens and Doyle 1981, Yang 2001, Zufryden 1986) in supermarkets, focusing on optimal sets of products to stock and sell and direct marketing applications focusing on selecting the optimal set of customers to target a product to. These are a rich set of domains where the interactions between DM and optimization can yield significant benefits, but have not been explored in sufficient detail and remain interesting opportunities for research. Online Appendix B: Outline of an Optimization Approach to Web Site Design Opportunities exist for framing Web site design problems as optimization problems. For example, an objective of Web site design can be to maximize navigational simplicity by minimizing the average number of clicks required to get to any page. This problem can be formulated in two separate ways: first, minimize the average number of clicks for the entire population; second, do this for each user separately (as, for example, can be done for my.yahoo pages). In particular, for the single user problem, this problem can be formulated as a combination of optimization and DM problems. The first part is a graph optimization problem that can be formulated as follows. Assume there exists n Web pages p 1, p 2,,p n that need to be organized into a Web site by connecting them through links, in order to minimize the expected number of clicks to get to any arbitrary page from a pre-determined home page, p h, for a particular user. In an unconstrained version, this optimization 41

8 problem is straightforward and all pages can be connected to the home page. In practice, site design has several constraints specifying how pages should or cannot be linked together, such as site designer constraints limiting maximum number of links that can be placed within any page parent-child constraints, e.g., a department home page should be a child of the school home page sibling constraints: if you have a link to a given page a, you should also have a link to a page b. Let i p be the importance (or weight) of the page p and let path i (p h, p) be a path taken in the graph from the home page to page p. Let paths(p h, p) be the set of all paths path i (p h, p). Let prob(path i (p h, p)) be the probability of getting to page p from home page p h using that path and let path i (p h, p) denote the length of the path. The expected value of the length of the path to p from the home page, E( path(p h,p) ) can be computed based on prob(path i (p h, p)) taken over all possible paths from p h to p in a given graph. The objective function can then be defined as: Across all possible graphs of n pages, minimize p (i p *E( path(p h,p) )) subject to constraints of the types listed above. There are two caveats to this problem. The first is that computation of the probabilities prob(path i (p h, p)) is not easy. The second is that presence of cycles in the graph needs to be incorporated appropriately. These caveats constitute some opportunities for future research in this area. A problem with the above approach is that it is hard to provide user-specific constraints apriori (in many cases we may not even know these constraints before looking at the data). DM can help to address this problem. First, the Web site is structured as per the solution obtained to the optimization problem. As the user navigates this site, data is gathered on the user s access patterns. DM can be used to identify patterns that can be used as additional constraints. For example, DM can find that most users access the finance and sports pages together and this can then be used as a sibling constraint. The optimization problem is then solved incorporating the additional constraints and this process can be iteratively done until convergence. 42

9 This is an example of one optimization problem. The opportunity for the research is to identify similar problems that can be formulated in the context of Web site design. Note that in this mode, problems in Web site design will be mainly formulated as optimization problems (e.g., select the optimal set of links to place in a page delivered to a customer). However, DM will play a key part in the specification of the optimization space (e.g., which subsets of links should be in the consideration set may be discovered using DM) and will also play a part in the specification of the constraints (e.g., DM can be used to discover that for users from a certain domain, links a and b will have to be a part of the dynamically composed page). References used in the Online Appendix Brijs, T., Swinnen, G., Vanhoof, K., Wets, G Using Association Rules for Product Assortment Decisions: A Case Study. In Procs. of ACM Int l Conf. On Know. Discovery and Data Mining, Caixeta-Filho, J.V., Swaay-Neto, J.M., Wagemaker, A.P Optimization of the Production Planning and Trade of Lily Flowers at Jan de Wit Company. Interfaces. 32(1). Campbell, D., Erdahl, R., Johnson, D., Bibelnieks, E., Haydock, M.,Bullock, M., Crowder, H Optimizing Customer Mail Streams at Fingerhut. Interfaces. 31(1). Chan, P., Fan, W., Prodromidis, A., Stolfo, S Distributed data mining in credit card fraud detection. IEEE Intelligent Systems. 14(6) Corstjens, M., Doyle, P A Model for Optimizing Retail Space Allocations. Mgmt. Science. 27(7) Fawcett, T., Provost, F Adaptive Fraud Detection. Data Mining & Knowl. Discovery. 1(3) Geoffrion, A.M., Krishnan, R Prospects for Operations Research in the E-Business Era, Interfaces, March-April. Huhtala, Y., Karkkainen, J., Porkka, P., Toivonen, H TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies. The Computer Journal. 42(2). Lee, W., Stolfo, S., Mok, K Adaptive Intrusion Detection: A Data Mining Approach. Artificial Intelligence Review. 14(6) Padmanabhan, B., Tuzhilin, A A Belief-Driven Method for Discovering Unexpected Patterns. In Procs. of ACM Int l Conf. On Knowledge Discovery and Data Mining (ACM SIGKDD). Sery, S., Presty, V., Shobrys, D.E Optimization Models for Restructuring BASF North America's Distribution System. Interfaces. 31(3). 43

10 SIGKDD Explorations Special Issue on Constraints in Data Mining (1) June Yang, M An Efficient Algorithm to Allocate Shelf Space. European J. of Operations Resch Zufryden, F.S A Dynamic Programming Approach for Product Selection and Supermarket Shelf-Space Allocation. Journal of OR Society. 37(4)