Market basket analysis with neural gas networks and self-organising maps Received (in revised form): 3rd March, 2003

Similar documents
2.36 Bridge Inspections. Introduction. Scope and Objective. Conclusions

THE ROYAL STATISTICAL SOCIETY 2009 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 8 SURVEY SAMPLING AND ESTIMATION

DECOMPOSING PURCHASE ELASTICITY WITH A DYNAMIC STRUCTURAL MODEL OF FLEXIBLE CONSUMPTION. Tat Chan. Chakravarthi Narasimhan.

Using Matrix to Solving the Probabilistic Inventory Models (Demand Model)

ASSESSMENT OF THE POWER CURVE FLATTENING METHOD: AN APPROACH TO SMART GRIDS

A Low-Temperature Creep Experiment Using Common Solder

The limits to profit-wage redistribution: Endogenous regime shifts in Kaleckian models of growth and distribution

Structural Change and Economic Dynamics

The Division of Labour under Uncertainty. Nigel Wadeson *

The Study on Identifying the Relationship between Opportunity Recognition and Sustainability in Small Business in Sri Lanka

Branding. Checklist. New / Small Business. Create A Beautiful Brand for your

Transportation Research Forum

Consumer Panic Buying and Quota Policy under Supply Disruptions

A Novel Smart Home Energy Management System: Cooperative Neighbourhood and Adaptive Renewable Energy Usage

Block Order Restrictions in Combinatorial Electric Energy Auctions

ROBUST SCHEDULING UNDER TIME-SENSITIVE ELECTRICITY PRICES FOR CONTINUOUS POWER- INTENSIVE PROCESSES

REDUCE PEAK-TIME ENERGY USE BY DEMAND BIDDING PROGRAM IN IRAN

Consumer price indices: provisional data December 2015

Power-Aware Task Scheduling for Dynamic Voltage Selection and Power Management for Multiprocessors

Efficient Resource Management using Advance Reservations for Heterogeneous Grids

Referrals in Search Markets

Richard Bolstein, George Mason University

Consumer prices: final data

Corporate Governance, Entrenched Labor, and Economic Growth. William R. Emmons and Frank A. Schmid

Computer Simulated Shopping Experiments for Analyzing Dynamic Purchasing Patterns: Validation and Guidelines

2.3 Creation of Crown Agencies and Borrowing without Authority

Equation Chapter 1 Section 1

2.37 Inland Fish and Game Licences. Introduction 1997 $ 1, , , , , ,102

Consumer price indices: final data

Chapter 2. Functions and Graphs. 03 Feb 2009 MATH 1314 College Algebra Ch.2 1

Consumer price indices: provisional data December 2016

Research on the Cost Curves and Strategies Related to the Carbon Emission Reduction in China

MANY ROADS TO TRAVEL: ALTERNATIVE APPROACHES TO ROUTE SELECTION FOR YUCCA MOUNTATION SHIPMENTS

UC Berkeley Research Reports

WSEAS TRANSACTIONS on POWER SYSTEMS

Consumer prices: final data November 2017

10. Design Optimization Overview

Evaluating adaptability of filtration technology to high-turbidity water purification

AUTHOR ACCEPTED MANUSCRIPT

Abrand choice model with heterogeneous price-threshold parameters is used to investigate a three-regime

Optimization of maintenance strategies and ROI analysis of CMS through RAM-LCC analysis. A wind energy sector case study.

Scaling Effects in Laser-Based Additive Manufacturing Processes

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Melt Pool Size Control in Thin-Walled and Bulky Parts via Process Maps

EFFECTIVE UTILIZATION OF FLYWHEEL ENERGY STORAGE (FES) FOR FREQUENCY REGULATION SERVICE PROVISION MIRAT TOKOMBAYEV THESIS

ON THE REINFORCED RELIABILITY OF FORWARD COLLISION WARNING SYSTEM WITH MACHINE LEARNING

Strategic Competition and Optimal Parallel Import Policy.

Københavns Universitet. A regional econometric sector model for Danish agriculture Jensen, Jørgen Dejgård; Andersen, Martin; Christensen, Knud

HOUSEHOLD SOLID WASTE RECYCLING INDUCED PRODUCTION VALUES AND EMPLOYMENT OPPORTUNITIES IN TAIWAN

Draft for Public Comment Australian/New Zealand Standard

Consumer prices: final data July 2017

technicalmonograph Natural ventilation strategies for refurbishment projects Can we avoid mechanical ventilation?

Measurement and Reporting of Vapor Phase Mercury Emissions from Low-Emitting Stationary Sources (DRAFT 9/25/08)

TECHNICAL NOTE. On Cold-Formed Steel Construction DESIGN CONSIDERATIONS FOR FLEXURAL AND LATERAL-TORSIONAL BRACING

Passing services without the need of routing the walls.

Consumer prices: provisional data April 2017

Sustainable transportation and order quantity: insights from multiobjective optimization Bouchery, Y.; Ghaffari, A.; Jemai, Z.; Fransoo, J.C.

Consumer prices: provisional data January 2017

Key policy recommendations and policy briefs

ANALYSIS OF TENSION MEMBERS

PHASE CHANGE MATERIALS

Citation for published version (APA): Riezebos, J. (2002). Time bucket size and lot-splitting approach. s.n.

On Activity-based Network Design Problems

Asymmetric Information and. Limited Information about Price Tourists and Natives. Informed and Uninformed Customers. Few informed customers 11/6/2009

Industry news : May 2007

Prime mover sizing for base-loaded combined heating and power systems

Improved Fuzzy Load Models by Clustering Techniques in Distribution Network Control

Optimal ordering quantities for substitutable deteriorating items under joint replenishment with cost of substitution

The impact of soda taxes on consumer welfare: implications of storability and taste heterogeneity

JEL codes: F10, F12, F14

M.Tech Scholer J.P.I.E.T, Meerut, Uttar Pradesh, India. Department of computer science J.P.I.E.T, Meerut, Uttar Pradesh, India

Optimum design of a CCHP system based on Economical, energy and environmental considerations using GA and PSO

In te current study, wind-induced torsional loads on low and medium eigt buildings were examined in te boundary layer wind tunnel. uilding model (scal

Personalized Pricing and Quality Differentiation on the Internet

Residential and public use. Permanently damp environments and direct contact with water. Do not use in pools.

Floor decoration. Residential and public use. Permanently damp environments and direct contact with water.

Protecting the Environment and the Poor:

The German value of time (VOT) and value of reliability (VOR) study The survey work

Outboard Engine Emissions: Modelling and Simulation of Underwater Propeller Velocity Profile using the CFD Code FLUENT

Consumer prices: final data

Effect Weibull Distribution Parameters Calculating Methods on Energy Output of a Wind Turbine: A Study Case

A NON-PARAMETRIC ESTIMATOR FOR RESERVE PRICES IN PROCUREMENT AUCTIONS

Biofuels Role in Mexico s Rural Development

Residential and public use. Permanently damp environments and direct contact with water. Outdoors, use pro-part inox. Do not use in pools.

DEPARTMENT OF ECONOMICS

Merger Efficiency and Welfare Implications of Buyer Power

Evaluating Irrigation Water Demand

Optimized geothermal binary power cycles

R-20F method: An approach for measuring the isolation effect of foams used fighting forest fires

PRICING AND SCHEDULING STRATEGIES FOR AIR CARGO CARRIERS: A NON-COOPERATIVE GAME APPROACH

Open Access The Current Situation and Development of Fire Resistance Design for Steel Structures in China

Energy storage in renewable-based residential energy hubs

ANALYSIS OF PLANNING, MANAGEMENT AND EXECUTION OF MOTOR VEHICLES REPAIR SERVICES FOR THE PURPOSES OF DEVELOPMENT OF AN OPERATIONAL PLANNING MODEL

BOD 5 removal kinetics and wastewater flow pattern of stabilization pond system in Birjand

A Study on Pendulum Seismic Isolators for High-Rise Buildings

Residential and public use. Permanently damp environments and direct contact with water. Outdoors, use pro-part inox. Do not use in pools.

Texto para Discussão. Série Economia

Poverty Effects of Higher Food Prices

MWRA Annual G1 Compliance Report for Calendar Year 2007

CAPILLARITY TESTS ON HISTORIC MORTAR SAMPLES EXTRACTED FROM SITE. METHODOLOGY AND COMPARED RESULTS

Transcription:

Market basket analysis wit neural gas networks and self-organising maps Received (in revised form): rd Marc, 00 Reinold Decker is Professor and Head of Marketing at te University of Bielefeld, Germany and a visiting professor in marketing at universities in Austria, Russia and Bulgaria He is autor and co-autor of numerous publications in journals, conference proceedings and compilations focussing on marketing researc and data analysis as well as te autor and co-editor of academic books He is a member of several academic societies and serves as a referee for different journals Katarina Monien received er diploma in matematics in 00 and is a lecturer in marketing at te Department of Economics and Business Administration, University of Bielefeld Her researc interest is te application of neural networks and macine learning in marketing Abstract Market basket analysis as been an elementary part of quantitative decision support in retail marketing for many years and it is regularly cited as a prime application area of data mining In tis paper two competitive neural network approaces are presented and discussed wit respect to teir suitability for purcase interdependence analysis on te product category level Particular attention is paid to te user-oriented representation or visualisation of cross-category dependences Bot approaces are applied to point of sales scanner data provided by a German retail cain to ceck ow far tey are able to uncover presumed purcase interdependences Katarina Monien Department of Economics and Business Administration, University of Bielefeld, PO Box 0 0, 0 Bielefeld, Germany Tel: 9 (0) 0 ; Fax: 9 (0) 0 ; e-mail: kmonien@ wiwiuni-bielefeldde INTRODUCTION Market basket analysis as not only been an important topic of traditional retail marketing for more tan years but as also gained an increasing relevance for electronic retailing According to Russell and Petersen market basket analysis focuses on te decision process in wic a consumer selects items from a given set of product categories on te same sopping trip Correspondingly, te term purcase interdependence is used in te following pages to refer to interrelations between different elements of a retail assortment resulting from purcases tat ave already been carried out Te analysis of market basket data as experienced a renaissance as a result of various publications on data mining and knowledge discovery in databases since 99 9 In tese papers market basket analysis is partly regarded as a typical field of application for data mining and te main empasis is put on association rule-based approaces In Decker and Scimmelpfennig 0 te traditional association coefficient-based approac (measuring purcase interdependence by means of cluster analysis and multidimensional scaling) is compared wit a rule-based approac reflecting on electronic retailing In te present paper bot a self-organising map and a neural Henry Stewart Publications 9- (00) Vol,, - Journal of Targeting, Measurement and Analysis for Marketing

Decker and Monien gas network approac are investigated wit respect to teir suitability for purcase interdependence analysis on te product category level Te practical benefit ofsuc approaces strongly depends on teir ability to uncover relevant interrelations in te assortment to be analysed and te extent to wic one succeeds in adequately documenting or visualising te same for decision support in retail marketing It may be tat existing asymmetric interdependences between individual product categories and te inevitable occurrence of random take away effects are special callenges in tis context At te same time purcase interdependence analysis is confronted wit a continuously growing database resulting from point of sales (POS) scanning All tese aspects ave to be kept in mind wen a new approac is discussed Te rest of te paper is organised as follows In te following section a description of te self-organising map as well as te less well-known neural gas network approac are presented as tools for purcase interdependence analysis Ten easy-to-interpret representations of purcase interdependences are proposed in te next section Finally, a reality-based applicationofbotapproacesis presented Te paper closes wit some conclusions and a sort outlook PURCHASE INTERDEPENDENCE ANALYSIS BY MEANS OF SELF-ORGANISING MAPS AND NEURAL GAS NETWORKS Bot approaces introduced for purcase interdependence analysis start wit two basic assumptions First, te measurement of purcase interdependence is meaningful only on te multicategory level Tis means tat all product categories of interest ave to be considered simultaneously Secondly, interrelations of te interesting kind usually lead to similar market basket patterns Two market baskets ave a similar pattern if tey are caracterised by a more or less identical combination of product categories To identify tose interrelations te autors adapted te neural gas network (NGN) approac introduced by Martinetz and Sculten and te well-known self-organising map (SOM) approac introduced by Koonen to te problem on and In bot cases te relevant patterns are represented by neurons (units) Te formal counterpart of suc a unit is a weigt vector Wen te training of te neural network is finised, tat is to say after te net weigts ave been fitted to te given data, te resulting final weigt vectors are called prototypes Before starting te metodological considerations some symbols ave to be introduced Let n be te number of interesting product categories and m te unrestricted number of individual market baskets to be analysed Ten an individual market basket can be defined by binary transaction vector t j (t j, t j,,t jn ) wit j {,,m}, were t jk if market basket j contains at least one item of product category k {,,n} andt jk 0oterwise Essentials of te SOM approac In most applications te units of an SOM are organised on a two dimensional grid were te individual positions reflect te interrelations between te respective units In te following, eac unit u ( {,, p}) is represented by a weigt vector ab ( ab,, abk,, abn ) were a and b refer to te position of te unit witin a rectangular grid and 0 abk olds true Journal of Targeting, Measurement and Analysis for Marketing Vol,, Henry Stewart Publications 9- (00)

Market basket analysis wit neural gas networks and self-organising maps Applying te SOM approac to POS scanner data in order to identify existing purcase interdependences means carrying out two different tasks simultaneously: finding an optimal set of prototypes representing similar market baskets and ensuring te optimal topological arrangement of tese prototypes In te resulting map similar weigt vectors (representing similar market basket patterns) are located close togeter How to interpret suc a map in detail wit respect to purcase interdependence will be sown in te empirical part of te paper Te net training process looks like tis: at te beginning, all weigts abk ave to be initialised in a suitable way In te following iterations l {,,l max } te distances between eac weigt vector ab and a randomly cosen input vector (market basket) t j are computed according to dist(t j, ab ) t j ab n k= (t jk abk ) () to determine te winning unit u ca c b wit minimum distance Ten eac weigt vector as to be updated as follows: ab (l ) ab (l) (l) nf ca c b (a,b) (t j ab ) were learning rate (l) (0)( l/l max ) is a decreasing function of iteration l Te extent of tis adaptation can be controlled via neigbourood function nf ca c b (a,b) exp (a c a) (b c b ) (l) were (l) (0)( l/l max ) determines te scope of te neigbourood kernel Te wole procedure is repeated until te maximum number of iterations l max or any oter aprioridefined stopping criterion is reaced Te minimisation of distances finally leads to te optimal prototype system { ca c b } To optimise te topological structure of te wole map te accumulated neigbourood distance dist( ca,c b, c a c b) ca c b c a c b (c a,c b ) (c a,c b ) (c a,c b) n (c a,c b ) (c a,c b) k= ( ca c b k c a c bk) () wit u c ~ a c N ~ b c a c b as to be minimised as well Neigbourood set N ca c b contains all units u c ~ a c tat are topological neigbours ~ b of te winning unit u ca c b witin te rectangular grid In te basic SOM approac te number of units p as to be fixed in advance wic can be restrictive in some cases To avoid tis disadvantage Alaakoon et al ave proposed a so-called growing self-organising map (GSOM) approac were bot te size and te sape of te network are determined dynamically during te training process Because of te fundamental nature of tis investigation tis aspect is not elaborated on at tis point Essentials of te NGN approac Again, eac unit u ( {,,s}) is represented by a weigt vector (,, k,, n ) [0;] n Te dimensionality of tese vectors is equal to te number of interesting product categories n In contrast to te SOM approac, tere is no aprioridefined grid to be fitted Te similarity of bot approaces also becomes apparent in te training process After initialisation te distance between eac weigt vector and a randomly cosen binary transaction vector t j is computed in accordance to equation In respect of tese distances all units u are arranged in suc a way tat te Henry Stewart Publications 9- (00) Vol,, - Journal of Targeting, Measurement and Analysis for Marketing

Decker and Monien winning unit wit weigt vector win takes rank r(t j, win ) 0, te co-winning unit takes rank and so on Tis ranking explicitly determines te strengt of te adaptation of te individual weigt vectors in eac iteration It is (l (l) nf (l) (r(t j, )) (t j ) wit neigbourood function nf (l) (r(t j, )) exp r(t j, ) (l) and learning rates and (l) Init End Init l l max (l) Init End Init l l maxrespectively Obviously, te numerical value of neigbourood function nf (l) (r(t j, )) decreases, oter tings being equal, wit an increasing rank of te respective unit Terefore, only tose weigt vectors tat are close to te input signal t j are canged significantly according to: (l ) (l) (l) Te basic NGN approac can be extended in different ways For example, following Martinetz and Sculten a data-driven topological structure can be added to sow te interrelations between units For tis purpose a connectivity matrix C (c ),,,s as to be introduced wit c, 0, if unit u and u are connected oterwise Eac time unit u becomes te winning unit indicator variable c is set to for te co-winning unit u Totakeinto account te strengt of te connection between two units u and u an additional controlling variable age is introduced Tis variable is set to 0 if c is set to Te ages of all te oter connections to winning unit u are raised by If te age of a connection exceeds te dynamically computed maximum age max (l) age Init age End age Init l l max tis connection is removed from te network Tresold age max (l) depends on pre-defined initial and final values age Init and age End as well as on iteration l Current extensions of te NGN approac focus on te speeding up of te training process or on te dynamic determination of te number of units to be included in te network Atukorale and Sugantan, for example, ave publised a so-called implicit ranking sceme Te basic idea of tis proposal is to redefine te neigbourood function using a kind of normalised distance q(t j, ) dist(t j, ) dist min dist max dist min instead of te original rank order to reduce training time Te maximum and minimum distances dist max and dist min between te current input signal t j and eac weigt vector ave to be defined adequately For te winning unit q(t j, win ) is equal to 0 Anoter modification suggested by te same autors is te so-called truncated update rule, were only tose weigt vectors are updated wit normalised distances smaller tan a dynamically computed tresold In an own sensitivity study it became apparent tat tere are only negligible differences in te running times of tese modifications, if s 0 Terefore, a detailed description was dispensed wit and a decision made in favour of te original NGN approac for te investigations Journal of Targeting, Measurement and Analysis for Marketing Vol,, Henry Stewart Publications 9- (00)

Market basket analysis wit neural gas networks and self-organising maps Table : Elements of te NGN output Unit Frequency Ranking Weigts u u s P(u ) P(u s ) PC PC k k PC k n PC s PC k k s PC k s n k k k n s sk sk s s skn Similarly to te GSOM approac, tere is also a dynamic extension of te NGN approac called growing neural gas network For a detailed description of tis see Fritzke 9 REPRESENTATION OF INTERRELATIONS As already empasised at te beginning of te paper, an essential aspect of purcase interdependence analysis is te adequate documentation or visualisation of te uncovered interrelations Te availability of compreensible representations (eg as an essential part of regular sales reporting) makes it easier to apply tis information to marketing decisions, for instance, wit respect to product placement and promotion pricing In tis section some procedures are proposed tat can be applied to bot te SOM and te NGN approac For demonstration purposes te focus is on te latter Table gives a general impression of te NGN output underlying te following considerations After aving finised te training process te weigts of eac unit u can be arranged in descending order: k k i k 0 Te corresponding n ranking of te individual product categories (abbreviated by PC) is displayed in te centre of te table wereas te sare of market baskets P(u (wit s = P(u ) ) wic fit prototype isgivenincolumn Frequency Starting from tis it is possible to visualise interesting interrelations bot on te product category and te market basket level by means of a grap Visualising purcase interdependences To visualise possible interdependences on te product category level P(PC ki u ): k,k i i (wit k i {,,n}) is interpreted as te probability of observing product category PC k inamarketbaskettatcorresponds i to prototype Ten tis probability is combined wit te observed frequency of eac prototype to get te compound probability P(PC ki u ) P(u ) P(PC k u i ),k i To decide weter two product categories ave to be connected in a relating grap because of teir interdependence a user-defined tresold d can be introduced Product categories k and k are connected if bot probabilities P(PC k )andp(pc k ) are greater tan d for at least one Alternatively, a suitable elbow criterion can be applied Figure sows wat suc a grap migt look like Product categories and are interdependent in te given sense but witout aving any oter meaningful interrelation, wereas product category, for example, is part of a muc more complex net In te empirical application sectionsucagrapwillbecreatedfrom real POS scanner data wit presumed interrelations between individual product categories Henry Stewart Publications 9- (00) Vol,, - Journal of Targeting, Measurement and Analysis for Marketing

Decker and Monien Figure Possible structure of a simple purcase interdependence grap Heterogeneity and asymmetry in market basket patterns Using connectivity matrix C agraptat represents existing similarities between individual prototypes is created Two units are neigbouring if tey are represented by similar weigt vectors Eac edge of te grap is weigted by te reciprocal age /age Te longer a connection as not been confirmed during network training te lower te weigt of tis edge If a continuously growing number of market baskets ave to be represented (eg as a result of daily POS scanning in retail stores) te current relevance of a connection can be expressed dynamically in tis way A ig number of edges wit comparatively low weigts point to distinctive differences (eterogeneity) wit respect to te underlying buying patterns Furtermore, using te formula of Bayes, asymmetries on te market basket level can be described Let P(u PC k ) P(PC k u ) P(PC k ) P(u ) P(PC k u ) () s = P(u ) P(PC k u ) be te probability of observing pattern (prototype) wen product category PC k is already an element of an arising market basket Te computation of tese probabilities can provide valuable ints about tose product categories wic induce te purcase of items of te remaining categories From a sales-oriented point of view tose product categories are of particular interest to te retail management wic induce market baskets containing profitable product categories wit ig probability Information of tis kind is useful primarily for periodical promotion planning Figure illustrates te general structure of an individual vertex of te grap Te given probabilities result from equation EMPIRICAL APPLICATION TO POS SCANNER DATA Data description To demonstrate teir general suitability bot neural network approaces were applied to POS scanner data collected by a German retail cain in te mid-990s For illustration purposes product categories from te cemist s assortment were selected Accompanying investigations of te available data ave Journal of Targeting, Measurement and Analysis for Marketing Vol,, Henry Stewart Publications 9- (00)

Market basket analysis wit neural gas networks and self-organising maps Figure General structure of vertices sown tat te following considerations can also be transferred witout restrictions to oter categories of products in everyday use In tis context, it sould be mentioned tat in tis implementation, using SAS Release 0, neiter te SOM nor te NGN approac is restricted wit respect to te maximum number of market baskets to beanalysedbut,ofcourse,anupper bound may result from te storage capacity of te employed ardware Te data referred to in Table result Table : Profile of te POS scanner data No Product category Occurrence 9 0 9 0 Sampoos Hair conditioners Hair lotions Tampons Sanitary napkins Cat food Rewards for cats/dogs Juices for babies Desserts for babies Vegetables for babies Cildrens food Cildrens menus Denture cleansing agents Denture fixer Sun protectors/blockers After sun lotions Saving soaps/creams Razor blades Slim/diet food Functional/ealt food Coug drops Cewing gum Heart and nerve tonics Eyesadow Lipsticks 0 0 0 9 9 0 0 from,09 market baskets Product categories wit possible purcase interdependences ave been put togeter in separate fields Eac market basket is coded wit a binary vector were indicates te occurrence and 0 te non-occurrence of te respective product category Te total number of items of a product category in a basket is not considered In doing tis it is possible to abstract from biasing stock-keeping effects Te transformation of te original data (containing, for example, information on price, time and date of purcase) into binary vectors was realised wit standard data management facilities of SAS Te product categories to, for example,canbeassumedtobe interrelated in te relevant sense because tey are at least partially complementary Te same, but in a somewat more pronounced way, seems to be valid for product category and Items of tese two groups can only be used jointly Finally, product categories and represent so-called random take away products, tat are often placed spontaneously in te basket at te ceckout A distinct and causally motivated relation to any oter product category listed in te table is not discernible Results received from SOM Cecking several possible alternatives an SOM layer was found to produce Henry Stewart Publications 9- (00) Vol,, - Journal of Targeting, Measurement and Analysis for Marketing 9

Decker and Monien a/b 0 90 9 0 9 9 9 0 0 0 Figure SOM output after l max 00,000 iterations te best results wit respect to eterogeneity (cf for tis equation ) et{ ca c b } m m j= dist (t j, ca c b ), and simplicity (cf equation ) simpl{ ca c b } (c a,c b ) dist ( ca c b, c a c b) 90 of te prototype system as well as te interpretability in content Te solution accompanied by te initial learning rate (0) 0 and (0) for te neigbourood kernel is sown in Figure Eac of te fields of te map represents one unit or prototype To make interpretations easier, owever, only tose product categories wit weigts greater tan 0 ave been displayed Tat is wy tree product categories,, and 9, do not appear in te map Regarding te first two product categories (none of tem ave weigts greater tan 0), tis is not too surprising because of te comparatively low frequency of occurrence (cf Table ) Product category 9, wit maximum weigt 0, owever, only narrowly misses its inclusion in te map at te expected place (row, column ) All in all, te different clouds in Figure conform to a great extent to te presumed interrelations Te air care products of categories, and, for instance, define a cloud were product category plays a seemingly special role Taking into account te basic function of te items of tis category (sampoos) witin te entire air care process tis seems to be quite plausible In te same way te product categories to define a fairly compact cloud in te upper rigt-and corner of te map Tis could be rated as a int at some stronger relations between tese product categories Te interrelation of product categories and, to mention anoter nice example, is also undoubtedly understandable Furtermore, te suggested type of representation is useful wit respect to te detection of so-called random take away products A simple and plausible indicator for tis penomenon is te extent to wic a product category scatters across te map In te present case tis seems to be valid for product category (coug drops) wic appears in several, but not necessarily neigbouring, units and wic is displayed togeter wit different product categories witin one unit For te oter 0 Journal of Targeting, Measurement and Analysis for Marketing Vol,, Henry Stewart Publications 9- (00)

Market basket analysis wit neural gas networks and self-organising maps Table : NGN output after l max 0,000 iterations P(u ) PC k PC k PC k PC k PC k k k k k k 9 0 00 0 009 0 00 00 0 009 00 00 9 0 0 9 09 000 000 0 09 0 000 0 09 000 09 0 0 0 000 00 0 0 0 0 09 0 0 0 0 0 0 0 00 0 0 00 0 0 0 0 00 00 00 000 009 00 00 0 0 009 00 009 00 000 presumed random take away product category (cewing gum) unfortunately no suc clear picture is obtained In a countermove tis seems to apply for product category Results received from NGN Table contains te output of te original NGN approac using 0 units Even tis parsimonious specification turned out to produce acceptable results wit respect to eterogeneity et{ } m m j= dist (t j, ) 090 and interpretability Te initial and final values of te learning rates ( Init 0, End 000, and Init 0, End 00) as well as tose of te age (age Init 0, age End 00) are similar to proposals made in Martinetz and Sculten, 0 Martinetz et al and Fritzke Because of space restrictions only five product categories at a time (starting wit te igest weigt) ave been displayed Obviously, te product categories for cildren are clearly dominating prototype andare strongly interdependent Even product category at rank as a weigt greater tan 0 But tere are also some oter very plausible purcase interdependences indicated by comparatively ig weigts witin one prototype Tis applies for example to teaircareproductcategories,and tat are dominating prototype 9 Attesametimeitemergestatproduct category (sampoos) seemingly contains random take away products because of its co-occurrence in several oter market basket prototypes Anoter very nice interrelation is uncovered by prototype were te dental care products appear togeter wit cat food and rewards for cats and dogs Obviously, tese products for small pets are predominantly bougt by an older clientele Insigts of tis kind can be used excellently for customer oriented sales promotions Finally, prototype contains face and air care products typically bougt by female consumers All in all te NGN results are very similar to tose produced by te SOM Te NGN approac proves, owever, to be more parsimonious bot regarding training time and te required number of units In tis respect, at least in tis investigation, te NGN approac sligtly dominates te SOM approac Neverteless, a final assessment needs more comparisons on different data sets Taking into account tat it makes a great difference weter a prototype represents a frequent or a scarce market basket pattern te compound probabilities P(PC u k i )canbe computed using te available weigts Henry Stewart Publications 9- (00) Vol,, - Journal of Targeting, Measurement and Analysis for Marketing

Decker and Monien 0 9 Figure Interdependences on te product category level and frequencies Additionally, defining a tresold d 00, for example, finally leads to te grapical representation of purcase interdependences on te product category level depicted in Figure Te present visualisation of purcase interdependences is remarkable in two respects First, te individual subgraps reflect, as expected, te most important interrelations from a data analytical point of view Te starlike subgrap on te left-and side, for instance, confirms once again te special role of product category wic as already been mentioned and wic results from te at least partly existing random take away effects Secondly, grapical representations like tis are an elegant way of enabling visualisation of bot direct and indirect interdependences Product categories and, for example, are directly interdependent But tere is also an indirect relation to product category and via product category In contrast to tis te apparently strongly interdependent product categories and seem to occupy a solitary position Tis point will be considered later Figure sows te result of an application of te Bayes approac (cf equation ) to te given data To simplify te grap only tose connections (and, as a result, prototypes) ave been depicted tat exceed an aprioridefined probability to appear Te probability of an edge connecting unit u and unit u is equal to te number of iterations were tese units were te winning and te co-winning unit divided by te total numbers of iterations Fixing te tresold for tis probability to /, for instance, results in te grap on and Unit u and u are missing because of teir weak connections to te oter units in te present sense Eac edge of te grap is additionally weigted by its reciprocal age Te small weigt of te edge connecting unit u and u (/), for example, indicates tat te seeming similarity of bot prototypes could not be confirmed for a longer time Te opposite olds for te edge connecting unit u 9 and u 0 Obviously, te latest input signal (transaction vector) as confirmed te common ground of bot prototypes Te reader migt be Journal of Targeting, Measurement and Analysis for Marketing Vol,, Henry Stewart Publications 9- (00)

Market basket analysis wit neural gas networks and self-organising maps 0 09 009 009 0 u 0 00 00 00 000 u 0 00 0 00 00 0 00 0 0099 0 0 00 00 00 000 u 0 0 0 09 0 00 u 9 00 0 0 0 0 0 0 0099 00 00 9 0 0 00 09 0 0 u 0 09 00 0 00 u 0 00 00 00 00 0 0 0 0 0099 00 0 0 00 00 u 00 009 0 00 00 u 00 00 00 00 0099 0 0 09 0 00 Figure Interdependences on te market basket level astonised about te connection of unit u and u altoug bot seem to be represented by different prototypes In fact, tis edge is primarily determined by product categories ranked to, wic are not displayed Terefore, if te NGN is continuously (eg daily) adapted to new POS scanner data te canges of weigts in te course of time provide valuable ints at an emerging alignment or differentiation of buying patterns A corresponding callenge to future researc migt be te development of a measurement framework to monitor dynamically movements in te observed purcase interdependences Additionally, it would be possible to focus on te respective influence of canging sales promotion activities According to Figure eac vertex of te grap in Figure contains te rank order of product categories (first row), te Bayesian probabilities P(u PC ) (second k i row), and te product purcase probabilities P(PC k ) (tird row) Looking i at te last two rows some interesting asymmetries are detectable For instance, in unit u 9, te probability of observing te respective market basket pattern is greater wit product category on and instead of product category (0 versus 0) Te contrary is valid for te probabilities of buying items of tese two product categories (0 versus 0) An example of a more or less symmetric relation is given by product categories 9 and (0 versus 0) in unit u But te fact tat items of product category 9 appear in a market basket wit a iger probability (0 versus 00) makes tis one more interesting for promotional activities Henry Stewart Publications 9- (00) Vol,, - Journal of Targeting, Measurement and Analysis for Marketing

Decker and Monien 0 0 9 9 Figure MDS representation of te NGN results It is necessary to point out tat asymmetries of te present kind are only valid for individual prototypes Product categories and, for example, are caracterised by an extremely asymmetric relation wit respect to prototype, wereas te same relation looks nearly symmetric for prototype 0 Tis is caused by te fact tat product category (togeter wit ) dominates te profile of prototype Information about tose dominations can be used to force cross-sellings witin te assortment under consideration Last but not least, te Bayesian probabilities can be used to generate rules like tis: Product category k determines te occurrence of market basket type Transforming te conditional probabilities into verbal rules eases te communication between te analyst and te decision maker Te interestingnatureofsucarulecanbe determined using te lift, a measure tat is well-known from data mining lift(pc k u ) conf (PC k u ) sup (u ) P(u PC k ) P(u ) If only product categories wit a lift greater tan 0 are considered anoter grap will result wic is very similar to tat depicted in Figure Te corresponding probabilities are in bold face in Figure Tis time product categoriesandaswellasand would be connected witin te concerning subgrap In tis way te abovementioned interesting nature of tis interrelation is confirmed from a metodological point of view as well Visualising purcase interdependences by means of NGN-based multidimensional scaling To be able to compare te results of te previous subsection to traditional approaces of purcase interdependence analysis, for instance to tose starting from association coefficients like Tanimoto, it seems to be elpful to carry out a simple transformation of te NGN output In te present case product categories k and k are assumed to be interrelated if tey ave similar probabilities P(PC k u )andp(pc k u ) wit respect to all prototypes Analysing te corresponding similarities Journal of Targeting, Measurement and Analysis for Marketing Vol,, Henry Stewart Publications 9- (00)

Market basket analysis wit neural gas networks and self-organising maps by means of multidimensional scaling (MDS) finally leads to Figure Because of te ig conformity of tis representation (produced wit SAS PROC MDS) wit te assumptions above a more intensive interpretation is not necessary CONCLUSIONS AND OUTLOOK Tis paper is concerned wit te presentation and discussion of two alternative neural network approaces for purcase interdependence analysis Wit te empirical investigation it could be sown tat bot approaces are powerful tools providing outputs tat can be processed in different ways to extract information Bot can isolate random take away effects to a certain degree and can be extended regarding te detection of asymmetries at te market basket level An important advantage of bot approaces is te absence of a metodologically motivated restriction of te maximum number of market baskets to be analysed Te adaptability of NGN makes tis approac a useful instrument for dynamic POS scanner data analysis Considering te fact tat, at least for te data ere te NGN approac is superior to te SOM approac wit respect to bot te training time and te required number of weigts te former is wort a more toroug investigation in te present context On te oter and, in contrast to NGN, implementations of te basic SOM metodology are available in several commercial or academic tools for data analysis wic makes its application significantly easier Future researc sould concentrate on te development of meaningful quality measures for application-oriented market basket analysis and te identification of possible differences between weekly sopping baskets and tose of top-up sopping trips Beyond tis te reliable and compreensive isolation of random take away effects still requires considerably greater effort Te autors are preparing a furter applicationofbotapproacestoalarge data set provided by anoter retail cain once again concerning everyday products, but different from te cemist s assortment Tose wo are interested in te results (wic are sceduled to be available in summer 00) are invited to write to te autors Acknowledgment Te autors would like to tank two anonymous reviewers for teir elpful ints concerning an earlier draft of te paper References Böcker, F (9) Die Bestimmung der Kaufverbundeneit von Produkten, Duncker& Humblot, Berlin Hruscka, H (99) Bestimmung der Kaufverbundeneit mit Hilfe eines probabilistiscen Me modells, Zeitscrift für betriebswirtscaftlice Forscung, Vol,No,pp Merkle, E (9) Die Erfassung und Nutzung von Informationen über den Sortimentsverbund in Handelsbetrieben, Duncker & Humblot, Berlin Hao,MC,Dayal,U,Hsu,M,Sprenger,Tand Gross, M H (00) Visualization of directed associations in e-commerce transaction data, Hewlett Packard Researc Laboratories, Palo Alto Russell, G J and Petersen, A (000) Analysis of cross category dependence in market basket selection, Journal of Retailing, Vol,No,pp 9 Agrawal, R, Imielinski, T and Swami, A (99) Mining association rules between sets of items in large databases, in Proceedings of te 99 ACM SIGMOD International Conference on Management of Data, Wasington, pp 0 Brin,S,Motwani,R,Ullman,JDandTsur,S (99) Dynamic itemset counting and implication rules for market basket data, in Proceedings of te 99 ACM SIGMOD International Conference on Management of Data, Tuscon, pp Hu, Z, Cin, W-N and Takeici, M (000) Calculating a new data mining algoritm for market basket analysis, in Pontelli, E and Santos Costa, V (eds) Practical aspects of declarative languages, Lecture Notes in Computer Science, No, Springer, Berlin, pp 9 9Haoet al (00) op cit 0 Decker, R and Scimmelpfennig, H (00) Henry Stewart Publications 9- (00) Vol,, - Journal of Targeting, Measurement and Analysis for Marketing

Decker and Monien Alternative Ansätze zur datengestützten Verbundmessung im Electronic Retailing, in Alert, D,Olbric,RandScröder, H (eds) Jarbuc Handelsmanagement 00 Electronic Retailing, Deutscer Facverlag, Frankfurt, pp 9 Scmalen, H, Pectl, H and Scweitzer, W (99) Sonderangebotspolitik im Lebensmittel-Einzelandel, Scäffer-Poescel, Stuttgart Martinetz, T and Sculten, K (99) A neural gas network learns topologies, in Koonen, T, Mäkisara,K,Simula,OandKangas,J(eds) Artificial neural networks, Nort Holland, Amsterdam, pp 9 0 Koonen, T (9) Self-organized formation of topologically correct feature maps, Biological Cybernetics, Vol,pp9 9 Koonen, T (00) Self-organizing maps, rdedn, Springer, Berlin Alaakoon, D, Halgamuge, S K and Srinivasan, B (000) Dynamic self-organizing maps wit controlled growt for knowledge discovery, IEEE Transactions on Neural Networks, Vol,No,pp 0 Martinetz and Sculten (99) op cit Ibid Atukorale, A and Sugantan, N (000) Hierarcical overlapped neural-gas network wit application to pattern classification, Neurocomputing, Vol, No, pp 9 Fritzke, B (99) A growing neural gas network learns topologies, in Tesauro,G,Touretzky,DS and Leen,TK(eds) Advances in neural information processing systems, MIT Press, Cambridge, pp 0 Martinetz and Sculten (99) op cit Martinetz,T,Berkovic,SGandSculten,K (99) Neural-gas network for vector quantization and its application to time-series prediction, IEEE Transactions on Neural Networks, Vol, No, pp 9 Fritzke (99) op cit Pedrycz, W (00) Granular computing in data mining, in Kandel, A, Last, M and Bunke, H (eds) Data mining and computational intelligence, Pysica, Heidelberg, pp Brin et al (99) op cit Merkle (9) op cit Journal of Targeting, Measurement and Analysis for Marketing Vol,, Henry Stewart Publications 9- (00)