RFM-BASED E-MARKETS SEGMENTATION USING SELF- ORGANIZING MAPS

Size: px
Start display at page:

Download "RFM-BASED E-MARKETS SEGMENTATION USING SELF- ORGANIZING MAPS"

Transcription

1 Pinnacle Research Journals 86 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) RFM-BASED E-MARKETS SEGMENTATION USING SELF- ORGANIZING MAPS BAHRAM IZADI*, ATIYE SABAGHINIA** * Department of Management, Faculty of Administrative Sciences and Economics, University of Isfahan, Iran ** MS Student of Business Administration- Marketing, Sheikh Bahaei University, Isfahan, Iran ABSTRACT As companies have put the mass marketing aside and turned to direct marketing, precise and correct segmentation and prioritizing the market segments using the appropriate tools are become significant issues in field of marketing. The purpose of this paper is to segment E- markets based on three variables of recency, frequency and monetary values, using Neural Network clustering method. To achieve this purpose, ADSL customers of an E-company are clustered using the method of self-organizing map (SOM) which is one of the well-known clustering method in data mining. KEYWORDS: market segmentation, data mining, RFM model, self-organizing maps INTRODUCTION In the past, managers believed in the concept of mass marketing and the debate was over the creation of potentially large markets, which lead to less cost and more income. Todays, many companies are moved from mass marketing to smaller groups of buyers with specified needs and behavioral characteristics that require individual products with marketing mix (Schejter&et al, 200). Today, companies have realized that they can not be attractive to all customers, or at least all they can not absorb all into a form. The numbers of purchasers have been too high and are geographically widespread, and they also have different needs and demands and they have different shopping experiences. In addition, companies have very different abilities in different sectors of the market (Kotler& Armstrong, 20).So Market segmentation is a process by which a distinct market segment of customer s needs and characteristics are divided equally. (Walker& et al,2005). companies are not only looking to sell goods, rather, they are trying to create and keep profitable customers. Companies can identify customers by segmenting their market s customers into different groups based on specific criteria. Market Segmentation involves a wide variety of ways and methods, which are divided into two main groups: the first group are approaches that

2 Pinnacle Research Journals 87 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) segments are selected based on known features from a known population, the second group are E-hoc methods which are empirical researches and are related using multi-variable analysis identified in each sector (Hanafizadeh&Mirzazadeh, 20). It can be said that in the past, segmentation was based on subjective methods which were based on the researcher s perception. In these old methods, number and type of segments were predetermined and were grouped customers subjectively on the basis of predetermined variables and thus there were not consistent relationship between selected segments based on speculation. But today, most companies are relying on use of data with new market segmentation methods by following customerorientation. They do not only study sections features, but also they pay attention to size and profitability of market segments in order to the efficient segmentation. In line with this, they use an appropriate method which is simply understandable and far from speculation. The new segmentation helps identifying valuable and more profitable segments of market, better and more accurate. In the current situation according to shifting towards customer-orientation, segmenting markets to identify valuable customers in order to keep and attract customers, is essential. Understanding and separation of customer by their needs and the used marketing mix has a vital role in marketing (Liu &et al, 202) and in line with the new segmentation, precise determination of the relevant market is very important and customer segmentation by new well known methods helps that. Today, markets segmentation is essential to identify valuable customers and keep and attract them. Hence, today, with the increasing expansion of information technology and the huge volume of data available to customers, the new and efficient techniques which are created from combination and integration of different sciences are used for effective segmentation and providing appropriate approaches to develop various industries. Companies are faced with very large data sets in databases and because of values of these data, companies have decided to segment customers, which is unavoidable in order to take the advantages of available large-scale data for identifying customers with new data mining methods. Somehow a large amount of data and inefficient performance of traditional statistical techniques for intensive data is an incentive to find effective segmentation tools in order to discover useful information about markets and customers, therefore, data mining is a solution to this problem (Hiziroglu,203). Data mining as a powerful tool refers to find connections between rules and behavior patterns from analysis of large quantities of data (Xiao & Fan, 20). In recent years, based on data mining techniques and based on models and variables which are different from the old segmentation, analysis has been done in transactional data available to customers Including Li& et al (20), In their study, they analyze the characteristics of their customers in a spinning factory using clustering techniques. In this study, the customer relationship model is defined based on RFM developed model which is LRFM, in which L defines the length of relation. Also the customers in this study are grouped in five clusters using K average method and in this regard the different groups of customers are identified including potential customers, new customers and valuable and main customers. And Wei& et al (202), in their study began segmenting customers of a dental clinic with an approach based on neural network (self

3 Pinnacle Research Journals 88 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) organizing maps) and with use of developed model of LRFM (L denotes number of days from the date of first visit up to the last visit). After reaching the 2 clusters, they used customer relationship matrix for analysis. Wei& et al (203), in their study investigated the concept of customer relationship in the hairdressing industry using data mining techniques. They segmented the customers of hairdressing salons in Taiwan By combining the methods of k-means and selforganizing maps based on criteria of RFM. They identified a variety of clients including loyal customers, potential customers, new customers and etc. in their study and raised appropriate marketing strategies for each of the clusters and different types of customers. As mentioned, customer segmentation can be done in various ways. In this paper, the variables of RFM model have been used for related electronic market segmentation as input variables of self-organizing maps. This article seeks to achieve an efficient method to segment electronic markets and their customers. For this task RFM model is been used for determining variables and data mining for market segmentation and grouping customers based on three variables of recency, frequency and monetary value. So this paper is organized as follows: In Section II, the proposed methodology is described and in Section III the results of applying this model in e-marketplaces are described and in the final section, appropriate strategies for each part and suggestions for future researches are presented. The proposed method of research In this part, the proposed method of paper is provided. This method is based on RFM model and focuses on segmentation based on self-organizing map method and somehow using neural approach in market segmenting. First of all the relevant data are extracted from the company s database and then preparing and weighting three variables of recency, frequency and monetary value are been done. In the next step with regard to use of neural network for segmentation of e- marketplaces, it is used the method of self-organizing maps in order to achieve market segments and determine different clusters of customers and In the end, each cluster strategy has been developed. Data Preparation Customers database are composed of massive transaction data that some of them are irrelevant, redundant and useless which can be removed through data preparation. The first step for cleaning data is discovering differences in the data which may be due to several factors. Preparation involves clearing a subset of the data, inserting the appropriate values or estimating of missing data and integration of data (Han &et al,20). RFM model RFM model is a general and flexible model that has flexibility in particular situations and can be used and localized based on characteristics of industry. This model can be used in combination

4 Pinnacle Research Journals 89 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) with other data mining tools by itself and with its simplicity, it can have useful and applicable results for marketing strategies including market segmentation and strategy formulation. RFM is a powerful and known tool in marketing databases that is widely used to measure the value of customers based on their purchasing. RFM model was first introduced in 99 by Huges (Han Hua&et al,203). Variables in the model are: - Recency: This index refers to the time interval between the last purchase made by a customer to a particular period (end of period). The lower amount of this distance indicates the high value of this index in model. 2- Frequency: This indicator shows the number of transactions that a customer has done in a specific period of time. Greater number of exchanges indicates the high value of this index in the model. 3- Monetary: This index represents the amount of money that a customer is paid for exchanges in a specific period of time. Greater amount of paid money indicates high value of this index in model (Coussementet al, 20). These models were developed during the research done and some of these models are as follows:. erfm-emo: which is developed based on composition of the demographic histological data and used for predicting customer loss rates. 2. TRFM: that is used for development of seasonal products as a combination with quarterly information. 3. RFD: its development is based on combination with periods in order to analyze web sites clients.. RML: it is used to evaluate customer s loyalty based on combination with loyalty factor. 5. FRAT: combined with amount and type of sold goods to improve customer clustering and is used based on classification of each product. 6. RFR: its developing method is used with combination of influence and network access for the analysis of social networks (Wei &et al, 200). 7. WRFM: The analytic hierarchy process is used for better decision making to determine the relative weights of RFM s criteria. Self-organizing maps Self-organizing maps provide powerful and attractive tools to display Multi-dimensional data in spaces with lower dimensions (usually one or two dimensions). Also, they are a method for clustering and preprocessing information and also these maps are visualization tools for exploratory data analysis and make it easy to observe relationships between large amounts of data for humans. Self-organizing maps have been developed by Professor Tyov Cohen of the University of Finland. The algorithm of Self-organizing map is an invariant recursive regression equation which maps a set of vectors of m Rn to the space of x Rn vectors through the following steps:

5 Pinnacle Research Journals 90 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) At each stage of training, a sample vector of x from the input data set is randomly selected and the distances between x and all the prototype vectors are calculated. By minimizing the difference between a sample with other samples, the best degree of matching (Best Matching Unit (BMU)) is calculated by Equation (Vesanto& et al, 2000). x m b = min i x m i In the next step, the prototype vectors are updated to the best match and their topological neighbors are moved to the vicinity of the input vector in the input space. For updating the prototype vector of unit i, equation 2 is used: m i t + = m i t + α t h bi t x t m i t Where t represents the time that the self-organizing network is trained as a recursive process, α (t) is training and learning rate which indicates the conformity rate and decrease uniformity with regression process (time) and hbi (t) is the neighborhood kernel which is a decreasing function of the distance between i-th and b-th on the network map and focuses on winning unit. Neighborhood function is considered to be like equation 3: (Vesanto&AlhoniemiVe, 2000). h bi t = exp r i r b 2δ 2 t Where σ2 (t) equal to the radius of the neighborhood function expansion and ri rbوr2 R2 are place of i-th and b-th neuron on the self-organized network, that in this case, along with the time and process of regression, radius expansion decreases. There is no specific approach to determine the number of clusters and just a general rule for determining the number of clusters is proposed in which, Nis the number of samples in data sets. Self-organizing algorithm minimizes the error function in equation : E = N C h bi x i m j 2 i= j= Where C is the number of clusters, neighboring kernel of hbi (t) is focused in unit b that represents the best amount of fitness of vector xi and analyzed for unit j. According to equation, SOM considers a more negative score for large errors (greater distances). The input data of SOM are formed of vectors with n elements. For clustering these input vectors, different arrays composition can be considered. As previously mentioned, there is not any predetermined approach to determine the number of categories. It should be noted that the number of neurons in each array is obtained by multiplying the number of categories in each of two-elements in an

6 Pinnacle Research Journals 9 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) array. Also, to compare the distances within a cluster and between clusters, the Euclidean distance is used (Schatzmann, 2003). In this algorithm at every level of training, each of the training vectors adds to network randomly and weight values and bias are updated after the presentation of each vector. Network of selforganizing map has been trained for clustering of input data set using Random order incremental training. Training is stopped when one of these criteria is fulfilled: the maximum numbers of courses reach the minimum error or achieve the maximum amount of time. Then the network will specify the winner neuron and weights of the winning neuron and neighboring winner neurons in each learning phase get closer to the input vector. The weight of winner neurons and its neighboring neurons are changed according to learning rate. The learning rate and neighbor distance will be updated in the two-step arrangement. In the arranging stage, the learning rate begins from an initial value and decreases and the neighbor distance decreased from the maximum neurotic distance to. In the arranging stage it is expected that the neuronal weights make themselves compatible with neuronal correlates positions in the input space and develop a general arrangement in the weights of all neurons with the great strides. Hence, the variable of learning rate is a considerable value and with a certain number of steps, the arranging stage will stop. During the adjusting stage, unlike the arranging stage, learning rate decreases slowly and with small changes in weights, it reaches more accurate and final adjustment in weights which at last it leads to convergence. In adjusting stage, it is expected the weights to be scattered in entire input space randomly besides preserving topological discipline in arrangement stage. In adjusting stage which is stage of convergence, training rate has smaller amount in order to reaches more accurate and final adjustment in weights by small changes. Thus, feature maps during learning clustering inputs, will also learn topology and the input distribution (Demuth& et al, 2008). The research data analysis The examined case study in this research is the ADSL customers of an internet service provider in Iran. The transaction data between 2006 and 202 from the company s database were used for doing this research which were30000 records. After preparation of data, three variables of recency, frequency and monetary value are extracted and 6000 cleaned record are obtained. Parts of these data are given in table. Table: Some parts of cleared data User Changed Credit User ID Date

7 Pinnacle Research Journals 92 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) The weight of variables has been determined according to experts which based on intuitive judgments of experts and professionals of IT industry, the weights of 0.5 for frequency, 0.3 for recency and 0.2 for monetary value are estimated which shows that the repeating purchase in electronic markets is important because customers repeating buy and also it s recently in this industry, helps long term relationship and increases loyalty and maintains customers in such markets and along with it, that will result a continued profitability in relevant market. Some of weighted data are given in table 2. Table2: Part of the data related to RFM score User ID Recency Frequency Monetary Recency Score Frequency Score Monetary Score RFM Score

8 Pinnacle Research Journals 93 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) After data preparation that is based on RFM model and is done by clementine software, these data are used as input variables for clustering using SOM method. As mentioned, this method is a neural network to cluster the data set into distinct clusters and records are clustered in a way that the records within a cluster are similar and records in different clusters are dissimilar. In this method, the basic units are neurons that are organized at both the input and output layers. All the input neurons are connected to the output neurons, where each of these connections has their own weights. During training, each neuron fights all other neurons to win. This process is repeated several times, until the changes are very minor (Hong, 202). In this way, a twodimensional map of the clusters creates in which, similar records are seen near and the records that are different from each other seen far apart. And thus, in the present study based on this method, data is divided into n clusters which are shown in table 3. Table3: Identifying customers segments according to RFM model Number of Recently Score Frequency Score Customer Mean Mean Monetary Score Mean Cluster Cluster Cluster And also as mentioned, these maps are good visualization tools to display data easily. In figure, it is shown as a two-dimensional map. Existences of different colors on the map indicate the number of customer in each of the homes.

9 Pinnacle Research Journals 9 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) Picture : Self-organized map CONCLUSION In today's world of marketing, market segmentation for better planning and more focused on markets and customers is paramount importance. Market segmentation can enable companies to be more coordinated with customers through highlighting specific customer needs and with effectively targeting. The segmentation that is done today by integration of methods such as data mining, statistics, operations research and computer sciences in line with technology development can be more helpful for companies. In this paper in addition to emphasize on importance of e-markets segmentation, necessity of implementation of segmentation process with new methods such as data mining is mentioned. And one of these techniques which is self-organizing map with approach of using neural network has been applied. The variables of recency, frequency and monetary value based on RFM model are used as input variables for segmenting these markets. The obtained results showed three distinct segments and more importantly the valuable market segment of ADSL Company. REFERENCES Biranty,D.(200).Data Mining Using Rfm Analysis.KnowledgeOrientedApplications InData Mining,vol8. Coussement, K., Van den Bossche, F. A. M and Bock, K. W. D. (20). Data accuracy's impact on performance: Benchmarking RFM analysis, logistic regression, and decision trees

10 Pinnacle Research Journals 95 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) Demuth, H., Beale, M. & Hagan, M. (2008). Neural Network Toolbox (MATLAB), version 6, The MathWorks, Inc. Han Hua., H, Cheng-KuiHuang., T, Hua Kao., Y.(203).Knowledge discovery of weighted RFM sequential patterns from customer sequence databases. The Journal of Systems and Software, 86: Han, J., Kamber, M., Pei, J. (20). Data Mining: Concepts and techniques.third Edition. Printed in United States of America. Hanafizadeh, P and Mirzazadeh, M. (20). Visualizing market segmentation using selforganizing maps and Fuzzy Delphi method ADSL market of a telecommunication company. Expert Systems with Applications, 38, PP Hiziroglu, A. (203). "Soft computing applications in customer segmentation: State-of-art review and critique.expert Systems with Applications, 0: Hong, C, W.(202).Using the Taguchi method for effective market segmentation.expert Systems with Applications, 39: Journal of Business Research, 67: Kotler, P., Armstrong, G, M.(20). Principles of Marketing. th Edition. Publisher: Prentice Hall.. Li, D. C., Dai, W. L and Tseng, W. T. (20). A two-stage clustering method to analyze customer characteristics to build discriminative customer management: A case of textile manufacturing business. Expert Systems with Applications, 38, PP Liu, Y., Kiang, M and Brusco, M (202)."A unified framework for market segmentation and its applications. Expert Systems with Applications, 39: Schatzmann, J., (2003). Using Self-Organizing Maps to Visualize Clusters and Trends in Multidimensional Datasets.Department of Computing Data Mining Group, Imperial College, London. Schejter, A, M.,Serenko, A., S., Turel, O and Zahaf, M. (200). "Policy implications of market segmentation as a determinant of fixed-mobile service substitution: What it means for carriers and policy makers.telematics and Informatics, 27: Vesanto, J. &Alhoniemi, E. (2000).Clustering of the Self-Organizing Map, (3), Vesanto, J., Himberg, J., Alhoniemi, E. &Parhankangas, J. ( 2000). SOM Toolbox for Matlab 5.Helsinki University of Technolog. Walker, O, C., Boyd, H, W., Mullins,J., Larreche, J,C (2005). Marketing Strategy: A Decision- Focused Approach. Publisher:McGraw-Hill Irvin.

11 Pinnacle Research Journals 96 Vol. 3 Issue 2 December 20, ISSN , pp (Special Issue on Basic and Applied Sciences) Wei, J,T., Lin, S, Y., W, H,H.(200). A review of The Aplication of RFM Model.Artificial Journal ofbusiness Management, Vol. Wei, J. T., Lee, M. C., Chen, H. K and Wu, H. H. (203). Customer relationship management in the hairdressing industry: An application of data mining techniques.expert Systems with Applications, 0, PP Wei,J-T., Lin,S-Y., Weng,C-C and Wu,H-H. (202). A case study of applying LRFM model in market segmentation of a children s dental clinic. Expert Systems with Applications, 39: Xiao, F.,Fan,C.(20). Data mining in building automation system for improving buildingoperational performance. Energy and Buildings,75,PP.09-8.