CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT

Size: px
Start display at page:

Download "CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT"

Transcription

1 CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT 8.1 Introduction Customer Relationship Management (CRM) is a process that manages the interactions between a company and its customers. CRM through data mining can get more efficiency in acquiring new customers, increasing value of existing customers and retaining good customers. The primary users of CRM software applications are database marketers who are looking to automate the process of interacting with customers. To be successful, database marketers must first identify market segments containing customers or prospects worth high-profit potential. They then build and execute campaigns that favorably impact the behavior of these individuals. The first task, identifying market segments, requires significant data about prospective customers and their buying behaviors. In theory, the more data the better result. In practice, however, massive data stores often impede marketers, who struggle to sift through the minutiae to find the nuggets of valuable information. Recently, marketers have added a new class of software to their targeting arsenal. Data mining applications automate the process of searching the mountains of data to find patterns that are good predictors of purchasing

2 behaviors. After mining the data, marketers must feed the results into campaign management software that, as the name implies, manages the campaign directed at the defined market segments. In the past, the link between data mining and campaign management software was mostly manual. In the worst cases, it involved sneaker net, creating a physical file on tape or disk, which someone then carried to another computer and loaded into the marketing database. This separation of the data mining and campaign management software introduces considerable inefficiency and opens the door for human errors. Tightly integrating the two disciplines presents an opportunity for companies to gain competitive advantage. There many reasons why real tine CRM analytics is expected to be performing as good as or better than Web service a real time but not real person service. In the world of consumer expectation where real people interactions are considered as an important differentiator of company and competition, it means a lot for the consumer and for the company. Even in instantaneous web service, when the consumers input data in the web, they expect to get not only instantaneous web service, but also instantaneous intelligent service; they cannot be told that the telephone talk they has few minutes back with the customer service representative is not part of the service provided by the web service, because there is a lag in integrating the data or in the execution of an algorithm. Finally, consumers always want relevant service, where the relevance means current status, not yesterday s status.

3 In recent years, data mining has not only a great popularity in research area but also in commercialization. Data mining can help organizations discovering meaningful trends, patterns and correlations in their customer, product, or data, to drive improved customer relationships and then decrease the risk of business operations. The basic data mining techniques include classification, clustering, association rules, regression analysis, sequence analysis, etc. Other data mining techniques include rule-based reasoning approach, genetic algorithms, decision trees, fuzzy logic, inductive learning systems, statistical methods, and so forth [121]. Generally, no tool for data mining in CRM is perfect because there are some uncertain drawbacks in it. For example, clustering has prior fixation of number of clusters and seeds. In decision trees, too many instances lead to large decision trees which may decrease classification accuracy rate and do not clearly create the relationships which come from the training examples. In artificial neural networks (ANN), number of hidden neurons, number of hidden layers and training parameters need to be determined, and ANN has long training times in a large dataset especially. Moreover, ANN served as black box which leads to inconsistency of the out- puts, is a trial-and-error process. In genetic algorithm, GA also has some drawbacks such as slow convergence, a brute computing method, a large computation time and less stability. In association rules, major drawback is the number of generated rules is huge and may be a redundancy.

4 For solving the problems of previous paragraph, three methods, RS theory, enhanced K-means algorithm and fuzzy measure are worth to be explored in this study [86, 94]. With respect to rough set theory (RS theory), five advantages are expressed in the following: (1) the RS theory do not require any preliminary or additional parameter about the data; (2) they can work with missing values, switch among different reducts, and use less expensive or time to generate rules; (3) they offer the ability to handle large amounts of both quantitative and qualitative data; (4) they yield understandable decision rules and own stability; and (5) they can model highly non-linear or discontinuous functional relationships provides a powerful method for characterizing complex, multi dimensional patterns [43,80]. K-Means is widely-used clustering algorithm, owing to its simple and convenience. Cluster analysis is a statistical technique that is used to identify a set of groups that both minimize within-group variation and maximize between-group variation based on a distance or dissimilarity function, and its aim is to find an optimal set of clusters [121]. Thus, this study is on the use of some techniques to cope with these short comings and then to improve CRM for enterprises to generate better segmentation on customer to maximize profit with win-win situation for company customer, based on RS theory, K-means clustering and fuzzy set [64]. 8.2 Customer Relationship Management (CRM) CRM is a philosophy of business operation for acquiring and retaining customers, increasing customer value, loyalty and retention, and implementing

5 customer-centric strategies. CRM, devoted to improve relationships with customer, focuses on a comprehensive picture on how to integrate customer value, requirements, expectations and behaviors via analyzing data from transaction of customer [84]. Enterprises can shorten sales cycle and increase customer loyalty to build better close relationships with customers and further add revenues by good CRM. Thus, an excellent CRM can help enterprises keeping existing customers and attracting new ones. Enterprises apply some methods to effectively enhance customer relationships, which include customer relationship management, customer value analysis, enterprise strategy, and positive service mechanisms. Moreover, enterprises also strengthen marketing and sales effectiveness in order to build good CRM. Kalakota and Robinson (1999) explained that the CRM is to integrate the function of the related fields with customer in the enterprise such as marketing, sales, services and technical support for customer needs, and it usually utilizes IT to help an enterprise managing relationships with customer in a systematic way, improving customer loyalty and increasing overall business profits [60]. It has been estimated that it costs five times as much to attract a new customer as it does to retain a existing one, according to research by the American management Association [65,83] and this relationship is particularly obvious in the services sector [26]. Therefore, enterprises understand the importance of developing a good close relationship with existing and new customers. Instead of attracting new customers, they would like to perform

6 possible more business operations for customers in order to keep existing customers and build up long term customer relationship. Based on this reason, this study ensures that enterprises should be implementing customer data analysis to understand about their customers, to retain valuable customers and finally to bring plenty profits for themselves. 8.3 Research Model This study constructs a model for yielding input attributes by attribute reduction through RS theory, the reduced attributes are passed to UCAM clustering and membership of each customer is obtained through Fuzzy-UCAM clustering. Figure. 1.1 in chapter 1 illustrates research model in this study. This research model is the enhancement of data mining process by including UCAM, Fuzzy-UCAM and RF-UCAM is as listed below 1. Problem Definition and Data acquisition 2. Data Pre-processing and survey 3. Feature reduction via RF-UCAM 4. Clustering via UCAM and Fuzzy - UCAM 5. Data Modeling 6. Evaluation 7. Knowledge deployment Enhancing RF-UCAM with data mining helps to have reduction on attribute set which reduces the computational complexity. UCAM clustering

7 algorithm is used for obtaining unique cluster and where as Fuzzy-UCAM is helps to get fuzzy membership matrix for the objects in the cluster. This section briefly introduces the research model of this study and the proposed procedure for segmenting customer. CRM is to achieve the needs of customers and enhance the strength with customers for company [114]. In recent years, data mining has great popularity in research area but no tool for data mining in CRM is perfect because there are some uncertain draw- backs in it such as clustering has prior fixation of number of clusters and seeds, in decision trees, too many instances decrease classification accuracy rate, etc. to solve the problem above, three methods, rough set theory, K-Means algorithm and Fuzzy C-Means are worth to be presented in this study. This study proposes a new procedure, joining RS theory; enhanced K-Means (UCAM) and enhanced Fuzzy C-Means (Fuzzy-UCAM) improve these drawbacks above. This procedural approach gives arise to an algorithm RF-UCAM, which is outlined below Input: D = {t1, t2,t3... tn } // Set of n data points. T Threshold value. Output: Clusters. Number of cluster depends on affinity measure. Method: 1. Set the threshold value T.. 2. Discretize continuous attributes to enhance the rough sets algorithm. 3. Rough set attribute reduction is carried out for dimensionality reduction.

8 4. Clustering through UCAM. 5. (optional) Fuzzy measure is computed using Fuzzy-UCAM. Figure 8.1 Rough-Fuzzy-UCAM (RF-UCAM) algorithm RF-UCAM algorithm uses Rough Set Attribute Reduction(RSAR). RSAR is filter based tool which extract knowledge without affecting the information content. Main advantage of RSAR is requires no additional parameters to operate other than the supplied data and hence it does not affect on originality of the data. In this method Quickreduct algorithm is used for feature reduction. In Quickreduct reduction of attribute is achieved by comparing equivalence relations generated by sets of attributes. Attributes are removed so that the reduced set provides the same predictive capability of the decision feature as the original. A reduct is defined as a subset of minimal cardinality of the conditional attribute set such that a subset with minimum cardinality is searched. On including RSAR in the research model, it reduces the processing time, increases the cluster uniqueness and retains the originality of the data. 8.4 Experimental Results of C-Company Dataset A practical collected dataset is used to evaluate the efficiency of the newly proposed methodology. As indicated in earlier comparison in this also inter cluster distance and intra cluster distance is measured along with that some effective cluster validity measures are also evaluated, which are as highlighted in the fore coming lines.

9 Silhouette is a method which is based on the silhouette width, an indicator for the quality of each object. The average distance of a object to all object of the same cluster, and to the closest cluster. The quality of the object in the cluster is good when it attains maximum value. Davies Bouldin(DB) Index is to find compact and well separated clusters. The optimum number of clusters corresponds to the minimum value of DB. Krzanowski and Lai(KL) Index, this metric belongs to the so-called Elbow models.these approaches plot a certain quality function over all possible values for K and detect the optimum as the point where the plotted curves reach an elbow, i.e. the value from which the curve considerably decreases or increases. The optimum K corresponds to the maximum of K. The cluster index of Calinski-Harabasz is calculated based on the error in of squares between different clusters (inter-cluster) and the squared differences of all objects in a cluster from their respective clusters center (intra-cluster), with increasing the number of cluster K, approaching optimum cluster solution. Hartigan index is the withingroup dispersion metric for data clustered into optimum clusters. Maximum value of the index is taken as indicating the correct number of clusters in the data [13]. The Table 8.1 gives the comparative analysis of K-Means, UCAM and RF-UCAM. Here the performance of RF-UCAM is compared with K-Means and UCAM by measuring its performance on implementing for the databases available in the UCI data repository and with real time CRM data. The tabulated values give the clear visualization on the better performance of RF-UCAM.

10 Number of Cluster=3 Validity Measures Inter Cluster Intra-Cluster Silhouette Davies Calinski Krzanowski Hartigan Execution time Thyroid Glass Pupil K-means Dermatology Iris Wine C-company data Thyroid Glass Pupil UCAM Dermatology Iris Wine C-company data Thyroid Glass Pupil RF-UCAM Dermatology Iris Wine C-company data Table 8.1: The Cluster results by K-Means, UCAM and RF-UCAM with cluster size 3

11 8.5 Discussion In this study, a discussion about the cluster obtained by different methods is placed as an issue. Cluster size is defined as 3 in the above Table 8.1 but any number of clusters Cn is made by adjusting the threshold value. Threshold value takes the range of minimum and maximum value based on the availability of range in attributes of the dataset. From the table 8.1, shows validity measures carried out with various affective approach such as Silhouette, Davies, Calinski, Hartigan,etc. indexes. Before analyzing proposed methodology with C-Company data, it is implemented on some of the datasets available in the UCI data repository. Most significant measures such as inter cluster distance, intra cluster distance and execution time are pictured in the following Figure 8.2, 8.3, 8.4. Figure 8.2 Inter cluster distance

12 Figure 8.3 Intra cluster distance Figure 8.4 Time of execution in sec

13 All the data set has produced a positive result in favor of RF-UCAM with notable difference, the inter cluster distance is comparably high than K-Means and UCAM. Intra cluster distance and execution time is notably low than other, which prove the efficiency of new methodology proposed in this research work. RF-UCAM produces notable high for the measures of Silhouette, Calinski, Krzanowski, Hartigan when compared with K-Means and UCAM and it has produced comparable low values on Davies and in execution time. It favours C-company, on applying this new methodology helps to frame standard layout for CRM by having better identity on customer. 8.6 Summary In this chapter, new RF-UCAM algorithm is used to retain customer and to find the odd customers by adjusting the threshold value to convert those customer as stable by giving better care. The fuzzy measure in this approach provides clear identification on customer through notating the membership degree. This approach reduces the overheads of fixing the cluster size and initial seeds as in K-Means and in FCM. The proposed methods improve the scalability and reduce the clustering error. This approach ensures that the total mechanism of clustering is in time without loss in correctness of clusters. On including RF- UCAM in the research model, it reduces the processing time, increases the cluster uniqueness and retains the originality of the data.