Data Mining based Product Marketing Technique for Banking Products

Size: px
Start display at page:

Download "Data Mining based Product Marketing Technique for Banking Products"

Transcription

1 2016 IEEE 16th International Conference on Data Mining Workshops Data Mining based Product Marketing Technique for Banking Products Merve Mitik, Ozan Korkmaz, Pinar Karagoz, Ismail Hakki Toroslu, Ferhat Yucel Middle East Technical University, Ankara, Turkey Intertech Bilgi Islem ve Pazarlama Ticaret A.S. Istanbul, Turkiye Abstract In direct marketing, in order to increase the return rate of a marketing campaign, the massive customer dataset is needed to be analyzed, to make best product offers to the customers through the most proper channels. However, this problem is very challenging, since, usually only for very small portions of the whole dataset, some positive returns are received. This paper studies the similar problem for bank product marketing. The proposed approach is a two layer system, which first clusters the customers then, constructs a classification model for product communication channel offers. Experimental analysis on real life banking campaign dataset shows promising results. I. INTRODUCTION Bank marketing is a management process whose main purpose is to provide banking services to the customers for their potential purchase [14]. There are two different advertising methods that are used in marketing [4], namely, mass direct marketing to introduce the products. Mass marketing is based on broadcasting channels like television, radio, newspaper billboard methods to reach marketing messages to the largest number of people possible [3]. However, mass marketing is not as effective as it is used to be since there are too many companies, products customers. Response rates of broadcasting advertisements are low despite their high cost [16]. On the other h, direct marketing is a method that customers are directly informed for banking products, such as credit cards, new savings account types, credit offers etc., which are analyzed selected according to customer s characteristics contacted over a communication channel like phone calls, s, text messages. Bank marketing systems are shifting from mass marketing strategy to direct marketing strategy [5]. Since, the return rate of mass marketing is very low compared to its very high cost per customer, it is mainly used to market specialized products directly to the customer. Moreover, in direct marketing, the return rate the effectiveness of the campaigns can be measured using the responses of customers, improvements on the campaigns can also be made. Banking management systems have very large amount of customer data, therefore, it is infeasible to analyze this huge data make decisions on the products for the customers manually. For this problem, data mining techniques can be utilized. With the help of clustering classification techniques, customers can be grouped according to similar needs characteristics, these characteristics can be modeled. In this work, the problem under interest can be described as follows: Given a set of banking products P, a set of communication channels C, for a given user u, predict (i) whether the user u will accept any product offer, (i) it s/he will accept, predict the product p P /or communication channel c C. We propose two methods for solving this problem, partitioning model based prediction for determining the banking product the channel type to reach to the customer. In the partitioning method, we create a set of clusters with respect to the products, channels sold or not sold labels, then, using these clusters, we aim to predict customer decision. In the model based method, we use models generated from the logs of product offer to customers to predict the right product the channel if they are likely to accept the offer. We used WEKA Data Mining Software [15] in our experiments using algorithms for classification, clustering data preprocessing. The motivation of this paper is to improve the return rate for bank marketing. Therefore, accuracy is an important factor for a campaign result. The first method aims to increase the accuracy rate using partitioning based approach improves the prediction rate. However, accuracy is only one of the factors that banks seek for marketing their products. For some campaigns, truly predicted positive returns on a campaign can be more important, which requires improvement on sensitivity rate. For that purpose, we propose the second method, which is a hybrid solution using model based approach with partitioning. Although it leads to only a small rise on accuracy rate, sensitivity ratio is increased dramatically with this hybrid solution. This paper is organized as follows. Section 2 summarizes the related work. Section 3 describes the dataset properties that we have used its characteristics. Section 4 explains the details of the methods that we have proposed. Section 5 is about the results of experiments. Finally, Section 6 presents the conclusions. II. RELATED WORK Data mining techniques, specifically clustering classification methods are widely used in various areas to predict the different behaviors. Retaining the existing customers in /16 $ IEEE DOI /ICDMW

2 telecom industry [6], diagnosing lung cancer in the early stages [7], forecasting wind to take the better advantage of the wind power [8], predicting petroleum well performance for oilfields [13] are some of the several different fields that use clustering classification techniques of data mining to predict the behaviours of people, diseases the nature. Moreover, in customer relationship management (CRM) clustering classification methods are among the most used data mining techniques to increase the response rates of existing customers to enhance the profit of a campaign [9]. The ultimate motto of bank marketing is to deliver the right product to the right customer at the right time. The work in [2] proposes a solution to the problem in bank marketing which includes multiple campaigns, several contact channels multiple time periods. The solution aims to choose the customers with the maximum expected return rate for a particular product in order to fulfil global maximization. Some business constraints are defined in the form of restrictions on minimum maximum product offers to be made to a customer, limits on contact channels funding of the campaign [2]. The optimization solution in the paper provides three major improvements over the stard approaches. Firstly, it is shown as an example that the solution brings out the campaign incremental profit nearly twice more than the competing solutions[2]. Secondly, the solution hles the constraints. Also, the extra information extracted as a part of the solution can be used by the company to generate more profitable campaigns in the future. This paper is not about data mining techniques used in bank marketing analysis, however it gives broad opinions about the subject. In [1], bank marketing dataset from a Portuguese bank is analyzed with different most important data mining classification techniques such as, multilayer perception neural network, tree augmented Naive Bayes, logistic regression C5.0 which is an improved version of C4.5 decision tree. Performances of these techniques are examined to increase the campaign effectiveness. The comparison of the techniques in the study show that the C5.0 decision tree achieved better performance than MLPNN, TAN LR [1]. A similar study conducted on the same topic the same dataset as in [1] with the classification methods MLPNN Naive Bayes shows that MLPNN is slightly better than Naive Bayes classification [10]. The cost of direct marketing may increase if techniques applied to retrieve respondents for products are not well designed, since the data used to create models are huge imbalanced in general. This problem is widely encountered in data mining on direct marketing as discussed by Ling et al., in [16]. The problem occurs because using imbalanced dataset in direct marketing with one rule or pattern leads predictions with fully negative results [16]. Therefore, ensemble methods are used to generate more accurate models by applying more than one classifiers on training dataset [11]. With a different approach, optimization methods can be applied to improve bank product marketing. In [12], different types of heuristic models like Branch Price Next Product to Buy are used to find customers that will buy the offered products. However, the idea is same for heuristic data mining approaches, which is to achieve maximum profit by offering products to customers while considering business constraints on market [2] [12]. III. DATASET The dataset used in this study is provided by a Turkish Bank, which is related to direct marketing campaigns. The marketing campaigns were based on Channel, Product the class IsSold fields. That is, each record in the dataset represents whether a customer, whose id some profile information is provided, buys the offered product via specified channel or not. The dataset consists of instances with 13 fields. The attributes in the dataset are either numeric or nominal attributes. Numeric attributes are Id, Date, Period, Customer Age, Customer Period, Active Customer, Active Product, Total Asset, Total Loan. On the other h, nominal attributes are Channel with values of IVN, CC, , SMS; Product with values of Credit Card, Overdraft Account, Loan, Deposit Account; Education with values of Bachelor s Degree, Academy, Secondary School, Primary School, Master s Degree, High School, Uneducated, Doctoral Degree IsSold with values of Yes No. The last attribute IsSold is the class label attribute it denotes whether the customer has accepted the offer or not. The complete list of attributes their descriptions are given in Table I. Out of entries in our dataset, 24 of them were duplicates. Therefore, we have removed the duplicates used entries in our experiments. Attribute IsSold contains 2975 positive (Yes) negative (No) instances. There are 4 products offered in the dataset. The number of records for each product is as follows: of them are Credit Card, of them are Overdraft Account, of them are Loan of them are Deposit Account. The number of positive negative responses for 4 products are given in Table II. In order to offer these products 4 different channels are used of the records have channel value as SMS, of them have value, of them have CC of them have IVN values. The number of positive negative responses while using 4 channels are given in Table III. Among the other attributes, Education, Total Asset Total Loan have several empty attributes values. These attribute instances have been replaced by Null values. In this work, while calculating distances of the instances to clusters, whose details are described in Section IV, we applied attribute selection according to information gain 3 of the attributes are found to be the most important ones which are ID, Date Customer Period. Therefore, these 3 attributes are used in distance calculation within the algorithms presented. 553

3 TABLE I ATTRIBUTES OF TURKISH BANK DATASET ID Attribute Name Attribute Type Attribute Description Domain Values 1 ID Numeric Id of customer [6477, ] 2 CHANNEL Nominal Which channel is used {IVN, CC, to reach customer? , SMS} 3 PRODUCT Nominal {Credit Card, Deposit Account} Which product that Overdraft Account, contact is made for? Loan, 4 DATE Numeric Which date that contact is made? [ , ] 5 PERIOD Numeric Which period that contact is made? [120, 124] 6 CUSTOMER AGE Numeric How old is customer? [16, 90] 7 CUSTOMER PERIOD Numeric How many weeks that bank works with [1, 1384] customer? 8 ACTIVE CUSTOMER Nominal Is customer an active {0, 1} 9 ACTIVE PRODUCT Numeric one? How many active product does customer have? 10 EDUCATION Nominal {Bachelor s Degree, Academy, Secondary School, Primary School, Education status of Master s Degree, customer High School, Uneducated, Doctoral Degree, Null} 11 TOTAL ASSET Numeric Total asset of customer [0, ] 12 TOTAL LOAN Numeric Total loan of customer [0, ] 13 ISSOLD Nominal Has customer subscribed product? {Yes, No} [0, 17] TABLE II PRODUCT VS SOLD QUANTITY Credit Card Overdraft Account Loan Deposit Account YES NO TABLE III CHANNEL VS SOLD QUANTITY SMS CC IVN YES NO , Yes as product, channel label attribute values, while in another partition, there are customers including Load, SMS, No as product, channel label attribute values. Then, we cluster each one of these partitioned data in order to group similar customers with the same product, channel, label fields together. To predict a product a channel for a new customer, we firstly determine the closest Yes No clusters to her, by comparing the new customer s data with the centriods of the clusters. If the new customer is closer to a No cluster, then we label her as a No, however, if she is closer to a Yes cluster, then we can say that this new customer may be interested in the product with the nearest cluster s product by the chosen channel of that cluster. Algorithm of this method is presented in Algorithm 1. IV. PROPOSED METHOD A. Partitioning Based Bank Product Channel Prediction Technique In this method, we first divided our training dataset into 32 partitions for each different product (4), channel (4) label (2) combinations (PCL). For example, in one of the partitions there are customers including only Credit Card, One possible improvement in this model is, instead of just checking the nearest clusters, taking the average of the differences of N closest Yes clusters N closest No clusters. With this improvement, not only the nearest neighbors are considered, but also it can be determined whether the new customer is around Yes or No customers. In this experiment, we used K-Means Clustering as the partitioning method. The optimal k value of K-Means algorithm is found experimentally. 554

4 Algorithm 1 Partitioning Based Bank Product Channel Prediction Technique s Algorithm 1: procedure CONSTRUCTMODEL(trainingData, n) 2: partition the trainingdata into same PCL instances 3: for each PCL instances do 4: construct n clusters for each P i, C j, L k as P i C j L k 5: end for 6: end procedure 7: 8: procedure PREDICTINSTANCE(data, P ( ) C ( ) L ( ) ) 9: P i C j Y (k) = closestcentroid(p ( ) C ( ) L ( ) Y, data) 10: P i C j N (k) = closestcentroid(p ( ) C ( ) L ( ) N, data) 11: if dist(data,p i C j Y (k) ) < dist(data,p i C j N (k) ) then 12: data. = Yes 13: data.product = P i 14: data.channel = C j 15: else 16: data. = No 17: end if 18: end procedure B. Model Based Bank Product Channel Prediction Technique This is a hybrid two phase method, in its first phase, we use classification techniques to decide the label of the customer, i.e., if she is likely to buy any product or not. The previous method makes the label decision by using the following simple rule: if distance(closestyescentroid, NewCustomer) < distance(closestnocentroid, NewCustomer) then LABEL=YES else LABEL=NO. In the model based method, we replace this rule with a classification based model. Using the training data, a model is constructed to determine the label of the user. We have tried several classification methods, C4.5 decision tree Naive Bayes produced the best results. In the experiments section, we present the results for these two classifiers. In the banking dataset we have used for training, there are so many negative labeled instances compared to the positive labeled instances. Because of the huge imbalance of the labels, models constructed using classification techniques will be inclined to produce No labels to increase the accuracy. To avoid this behavior, we have generated a model with equal number of Yes No customers. If the new customer s label comes out to be No from the model, then we can say that she is not interested in any product. However, if the label turns out to be Yes, then we use the same approach as in the partitioning method, by comparing the new customer against the clusters constucted for each product channel. Then, we find the closest Yes cluster, to offer the product of the cluster through the channel of the cluster to the new user. This algorithm is presented in Algorithm 2. Algorithm 2 Model Based Bank Product Channel Prediction Technique s Algorithm 1: procedure CONSTRUCTMODEL(trainingData, n) 2: partition the trainingdata into same PCY instances 3: for each PCY instances do 4: construct n clusters for each P i, C j as P i C j Y (k) 5: end for 6: classi f icationmodel = classifier(trainingdata) 7: //same amount of Yes No instances 8: end procedure 9: 10: procedure PREDICTINSTANCE(classi f icationmodel, data, P ( ) C ( ) Y ( ) ) 11: if classificationmodel(data) == YES then 12: P i C j Y (k) = closestcentroid(p ( ) C ( ) Y ( ), data) 13: data. = Yes 14: data.product = P i 15: data.channel = C j 16: else 17: data. = No 18: end if 19: end procedure V. EXPERIMENTS A. Experiments with the Baseline Method In the baseline method, the new customer is checked whether it is more similar to the customers who responded positively before, or who responded negatively. If it is more similar to the No responded customers, then we predict the label for the new customer as No. However, if it is more similar to the Yes labelled instances, then we need to find a product to offer with a proper channel. In this baseline method, Yes responded customers are partitioned according to products channels (i.e., one cluster per product one cluster per channel). Thus, for the new customer the product the channel are determined by finding the most similar centroid from products channel clusters respectively. For the accuracy evaluation, we check if the customer s correct product, channel label totally match with our predictions. The results of our experiments with the baseline method is shown in Table IV. Although, the accuracy rate is 50%, it is mostly due to correct predictions of negative instances, since, the true positive rate is very low, which is 32.96%. In the bank product marketing problem, it is important to reduce the false positive ratio in order to prevent unproductive marketing efforts not to disturb customers. Therefore, the aim is to improve true positive ratio as well as true negative ratio. B. Experiments with Partitioning Based Method In order to evaluate two methods that we have proposed in this paper, we had to determine the accuracy of suggesting the product the channel for the customer. However, in addition to this accuracy measure, we also wanted to determine 555

5 Fig. 1. Model based method Fig. 2. Partitioning based method the accuracy for just determining the suitable product for the customer, or the accuracy for just determining the proper channel for the customer. Finally, we have also determined the accuracy for only finding out the label of the customer as well. We tested this method s performance with the increasing number of clusters per partition corresponding to products channels. In this dataset the rate of positive results are very small. Therefore, the accuracy levels true positive ratios that we have obtained are very important. Due to profile distributions of users in the database, that is its characteristics, we cannot expect monotonic behaviour for accuracy values for increasing cluster numbers. For this reason we have tried varying number of clusters in order to be able to determine the best results. Results of this method is shown in Table V. As it is seen in the results, partitioning has positive effect on accuracy, both on true positive true negative rates. It can be seen that with 1 cluster, the number of the correct predictions 556

6 TABLE IV CHECKING PRODUCT, CHANNEL AND LABEL WITH BASELINE METHOD TABLE V CHANGING CLUSTER COUNT WITH CHECKING PRODUCT, CHANNEL, LABEL num. of clusters Acc. TP TN FP FN for product, channel label is 44.56% with Yes labelled data 28.31% with No labelled data. After 10 clusters, we have not observed significant change in the results. Two interesting results can be observed from this table. Even though the highest accuracy is achieved by using 3 clusters per partition, if we are interested in increasing true positive results choosing 10 for cluster number gives better result. In terms of three variations of the problem, namely only determining the product the label, or the channel the label, finally just the label, the results are quite expected. Although, in terms of the accuracy values the results are very close to the original problem, the number of true positives increase slightly in simple versions. C. Experiments with Model Based Method with Partitioning num. of clusters Acc. TP TN FP FN Channel Product Product, Channel Fig. 3. Experiment results on the effect of cluster count on the accuracy for Partitioning Based Method In this experiment, among the classification methods we applied, C4.5 Decision Tree performed better than the others. Naive Bayes Classifier C4.5 Decision Tree results are shown in Table VI Table VII, respectively. The results of Naive Bayes Classifier show that there is improvement on true positive predictions, however true negative ones are decreased. Therefore, the accuracy is 43.27% for 100 clusters, which is less than accuracy ratio of the partitioning based method. As seen in the results in Table VII, with C4.5 Decision Tree, 74.47% accuracy rate is obtained for predicting product, channel label of the customer. True positive rates are also higher than partition based method. Figure 4 shows the difference between two methods their accuracy rates. Fig. 4. Experimental results on the effect of cluster count on the accuracy for Model Based Method with Partitioning under C4.5 Naive Bayes classifiers Similar to the previous experiments we have determined the 557

7 TABLE VI APPLYING NAIVE BAYES WHILE CLUSTER COUNT CHANGES WITH CHECKING PRODUCT, CHANNEL, LABEL TABLE VII APPLYING C4.5 DECISION TREE WHILE CLUSTER COUNT CHANGES WITH CHECKING PRODUCT, CHANNEL, LABEL num. of clusters Acc. TP TN FP FN num. of clusters Acc. TP TN FP FN } Channel Product Product, Channel } Channel Product Product, Channel accuracy of the method with different number of clusters for each partition. In this case the best result is obtained with 5 clusters, both in terms of accuracy the ratio of true positives, after 10 clusters there was no significant change in the results also. Moreover, for determining the label since decision tree is used there is basically only one cluster. VI. CONCLUSION Direct marketing in banking sector has difficulties in determining the best product the most appropriate channel for the customer. Under large data size low number of positive instances, we have observed that the previously proposed supervised learning based solution do not provide satisfactory accuracy rate. In this paper, we present two methods, partitioning method model based method with partitioning, evaluate their accuracy performance. Experimental results show that partitioning based method alone makes positive increase on accuracy true positive true negative numbers. Moreover, model based method with partitioning yields even higher accuracy results. It is also seen that partitioning makes dramatically positive effect on true positive ratio. As a feature work, we will extend this problem by considering the cost of the channels used to reach to the customers the expected profits for selling the products. In this model, not only we will be interested in accuracy, but we will be trying to reach to the customers through the most appropriate channel to minimize the cost while maximizing the profit obtained by selling the products. This optimization version of the problem can be modelled in many different ways, including the well known operations research methods like integer programming techniques. However, due to the huge size of the variables this approach will be very infeasible, therefore more feasible but potentially non-optimal solutions may be preferred. ACKNOWLEDGMENTS This work is partially supported by Intertech within the scope of research collaboration project. REFERENCES [1] Elsalamony, H.A. Bank Direct Marketing Analysis of Data Mining Techniques. International Journal of Computer Applications, 85: , 2014 [2] Cohen, M.-D. Exploiting response models - optimizing cross-sell upsell opportunities in banking. Information Systems, 29: , 2004 [3] Nueno, Jose Luis, John A. Quelch. The mass marketing of luxury. Business Horizons 41.6: 61-68, 1998 [4] Nachev, A., M. Hogan. Application of Multilayer Perceptrons for Response Modeling. Proceedings on the International Conference on Artificial Intelligence, Springfield, The Steering Committee of The World Congress in Computer Science, Computer Engineering Applied Computing, 2014 [5] Kotler, Philip. From mass marketing to mass customization. Planning review 17.5: 10-47,

8 [6] Almana, Amal M., Mehmet Sabih Aksoy, Rasheed Alzahrani. A survey on data mining techniques in customer churn analysis for telecom industry. Journal of Engineering Research Applications 4.5: , 2014 [7] Zubi, Zakaria Suliman, Rema Asheibani Saad. Improves Treatment Programs of Lung Cancer Using Data Mining Techniques. Journal of Software Engineering Applications 7.2: 69, 2014 [8] Ozkan, Mehmet Baris, et al. A data mining-based wind power forecasting method: results for wind power plants in turkey. Data Warehousing Knowledge Discovery, Springer Berlin Heidelberg, , 2013 [9] Ngai, Eric WT, Li Xiu, Dorothy CK Chau. Application of data mining techniques in customer relationship management: A literature review classification. Expert systems with applications 36.2: , 2009 [10] Bahari, T. Femina, M. Sudheep Elayidom. An Efficient CRM-Data Mining Framework for the Prediction of Customer Behaviour. Procedia Computer Science 46: , 2015 [11] Pan, Y., Tang, Z. Ensemble methods in bank direct marketing. Service Systems Service Management (ICSSSM), th International Conference on. IEEE, 2014 [12] Nobibon, F. Talla, Roel Leus, Frits Spieksma. Models for the optimization of promotion campaigns: exact heuristic algorithms. Social Science Research Network, Available at SSRN , 2008 [13] Lopez-Yanez, Itzama, Leonid Sheremetov, Oscar Camacho-Nieto. Multivariate Prediction Based on the Gamma Classifier: A Data Mining Application to Petroleum Engineering. Database Expert Systems Applications, Springer Berlin Heidelberg, 2013 [14] Shih, J-Y., W-H. Chen, Y-J. Chang. Developing target marketing models for personal loans. Industrial Engineering Engineering Management (IEEM), IEEE International Conference, 2014 [15] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. Witten, I.H. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), pp.10-18, 2009 [16] Ling, C.X. Li, C. Data Mining for Direct Marketing: Problems Solutions. KDD, Vol. 98, pp ,