E-commerce models for banks profitability

Size: px

Start display at page:

Download "E-commerce models for banks profitability"

Howard Montgomery
5 years ago
Views:

1 Data Mining VI 485 E-commerce models for banks profitability V. Aggelis Egnatia Bank SA, Greece Abstract The use of data mining methods in the area of e-business can already be considered of great assistance in prediction, knowledge management, and decision support. In e-commerce in particular there are a significant number of metrics which have been tested and used for measuring interesting parameters. In most cases these parameters are in relation with customer habits and customer profitability. Nowadays, many merchants cooperate with banks for authorizing credit card transactions in order to purchase products. Banks are also interested, in measuring the profit and the strength of this cooperation. In this paper we introduce two models with this scope, (a) a merchant clustering model and (b) a bank revenue predictive model. According to the first model, a bank scores and classifies its cooperating merchants using a number of parameters, while in the second model, a bank predicts its revenue from e-commerce transactions. Keywords: data mining, clustering, k-means algorithm, predictive model, linear regression. 1 Introduction Banking or financial data treatment is generally conducted using several data mining methods such as Linear Regression, Clustering, Classification and others aiming at the development of patterns, rules, predictive models and finally forecasting. These methods produce interesting as well as useful results. However, not all kinds of results lead to rigid conclusions. From this point of view the data miner and the judgment of the user are essential in evaluating the results and especially the predictive models efficiency. Therefore the co-operation between people expert in data mining and others with good knowledge of the data sets is important leading to proper evaluation of the predictive model. In the banking area this combination is definitely necessary due to the singularity of bank data as well as bank market rules.

2 486 Data Mining VI A specific kind of bank service is the application of credit card authorization through internet (e-commerce). E-commerce is relatively new making relevant feature extraction very important. Since future tendencies suggest the increase of its use, a bank should be naturally concerned with enlargement of its cooperated merchant share in this specific area and increasing its revenue. In the present paper, the merchants clustering is studied along with the ranking of these merchants according to a new merchant scoring analysis. Also we develop and evaluate a predictive model for bank s revenue through e- commerce. Τhe software used is SPSS Clementine 7.0. Description of various clustering techniques and algorithms and a general description of predictive models follow in section 2 while in section 3 the calculation of merchants clustering and the procedure of predictive model is described. Section 4 contains experimental results derived from the data set of section 3 and conclusions and future work is stated in section 5. 2 Theoretical background 2.1 Clustering basics Clustering techniques [1, 4] fall into a group of undirected data mining tools. The goal of undirected data mining is to discover structure in the data as a whole. There is no target variable to be predicted, thus no distinction is being made between independent and dependent variables. Clustering techniques are used for combining observed examples into clusters (groups) that satisfy two main criteria: each group or cluster is homogeneous; examples that belong to the same group are similar to each other. each group or cluster should be different from other clusters, that is, examples that belong to one cluster should be different from the examples of other clusters. Depending on the clustering technique, clusters can be expressed in different ways: identified clusters may be exclusive, so that any example belongs to only one cluster. they may be overlapping; an example may belong to several clusters. they may be probabilistic, whereby an example belongs to each cluster with a certain probability. clusters might have hierarchical structure, having crude division of examples at highest level of hierarchy, which is then refined to subclusters at lower levels. 2.2 K-means algorithm K-means [1, 4, 8, 9] is the simplest clustering algorithm. This algorithm uses as input a predefined number of clusters that is the k from its name. Mean stands for an average, an average location of all the members of a particular cluster. When

3 Data Mining VI 487 dealing with clustering techniques, a notion of a high dimensional space must be adopted, or space in which orthogonal dimensions are all attributes from the table of analysed data. The value of each attribute of an example represents a distance of the example from the origin along the attribute axes. Of course, in order to use this geometry efficiently, the values in the data set must all be numeric and should be normalized in order to allow fair computation of the overall distances in a multi-attribute space. K-means algorithm is a simple, iterative procedure, in which a crucial concept is the one of centroid. Centroid is an artificial point in the space of records that represents an average location of the particular cluster. The coordinates of this point are averages of attribute values of all examples that belong to the cluster. The steps of the K-means algorithm are given in Figure Select randomly k points (it can be also examples) to be the seeds for the centroids of k clusters. 2. Assign each example to the centroid closest to the example, forming in this way k exclusive clusters of examples. 3. Calculate new centroids of the clusters. For that purpose average all attribute values of the examples belonging to the same cluster (centroid). 4. Check if the cluster centroids have changed their "coordinates". If yes, start again form the step 2). If not, cluster detection is finished and all examples have their cluster memberships defined. Figure 1: K-means algorithm. Usually this iterative procedure of redefining centroids and reassigning the examples to clusters needs only a few iterations to converge. 2.3 Predictive models basics A model is an abstract representation of a real-word process. A typical form of a model is Y=aX+b, where Y, X are variables and a, b are parameters. In a predictive model [12, 13, 14, 15], one variable is expressed as a function of the others. This permits the value of the response variable to be predicted from given values of the others (the predictor variables). The response variable in general predictive models is often denoted by Y, and the p predictor variables by X 1,, ^ ^ X p. The model will yield predictions, y = f(x 1,, x p ;θ) where y is the prediction of the model and θ represents the parameters of the model structure. When Y is quantitative, this task of estimating a mapping from the p- dimensional X to Y is known as regression. Prediction models [1, 14, 16] in which the response variable is a linear function of the predictor variables, yields prediction:

4 488 Data Mining VI ^ Y = a 0 + a X j j j= 1 p (1) ^ where θ = {a 0,, a p }. We have used Y rather than simply Y on the left of the expression because it is a model, which has been constructed from the data. In ^ other words, the values of Y are values predicted from the X, and not values actually observed. 3 Clustering merchants and predicting in banking data set The term «merchant» describes the on-line shop who has conducted at least one transaction during the searching period. The data sample used concern the period of the first quarter of The following variables are calculated for this specific time period. Authorized Transactions: We use the OK parameter for the authorized transactions. OK is defined as the count of authorized transactions the customers of on-line shop conducted within the period of interest (1 st quarter 2004) Amount of authorized transactions: We use the OKAmt parameter for the amount of authorized transactions. OKAmt is the total amount of authorized transactions within the above stated period. Void Transactions: Void parameter is used for the void transactions. There are many reasons making a transaction void, such as not authorized card transaction, blocked credit card, connection errors and other. Void is the count of void transactions the customers of on-line shop conducted within the period of interest. Amount of void transactions: VoidAmt parameter is used for the amount of void transactions. VoidAmt is the total amount of void transactions within the first quarter of Revenue: We refer in bank s revenue, which is a percentage of the total authorized amount for every cooperated on-line shop. The revenue amount is the 3% of total authorized amount, for our case study purposes. Recency: Recency is the date of the customer s last transaction. Since the recency s value contributes to a scoring determination, a numeric value is necessary. Therefore, we define the rec variable as the number of days between the first date concerned (1/1/2004) and the date of the last customer s transaction. In order to have steady results amounts are expressed in terms of thousands euro. Also rec is set to zero if the value of OK is less than 3, which means that customers of the on-line shop did not conduct at least one purchase per month. Merchant Score is calculated using the formula: Score = OK + OKAmt Void VoidAmt + Revenue + rec. A sample of the data set on which data mining methods are applied lies in Table 1.

Data Mining VI 489 Table 1: Sample data set. Merchant Id OK OKAmt Void VoidAmt Revenue rec ( /1000) ( /1000)............... 100114 2 0.008 0 0 2.64 0 100115 14 2.1 13 1.65 63.00 89 100117 12 1.

5 Data Mining VI 489 Table 1: Sample data set. Merchant Id OK OKAmt Void VoidAmt Revenue rec ( /1000) ( /1000) Merchant clustering is performed using the K-means algorithm, which was discussed in section 2. In order to generate a prediction the stepwise linear regression [2, 6] method was used. The Stepwise method of field selection builds the equation in steps, as the name implies. The initial model is the simplest model possible, with no input fields in the equation. At each step, input fields that have not yet been added to the model are evaluated, and if the best of those input fields adds significantly to the predictive power of the model, it is added. In addition, input fields that are currently in the model are reevaluated to determine if any of them can be removed without significantly detracting from the model. If so, they are removed. Then the process is repeated, and other fields are added or removed. When no more fields can be added to improve the model, and no more can be removed without detracting from the model, the final model is generated. Figure 2: Score distribution.

490 Data Mining VI 4 Experimental results 4.1 Clustering As seen in the histogram of Figure 2, score distribution is high over values less than 500.

6 490 Data Mining VI 4 Experimental results 4.1 Clustering As seen in the histogram of Figure 2, score distribution is high over values less than 500. This is a natural trend since the majority of merchants have small transaction numbers. Application of the K-means algorithm results in the 2 clusters of Figure 3. Next to each cluster one can see the number of appearances as well as the average value of each variable. The above clustering results in the distribution of Table 2 and Figure 4. The most important observations concerning the above results are the following: There are two basic categories of merchants, the Big ones and the Small Big merchants bring the most revenue in Banks. Merchants with a lot of void transactions usually are not members of big merchants. 4.2 Prediction model The stepwise method builds in two steps the following prediction (Figure 5). Revenue^ = (2.364)*OK + ( )*Void (2) In order to evaluate and test the appropriateness of the model (2) lift chart was used along with some indicative measures such as R, R-square, Adjusted R- Square and Linear Correlation. Figure 3: Table 2: K-means clusters. Clustering results. Cluster 1 (88,46%) Big 90% Cluster 2 (11,54%) Small 10%

0 on the left, remains on a high plateau as we move to the right, and then trails off sharply towards 1.

7 Data Mining VI 491 Figure 4: Clusters. Figure 5: Prediction model. An example of lift Chart is shown in Figure 6. As can be seen, Chart starts well above 1.0 on the left, remains on a high plateau as we move to the right, and then trails off sharply towards 1.0 on the right side of the chart. Using the prediction of the model shows the actual lift. 4.3 Other measures Other measures of the suitability of the models are supplied in Figure 7.

Mannila, P. Smyth, 2001., Clementine 7.0 Users s Guide, 2002., K. Joreskog, 1999., N.R. Draper, and H. Smith, 1998].

8 492 Data Mining VI Figure 6: Lift chart. Figure 7: Model summary. The degree to which two or more predictors (X variables) are related to the response (Y) variable is expressed in the correlation coefficient R, which is the square root of R-square [D. Hand, H. Mannila, P. Smyth, 2001., Clementine 7.0 Users s Guide, 2002., K. Joreskog, 1999., N.R. Draper, and H. Smith, 1998]. To interpret the direction of the relationship between variables, one should look at the signs (plus or minus) of the regression or parameters (θ). If a parameter is positive, then the relationship of this variable with the dependent variable is positive; otherwise in case the parameter is negative so is the relationship. As can be seen in Figure 4 the value of R concerning the second step model is appropriate since it is close to 1. Additionally it can be observed that decrease of

Data Mining VI 493 ASMR and the increase of BSV, is accompanied by an increase of the count of Logins in e-banking services. R square is commonly used as measure of a model s goodness of fit.

9 Data Mining VI 493 ASMR and the increase of BSV, is accompanied by an increase of the count of Logins in e-banking services. R square is commonly used as measure of a model s goodness of fit. An R square value near 1 indicates a perfect regression. R square value of is considered satisfactory and indicates an acceptable model, bearing in mind that: R square is a non-descending function of the number of predictor variables present in the model; that is, adding more historical data and predictor variables (X's), has almost constantly an increasing effect on R square. This is because the addition of predictor variables to the model reduces the prediction errors. R square assumes that the data set being analysed is the entire population while in fact, it represents only a sample of the population. Αdjusted R square measures the proportion of the variation in the response variable due to the predictor variables. Unlike R square, adjusted R square accounts for the degrees of freedom associated with the sums of the squares. Therefore, even though the residual sum of squares decreases or remains constant as new predictor variables are added, this is not the case for the residual variance. This is the reason, adjusted R square is generally considered to be a more accurate goodness-of-fit measure than R square. If adjusted R square is significantly lower than R square, this normally means that some predictor variables are missing. The absence of these variables causes the improper measurement of the variation in the dependent variable. The nearest the adjusted R square is to 1, the better the model is. Adjusted R square value of is almost the same with R square indicating therefore an acceptable model. Figure 8: Linear correlation. Finally, as can been seen in Figure 8 the level of Linear Correlation of the model is Since this value approaches unity it indicates a strong positive relation, such that high predicted values are associated with high actual values and vice versa.

10 494 Data Mining VI 5 Conclusions and future work In the present paper it is shown that the knowledge of scoring of merchants who are cooperated with them can rank them according to a two level model. This result was highlighted by the use of K-Means method. Therefore, the e-banking unit of a bank may easily identify the most important merchants. The model continuously trained reveals also the way merchants are transposed between different levels so that the bank administration has the opportunity to diminish merchant leakage. At the same time merchant approach and new services and products promotion is improved since it is the bank s knowledge that it is more likely a merchant to respond to a promotion campaign if this customer belongs to the 10% of more beneficial ones. Correct recognition and analysis of the clustering results offers an advantage to the e-banking unit of a bank over the competition. Merchant clustering could be subjected to further exploitation and research. Also in this study, the development of a predictive model concerning the bank s revenue relatively to the merchant s transactions is described while experimental results are also supplied. It is concluded that there exists a strong relation between the bank s revenue and the whole merchant s transactions. Future plans employ the development of predictive models using other sources. The use of other clustering algorithms as well as other data mining methods is a promising and challenging issue for future work. References [1] D. Hand, H. Mannila, P. Smyth Principles of Data Mining. The MIT Press, [2] Clementine 7.0 Users s Guide. SPSS, Integral solutions Limited, [3] Clementine Application Template for Customer Relationship Management 6.5. SPSS, Integral solutions Limited, [4] K. Collier, B. Carey, E. Grusy, C. Marjaniemi, and D. Sautter. A Perspective on Data Mining, Northern Arizona University, [5] J. Curry and A. Curry The Customer Marketing Method: How to Implement and Profit from Customer Relationship Management, [6] S.A. Madeira. Comparison of Target Selection Methods in Direct Marketing, MSc Thesis, Technical University of Lisbon, 2002 [7] Retain Customers and reduce risk, White Paper, COMPAQ, [8] P. Bradley and U. Fayyad. Refining Initial Points for K-Means Clustering, Proc. 15th International Conf. on Machine Learning, [9] H. Zha, C. Ding, M. Gu, X. He and H. Simon. Spectral Relaxation for K- means Clustering, Neural Info. Processing Systems, [10] Data-Driven Analysis Tools and Techniques, White Paper, DataPlus Millennium, 2001.

11 Data Mining VI 495 [11] K. Im, and S. Park. A Study on Analyzing Characteristics of Target Customers from Refined Sales Data, APIEMS, [12] Foster, D., and Stine, R. Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy, Center for Financial Institutions Working Papers from Wharton School Center for Financial Institutions, University of Pennsylvania, [13] Zupan, B., Demsar, J., Kattan, M., Ohori, M., Graefen, M., Bohanec, M., and Beck, J.R. Orange and Decisions-at-Hand: Bridging Predictive Data Mining and Decision Support, Workshop Integrating Aspects of Data Mining, Decision Support and Meta-Learning, [14] Raftery, A., Madigan, D., and Hoeting, J. Bayesian Model Averaging for Linear Regression Models, Journal of the American Statistical Association, [15] Laud, P., and Ibrahim, J. Predictive Model selection, Journal of the Royal Statistics Society, [16] Draper, N.R., and Smith, H. "Applied Regression Analysis" John Wiley & Sons, Inc, 1998.

e-col Predictive Model for Electronic Banking Data

In 5th European Conference on Knowledge Management, 2004 e-col Predictive Model for Electronic Banking Data Vasilis Aggelis University of Patras Department of Computer Engineering and Informatics Rio,