Analysis Online Shopping Behavior of Consumer Using Decision Tree Leiyue Yao 1, a, Jianying Xiong 2,b

Size: px
Start display at page:

Download "Analysis Online Shopping Behavior of Consumer Using Decision Tree Leiyue Yao 1, a, Jianying Xiong 2,b"

Transcription

1 Advanced Materals Research Onlne: ISSN: , Vols , pp do: / Trans Tech Publcatons, Swtzerland Analyss Onlne Shoppng Behavor of Consumer Usng Decson Tree Leyue Yao 1, a, Janyng Xong 2,b 1 Computer scence depatment, Computer scence depatment, Chna 1 Computer scence depatment, Computer scence depatment, Chna a specal8212@sohu.com, b ylyyly2001@163.com Keywords: tradng predct; decson tree; C5.0; Abstract. Tradng falure s the man reason for a dspute of C2C e-commerce. So predct the behavor of transactons can assst buyers and sellers negotated transactons, helps to reduce transacton dsputes. Separate the success and falure purchase record, then establsh decson-makng model through the C5.0 decson tree and RFM(Recency, Frequency, Monetary) model on consumer purchase behavor data, quantfy the mportance of the decson varables, the demonstraton experment shows the predcton accuracy s more than 80%. Introducton "2010 Annual Report of the Chnese e-commerce market data montorng "shows [1], transacton of e-commerce market n 2010 n Chna has reached 4.5 trllon Yuan. As the representatve for the onlne retal market, B2C and C2C have occuped bllon Yuan, an ncrease of 97.3%, and accountng for the total amount of retal sales throughout the year by 3%. The report predcts that onlne retal market deal sze s expected to exceed 1 trllon Yuan n the next two years, and wll account for more than 5% of the whole retal sales n one year. "2010 Annual Report of e-commerce complants" [2] shows those complants of e-commerce ncludng three categores of complants: onlne shoppng, onlne payment, onlne bookng. Especally complants about onlne shoppng s accountng for more than 70% of the total amount of e-commerce complants. Falure of tradng onlne s the man causes of dsputes, C2C onlne tradng envronment s rely manly on the trust of partes to regulate transactons n such a specal e-commerce platform. [3], the seller's reputaton value s the man bass for selecton of tradng Buyer, but due to mperfectons n the trust system, f ntegrated both sellers, buyers and the transacton nformaton can ncrease the success rate of predcton, whch can help buyers and sellers how to choose, and wll help to reduce transacton dsputes [4]. Methods Study objectves. The research s objected to predct success or falure of customer transactons helps to reduce the dsputes of C2C e-commerce transacton. Frstly need to buld a dataset of "success" and "falure" purchase records from hstorcal data. Then construct a classfcaton model to class two classes, and predct the success of customer transactons accordng to the classfcaton model (classfcaton rules). Fg. 1 the process of data dsposes All rghts reserved. No part of contents of ths paper may be reproduced or transmtted n any form or by any means wthout the wrtten permsson of Trans Tech Publcatons, (ID: , Pennsylvana State Unversty, Unversty Park, USA-06/03/16,19:26:57)

2 892 Advanced Materals and Informaton Technology Processng Analyss process. Includng data preparaton, data preprocessng, data mnng, output rules. Related Concepts. Introduce the concepts used n the study. Reputaton. Members of tradng platform should gve score (-1,0,1) to each other after the completon of each transacton. The accumulatve evaluaton scores referred to as reputaton of members [5]. RB n = f = 1 n = f = 1 f ={-1,0,1} (1) RS f ={-1,0,1} (2) f RB s reputaton of buyer, and RS s reputaton of seller. means the feedback of seller to f buyer, and means the feedback of buyer to seller. Transacton densty. Some sellers wll colluson n order to mprove ther own reputaton. The feedback s manly concentrated n a small secton of buyers. So use the transacton densty to measure. S stand for the number of feedback after transactons, and NB means the number of buyers connected to the seller. S V = (3) NB Decson tree model. Classfcaton s done based on the numercal calculaton of some varables and then classfed accordng to the results. The fnal result of the calculaton s classfed nto several dscrete values. Study ther characterstcs use the classfed data, and then make predctons for other data under the classfcaton of these features[6].classfcaton usually usng multple technques, and then compare the results or n combnaton, to obtan the best overall classfcaton, ths study used the decson tree.c5.0 s a decson tree algorthm mprove by Qunlan based on C4.5[7]. t ncludes all the features of C4.5, and also ntroduce many new technologes. The most mportant s Boostng whch gve a weght to the sample, the hgher the weght of the sample for the study generated the greater the mpact of the decson tree. All the samples have the same weght n the ntal state, each tranng sample weghts should be adjusted n the next course of, so that the error classfcaton of a sample by the decson tree wll obtan hgher weght value at the next teraton. RFM model. Accordng to Arthur Hughes Database Marketng research n Amercan Insttute, there are three magc elements consttute the best ndcator of customer databases, RFM ased on customer behavor to dstngush between ndcators of customer value model, R refers to the Recency; F refers to the Frequency; M refers to the Monetary[8]. RFM model can show the dynamc formaton of a customer's entre profle, whch s personalzed to provde a bass for communcaton and servces. By mprovng the status of the three ndexes, t wll provde more support for marketng decsons. CRM generally focused on the analyss of the customer contrbuton, RFM emphaszes the behavor of the customer to dstngush between customers. Experment Data Collectng. (1) Data source. The data s part of the women's category sales transacton from one Taobao platform.(2) Successful and falure transacton data successful purchase records and 8903 falure records through sql language operaton and data flter. (3) Data samplng: t s need to calculate trade-densty to avod the mpact of trade conspracy. The densty statstcs for the transacton as table 1. Table 1 Transacton densty statstcs Amount of buyer Mean value mn max SD

3 Advanced Materals Research Vols There are 1902 buyers, wth 3561 falure data and 9681 successful data f flter the trade-densty above Data Pretreatment.(1) buyers and sellers statstcal varables.the reputaton of the seller and buyer, buyers n recent degree R, frequency F, the value of the degree M ndcators, tradng densty, order number, transacton partes.(2) transactons statstcal varables.buyer ID, Seller ID, purchase prce, Paypal payments, purchase tme, commodty groups, purchase amount, the transacton s successful. (3)data combned.wll (1) and (2) data by the buyers ID, seller ID to connect to merge. (4) data sub-boxes.because the credblty of the value of the data, RFM, and prces are contnuous values, and classfcaton model needs dscrete data, the experment dvded fve sub-boxes usng 5-dgt number. (5) Sample balance. Proporton of the two samples s qute dfferent, so 100% fal transactons and 40% successful transactons as the fnal sample. (6) Sample segmentaton. Dvde the sample nto 2 classes, whch s tranng set, and test set. Table 2 Proporton of tranng the test set value Proporton number tranng set 61.87% 2830 test set 38.87% 4451 Result. Comparson of dfferent algorthm. Use four knds of decson tree to compare the accuracy of the classfcaton model, compared accuracy wth the C5.0, results show that C5.0 decson algorthm classfcaton was sgnfcantly hgher than the other. Table 3 Comparson of the decson tree algorthm Algorthm accuracy error C&T % QUEST % % CHAID % % C % 13.35% Use C5.0 model n Clementne 12.0, node can realze C5.0 decson tree algorthm. The varable "transacton state" s the target varable, classfy customer transactons. Classfcaton results obtaned n Table 4. Table 4 Classfy performance of the two data set partton Tranng set Test set correct % % errpr % % sum Test data obtaned on the overlap matrx (lne represents the actual value) Table 5 Overlap matrx of the test data Test set F T F T Decson rules. The mportance of varables to predct the success of transacton as table 6.the buyer reputaton > Monetary> seller's reputaton> Commodty Prce> RFM ndex score> Renceny > product number> Frequency. Table 6 the mportance of varable Varable Buyer reputato Monetary Seller reputaton Prce RFM score Rencency product number Importance Buyer s reputaton reflect the accumulated score n the hstory of trade, a good reputaton mplcate that he can communcate well wth seller, so t s most mportant n transacton behavor predcton. Monetary s total value of spent on shoppng onlne, s the most mportant ndcator of RFM, whch reflect customer loyalty, and s also mportant n the transacton behavor predcton. The decson tree generated s as fgure 2.

4 894 Advanced Materals and Informaton Technology Processng Concluson Fg. 2 Decson tree The study s establshng the predcaton model of customer purchase behavor based on the decson tree and RFM. The experments show that the accuracy of C5.0 algorthm was 86.65%, s sgnfcantly hgher than other decson tree algorthms. By parttonng the sample, the predcaton accuracy of test data s over 80% to valdate the model, and calculated the mportance of the decson varables. Combned wth e-commerce reputaton mechansm s more conducve to reducng transacton rsk, and mprove transacton success rate, thus reducng network Purchase transacton dsputes References [1] "2010 Annual Report of the Chnese e-commerce market data montorng [2] "2010 Annual Report of e-commerce complants" [3] Zachara,G.,Maes,P. Trust management through reputaton mechansms [J].Appled Artfcal Intellgence, 2000, 14(9),p881~907. [4] B.G. Slverman, Implcatons of Buyer decson theory for desgn of E-Commerce Web Stes, Human-computer studes, 2001,55(5),p [5] Sabater J, Serra C. Revew on computatonal trust reputaton models [J]. Artfcal Intellgence Revew, 2006, 24(1).p [6] Chenghu, Z. Xn, Y., & Hu, Y. Clent transcaton behavor pattern recognton based on clusterng method. Computer Engneerng and Applcatons (10),p [7] Pang Suln, Gong Jzhang. C5.0 classfcaton algorhm and ts applcaton on ndvdual credt score for banks. System Engneerng Theory &Practce,2009,29(12).p [8] Peter S. Fader, Bruce G.S. Harde, & Ka Lok Lee. RFM and CLV:Usng Iso-Value Curves for Customer Base Analyss, Journal of Maketng Research,2005,11,

5 Advanced Materals and Informaton Technology Processng / Analyss Onlne Shoppng Behavor of Consumer Usng Decson Tree /