Analysis of Preschool Linguistic Education Based on Orienting Problem Algorithm

Size: px
Start display at page:

Download "Analysis of Preschool Linguistic Education Based on Orienting Problem Algorithm"

Transcription

1 Analysis of Peschool Linguistic Education Based on Oienting Poblem Algoithm Yan Zhao Hanjiang Nomal Univesity, Shiyan, Hubei Povince, , China. Abstact - This pape analyses mixed peschool linguistic education (CPBE) based on oienting convegence. The eseach steps ae: i) we put oienting of fusion method system into mixed attibute data oienting poblem, ii) we genealize the oient Ensemble Method, iii) apply the oient ensemble concepts to solve mixed attibute data oienting poblem, iv) we establish the algoithm famewok, v) put fowad the objective function and algoithm, and finally vi) test the effectiveness of the algoithm on actual data. Keywods - Oient poblem-based ; algoithm famewok; peschool linguistic education I. INTRODUCTION Since China's peschool linguistic education secto join in the wold tade oganization (WTO), in the face of fiece maket competition envionment and the influence of foeign advanced technology and ideas, which bing about change and innovation. Unde the cuent changing economy, peschool linguistic education most focuses on all kinds of infomation data collection and analysis, though the sceening of optimal planning fo all kinds of custome infomation, that povide diffeentiated way of financial poducts and sevices [1]. Afte joining WTO, the coe competitiveness of peschool linguistic education has been conveted into competition fo high quality clients, peschool linguistic education industy s demand fo data cente and custome analysis has been fomed. Fo example: Industial peschool linguistic education custome data s sceening is based on the customes basic sevices and cedit isk, classified accoding to the custome epayment amount; Societe Geneale, though custome pofitability data analysis to classify, analyze the taget custome base, maketing team takes coesponding stategy to expand custome base as pe the analysis esult; Othe pees also analyze thei own customes accoding to all kinds of data. Ping An subodinates will be officially joined the cental peschool linguistic maket cedit epoting system. The coe of cedit epoting system is data collection, the geate the data samples, the stonge the egulaity and the moe value it owns. Ping An goup has 74 million customes, accumulating of those pesonal infomation, which plays a key ole fo Ping An innovative financial Intenet [2-3]. Custome segmentation is the pemise of maket segmentation and taget maketing. Accuate maket segmentation and diffeentiation maketing stategy is the difficult poblem that the entepise maketing to be faced with. The classification and oienting of data mining methods can be applied to custome goup. II. DATA ORIENTING AND ORIENTING ENSEMBLE A. Data Oienting Data mining oient is a goup of individuals accoding to similaity to divide into seveal categoies, makes the distance between individuals belonging to the same categoy as small as possible, and the distance between diffeent categoies of individuals as lage as possible. Oienting is often used to develop tageted maketing accoding to dividing custome's behavio chaacteistics into diffeent goups of uses [4-5]. It s not always get available esult that evet the class attibute data into numeical attibute data, due to the value domain of the class attibute is out of ode. Z.Huang poposed k - modes algoithm and k - as algoithm pomote the k-pototypes method, which oient the class attibute and mixed attibute.howeve, by using of that way, which exist the poblem of low level in exactness, stability and high andomness. B. Oienting Ensemble Oient ensemble is a ising new eseach field in the past two yeas. The basic idea is that with seveal independent oients espectively on the oiginal data, and goup those esults, finally get the oienting esults of the oiginal data. The actual data collection has the poblem of iegula shape, noise, lage amount of data and distibuted data. Howeve, by using of multi oients, that can distibute pocessing data, in the meanwhile, the noise and outlies have less effect on the esults, owns good stability, also has a good pefomance on pocessing iegula data and noise. Fo lage data sets, using the appopiate oients, which own a vey good scalability as well.at the same time, the oient ensemble method can deal with the poblem that single oient difficult to get the oienting esult data set. At pesent most of the eseach liteatue has focused on numeical attibutes, only class attibute is discussed, while the most widespead of mixed attibute data set poblem in actual wold has no aticles involved [6-7]. DOI /IJSSST.a ISSN: x online, pint

2 III. CPBE ALGORITHM FRAMEWORK CPBE mixed attibute data set algoithm famewok, including numeical attibutes and class attibute. Taking each class attibute as a oienting output esult, each class attibute values ae oienting label, the pat of class attibute be conveted to the same numbe of class peschool linguistic education esults. Regading numeical attibutes, adopting the k - means foms oienting, using multiple oienting on these numeical attibute. Finally, taking the oienting esult of class attibute and numeical attibute poceeds oienting ensemble, then the final oienting esults ae obtained. Algoithm famewok as shown in figue 1. each numeical attibutes to oienting λj λ1, 2, 3.. m altogethe the numbe of m oienting, though fusion function Γ fuse to λ, gets the final oienting [8]. Fo example, table 1 contain eight items, each data item includes two class attibute and two numeical attibutes. Fo class attibute A1, accoding to the value of each data item, can be obtained esults λ1 = ((1,3,7),(2,4,5,6,8)). Classify the oiginal eight items into two kinds, namely put data item 1, 3, 7 in the same class, item 2, 4, 5, 6, 8 in anothe; Fo A2 class attibutes, can be obtained esult λ2 = ((1,2,5,6),(3,7),(4,8)), Classify the oiginal eight items into thee classes; Numeical attibutes A3, the esults can be obtained λ3 = ((1,2,8),(3,4,5),(6,7)), classify the oiginal eight items into thee classes; Numeical attibutes A4, can be obtained as the esults λ4 = ((1,2,3),(4,5,6,7,8)), classify the oiginal eight items into two classes. In this way, the fou λ1,2,3,4 could be obtained get the final esult by using oienting ensemble [9-10]. TABLE 1. DATA ITEM TABLE INCLUDING TWO CLASS ATTRIBUTE AND TWO NUMERICAL ATTRIBUTES. Data Item Class attibute Numeical attibutes A1 A2 A3 A4 1 A X B X A Y B Z B X B X A Y B Z Fig 1. CPBE Algoithm Famewok. Algoithm desciption: X = {X1, X2 Xn} pesents a data set, Xi = {Xi1, Xi2 Xim }, i = 1, 2,..., n is data items, each data item has m = + p attibutes, which pesents the numbe of class attibutes, p is the numbe of numeical attibutes. Oienting is shown in H, which map data set into a data item (o called the column label). Fo X class attibute shown in Ai, i = 1,...,, with domain Vi. Hi is coesponding oienting of the class attibute Ai, it map Vi to a natual numbe, complete the Ai, division of a data item, obtain esults λi.. The definition is shown as below: i Hi xi, Ai xi, AiVi, xix, i 1,,. Numeical attibutes can be expessed with Aj, j=+ 1,,m. Hj stands fo coesponding oienting of numeical attibutes Aj, it completes mapping fo all the data item in IV. OBJECTIVE FUNCTION AND COMMON ALGORITHMS A. Objective Function The goal of X oienting is to find a patition, incopoate all the data items in the X into the K misaligned goup (k stands fo a natual numbe). Fo n data items, it owns many feasible dividing methods, and the goal is to find a best, its need to define the objective function of the oient patition. Recently, many cuent oienting algoithms adopt the objective function based on distance o density. Hee, accoding to the theoy of Stehl, using maximize shaed infomation as a objective function, Setting two patition of ka and kb espectively. In ka kb owns the numbe of ka and kb oienting espectively. In ka oient Ch, defines the sample numbe fo nh; In kb, oient Ci defines the sample numbe ni. ni h is sample numbe both in ka h class and kb l class.the nomalized mutual infomation ae: DOI /IJSSST.a ISSN: x online, pint

3 ka kb h 2 h nn i a, b ni logkakb n i1 h1 nhni (1) Define as Vecto set of k p, and a patition k d s aveage nomalized mutual infomation (A) ae:, ANML p,, 1 p i i 1 j1 whee, i = 1,.. is coesponding patition of class attibute;, j = + 1,..., + p ae coesponding division fo numeical attibutes. Coefficient D is used to adjust the class attibutes influence in oient, the poblem will be discussed in anothe theoy. (1) is the objective function. Set Kopt fo all patition, make the type (1) up to maximum patition, opt ag max i1, p j 1 i j (2), (3) j V. ANALYSIS OF PRESCHOOL LINGUISTIC MARKET CUSTOMER RELATIONSHIP Peschool linguistic maket cedit cad appoval data in custome elationship management (CRM) include six numeical attibutes and nine class attibutes, a total of 690 items. The 15 attibutes ae: custome name, custome numbe, id numbe, gende, age, domicile of oigin, employment status, income, industy, vocation, education, natue of the unit, the custome egistation, custome classification, accumulated points. Poduct table mainly includes poduct type, name, details, custome pofitability and copoate eanings and othe infomation. Custome poduct associated data tables, mainly associate customes with its puchasing agent financial poducts, mainly include the following attibutes: month, custome numbe, custome name, type, poduct name, monthly balance, the aveage monthly balance. Remove item 24 missing data, ultimately adopting 666 data items. The compaison of two kinds of oienting algoithms accuacy unde diffeent numbe of oienting as shown in figue 2.oienting esults numbe fom 2 to 8, espectively compae algoithm accuacy fo each of the fixed oienting numbe. The aveage accuacy of k-pototypes is 0.77 k, CPBE aveage algoithm accuacy is whee kd is the patition of all possible options. B. Common Algoithms and thei Complexity Stehl put fowad CSPA, HGPA and MCLA thee algoithms to solve fusion poblem. The algoithm complexity of CSPA is O (n2k), HGPA algoithm complexity is O (NKR), MCLA algoithm complexity is O (nk22), while n is sample numbe of data set, k is oient numbe, is algoithm unning times. Unde the condition of knn, HGPA and MCLA s algoithm complexity and the data set numbe have linea elationship, so the two algoithms have stong ability to deal with lage data sets[11]. CPBE algoithm as we poposed, class attibute pat adopts HGPA algoithm, numeical attibutes pat adopts the most commonly used k - means algoithm, the algoithm complexity is O (NKR).In this way, CPBE algoithm complexity and the numbe of data items n inceased linealy. In addition, as the oient fusion algoithm can be distibuted computing, CPBE algoithm based on oienting ensemble can also be distibuted computing. Theefoe, CPBE algoithm fo massive data set is pactical and effective [12-13]. Fig 2. Compaison of the Two Algoithms Cedit cad authoization data put emphasis on the compaison of two kinds of algoithm accuacy, in ode to veify the oienting pefomance. In the same set of data, fixed oienting numbe is 2. Respectively Random unning two algoithms eight times, the compaison of two kinds of algoithm aveage accuacy ae shown in table 2. TABLE 2. COMPARISON OF THE TWO KINDS OF ALGORITHMS AVERAGE ACCURACY Algoithm Aveage Accuacy Fist Second Thid Fouth Fifth Sixth K-Pototypes CPBE CPBE oienting algoithm accuacy is elatively high. In eight times tials, CPBE algoithm accuacy changed little, the diffeence between the highest and lowest is 0.04.While DOI /IJSSST.a ISSN: x online, pint

4 k-pototypes algoithm change a lot, the highest and lowest diffeence is 0.15.The eason fo this is that k-pototypes algoithm is geat affected by initial pototype selection. Because of multiple oienting ensembles, the CPBE algoithm oienting pefomance is elatively high and stable [14-16]. Custome segmentation is the pemise of maket segmentation and taget maketing. Accuate maket segmentation and diffeentiation maketing stategy is the difficult poblems that entepise maketing to be faced with. The classification and oienting of data mining methods can be applied to custome goup. This poject adopts the oienting method fo custome segmentation, which povides a complete solution fo postal financial maketing custome segmentation, and to veify its possibility with the actual case. Hee we conside customes oienting as pe to the selected oient attibutes. oienting algoithm ealization pocess is as follows: 1) the selection of K value (this pape values: 12); 2) selecting initial centoid, as a heat of oients; 3) ead each ecod, calculate the distance between the second ecod to oient heat, and attibute its distance to the neaest to oient heat,, and then update the oient heat. Then epeat calculation the last thid ecod; 4) and then epeat step 2 until the oient heat is no longe change Oienting Results: Analyze custome classification afte oienting, mege the oient with simila popeties, eventually mege into 7 Oients. Fo Oient 1 custome, maketing staff could focus on insuance, wealth management poduct sales, this pat belongs to the high-end customes, and age is about 54, who pay attention to thei own eimbusement and have cetain financial management consciousness, is the main puchasing powe fo fund finance poduct cuently. Oient 2, elde custome, compae to othe elde custome, this pat belongs to high-end old customes. Thei own moe popety elatively, if the maketing pesonnel have intention to guide them to buy financial poducts, insuance by bonus shae is the most appopiate. Oient 3 belongs to the ationally high-end customes, geneal depositos, can lead to buy some of the finance and insuance, but the difficulty is elatively high. Oient 4 belong to the main puchasing powe of the netwok of insuance poducts, but fom the data analysis, buying insuance has a cetain andomness, namely the maketing pesonnel success due to luck, and fom the view of age stuctue, elatively belong to young goup, ae mainly distibuted in 40 to 44 yeas old. Oient 5 is the pat of old people beyond 50 yeas old, they intend to savings, but you can ecommend the appopiate insuance. Oient 6 belongs to people who own assets unde 2 w, focused on savings. It could guide to do financial management, such as fixed investment in funds Oients 7 is losing customes. Fo above seven kinds of custome, we obtain statistics fo the custome distibution, 88.3% customes ae mainly distibuted in oients of 3, 4, 5, 6. With fewe high-end (oient 1, 2), fo daily maketing outlets, suggest focusing on oient 3,4,5,6 intoduce specific maketing stategy. In the notice of high-end customes, put emphasis on tageted maketing fo the majo custome in netwok. Conduct initial analysis on custome data, then poceeds data peteatment on agent financial poducts, both take custome s puchasing diffeence of poduct and custome pofit contibution fo banch lines into consideation, classify financial customes, poviding customes with diffeential financial poducts and sevices. Fo futhe application of big data, match the UnionPay cad data with the custome, know custome daily consumption behavio, and excavate potential big custome. Matching the custome info to the netwok, distibuting the data, the netwok maketing team takes tageted stategy to expand the custome on the basis of data analysis. VI. SUMMARY AND CONCLUSIONS In the eal wold, especially business, data owns a lage amount of mixed attibute vaiations. This pape intoduced oienting ensemble method to mixed peschool linguistic education poblems, put fowad the mixed peschool linguistic education algoithm famewok, detemined the objective function based on oienting ensemble, and finally veified the pocess fo the test data and custome elationship management (CRM) data. The expeiment poved that CPBE algoithms have moe impoved esults than k-pototypes in tems of accuacy and stability. Futhe eseach diections include: i) include fuzzy theoy to futhe impove the algoithm fo data pocessing ability and accuacy; ii) paallel algoithm study to exploe the advantages of distibuted computing. ACKNOWLEDGMENT This eseach is suppoted by the Education science Fund Poject of Hubei Education Depatment (2014B554), the Univesity Key Poject of Hanjiang Nomal Univesity ( ). REFERENCES [1] HAN Jiawei, Kambe M. Data Mining Concepts and Techniques [M]. San Fancisco: Mogan Kaufmann Publishes, [2] Mac Queen J. Some Methods fo Classification and Analysis of Multivaiate Obsevations [C]Poc 5th Bekeley Symp on Math DOI /IJSSST.a ISSN: x online, pint

5 Statist. Bekeley: ACM Pess, 2003: [3] HUANG Zhexue. Extensions to the K-Means Algoithm fo Oienting Lage Data Sets with Categoical Values [J]. Data Mining and Knowledge Discovey, 1998, (2): [4] CHEN Ning, CHEN An, ZHOU Longxiang. Fuzzy K-Pototypes Algoithm fo Oienting Mixed Numeic and Categoical Valued Data [J]. Jounal of Softwae, 2001,12(8): [5] Stehl A, Ghosh J. Oient Ensembles A Knowledge Reuse Famewok fo Combining Patitions [J]. Jounal on Machine Leaning Reseach, 2002, (3): [6] Topchy A, Jain A, Punch W. A Mixtue Model fo Oienting Ensembles [C]Poc SIAM Data Mining. Floida: SIAM Pess, [7] YANG Linyun, WANG Wenyuan. Oienting Ensemble Appoaches: An oveiew [J]. Application Reseach of Computes, 2005, (12): [8] HE Zengyou, XU Xiaofei, DENG Sheng chun. A Oient Ensemble Method fo Oienting Categoical Data [J]. Infomation Fusion, 2005, (6): [9] Baba a D, Chen P. Using the Factal Dimension to Oient Datasets [A]. Poc.int l conf..on Knowledge Discovey and Data Mining [C]. 2000:260~264 [10] Abahao B,Babaa D,Almeida V,Ribeio F. Factal Chaacteization of Web Wokloads [Z].WWW2002 Confeence, Web-Engineeing Tack,Honolulu, Hawaii,2002. [11] Bay R L,Kinsne W. Multifactal Chaacteization fo Classification of Netwok Taffic [A]. Electical and Compute Engineeing,2004. Canadian Confeence on, Volume 3[C]. 2004:1453~1457. [12] Bouchaud J P,Pottes M. Theoy of Financial Risks [M]. Cambidge Univesity Pess, [13] Rajiv Kohli. Managing Custome Relationships Though E- Business Decision Suppot Application, A Case of Hospital- Physician Collaboation [J]. Decision Suppot Systems, 2001,32 (3): [14] Jiawei Han, Micheline Kambe. Concept of Data Mining and Technology (photocopy edition) [M]. Beijing: Highe Education Pess, [15] BVLGAR ; Custome Relationship Management (CRM) Solutions [M]. China Economic Publishing House, [16] Gang Ma, hong-xin li, xing-kai Yang. Custome Relationship Management (CRM)[M]. CITIC Publishing House,2005. DOI /IJSSST.a ISSN: x online, pint