Port Customer Credit Risk Prediction Based on Internal and External Information Fusion

Size: px
Start display at page:

Download "Port Customer Credit Risk Prediction Based on Internal and External Information Fusion"

Transcription

1 Send Orders for Reprnts to The Open Cybernetcs & Systemcs Journal, 2015, 9, Open Access Port Customer Credt Rsk Predcton Based on Internal and External Informaton Fuson Guo Y 1,*, Huang Le 1 and Lu Zqang 2 1 School of Economcs and Management, Bejng Jaotong Unversty, Bejng, , P.R. Chna; 2 Electronc Engneerng Insttuton of PLA, Anhu, Hefe, , P.R. Chna Abstract: Concernng the nformaton asymmetry, subjectvty and other problems n the predcton of port customer credt rsk level, a port customer credt rsk evaluaton system was desgned, whch provded the quantfcaton base for port customer credt rsk predcton. External-nternal nformaton fuson model was proposed for port customer credt rsk predcton. Customers nternal hstory nformaton and relevant external dynamc nformaton were used as nformaton sources. The predcton model fuses external and nternal nformaton to predct customers credt rsk level, therefore asssted port enterprses to estmate customers credt rsks level more accurately. Top 100 coal ndustry customers have been selected n experment, and ther operaton data were extracted from Guangzhou Port group operaton and management system. Relevant external nformaton has also been crawled n experments correspondngly. Experment results show that port customer credt rsk predcton model based on external-nternal nformaton fuson has a better predcton accuracy than non-nformaton-fuson model. Keywords: Back propagaton neural network, credt rsk predcton, nformaton fuson, port. 1. INTRODUCTION Port s an mportant hub of natonal comprehensve transportaton system and nternatonal logstcs. Wth the enhancng of the trend of global tradng and the rapd development of Chnese economy, ports servce more customers correspondngly. For mantanng major customers and longterm customers, ports often take the postpad way as convenence. Although ths credt-dependence settlement provdes convenence for busness actvtes, t ncrease the potental rsk n daly operaton. As a result of the lack of honesty and mperfect credt system, customers wll default operaton charges when they are sufferng poor fnancal status. Ths stuaton often nvolves hundreds of thousands Yuan or even more debts, whch serously affects the normal operaton of the port. Owng to the serous nformaton asymmetry between ports and customers, t s hard to fnd potental customers credt rsk from nternal data and nformaton. Therefore, predctng the customer credt rsk more accurate and earler become ports great challenges. The tradtonal method depends on personal judgments wth manager s experence and socal sources. They judge customers management status and then estmate the potental rsks of customers. Moreover, some scholars also explored n port customer credt rsk predcton. Me [1] proposed a port customer credt ratng evaluaton model, whch establshed dynamc customer credt fles by handlng customers fnance nformaton. Chu [2] chose wllngness to pay, payment hstory, fnancal statements, fnancal strength and collateral as credt evaluaton factors to evaluate customers credt status. Wang [3] proposed a port customer classfcaton system based on the actual port operaton and classfed port customers by usng BP neural network, whch made progress n quanttatve research than before. However, n the research of port customer credt rsk predcton, most of the current lterature have focused on buldng credt evaluaton system or rsk control methods selecton, only a few lteratures explored n quanttatve research. Exstng quanttatve researches manly refer to customers rsk predcton methods n bank or nsurance ndustres, whch used customers fnancal nformaton to predct ther credt rsks. But unlke the bankng and nsurance ndustres, ports are at the dsadvantageous poston n the nformaton asymmetry. It s dffcult to ports to obtan comprehensve nformaton of customers. So there has possblty that customers use unlateral nformaton to mslead ports. Once the fnancal trouble occurred, for ther own benefts, malcous nformaton concealng or manpulaton can hardly be avoded. Thus ports predctng results also been msled by false nformaton, whch ncreases the potental operaton rsks. Moreover, port ndustry has ts own characterstcs, other non-fnancal nformaton also has mportant mpacts. Ports can t only take fnancal nformaton to predct customers credt rsks, n order to avod gettng nto another knd of unlateral predcton. Usng external nformaton can reduce asymmetry of customers nformaton for ports, meanwhle, ncrease the accuracy and robustness of predcton of port customer credt rsk. Accordngly, ths paper frstly desgned a port customer credt rsk evaluaton system, whch quanttatvely evaluated X/ Bentham Open

2 1324 The Open Cybernetcs & Systemcs Journal, 2015, Volume 9 Y et al. the level of port customer credt. On ths bass, a multsource external-nternal nformaton fuson model has been proposed, whch predct port customer credt rsk level by fusng nternal busness and operaton data wth external related news artcles and ndustres ndexes. The actual busness and operaton data extracted from Guangzhou Port group operaton and management system, correspondng external news and ndustry ndexes have been crawled and used n experments as data source. 2. PORT CUSTOMER CREDIT EVALUATION SYS- TEM Dfferent from the bank and nsurance ndustres, port ndustry doesn t have a complete and systematc customer credt rsk evaluaton system to evaluate customer s credt rsk level. It s often based on personal subjectve judgment, whch s not conducve to the rsk control. For evaluatng the credt level of port customers, establshng the bass of port customer credt rsk predcton, the port customer credt evaluaton system should be desgned frstly. The establshment of port customer credt evaluaton system should base on actual stuaton of the port ndustry Based on objectve quantfcaton of port customer credt, port can fnd credt rsk customers earler. Thus port can reduce or elmnate the potental rsk and possble loss. In order to ensure that the port customer credt evaluaton system whch can provde the objectve and scentfc bass for the evaluaton, the evaluaton ndcators selecton should follow the prncples below: 1. Scentfc Prncple: Scentfc s the most fundamental prncples of evaluaton system. The port customer credt evaluaton system must follow the general law of port operaton, whch objectvely reflects actual level of port customer credt. 2. Purposeful Prncple: The establshment of evaluaton system should be purposeful. The purposes of ths evaluaton system are evaluatng the level of port customers rsk, fndng potental or exstng credt rsk, thus helpng port to avod operaton rsk and possble loss. 3. Systematc Prncple: All nfluental factors should be taken nto account n an effectve evaluaton system. And all these factors should be ntegrated systematcally. Durng desgnng the port customer credt evaluaton system, constructor needs a comprehensve analyss of both customer and port operaton s characterstcs, then extracts key ndcators that can affect customer credt evaluaton effectvely. 4. Feasblty Prncple: Feasblty prncple guarantees that the evaluaton system can be appled. It should fnd balance between comprehensveness and the objectve condtons n practcal applcaton. Followng ths prncple wll ensure that all the data the evaluaton system needs can be collected and processed. 5. Pertnence Prncple: Pertnence s momentous prncple n constructon of ndustry evaluaton system. There are extensve and sgnfcant dfferences between dfferent ndustry and dfferent enterprses. Excessve generalzed evaluaton system s dffcult to evaluate port customer credt accurately. Therefore, the port customer credt evaluaton system needs select the representatve factor accordng to the characterstcs of port customer and operaton. Based on the above prncples, fnancal stuaton, credt hstory, cooperatve relatonshp and customer characterstcs are selected as frst grade ndcators of port customer credt evaluaton system. These 4 factors can cover the man concern of port customer credt rsk evaluaton. Furthermore, these 4 factors can be dvded nto 13 second grade ndcators. The structure of port customer credt evaluaton system s shown n Fg. (1). Fg. (1). Port customer credt rsk evaluaton system structure. The man purposes of ths evaluaton system s evaluatng the level of port customers credt rsk, so the target layer of port customer credt evaluaton system s customer credt. The Frst grade ndcators nclude fnancal stuaton, credt hstory, cooperatve relatonshp and customer characterstc. Fnancal stuaton manly descrbes customers fnancal ablty, whch contans 3 second grade ndcators: regstered captal, tradng volume and maxmum ndvdual trades. Credt hstory represents past credt stuaton of customer, whch has been dvded nto accounts recevable, recevable rato and overdue days. Cooperatve relatonshp conssts of cooperaton perod, stevedore quantty, purchase frequency and rato of contract executon. Customer characterstc s subdvded nto turnover rate of cargo, man tradng cargo categores and commonly used operaton mode. For combnng both quantty and qualty methods thus helpng decson-makng, weghtng each ndcator s effect on port customer credt evaluaton, an Analyss Herarchy Process (AHP) method has been used. AHP method proposed n 1970s [4], whch s a multple objectve decson method wth combnaton of qualtatve and quanttatve methods. By dvdng the complex problem nto some key factors n dfferent herarches and constructng correspondng judgment matrx, AHP method can calculate the weghts of ndcators [5]. In ths paper, 7 relevant doman experts and

3 Port Customer Credt Rsk Predcton The Open Cybernetcs & Systemcs Journal, 2015, Volume Table 1. Indcators weght of port customer credt rsk evaluaton system. Frst grade Indcator Weght Second Grade Indcator Weght Regstered captal Fnancal stuaton Tradng volume Maxmum ndvdual trades Accounts recevable Credt hstory Recevable rato Overdue days Cooperaton perod Cooperatve relatonshp Stevedore quantty Purchase frequency Rato of contract executon Turnover rate of cargo Customer characterstc Man tradng cargo categores Commonly used operaton mode Fg. (2). Multsource external-nternal nformaton fuson procedure. port managers were nvted to gve ther quanttatve opnon on each ndcator s weght. Fnally, ndcators weght n port customer credt evaluaton system was calculated wth the AHP algorthms, whch s lsted n Table 1. Both desgnng port customer credt evaluaton system and weghtng each ndcator are bases of port customer credt evaluaton. 3. MULTISOURCE EXTERNAL-INTERNAL INFOR- MATION FUSION MODEL 3.1. Multsource External-Internal Informaton Fuson The procedure of multsource external-nternal nformaton fuson s shown n Fg. (2). All nformaton are collected from two man sources: nternal hstory data and external dynamc data. Internal hstory data nclude fnancal data, operaton data, credt hstory data and customer characterstc data stored n management or operaton systems. External dynamc data consst of both relevant ndustres ndexes and news artcles Internal Hstory Data Preprocessng Internal hstory data are manly extracted from port management system and operaton system. In preprocessng, the data of customer s regstered captal, tradng volume, accounts recevable, overdue days, cooperaton perod, stevedore quantty, purchase frequency, contract requrements, cargo balance, tradng cargo categores and operaton mode need to be extracted. Recevable rato, rato of contract executon and turnover rate of cargo are calculated, sorted chronologcally and merged wth former basc data correspondngly External Dynamc Data Preprocessng External dynamc data consst of relevant ndustres ndexes and news artcles. Most relevant ndustres ndexes are

4 1326 The Open Cybernetcs & Systemcs Journal, 2015, Volume 9 Y et al. structured data, whch can be preprocessed as same as nternal data. But news artcles are usually unstructured or sem- structured. It s dffcult to fuse structured data and unstructured data drectly. News artcles should be transformed to structured data n preprocessng, thus structured news can fuse wth structured data. The man steps of News artcles processng are lsted below: 1. Chnese segmentaton. It s a common way that usng vector space model (VSM) n text structurng, and VSM method s based on word or phrase. Unlke Latn has Space as the natural separator, Chnese artcle needs to use segmentaton method to dvde text nto a combnaton of words and phrases. Then the artcle can be converted to a vector form. Besdes, there also has proper nouns n port and relevant ndustres domans that should be consdered. Thus a doman relevant dctonary has been employed for segmentaton. 2. Weghtng. Term Frequency-Inverse Document Frequency (TF-IDF) proposed by Salton n 1973 [6]. Because of ts smple and both hgh accuracy rate and recall rate, t s often used n text weghtng [7]. After the weght calculaton, each news can be transformed nto a vector form. 3. Feature selecton. Too many and dspersed features have no help for further predctons. On the contrary, t wll ncrease computng cost and reduce generalzaton ablty. Feature selecton can effectvely reduce the dmenson of vector of news artcle, thus focusng on sgnfcant factors n predcton model. Selectng top 10% or fxed number of the hghest frequency words and phrases [8, 9] are common methods. In ths paper, top 100 of the hghest frequency words or phrases have been selected as features. 4. Structured news. After feature selecton, the orgnal hgh dmensonal vector has been transformed nto a 100-dmenson-vector by dmenson reducton. Tll then, the news text processng has completed, these news text vectors can fuse wth other structured data Normalzaton Both nternal hstory data and external dynamc data are ready for tranng model and predctng after preprocessng. But consderng the strong effects of ncomparablty of dfferent dmensonal and great dsparty of magntude n tranng and predctng, these data need to be normalzed furthermore. We use lnear algorthm to normalze all data by Equaton (1). We denote the -th feature as f, then f represents the k-th value n -th feature. n(f k k ) ndcates the normalzaton value. The range of n(f k ) after the normalzaton s [0, 1]. fk mn( f ) n(f k ) = (1) max( f ) mn( f ) Algn Internal And External Data Algnng nternal and external data s based on the assumpton that every emergng news artcle means that relevant customers or ndustres have somethng changed. And these changes gan ther nfluences n sequence. So t s necessary to algn nternal and external data before fuson that helpng elmnatng the effects from rrelevant nformaton. Thus both model tranng and predctng are smplfed, and whch helps the model to meet the needs of practcal applcaton Port Customer Credt Rsk Predcton Model Estmaton theory, uncertanty reasonng and ntellgent fuson are basc methods n nformaton fuson. Because of ntellgent fuson has better adaptve ablty and robustness, t has receved extensve concern n both academc crcles and ndustral communty. The BP neural network s often used n ntellgent fuson for ts better performance on nonlnear fttng, parallel processng and stablty. In the tradtonal ntellgent fuson process, structured data can be fused by BP neural network [10, 11]. Due to vectorzaton, news artcles are transformed to structured data n vector form, so these news artcles can fuse wth structured data as the nputs of BP neural network. The port customer credt rsk predcton model s shown n Fg. (3), whch N ndcates the number of neurons. Accordng to the desgn of port customer rsk evaluaton system and external data structure, there are 114 neurons n nput layer, and 1 neurons n output layer n ths model, whch outputs the predcton result. Hdden layer s the key to map nputs to outputs. Hdden layer neurons ft the mappng pattern by constantly adjustng ther own weghts. The number of hdden layer neurons wll affect the result accuracy: less hdden layer neurons can t ft the mappng pattern completely, but more hdden layer neurons can also lead to over fttng. There s no complete approprate method n hdden layer neurons desgn. On the bass of Kolmogorov theorem, any contnuous functon n m f :[0,1] R, y = f( x) can be smulated wth a three layer BP neural networks. It determned that only on hdden layer need to be bult. The number of hdden layer neurons can be calculated by emprcal formula whch Equaton (2) shows. n ndcates the number of hdden layer neurons, n h ndcates the number of nput layer neurons, m ndcates the number of output layer neurons and a s a nteger constant between 1 and 10. We found that there has mnmum error n our experments when a = 8. So the number of hdden layer neurons s n = n + m + a = h h n = n + m + a (2) 4. EXPERIMENTAL RESULTS AND DISCUSSIONS 4.1 Data Sets Both nternal hstory data and external dynamc data serve as the data sets of the experments. 1. Internal hstory data. The nternal hstory data have been extracted from Guangzhou Port group operaton and management system. The nternal data set contans top 100 coal ndustry customers hstory tradng and opera-

5 Port Customer Credt Rsk Predcton The Open Cybernetcs & Systemcs Journal, 2015, Volume Fg. (3). Port customer credt rsk predcton model based on external-nternal nformaton fuson. ton data n Guangzhou Port between 2011 and 2012, whch has 54,315 records and 2,500 mergng records after preprocessng. 2. External dynamc data. The coal ndustry ndexes between 2011 and 2012 have been extracted from Chna Energy Net 1. More than 200 relevant news artcles have been crawled and preprocessed as vectors. There are 23,800 records n total after algnng nternal and external data. In order to evaluate the performance of our system, the whole data set s splt nto June, 2012 and September 2012, respectvely, formng two tranng/testng sets: 1) 18/6 months and 2) 21/3 months. The numbers of records n each set are lsted n Table 2. Table 2. Scale of tranng set/test set n data sets. Scale of Tranng Set Scale of Test Set Data set 1 16,600 7,200 Data set 2 20,400 3, Results and Fndng We measured the accuracy of model by Mean Squared Error wth Equaton (3) n our experments. n 1 2 MSE = w ( y y ) (3) n = (2015/01/10) The MSE results are lsted n Table 3. Compared wth non-nformaton fuson model whch only use nternal data or external data, the results show that nternal-external nformaton fuson model have smaller MSE n all data sets and better fttng performance. It s proved that nternalexternal nformaton fuson model s more accurate n port customer credt rsk predcton. The predcton results of data set 2 have better accuracy than data set 1 correspondngly among all three models, whch ndcate more adequate data s helpful to tran model more fttng thus mprove the predcton accuracy. That means collectng more relevant nformaton s contrbute to predct customers credt rsk level n practcal. Besdes, the experment results also show that the model only used external data has worst predcton accuracy n both data sets. It ndcates that relyng on external nformaton and neglectng nternal nformaton also has one-sdedness cause of the effects of dfferent characterstcs between ndustres. It also proved that port customer credt evaluaton system s essental. 5. CONCLUSION It s nsuffcent n quanttatve research of port customer credt rsk predcton. In ths paper, we have desgned port customer credt evaluaton system as the bass of port customer credt rsk predcton. For gettng rd of subjectvty and one-sdedness n tradtonal manual way, we have proposed a port customer credt rsk predcton model based on external-nternal nformaton fuson. We extracted actual operaton and busness data from Guangzhou Port group operaton and management system, crawled relevant news

6 1328 The Open Cybernetcs & Systemcs Journal, 2015, Volume 9 Y et al. Table 3. Mse result of port customer credt rsk predcton from dfferent models. Informaton Fuson Based Model External Informaton Based Model Internal Informaton Based Model Data set E E E-04 Data set E E E-06 artcles and coal ndustry ndexes n the same perod. Results of experments show that nternal-external nformaton fuson model has better accuracy n port customer credt predcton, whch support port decson-makng for percevng and avodng potental customer credt rsk earler. CONFLICT OF INTEREST The authors confrm that ths artcle content has no conflct of nterest. ACKNOWLEDGEMENTS Ths research s supported by the Natural Scence Foundaton of Chna under Grant Nos REFERENCES [1] Y. Me, Port ncome rsk control method based on accountng management, Fnance and Accountng for Communcatons, vol. 12, pp , [2] S. X. Chu, Preventon and control of fnancal rsk n port enterprses, Chna Ports, vol. 3, pp , [3] R. Wang, Research on Port Freght Operatonal Rsk Control Based on BP Neural Network, M.S. thess, Bejng Jaotong Unversty, Chna, [4] W. Yoram, and T. L. Saaty, "Marketng applcatons of the analytc herarchy process, Management Scences, vol. 26, no. 7, pp , [5] J. Y. Guo, Z. B. Zhang, and Q. Y. Sun, Study and applcatons of analytc herarchy process, Chna Safety Scence Journal, vol. 18, no. 5, pp , [6] G. Salton, and C. T. Yu, On the constructon of effectve vocabulares for nformaton retreval, In: Proceedngs of the 1973 Meetng on Programmng Languages and Informaton Retreval, New York, 1973, pp [7] C. Y. Sh, C. J. Xu, and X. J. Yang, Research of TF-IDF algorthm, Journal of Computer Applcatons, vol. 29, no. B06, pp , [8] R. Feldman, and S. James, The Text Mnng Handbook: Advanced Approaches n Analyzng Unstructured Data, Cambrdge: UK, 2007, pp [9] X. D. L, X. D. Huang, and X. T. Deng, and S. F. Zhu, Enhancng quanttatve ntra-day stock return predcton by ntegratng both market news and stock prces nformaton, Neurocomputng, vol. 142, pp , [10] B. Xe, Z. C. Ca, and W. B. Song, Study on the applcaton of nformaton fuson n fault dagnoss of aero engne, Scence and Technology of West Chna, vol. 11, no. 2, pp , [11] E. F. Nakamura, A. A. F. Lourero, and A. C. Frery, Informaton fuson for wreless sensor networks: methods, models, and classfcatons, ACM Computng Surveys, vol. 39, no. 3, p. 9, Receved: June 10, 2015 Revsed: July 29, 2015 Accepted: August 15, 2015 Y et al.; Lcensee Bentham Open. Ths s an open access artcle lcensed under the terms of the Creatve Commons Attrbuton Non-Commercal Lcense ( lcenses/by-nc/3.0/) whch permts unrestrcted, non-commercal use, dstrbuton and reproducton n any medum, provded the work s properly cted.