Report for PAKDD 2007 Data Mining Competition

Size: px
Start display at page:

Download "Report for PAKDD 2007 Data Mining Competition"

Transcription

1 Report for PAKDD 2007 Data Mining Competition Li Guoliang School of Computing, National University of Singapore April, 2007 Abstract The task in PAKDD 2007 data mining competition is a cross-selling business problem from consumer finance industry. The objective is to predict the propensity of the credit card customers to apply for a home loan. The training data contains cases, 40 features, and 1 class label variables. After data pre-processing, we identified significant features with information-gain-based feature selection method, such as the number of Bureau enquiries for mortgages and loans, age, current residence months, and etc. We selected the top 14 features and applied logistic regression method to the training data. Compared with data mining methods, the results showed that the logistic regression method achieves the best AUC score from 10 fold cross validation. With the mentioned information, we built our final classifier with the whole training data. The AUC score is about We expect the similar performance on the withheld data. 1 Introduction PAKDD 2007 Data Mining Competition [1] is one of the events in the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2007). The background of the competition is that a financial institution has two groups of customers: one with credit cards and another with home loans. The institution would like to identify which customers are likely to apply for a home loan in 12 months after they apply for a credit card. The data for this task is the available information when the customers apply for a credit card. The main task in the competition is to predict the propensity of the credit card customers, who do not have a home loan, to apply for a home loan. 2 Data summary There are cases in the provided data, in which have the known class labels and 8000 hold out the class labels as the hidden test cases. Among the cases with known class labels, 700 customers have applied a home loan in 12 months after the credit card application. We called these 700 cases as positive data in this report. The information is summarized in Table 1. In the data set, there are forty-one features, excluding the customer IDs which cannot be used in the prediction and model building. The target variable, or the class label, indicates whether the customers have applied home loans. The other features are demographic data, credit information, and other related data. One feature, B_DEF_UNPD_L12M, only has (known) value 0. This feature is removed from the data before further analysis.

2 In the remaining data, including those for training and withhold, there are missing values (excluding the withheld class labels). The number of features with missing values is 9 for the entire data. The names of the features with missing values are listed in Table. The feature DISP_INCOME_CODE is the one with the most missing values it has missing values (about 71%). known Table 1 Summary of the available data Class label Number of cases Positive 700 negative Held-out 8000 Total Table 2 Names of features with missing values CHQ_ACCT_IND AMEX_CARD DINERS_CARD VISA_CARD MASTERCARD RETAIL_CARDS DISP_INCOME_CODE CUSTOMER_SEGMENT CREDIT_CARD_TYPE ANNUAL_INCOME_RANGE 3 Experiment In this work, we tried to build models to predict the class labels in the withheld data set. A data mining package Weka [4] is used as the main tool for data preprocessing, feature selection, model building and prediction. For the purpose to verify which data mining method is better for this application, the training data is randomly split into 10 fold for cross-validation. 3.1 Invalid values of nominal features In the data, some nominal features have invalid values. For example, feature MASTERCARD has invalid values of 1 and 2. For these invalid values, we replaced them with the mode of the corresponding features. 3.2 Discretization To apply certain data mining methods, we need to discretize the numeric data into nominal data. The discretization method we used is the method proposed by Fayyad and Irani [2]. This method selects a cutting point in the current data set with maximum information gain and split the data set into two subsets. Then the method recursively applies to each subset, until the Minimum-Description-Length-based stop criterion is reached.

3 3.3 Missing value processing Some data mining methods can deal with missing values, while others methods cannot. In this work, the missing values are replaced with the mean for the numeric features and the mode for the nominal features. 3.4 Feature selection In our experiment, we have tried different feature selection methods: chi-square, information gain, and gain ratio. These three feature selection methods measure the strengths of each feature for classification with the criteria in their names, and rank the features based on the measures. The top 14 features from these four methods are listed in Table. Table 3 Top 14 selected features Order Information Gain 1 B_ENQ_L6M_GR3 2 B_ENQ_L12M_GR3 3 CURR_RES_MTHS 4 AGE_AT_APPLICATION 5 PREV_RES_MTHS 6 B_ENQ_L3M 7 B_ENQ_L12M_GR2 8 B_ENQ_L6M 9 B_ENQ_L1M 10 RENT_BUY_CODE 11 MARITAL_STATUS 12 ANNUAL_INCOME_RANGE 13 A_DISTRICT_APPLICANT 14 SAV_ACCT_IND 3.5 Data mining methods applied In our experiment, we tried different data mining methods. After comparison, we selected logistic regression as our final method. Logistic regression [3] is a non-linear transformation of the linear regression, which applies to the case when the dependent variables are discrete. 3.6 Summary of experiment The data preprocessing and model building process is summarized in Figure 1. The main evaluation measures we used are ROC and AUC (ROC refers to receiver operating characteristic and AUC refers to the Area under the curve). Our results show that logistic regression gives the best AUC score. Figure 2 shows the ROC curve from the training data.

4 Value merging discretization Missing value replacement Feature selection Model building Figure 1 The procedure to process the data Figure 1 ROC curve from logistic regression model 4 Discussion After analyzing the data, we have the following insights from PAKDD 2007 competition data: 1. The most significant features for this cross-selling problem are: B_ENQ_L6M_GR3, B_ENQ_L12M_GR3, and other Bureau enquiry information. These features show the customers interests in mortgages. The customers may

5 have contacted with different financial institutions, and the contacted financial institutions did more enquiries from the Bureau. 2. The residence history is an important indicator for a home loan. If the customers rent a flat, it is more probable for them to apply for a home loan. If they have stay in a long time, it is probable for them to switch to a new home. 3. The age and marital status affect the home load application. These are the main factors to indicate whether the customers need a flat. 4. The income level is another main factor for a home loan. It indicates whether the customers can afford a home loan. Reference: [1] PAKDD 2007 Data Mining Competition, in: (2007). [2] U.M. Fayyad, K.B. Irani, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, in: Proceedings of 13th International Joint Conference on Artificial Intelligence (IJCAI) (1993) [3] D.W. Hosmer, S. Lemeshow, Applied logistic regression (John Wiley & Sons, New York, 2000). [4] I.H. Witten, E. Frank, Data mining: practical machine learning tools and techniques with Java implementations (Morgan Kaufmann, San Francisco, 1999).

Ranking Potential Customers based on GroupEnsemble method

Ranking Potential Customers based on GroupEnsemble method Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,

More information

DATA MINING: A BRIEF INTRODUCTION

DATA MINING: A BRIEF INTRODUCTION DATA MINING: A BRIEF INTRODUCTION Matthew N. O. Sadiku, Adebowale E. Shadare Sarhan M. Musa Roy G. Perry College of Engineering, Prairie View A&M University Prairie View, USA Abstract Data mining may be

More information

Predicting Customer Loyalty Using Data Mining Techniques

Predicting Customer Loyalty Using Data Mining Techniques Predicting Customer Loyalty Using Data Mining Techniques Simret Solomon University of South Africa (UNISA), Addis Ababa, Ethiopia simrets2002@yahoo.com Tibebe Beshah School of Information Science, Addis

More information

Advanced Analytics through the credit cycle

Advanced Analytics through the credit cycle Advanced Analytics through the credit cycle Alejandro Correa B. Andrés Gonzalez M. Introduction PRE ORIGINATION Credit Cycle POST ORIGINATION ORIGINATION Pre-Origination Propensity Models What is it?

More information

ScienceDirect. An Efficient CRM-Data Mining Framework for the Prediction of Customer Behaviour

ScienceDirect. An Efficient CRM-Data Mining Framework for the Prediction of Customer Behaviour Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 725 731 International Conference on Information and Communication Technologies (ICICT 2014) An Efficient CRM-Data

More information

RESULT AND DISCUSSION

RESULT AND DISCUSSION 4 Figure 3 shows ROC curve. It plots the probability of false positive (1-specificity) against true positive (sensitivity). The area under the ROC curve (AUR), which ranges from to 1, provides measure

More information

A Churn Analysis Using Data Mining Techniques: Case of Electricity Distribution Company

A Churn Analysis Using Data Mining Techniques: Case of Electricity Distribution Company , October 25-27, 2017, San Francisco, USA A Churn Analysis Using Data Mining Techniques: Case of Electricity Distribution Company Jiri Pribil, Member, IAENG, Michaela Polejova Abstract This paper focuses

More information

A Comparative Study of Filter-based Feature Ranking Techniques

A Comparative Study of Filter-based Feature Ranking Techniques Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,

More information

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC

Advanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

Improving Credit Card Fraud Detection using a Meta- Classification Strategy

Improving Credit Card Fraud Detection using a Meta- Classification Strategy Improving Credit Card Fraud Detection using a Meta- Classification Strategy Joseph Pun, Yuri Lawryshyn Department of Applied Chemistry and Engineering, University of Toronto Toronto ABSTRACT One of the

More information

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values Australian Journal of Basic and Applied Sciences, 6(7): 312-317, 2012 ISSN 1991-8178 A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values 1 K. Suresh

More information

Validating a Bankruptcy Prediction by Using Naïve Bayesian Network Model: A case from Malaysian Firms

Validating a Bankruptcy Prediction by Using Naïve Bayesian Network Model: A case from Malaysian Firms 2012 International Conference on Economics, Business Innovation IPEDR vol.38 (2012) (2012) IACSIT Press, Singapore Validating a Bankruptcy Prediction by Using Naïve Bayesian Network Model: A case from

More information

Session 15 Business Intelligence: Data Mining and Data Warehousing

Session 15 Business Intelligence: Data Mining and Data Warehousing 15.561 Information Technology Essentials Session 15 Business Intelligence: Data Mining and Data Warehousing Copyright 2005 Chris Dellarocas and Thomas Malone Adapted from Chris Dellarocas, U. Md. Outline

More information

arxiv: v1 [cs.lg] 13 Oct 2016

arxiv: v1 [cs.lg] 13 Oct 2016 Bank Card Usage Prediction Exploiting Geolocation Information Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, and Lars Schmidt-Thieme arxiv:1610.03996v1 [cs.lg] 13 Oct 2016 Information Systems and

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic

More information

CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS

CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS Darie MOLDOVAN, PhD * Mircea RUSU, PhD student ** Abstract The objective of this paper is to demonstrate the utility

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

Application of Data Mining Techniques for Crop Productivity Prediction

Application of Data Mining Techniques for Crop Productivity Prediction Application of Data Mining Techniques for Crop Productivity Prediction Zekarias Diriba zekariaskifle@gmail.com Berhanu Borena PhD Candidate, Addis Ababa University, Ethiopia berhanuborena@gmail.com Abstract

More information

GETTING STARTED WITH PROC LOGISTIC

GETTING STARTED WITH PROC LOGISTIC PAPER 255-25 GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. USA Introduction Logistic Regression is an increasingly popular analytic tool. Used to predict the probability

More information

Improving a Credit Scoring Model by Incorporating Bank Statement Derived Features

Improving a Credit Scoring Model by Incorporating Bank Statement Derived Features Improving a Credit Scoring Model by Incorporating Bank Statement Derived Features Rory P. Bunker 1, M. Asif Naeem 2, Wenjun Zhang 3 Auckland University of Technology 55 Wellesley St E, Auckland 1010, New

More information

KNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA

KNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA DOI: 10.21917/ijsc.2011.0022 KNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA N. Sivaram 1 and K. Ramar 2 1 Department of Computer Science

More information

Predicting Customer Purchase to Improve Bank Marketing Effectiveness

Predicting Customer Purchase to Improve Bank Marketing Effectiveness Business Analytics Using Data Mining (2017 Fall).Fianl Report Predicting Customer Purchase to Improve Bank Marketing Effectiveness Group 6 Sandy Wu Andy Hsu Wei-Zhu Chen Samantha Chien Instructor:Galit

More information

Bank Card Usage Prediction Exploiting Geolocation Information

Bank Card Usage Prediction Exploiting Geolocation Information Bank Card Usage Prediction Exploiting Geolocation Information Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab University of Hildesheim

More information

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat Startup Machine Learning: Bootstrapping a fraud detection system Michael Manapat Stripe @mlmanapat About me: Engineering Manager of the Machine Learning Products Team at Stripe About Stripe: Payments infrastructure

More information

CREDIT RISK MODELLING Using SAS

CREDIT RISK MODELLING Using SAS Basic Modelling Concepts Advance Credit Risk Model Development Scorecard Model Development Credit Risk Regulatory Guidelines 70 HOURS Practical Learning Live Online Classroom Weekends DexLab Certified

More information

Application of Decision Trees in Mining High-Value Credit Card Customers

Application of Decision Trees in Mining High-Value Credit Card Customers Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,

More information

Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features

Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features 1 Zahra Mahoor, 2 Mohammad Saraee, 3 Mohammad Davarpanah Jazi 1,2,3 Department of Electrical and Computer Engineering,

More information

Data Mining in CRM THE CRM STRATEGY

Data Mining in CRM THE CRM STRATEGY CHAPTER ONE Data Mining in CRM THE CRM STRATEGY Customers are the most important asset of an organization. There cannot be any business prospects without satisfied customers who remain loyal and develop

More information

THE APPLICATION OF INFORMATION ENTROPY THEORY BASED DATA CLASSIFICATION ALGORITHM IN THE SELECTION OF TALENTS IN HOTELS

THE APPLICATION OF INFORMATION ENTROPY THEORY BASED DATA CLASSIFICATION ALGORITHM IN THE SELECTION OF TALENTS IN HOTELS ITALIAN JOURNAL OF PURE AND APPLIED MATHEMATICS N. 38 2017 (253 260) 253 THE APPLICATION OF INFORMATION ENTROPY THEORY BASED DATA CLASSIFICATION ALGORITHM IN THE SELECTION OF TALENTS IN HOTELS A. Youyu

More information

Checking and Analysing Customers Buying Behavior with Clustering Algorithm

Checking and Analysing Customers Buying Behavior with Clustering Algorithm Pal. Jour. V.16, I.3, No.2 2017, 486-492 Copyright 2017 by Palma Journal, All Rights Reserved Available online at: http://palmajournal.org/ Checking and Analysing Customers Buying Behavior with Clustering

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites Web Customer Modeling for Automated Session Prioritization on High Traffic Sites Nicolas Poggi 1, Toni Moreno 2,3, Josep Lluis Berral 1, Ricard Gavaldà 4, and Jordi Torres 1,2 1 Computer Architecture Department,

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011

TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011 TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011 1 Develop Models for Customers Likely to Churn Churn is a term used to indicate a customer leaving the service of one company in

More information

IBM SPSS Decision Trees

IBM SPSS Decision Trees IBM SPSS Decision Trees 20 IBM SPSS Decision Trees Easily identify groups and predict outcomes Highlights With SPSS Decision Trees you can: Identify groups, segments, and patterns in a highly visual manner

More information

Direct Marketing with the Application of Data Mining

Direct Marketing with the Application of Data Mining Direct Marketing with the Application of Data Mining M Suman, T Anuradha, K Manasa Veena* KLUNIVERSITY,GreenFields,vijayawada,A.P *Email: manasaveena_555@yahoo.com Abstract For any business to be successful

More information

Business-Insight Top winner at the PAKDD 2010 cup

Business-Insight Top winner at the PAKDD 2010 cup Business-Insight Top winner at the PAKDD 2010 cup The objective of the 14th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2010) is Re-Calibration of a Credit Risk Assessment System

More information

Leveraging Analytics and. User Segmentation

Leveraging Analytics and. User Segmentation Freemium Economics Leveraging Analytics and User Segmentation to Drive Revenue Eric Benjamin Seufert ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Value Proposition for Financial Institutions

Value Proposition for Financial Institutions WWW.CUSTOMERS-DNA.COM VALUE PROPOSITION FINANCIAL INSTITUTIONS Value Proposition for Financial Institutions Customer Intelligence in Banking CRM & CUSTOMER INTELLIGENCE SERVICES INFO@CUSTOMERS-DNA.COM

More information

An Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection

An Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection An Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection T. Maruthi Padmaja 1, Narendra Dhulipalla 1, P. Radha Krishna 1, Raju S. Bapi 2, and A. Laha 1 1 Institute for

More information

Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values

Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values Mathematical Problems in Engineering, Article ID 869628, 15 pages http://dx.doi.org/10.1155/2014/869628 Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 96 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (6 ) 68 69 th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES6,

More information

Detecting and Pruning Introns for Faster Decision Tree Evolution

Detecting and Pruning Introns for Faster Decision Tree Evolution Detecting and Pruning Introns for Faster Decision Tree Evolution Jeroen Eggermont and Joost N. Kok and Walter A. Kosters Leiden Institute of Advanced Computer Science Universiteit Leiden P.O. Box 9512,

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Tutorial Regression & correlation. Presented by Jessica Raterman Shannon Hodges

Tutorial Regression & correlation. Presented by Jessica Raterman Shannon Hodges + Tutorial Regression & correlation Presented by Jessica Raterman Shannon Hodges + Access & assess your data n Install and/or load the MASS package to access the dataset birthwt n Familiarize yourself

More information

Using Decision Tree to predict repeat customers

Using Decision Tree to predict repeat customers Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the

More information

Evaluation next steps Lift and Costs

Evaluation next steps Lift and Costs Evaluation next steps Lift and Costs Outline Lift and Gains charts *ROC Cost-sensitive learning Evaluation for numeric predictions 2 Application Example: Direct Marketing Paradigm Find most likely prospects

More information

New Customer Acquisition Strategy

New Customer Acquisition Strategy Page 1 New Customer Acquisition Strategy Based on Customer Profiling Segmentation and Scoring Model Page 2 Introduction A customer profile is a snapshot of who your customers are, how to reach them, and

More information

Applying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010

Applying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010 Original Article Applying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010 Evgeny Antipov is the President of The Center for

More information

Constructive Meta-Level Feature Selection Method based on Method Repositories

Constructive Meta-Level Feature Selection Method based on Method Repositories Constructive Meta-Level Feature Selection Method based on Method Repositories Hidenao Abe 1, and Takahira Yamaguchi 2 1 Department of Medical Informatics, Shimane University, 89-1 Enya-cho Izumo Shimane,

More information

Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System

Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System r""'=~~"''''''''''''''''''''''''''''\;'=="'~''''o''''"'"''~ ~c_,,..! Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System Rainer Muche 1, Josef HogeP and Olaf

More information

Mining a Marketing Campaigns Data of Bank

Mining a Marketing Campaigns Data of Bank Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Data Warehouses & OLAP

Data Warehouses & OLAP Riadh Ben Messaoud 1. The Big Picture 2. Data Warehouse Philosophy 3. Data Warehouse Concepts 4. Warehousing Applications 5. Warehouse Schema Design 6. Business Intelligence Reporting 7. On-Line Analytical

More information

Financial Services: Maximize Revenue with Better Marketing Data. Marketing Data Solutions for the Financial Services Industry

Financial Services: Maximize Revenue with Better Marketing Data. Marketing Data Solutions for the Financial Services Industry Financial Services: Maximize Revenue with Better Marketing Data Marketing Data Solutions for the Financial Services Industry DataMentors, LLC April 2014 1 Financial Services: Maximize Revenue with Better

More information

Panel: Driving Factors for Prospective Sectors Moderated by: Angel Mitev Panelists: Hristo Hadjitchonev, Angel Marchev Jr., Nikola Toshev, Emil

Panel: Driving Factors for Prospective Sectors Moderated by: Angel Mitev Panelists: Hristo Hadjitchonev, Angel Marchev Jr., Nikola Toshev, Emil Panel: Driving Factors for Prospective Sectors Moderated by: Angel Mitev Panelists: Hristo Hadjitchonev, Angel Marchev Jr., Nikola Toshev, Emil Ivanov, Sergi Sergiev ROI of Data Science Projects Improve

More information

Preliminary Investigations into Knowledge Discovery for Quick Market Intelligence

Preliminary Investigations into Knowledge Discovery for Quick Market Intelligence From: AAAI Technical Report WS-93-02. Compilation copyright 1993, AAAI (www.aaai.org). All rights reserved. Preliminary Investigations into Knowledge Discovery for Quick Market Intelligence William P.

More information

From Profit Driven Business Analytics. Full book available for purchase here.

From Profit Driven Business Analytics. Full book available for purchase here. From Profit Driven Business Analytics. Full book available for purchase here. Contents Foreword xv Acknowledgments xvii Chapter 1 A Value-Centric Perspective Towards Analytics 1 Introduction 1 Business

More information

Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS

Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS Paper 1414-2017 Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS ABSTRACT Krutharth Peravalli, Dr. Dmitriy Khots West Corporation It takes months to find

More information

Marketing Data Solutions for the Financial Services Industry

Marketing Data Solutions for the Financial Services Industry Marketing Data Solutions for the Financial Services Industry Maximize your revenue with better data. Publication Date: July, 2015 www.datamentors.com info@datamentors.com 01. Maximize Revenue with Better

More information

Applications of Machine Learning to Predict Yelp Ratings

Applications of Machine Learning to Predict Yelp Ratings Applications of Machine Learning to Predict Yelp Ratings Kyle Carbon Aeronautics and Astronautics kcarbon@stanford.edu Kacyn Fujii Electrical Engineering khfujii@stanford.edu Prasanth Veerina Computer

More information

CSC-272 Exam #1 February 13, 2015

CSC-272 Exam #1 February 13, 2015 CSC-272 Exam #1 February 13, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors

More information

Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods

Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods Shusaku Tsumoto, Shoji Hirano, and Hidenao Abe Department of Medical Informatics, Faculty of Medicine,

More information

Data Mining. Textbook:

Data Mining. Textbook: Data Mining Textbook: DATA MINING: Practical Machine Learning Tools and Techniques, 2nd Edition, by Ian H. Witten and Eibe Frank, Morgan Kaufmann Publishers, 2005. Chapter 1: What s it all about? 1 Chapter

More information

Predicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining

Predicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining Predicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining Frehiywot Nega HiLCoE, Computer Science Programme, Ethiopia fr.nega@gmail.com Tibebe Beshah HiLCoE,

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

COMPARATIVE STUDY OF SUPERVISED LEARNING IN CUSTOMER RELATIONSHIP MANAGEMENT

COMPARATIVE STUDY OF SUPERVISED LEARNING IN CUSTOMER RELATIONSHIP MANAGEMENT International Journal of Computer Engineering & Technology (IJCET) Volume 8, Issue 6, Nov-Dec 2017, pp. 77 82, Article ID: IJCET_08_06_009 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=8&itype=6

More information

Study on Talent Introduction Strategies in Zhejiang University of Finance and Economics Based on Data Mining

Study on Talent Introduction Strategies in Zhejiang University of Finance and Economics Based on Data Mining International Journal of Statistical Distributions and Applications 2018; 4(1): 22-28 http://www.sciencepublishinggroup.com/j/ijsda doi: 10.11648/j.ijsd.20180401.13 ISSN: 2472-3487 (Print); ISSN: 2472-3509

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 International

More information

WEEK 9 DATA MINING 1

WEEK 9 DATA MINING 1 WEEK 9 DATA MINING 1 Week 9 Data Mining Introduction The purpose of this paper is to present the illustration of different aspects, which are associated with data mining. In the current era, businesses

More information

Why? Big Problem. Attrition. Causes: Open Commerce High competition Access to Information Bad Management Aggressive Marketing

Why? Big Problem. Attrition. Causes: Open Commerce High competition Access to Information Bad Management Aggressive Marketing Forum Internacional Carla Cardoso & Victor Lobo Why? ttrition Big Problem Causes: Open Commerce High competition ccess to Information Bad anagement ggressive arketing Possible olutions: Better Products

More information

Modeling of Waste Water Treatment Plant with Regression Trees

Modeling of Waste Water Treatment Plant with Regression Trees 1 Modeling of Waste Water Treatment Plant with Regression Trees Nataša Atanasova 1, Boris Kompare 1 1 Faculty of Civil and geodetic Engineering Institute of Sanitary Engineering University of Ljubljana,

More information

IM S5028. Architecture for Analytical CRM. Architecture for Analytical CRM. Customer Analytics. Data Mining for CRM: an overview.

IM S5028. Architecture for Analytical CRM. Architecture for Analytical CRM. Customer Analytics. Data Mining for CRM: an overview. Customer Analytics Data Mining for CRM: an overview Architecture for Analytical CRM customer contact points Retrospective analysis tools OLAP Query Reporting Customer Data Warehouse Operational systems

More information

Develop an Intelligence Analysis Tool for Abdominal Aortic Aneurysm

Develop an Intelligence Analysis Tool for Abdominal Aortic Aneurysm Develop an Intelligence Analysis Tool for Abdominal Aortic Aneurysm Nan-Chen Hsieh, Jui-Fa Chen, and Hsin-Che Tsai * Abstract. An Abdominal Aortic Aneurysm (AAA) is a focal dilatation at some point of

More information

Analytical - Vol.2, Issue 1 INFLUENCE OF CLIMATE CHANGE ON DIATOMS DIVERSITY INDICES IN LAKE PRESPA

Analytical - Vol.2, Issue 1 INFLUENCE OF CLIMATE CHANGE ON DIATOMS DIVERSITY INDICES IN LAKE PRESPA Analytical - Vol.2, Issue 1 INFLUENCE OF CLIMATE CHANGE ON DIATOMS DIVERSITY INDICES IN LAKE PRESPA Andreja Naumoski Kosta Mitreski 1. INTRODUCTION Applying machine learning techniques into ecology have

More information

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh Statistic Methods in in Mining Business Understanding Understanding Preparation Deployment Modelling Evaluation Mining Process (( Part 3) 3) Professor Dr. Gholamreza Nakhaeizadeh Professor Dr. Gholamreza

More information

Logistic Regression for Early Warning of Economic Failure of Construction Equipment

Logistic Regression for Early Warning of Economic Failure of Construction Equipment Logistic Regression for Early Warning of Economic Failure of Construction Equipment John Hildreth, PhD and Savannah Dewitt University of North Carolina at Charlotte Charlotte, North Carolina Equipment

More information

Credibility: Evaluating What s Been Learned

Credibility: Evaluating What s Been Learned Evaluation: the Key to Success Credibility: Evaluating What s Been Learned Chapter 5 of Data Mining How predictive is the model we learned? Accuracy on the training data is not a good indicator of performance

More information

Credit Scoring, Response Modelling and Insurance Rating

Credit Scoring, Response Modelling and Insurance Rating Credit Scoring, Response Modelling and Insurance Rating Also by Steven Finlay THE MANAGEMENT OF CONSUMER CREDIT CONSUMER CREDIT FUNDAMENTALS Credit Scoring, Response Modelling and Insurance Rating A Practical

More information

A Direct Marketing Framework to Facilitate Data Mining Usage for Marketers: A Case Study in Supermarket Promotions Strategy

A Direct Marketing Framework to Facilitate Data Mining Usage for Marketers: A Case Study in Supermarket Promotions Strategy A Direct Marketing Framework to Facilitate Data Mining Usage for Marketers: A Case Study in Supermarket Promotions Strategy Adel Flici (0630951) Business School, Brunel University, London, UK Abstract

More information

Churn Prediction for Game Industry Based on Cohort Classification Ensemble

Churn Prediction for Game Industry Based on Cohort Classification Ensemble Churn Prediction for Game Industry Based on Cohort Classification Ensemble Evgenii Tsymbalov 1,2 1 National Research University Higher School of Economics, Moscow, Russia 2 Webgames, Moscow, Russia etsymbalov@gmail.com

More information

Data Mining Applications with R

Data Mining Applications with R Data Mining Applications with R Yanchang Zhao Senior Data Miner, RDataMining.com, Australia Associate Professor, Yonghua Cen Nanjing University of Science and Technology, China AMSTERDAM BOSTON HEIDELBERG

More information

Making Contact Center operations easy through Big Data

Making Contact Center operations easy through Big Data Making Contact Center operations easy through Big Data 26 March 2015 Chapman Lam - Regional Director, Customer Experience Hongjuan Liu - Regional Director, Customer Analytics & Behavioral Insights aegonmarketing.com

More information

Analysing which factors are of influence in predicting the employee turnover

Analysing which factors are of influence in predicting the employee turnover Analysing which factors are of influence in predicting the employee turnover Research Paper Business Analytics HMN Yousaf Supervised by Dr. Sandjai Bhulai Vrije Universiteit Amsterdam Faculty of Sciences

More information

Application of Bayesian Networks for Customer Behaviour Analysis

Application of Bayesian Networks for Customer Behaviour Analysis Application of Bayesian Networks for Customer Behaviour Analysis Heena Timani 1, Dr. Mayuri Pandya 2 Assistant Professor, School of Computer Studies, Ahmedabad University, Ahmedabad, Gujrat-India 1 Head

More information

Predicting Corporate Influence Cascades In Health Care Communities

Predicting Corporate Influence Cascades In Health Care Communities Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice

More information

Customer Targeting Models Using Actively-Selected Web Content

Customer Targeting Models Using Actively-Selected Web Content Customer Targeting Models Using Actively-Selected Web Content Prem Melville IBM T.J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598 pmelvil@us.ibm.com Saharon Rosset IBM T.J. Watson Research

More information

Harbingers of Failure: Online Appendix

Harbingers of Failure: Online Appendix Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine

More information

Application research on traffic modal choice based on decision tree algorithm Zhenghong Peng 1,a, Xin Luan 2,b

Application research on traffic modal choice based on decision tree algorithm Zhenghong Peng 1,a, Xin Luan 2,b Applied Mechanics and Materials Online: 011-09-08 ISSN: 166-748, Vols. 97-98, pp 843-848 doi:10.408/www.scientific.net/amm.97-98.843 011 Trans Tech Publications, Switzerland Application research on traffic

More information

Mining knowledge using Decision Tree Algorithm

Mining knowledge using Decision Tree Algorithm International Journal of Scientific & Engineering Research Volume 2, Issue 5, May-2011 1 Mining knowledge using Decision Tree Algorithm Mrs. Swati.V. Kulkarni Abstract Industry is experiencing more and

More information

Credit Card Marketing Classification Trees

Credit Card Marketing Classification Trees Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.

More information

Cost-Sensitive Test Strategies

Cost-Sensitive Test Strategies Victor S. Sheng, Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario N6A 5B7, Canada {ssheng, cling}@csd.uwo.ca Abstract In medical diagnosis doctors must often

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

DATA MINING IN THE FINANCIAL SERVICES INDUSTRY

DATA MINING IN THE FINANCIAL SERVICES INDUSTRY DATA MINING IN THE FINANCIAL SERVICES INDUSTRY PRESENTATION TO KNOWLEDGE DISCOVERY CENTRE (15 FEBRUARY 2001) Steven Parker Head CRM Consumer Banking Standard Chartered 1 STANDARD CHARTERED World s leading

More information

Applying and Evaluating Models to Predict Customer Attrition Using Data Mining Techniques

Applying and Evaluating Models to Predict Customer Attrition Using Data Mining Techniques Journal of Comparative International Management 2003 Management Futures 2003, Vol. 6, No. 1, 10-22 Printed in Canada Applying and Evaluating Models to Predict Customer Attrition Using Data Mining Techniques

More information

Mallow s C p for Selecting Best Performing Logistic Regression Subsets

Mallow s C p for Selecting Best Performing Logistic Regression Subsets Mallow s C p for Selecting Best Performing Logistic Regression Subsets Mary G. Lieberman John D. Morris Florida Atlantic University Mallow s C p is used herein to select maximally accurate subsets of predictor

More information

Designing Customer Target Recommendation System Using K-Means Clustering Method

Designing Customer Target Recommendation System Using K-Means Clustering Method 1 Designing Customer Target Recommendation System Using K-Means Clustering Method Evasaria M. Sipayung 1, Herastia Maharani 2, Benny A. Paskhadira 3 Abstract UD Swiss is a company engaged in the field

More information

Keywords acrm, RFM, Clustering, Classification, SMOTE, Metastacking

Keywords acrm, RFM, Clustering, Classification, SMOTE, Metastacking Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Comparative

More information