Report for PAKDD 2007 Data Mining Competition
|
|
- Annis Hoover
- 6 years ago
- Views:
Transcription
1 Report for PAKDD 2007 Data Mining Competition Li Guoliang School of Computing, National University of Singapore April, 2007 Abstract The task in PAKDD 2007 data mining competition is a cross-selling business problem from consumer finance industry. The objective is to predict the propensity of the credit card customers to apply for a home loan. The training data contains cases, 40 features, and 1 class label variables. After data pre-processing, we identified significant features with information-gain-based feature selection method, such as the number of Bureau enquiries for mortgages and loans, age, current residence months, and etc. We selected the top 14 features and applied logistic regression method to the training data. Compared with data mining methods, the results showed that the logistic regression method achieves the best AUC score from 10 fold cross validation. With the mentioned information, we built our final classifier with the whole training data. The AUC score is about We expect the similar performance on the withheld data. 1 Introduction PAKDD 2007 Data Mining Competition [1] is one of the events in the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2007). The background of the competition is that a financial institution has two groups of customers: one with credit cards and another with home loans. The institution would like to identify which customers are likely to apply for a home loan in 12 months after they apply for a credit card. The data for this task is the available information when the customers apply for a credit card. The main task in the competition is to predict the propensity of the credit card customers, who do not have a home loan, to apply for a home loan. 2 Data summary There are cases in the provided data, in which have the known class labels and 8000 hold out the class labels as the hidden test cases. Among the cases with known class labels, 700 customers have applied a home loan in 12 months after the credit card application. We called these 700 cases as positive data in this report. The information is summarized in Table 1. In the data set, there are forty-one features, excluding the customer IDs which cannot be used in the prediction and model building. The target variable, or the class label, indicates whether the customers have applied home loans. The other features are demographic data, credit information, and other related data. One feature, B_DEF_UNPD_L12M, only has (known) value 0. This feature is removed from the data before further analysis.
2 In the remaining data, including those for training and withhold, there are missing values (excluding the withheld class labels). The number of features with missing values is 9 for the entire data. The names of the features with missing values are listed in Table. The feature DISP_INCOME_CODE is the one with the most missing values it has missing values (about 71%). known Table 1 Summary of the available data Class label Number of cases Positive 700 negative Held-out 8000 Total Table 2 Names of features with missing values CHQ_ACCT_IND AMEX_CARD DINERS_CARD VISA_CARD MASTERCARD RETAIL_CARDS DISP_INCOME_CODE CUSTOMER_SEGMENT CREDIT_CARD_TYPE ANNUAL_INCOME_RANGE 3 Experiment In this work, we tried to build models to predict the class labels in the withheld data set. A data mining package Weka [4] is used as the main tool for data preprocessing, feature selection, model building and prediction. For the purpose to verify which data mining method is better for this application, the training data is randomly split into 10 fold for cross-validation. 3.1 Invalid values of nominal features In the data, some nominal features have invalid values. For example, feature MASTERCARD has invalid values of 1 and 2. For these invalid values, we replaced them with the mode of the corresponding features. 3.2 Discretization To apply certain data mining methods, we need to discretize the numeric data into nominal data. The discretization method we used is the method proposed by Fayyad and Irani [2]. This method selects a cutting point in the current data set with maximum information gain and split the data set into two subsets. Then the method recursively applies to each subset, until the Minimum-Description-Length-based stop criterion is reached.
3 3.3 Missing value processing Some data mining methods can deal with missing values, while others methods cannot. In this work, the missing values are replaced with the mean for the numeric features and the mode for the nominal features. 3.4 Feature selection In our experiment, we have tried different feature selection methods: chi-square, information gain, and gain ratio. These three feature selection methods measure the strengths of each feature for classification with the criteria in their names, and rank the features based on the measures. The top 14 features from these four methods are listed in Table. Table 3 Top 14 selected features Order Information Gain 1 B_ENQ_L6M_GR3 2 B_ENQ_L12M_GR3 3 CURR_RES_MTHS 4 AGE_AT_APPLICATION 5 PREV_RES_MTHS 6 B_ENQ_L3M 7 B_ENQ_L12M_GR2 8 B_ENQ_L6M 9 B_ENQ_L1M 10 RENT_BUY_CODE 11 MARITAL_STATUS 12 ANNUAL_INCOME_RANGE 13 A_DISTRICT_APPLICANT 14 SAV_ACCT_IND 3.5 Data mining methods applied In our experiment, we tried different data mining methods. After comparison, we selected logistic regression as our final method. Logistic regression [3] is a non-linear transformation of the linear regression, which applies to the case when the dependent variables are discrete. 3.6 Summary of experiment The data preprocessing and model building process is summarized in Figure 1. The main evaluation measures we used are ROC and AUC (ROC refers to receiver operating characteristic and AUC refers to the Area under the curve). Our results show that logistic regression gives the best AUC score. Figure 2 shows the ROC curve from the training data.
4 Value merging discretization Missing value replacement Feature selection Model building Figure 1 The procedure to process the data Figure 1 ROC curve from logistic regression model 4 Discussion After analyzing the data, we have the following insights from PAKDD 2007 competition data: 1. The most significant features for this cross-selling problem are: B_ENQ_L6M_GR3, B_ENQ_L12M_GR3, and other Bureau enquiry information. These features show the customers interests in mortgages. The customers may
5 have contacted with different financial institutions, and the contacted financial institutions did more enquiries from the Bureau. 2. The residence history is an important indicator for a home loan. If the customers rent a flat, it is more probable for them to apply for a home loan. If they have stay in a long time, it is probable for them to switch to a new home. 3. The age and marital status affect the home load application. These are the main factors to indicate whether the customers need a flat. 4. The income level is another main factor for a home loan. It indicates whether the customers can afford a home loan. Reference: [1] PAKDD 2007 Data Mining Competition, in: (2007). [2] U.M. Fayyad, K.B. Irani, Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, in: Proceedings of 13th International Joint Conference on Artificial Intelligence (IJCAI) (1993) [3] D.W. Hosmer, S. Lemeshow, Applied logistic regression (John Wiley & Sons, New York, 2000). [4] I.H. Witten, E. Frank, Data mining: practical machine learning tools and techniques with Java implementations (Morgan Kaufmann, San Francisco, 1999).
Ranking Potential Customers based on GroupEnsemble method
Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,
More informationDATA MINING: A BRIEF INTRODUCTION
DATA MINING: A BRIEF INTRODUCTION Matthew N. O. Sadiku, Adebowale E. Shadare Sarhan M. Musa Roy G. Perry College of Engineering, Prairie View A&M University Prairie View, USA Abstract Data mining may be
More informationPredicting Customer Loyalty Using Data Mining Techniques
Predicting Customer Loyalty Using Data Mining Techniques Simret Solomon University of South Africa (UNISA), Addis Ababa, Ethiopia simrets2002@yahoo.com Tibebe Beshah School of Information Science, Addis
More informationAdvanced Analytics through the credit cycle
Advanced Analytics through the credit cycle Alejandro Correa B. Andrés Gonzalez M. Introduction PRE ORIGINATION Credit Cycle POST ORIGINATION ORIGINATION Pre-Origination Propensity Models What is it?
More informationScienceDirect. An Efficient CRM-Data Mining Framework for the Prediction of Customer Behaviour
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 725 731 International Conference on Information and Communication Technologies (ICICT 2014) An Efficient CRM-Data
More informationRESULT AND DISCUSSION
4 Figure 3 shows ROC curve. It plots the probability of false positive (1-specificity) against true positive (sensitivity). The area under the ROC curve (AUR), which ranges from to 1, provides measure
More informationA Churn Analysis Using Data Mining Techniques: Case of Electricity Distribution Company
, October 25-27, 2017, San Francisco, USA A Churn Analysis Using Data Mining Techniques: Case of Electricity Distribution Company Jiri Pribil, Member, IAENG, Michaela Polejova Abstract This paper focuses
More informationA Comparative Study of Filter-based Feature Ranking Techniques
Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,
More informationAdvanced Tutorials. SESUG '95 Proceedings GETTING STARTED WITH PROC LOGISTIC
GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic
More information2 Maria Carolina Monard and Gustavo E. A. P. A. Batista
Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of
More informationImproving Credit Card Fraud Detection using a Meta- Classification Strategy
Improving Credit Card Fraud Detection using a Meta- Classification Strategy Joseph Pun, Yuri Lawryshyn Department of Applied Chemistry and Engineering, University of Toronto Toronto ABSTRACT One of the
More informationA Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values
Australian Journal of Basic and Applied Sciences, 6(7): 312-317, 2012 ISSN 1991-8178 A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values 1 K. Suresh
More informationValidating a Bankruptcy Prediction by Using Naïve Bayesian Network Model: A case from Malaysian Firms
2012 International Conference on Economics, Business Innovation IPEDR vol.38 (2012) (2012) IACSIT Press, Singapore Validating a Bankruptcy Prediction by Using Naïve Bayesian Network Model: A case from
More informationSession 15 Business Intelligence: Data Mining and Data Warehousing
15.561 Information Technology Essentials Session 15 Business Intelligence: Data Mining and Data Warehousing Copyright 2005 Chris Dellarocas and Thomas Malone Adapted from Chris Dellarocas, U. Md. Outline
More informationarxiv: v1 [cs.lg] 13 Oct 2016
Bank Card Usage Prediction Exploiting Geolocation Information Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, and Lars Schmidt-Thieme arxiv:1610.03996v1 [cs.lg] 13 Oct 2016 Information Systems and
More informationGETTING STARTED WITH PROC LOGISTIC
GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services and University of California, Berkeley Extension Division Introduction Logistic Regression is an increasingly popular analytic
More informationCONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS
CONNECTING CORPORATE GOVERNANCE TO COMPANIES PERFORMANCE BY ARTIFICIAL NEURAL NETWORKS Darie MOLDOVAN, PhD * Mircea RUSU, PhD student ** Abstract The objective of this paper is to demonstrate the utility
More informationSPM 8.2. Salford Predictive Modeler
SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from
More informationApplication of Data Mining Techniques for Crop Productivity Prediction
Application of Data Mining Techniques for Crop Productivity Prediction Zekarias Diriba zekariaskifle@gmail.com Berhanu Borena PhD Candidate, Addis Ababa University, Ethiopia berhanuborena@gmail.com Abstract
More informationGETTING STARTED WITH PROC LOGISTIC
PAPER 255-25 GETTING STARTED WITH PROC LOGISTIC Andrew H. Karp Sierra Information Services, Inc. USA Introduction Logistic Regression is an increasingly popular analytic tool. Used to predict the probability
More informationImproving a Credit Scoring Model by Incorporating Bank Statement Derived Features
Improving a Credit Scoring Model by Incorporating Bank Statement Derived Features Rory P. Bunker 1, M. Asif Naeem 2, Wenjun Zhang 3 Auckland University of Technology 55 Wellesley St E, Auckland 1010, New
More informationKNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA
DOI: 10.21917/ijsc.2011.0022 KNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA N. Sivaram 1 and K. Ramar 2 1 Department of Computer Science
More informationPredicting Customer Purchase to Improve Bank Marketing Effectiveness
Business Analytics Using Data Mining (2017 Fall).Fianl Report Predicting Customer Purchase to Improve Bank Marketing Effectiveness Group 6 Sandy Wu Andy Hsu Wei-Zhu Chen Samantha Chien Instructor:Galit
More informationBank Card Usage Prediction Exploiting Geolocation Information
Bank Card Usage Prediction Exploiting Geolocation Information Martin Wistuba, Nghia Duong-Trung, Nicolas Schilling, and Lars Schmidt-Thieme Information Systems and Machine Learning Lab University of Hildesheim
More informationStartup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat
Startup Machine Learning: Bootstrapping a fraud detection system Michael Manapat Stripe @mlmanapat About me: Engineering Manager of the Machine Learning Products Team at Stripe About Stripe: Payments infrastructure
More informationCREDIT RISK MODELLING Using SAS
Basic Modelling Concepts Advance Credit Risk Model Development Scorecard Model Development Credit Risk Regulatory Guidelines 70 HOURS Practical Learning Live Online Classroom Weekends DexLab Certified
More informationApplication of Decision Trees in Mining High-Value Credit Card Customers
Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,
More informationImprovement of Association-based Gene Mapping Accuracy by Selecting High Rank Features
Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features 1 Zahra Mahoor, 2 Mohammad Saraee, 3 Mohammad Davarpanah Jazi 1,2,3 Department of Electrical and Computer Engineering,
More informationData Mining in CRM THE CRM STRATEGY
CHAPTER ONE Data Mining in CRM THE CRM STRATEGY Customers are the most important asset of an organization. There cannot be any business prospects without satisfied customers who remain loyal and develop
More informationTHE APPLICATION OF INFORMATION ENTROPY THEORY BASED DATA CLASSIFICATION ALGORITHM IN THE SELECTION OF TALENTS IN HOTELS
ITALIAN JOURNAL OF PURE AND APPLIED MATHEMATICS N. 38 2017 (253 260) 253 THE APPLICATION OF INFORMATION ENTROPY THEORY BASED DATA CLASSIFICATION ALGORITHM IN THE SELECTION OF TALENTS IN HOTELS A. Youyu
More informationChecking and Analysing Customers Buying Behavior with Clustering Algorithm
Pal. Jour. V.16, I.3, No.2 2017, 486-492 Copyright 2017 by Palma Journal, All Rights Reserved Available online at: http://palmajournal.org/ Checking and Analysing Customers Buying Behavior with Clustering
More informationTDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.
Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide
More informationWeb Customer Modeling for Automated Session Prioritization on High Traffic Sites
Web Customer Modeling for Automated Session Prioritization on High Traffic Sites Nicolas Poggi 1, Toni Moreno 2,3, Josep Lluis Berral 1, Ricard Gavaldà 4, and Jordi Torres 1,2 1 Computer Architecture Department,
More informationSalford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.
Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable
More informationTNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011
TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011 1 Develop Models for Customers Likely to Churn Churn is a term used to indicate a customer leaving the service of one company in
More informationIBM SPSS Decision Trees
IBM SPSS Decision Trees 20 IBM SPSS Decision Trees Easily identify groups and predict outcomes Highlights With SPSS Decision Trees you can: Identify groups, segments, and patterns in a highly visual manner
More informationDirect Marketing with the Application of Data Mining
Direct Marketing with the Application of Data Mining M Suman, T Anuradha, K Manasa Veena* KLUNIVERSITY,GreenFields,vijayawada,A.P *Email: manasaveena_555@yahoo.com Abstract For any business to be successful
More informationBusiness-Insight Top winner at the PAKDD 2010 cup
Business-Insight Top winner at the PAKDD 2010 cup The objective of the 14th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2010) is Re-Calibration of a Credit Risk Assessment System
More informationLeveraging Analytics and. User Segmentation
Freemium Economics Leveraging Analytics and User Segmentation to Drive Revenue Eric Benjamin Seufert ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE
More informationTDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.
Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide
More informationValue Proposition for Financial Institutions
WWW.CUSTOMERS-DNA.COM VALUE PROPOSITION FINANCIAL INSTITUTIONS Value Proposition for Financial Institutions Customer Intelligence in Banking CRM & CUSTOMER INTELLIGENCE SERVICES INFO@CUSTOMERS-DNA.COM
More informationAn Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection
An Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection T. Maruthi Padmaja 1, Narendra Dhulipalla 1, P. Radha Krishna 1, Raju S. Bapi 2, and A. Laha 1 1 Institute for
More informationResearch Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with Missing Values
Mathematical Problems in Engineering, Article ID 869628, 15 pages http://dx.doi.org/10.1155/2014/869628 Research Article One-Step Dynamic Classifier Ensemble Model for Customer Value Segmentation with
More informationAvailable online at ScienceDirect. Procedia Computer Science 96 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (6 ) 68 69 th International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES6,
More informationDetecting and Pruning Introns for Faster Decision Tree Evolution
Detecting and Pruning Introns for Faster Decision Tree Evolution Jeroen Eggermont and Joost N. Kok and Walter A. Kosters Leiden Institute of Advanced Computer Science Universiteit Leiden P.O. Box 9512,
More information3 Ways to Improve Your Targeted Marketing with Analytics
3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers
More informationTutorial Regression & correlation. Presented by Jessica Raterman Shannon Hodges
+ Tutorial Regression & correlation Presented by Jessica Raterman Shannon Hodges + Access & assess your data n Install and/or load the MASS package to access the dataset birthwt n Familiarize yourself
More informationUsing Decision Tree to predict repeat customers
Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the
More informationEvaluation next steps Lift and Costs
Evaluation next steps Lift and Costs Outline Lift and Gains charts *ROC Cost-sensitive learning Evaluation for numeric predictions 2 Application Example: Direct Marketing Paradigm Find most likely prospects
More informationNew Customer Acquisition Strategy
Page 1 New Customer Acquisition Strategy Based on Customer Profiling Segmentation and Scoring Model Page 2 Introduction A customer profile is a snapshot of who your customers are, how to reach them, and
More informationApplying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010
Original Article Applying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010 Evgeny Antipov is the President of The Center for
More informationConstructive Meta-Level Feature Selection Method based on Method Repositories
Constructive Meta-Level Feature Selection Method based on Method Repositories Hidenao Abe 1, and Takahira Yamaguchi 2 1 Department of Medical Informatics, Shimane University, 89-1 Enya-cho Izumo Shimane,
More informationPractical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System
r""'=~~"''''''''''''''''''''''''''''\;'=="'~''''o''''"'"''~ ~c_,,..! Practical Aspects of Modelling Techp.iques in Logistic Regression Procedures of the SAS System Rainer Muche 1, Josef HogeP and Olaf
More informationMining a Marketing Campaigns Data of Bank
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,
More informationData Warehouses & OLAP
Riadh Ben Messaoud 1. The Big Picture 2. Data Warehouse Philosophy 3. Data Warehouse Concepts 4. Warehousing Applications 5. Warehouse Schema Design 6. Business Intelligence Reporting 7. On-Line Analytical
More informationFinancial Services: Maximize Revenue with Better Marketing Data. Marketing Data Solutions for the Financial Services Industry
Financial Services: Maximize Revenue with Better Marketing Data Marketing Data Solutions for the Financial Services Industry DataMentors, LLC April 2014 1 Financial Services: Maximize Revenue with Better
More informationPanel: Driving Factors for Prospective Sectors Moderated by: Angel Mitev Panelists: Hristo Hadjitchonev, Angel Marchev Jr., Nikola Toshev, Emil
Panel: Driving Factors for Prospective Sectors Moderated by: Angel Mitev Panelists: Hristo Hadjitchonev, Angel Marchev Jr., Nikola Toshev, Emil Ivanov, Sergi Sergiev ROI of Data Science Projects Improve
More informationPreliminary Investigations into Knowledge Discovery for Quick Market Intelligence
From: AAAI Technical Report WS-93-02. Compilation copyright 1993, AAAI (www.aaai.org). All rights reserved. Preliminary Investigations into Knowledge Discovery for Quick Market Intelligence William P.
More informationFrom Profit Driven Business Analytics. Full book available for purchase here.
From Profit Driven Business Analytics. Full book available for purchase here. Contents Foreword xv Acknowledgments xvii Chapter 1 A Value-Centric Perspective Towards Analytics 1 Introduction 1 Business
More informationChurn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS
Paper 1414-2017 Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS ABSTRACT Krutharth Peravalli, Dr. Dmitriy Khots West Corporation It takes months to find
More informationMarketing Data Solutions for the Financial Services Industry
Marketing Data Solutions for the Financial Services Industry Maximize your revenue with better data. Publication Date: July, 2015 www.datamentors.com info@datamentors.com 01. Maximize Revenue with Better
More informationApplications of Machine Learning to Predict Yelp Ratings
Applications of Machine Learning to Predict Yelp Ratings Kyle Carbon Aeronautics and Astronautics kcarbon@stanford.edu Kacyn Fujii Electrical Engineering khfujii@stanford.edu Prasanth Veerina Computer
More informationCSC-272 Exam #1 February 13, 2015
CSC-272 Exam #1 February 13, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors
More informationAutomated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods
Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods Shusaku Tsumoto, Shoji Hirano, and Hidenao Abe Department of Medical Informatics, Faculty of Medicine,
More informationData Mining. Textbook:
Data Mining Textbook: DATA MINING: Practical Machine Learning Tools and Techniques, 2nd Edition, by Ian H. Witten and Eibe Frank, Morgan Kaufmann Publishers, 2005. Chapter 1: What s it all about? 1 Chapter
More informationPredicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining
Predicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining Frehiywot Nega HiLCoE, Computer Science Programme, Ethiopia fr.nega@gmail.com Tibebe Beshah HiLCoE,
More informationWeka Evaluation: Assessing the performance
Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives
More informationCOMPARATIVE STUDY OF SUPERVISED LEARNING IN CUSTOMER RELATIONSHIP MANAGEMENT
International Journal of Computer Engineering & Technology (IJCET) Volume 8, Issue 6, Nov-Dec 2017, pp. 77 82, Article ID: IJCET_08_06_009 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=8&itype=6
More informationStudy on Talent Introduction Strategies in Zhejiang University of Finance and Economics Based on Data Mining
International Journal of Statistical Distributions and Applications 2018; 4(1): 22-28 http://www.sciencepublishinggroup.com/j/ijsda doi: 10.11648/j.ijsd.20180401.13 ISSN: 2472-3487 (Print); ISSN: 2472-3509
More informationInternational Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 International
More informationWEEK 9 DATA MINING 1
WEEK 9 DATA MINING 1 Week 9 Data Mining Introduction The purpose of this paper is to present the illustration of different aspects, which are associated with data mining. In the current era, businesses
More informationWhy? Big Problem. Attrition. Causes: Open Commerce High competition Access to Information Bad Management Aggressive Marketing
Forum Internacional Carla Cardoso & Victor Lobo Why? ttrition Big Problem Causes: Open Commerce High competition ccess to Information Bad anagement ggressive arketing Possible olutions: Better Products
More informationModeling of Waste Water Treatment Plant with Regression Trees
1 Modeling of Waste Water Treatment Plant with Regression Trees Nataša Atanasova 1, Boris Kompare 1 1 Faculty of Civil and geodetic Engineering Institute of Sanitary Engineering University of Ljubljana,
More informationIM S5028. Architecture for Analytical CRM. Architecture for Analytical CRM. Customer Analytics. Data Mining for CRM: an overview.
Customer Analytics Data Mining for CRM: an overview Architecture for Analytical CRM customer contact points Retrospective analysis tools OLAP Query Reporting Customer Data Warehouse Operational systems
More informationDevelop an Intelligence Analysis Tool for Abdominal Aortic Aneurysm
Develop an Intelligence Analysis Tool for Abdominal Aortic Aneurysm Nan-Chen Hsieh, Jui-Fa Chen, and Hsin-Che Tsai * Abstract. An Abdominal Aortic Aneurysm (AAA) is a focal dilatation at some point of
More informationAnalytical - Vol.2, Issue 1 INFLUENCE OF CLIMATE CHANGE ON DIATOMS DIVERSITY INDICES IN LAKE PRESPA
Analytical - Vol.2, Issue 1 INFLUENCE OF CLIMATE CHANGE ON DIATOMS DIVERSITY INDICES IN LAKE PRESPA Andreja Naumoski Kosta Mitreski 1. INTRODUCTION Applying machine learning techniques into ecology have
More informationProfessor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh
Statistic Methods in in Mining Business Understanding Understanding Preparation Deployment Modelling Evaluation Mining Process (( Part 3) 3) Professor Dr. Gholamreza Nakhaeizadeh Professor Dr. Gholamreza
More informationLogistic Regression for Early Warning of Economic Failure of Construction Equipment
Logistic Regression for Early Warning of Economic Failure of Construction Equipment John Hildreth, PhD and Savannah Dewitt University of North Carolina at Charlotte Charlotte, North Carolina Equipment
More informationCredibility: Evaluating What s Been Learned
Evaluation: the Key to Success Credibility: Evaluating What s Been Learned Chapter 5 of Data Mining How predictive is the model we learned? Accuracy on the training data is not a good indicator of performance
More informationCredit Scoring, Response Modelling and Insurance Rating
Credit Scoring, Response Modelling and Insurance Rating Also by Steven Finlay THE MANAGEMENT OF CONSUMER CREDIT CONSUMER CREDIT FUNDAMENTALS Credit Scoring, Response Modelling and Insurance Rating A Practical
More informationA Direct Marketing Framework to Facilitate Data Mining Usage for Marketers: A Case Study in Supermarket Promotions Strategy
A Direct Marketing Framework to Facilitate Data Mining Usage for Marketers: A Case Study in Supermarket Promotions Strategy Adel Flici (0630951) Business School, Brunel University, London, UK Abstract
More informationChurn Prediction for Game Industry Based on Cohort Classification Ensemble
Churn Prediction for Game Industry Based on Cohort Classification Ensemble Evgenii Tsymbalov 1,2 1 National Research University Higher School of Economics, Moscow, Russia 2 Webgames, Moscow, Russia etsymbalov@gmail.com
More informationData Mining Applications with R
Data Mining Applications with R Yanchang Zhao Senior Data Miner, RDataMining.com, Australia Associate Professor, Yonghua Cen Nanjing University of Science and Technology, China AMSTERDAM BOSTON HEIDELBERG
More informationMaking Contact Center operations easy through Big Data
Making Contact Center operations easy through Big Data 26 March 2015 Chapman Lam - Regional Director, Customer Experience Hongjuan Liu - Regional Director, Customer Analytics & Behavioral Insights aegonmarketing.com
More informationAnalysing which factors are of influence in predicting the employee turnover
Analysing which factors are of influence in predicting the employee turnover Research Paper Business Analytics HMN Yousaf Supervised by Dr. Sandjai Bhulai Vrije Universiteit Amsterdam Faculty of Sciences
More informationApplication of Bayesian Networks for Customer Behaviour Analysis
Application of Bayesian Networks for Customer Behaviour Analysis Heena Timani 1, Dr. Mayuri Pandya 2 Assistant Professor, School of Computer Studies, Ahmedabad University, Ahmedabad, Gujrat-India 1 Head
More informationPredicting Corporate Influence Cascades In Health Care Communities
Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice
More informationCustomer Targeting Models Using Actively-Selected Web Content
Customer Targeting Models Using Actively-Selected Web Content Prem Melville IBM T.J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598 pmelvil@us.ibm.com Saharon Rosset IBM T.J. Watson Research
More informationHarbingers of Failure: Online Appendix
Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine
More informationApplication research on traffic modal choice based on decision tree algorithm Zhenghong Peng 1,a, Xin Luan 2,b
Applied Mechanics and Materials Online: 011-09-08 ISSN: 166-748, Vols. 97-98, pp 843-848 doi:10.408/www.scientific.net/amm.97-98.843 011 Trans Tech Publications, Switzerland Application research on traffic
More informationMining knowledge using Decision Tree Algorithm
International Journal of Scientific & Engineering Research Volume 2, Issue 5, May-2011 1 Mining knowledge using Decision Tree Algorithm Mrs. Swati.V. Kulkarni Abstract Industry is experiencing more and
More informationCredit Card Marketing Classification Trees
Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.
More informationCost-Sensitive Test Strategies
Victor S. Sheng, Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario N6A 5B7, Canada {ssheng, cling}@csd.uwo.ca Abstract In medical diagnosis doctors must often
More informationEffective CRM Using. Predictive Analytics. Antonios Chorianopoulos
Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the
More informationA STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET
A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,
More informationDATA MINING IN THE FINANCIAL SERVICES INDUSTRY
DATA MINING IN THE FINANCIAL SERVICES INDUSTRY PRESENTATION TO KNOWLEDGE DISCOVERY CENTRE (15 FEBRUARY 2001) Steven Parker Head CRM Consumer Banking Standard Chartered 1 STANDARD CHARTERED World s leading
More informationApplying and Evaluating Models to Predict Customer Attrition Using Data Mining Techniques
Journal of Comparative International Management 2003 Management Futures 2003, Vol. 6, No. 1, 10-22 Printed in Canada Applying and Evaluating Models to Predict Customer Attrition Using Data Mining Techniques
More informationMallow s C p for Selecting Best Performing Logistic Regression Subsets
Mallow s C p for Selecting Best Performing Logistic Regression Subsets Mary G. Lieberman John D. Morris Florida Atlantic University Mallow s C p is used herein to select maximally accurate subsets of predictor
More informationDesigning Customer Target Recommendation System Using K-Means Clustering Method
1 Designing Customer Target Recommendation System Using K-Means Clustering Method Evasaria M. Sipayung 1, Herastia Maharani 2, Benny A. Paskhadira 3 Abstract UD Swiss is a company engaged in the field
More informationKeywords acrm, RFM, Clustering, Classification, SMOTE, Metastacking
Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Comparative
More information