TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011

Size: px
Start display at page:

Download "TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011"

Transcription

1 TNM033 Data Mining Practical Final Project Deadline: 17 of January, Develop Models for Customers Likely to Churn Churn is a term used to indicate a customer leaving the service of one company in favor of another company. The aim of this exercise is to develop models that predict whether a customer is likely to churn, i.e. the company would like to know how to characterize the customers who may soon churn. Moreover, based on the knowledge gained from analyzing and mining the data, you are required to propose a number of practical measures that the company can put in action to try to retain those customers likely to churn. The dataset to be used for this exercise, available from the course web page in the file churn.arff, has about 3333 customer records. Each customer is described by 20 attributes, plus a binary class attribute churn indicating whether the customer churned. A description of the meaning of each attribute is given in appendix. Your approach to the problem should follow the KDD process: data understanding (also called exploratory data analysis), data preprocessing, mining the data, and description of the interesting discovered patterns (knowledge). The suggested software to be used is Weka 3.6. However, feel free to try any ideas you may have to tackle the problem with any other software. Your final report must clearly document the following aspects. Describe your findings for each of the KDD steps mentioned above. Indicate how you parametrized each (Weka) algorithm you used in this exercise. Indicate performance measures for the models you have built and include a comparison between different models (see also point below). Refer which software tools you have used 1 and their purpose. Consider also the possibility of using graphs and images in your report since they can improve the clarity and the strength of your conclusions. 1 If a tool is available through a web page indicate its web address. 1

2 1.1 Data Understanding The first step in approaching a data mining problem is to delve into the data, identify any interesting relationships between the attributes, and formulate some initial hypothesis, i.e. possible associations between the attributes and the class. Graphical tools can aid you in this phase. You can find below a suggestion of some points to look at that may help you to better understand the data. 1. For each attribute find the following information. (a) The attribute type, e.g. nominal, ordinal, numeric. (b) Percentage of missing values in the data. (c) Max, min, mean, standard deviation. (d) The type of distribution that the numeric attribute seems to follow (e.g. normal). (e) Are there any records that have a value for the attribute that no other record has (i.e. unique values)? (f) Study the histogram of the attribute and note how it seems to influence the risk for churning. (g) Are there any outliers for the attribute under consideration? If you suspect of the existence of outliers for an attribute, you may consider the possibility of using box plots for outlier detection Observe whether the dataset has imbalanced class distribution. 3. Switch to the Visualize tab on the upper part of the screen in Weka to visualize 2D-scatter plots for each pair of attributes. (a) Does any pair of attributes seem to be correlated? (b) Which attributes seem to be the most/least linked to the risk for churning? Summarize in a table your findings concerning the predictive value of each attribute with respect to the churn attribute. (c) Investigate also possible multivariate associations of attributes with the class attribute, i.e. study scatter plots of two attributes X and Y and try to identify possible high(low)-churn areas (if any). 4. Are there any variables that can be eliminated? Justify your answer and motivate the possible benefits of doing so (if any). Compare your conclusions with the results obtained by using some attribute selection filters in Weka. Do not forget to indicate which filter(s) you have used and to give a brief description. 2 The course web page has links to some available easy-to-use tools that build box plots. 2

3 1.2 Data Preprocessing The second step is to preprocess the data such that the transformed data is in a more suitable form for the mining algorithms. You can find below some aspects you should consider. Obviously, you may consider other aspects of data preprocessing, such as creating new attributes from the existing ones. 1. Attribute selection. Select one or two subsets of attributes with good predicting capability and motivate your choices (recall your conclusions for points 3 and 4 above). 2. Handling missing values. 3. Eliminating outliers. 4. Discretization of numeric attributes. Do not forget to indicate which discretization method did you use, if discretization is used. 5. Normalization. Motivate the need of normalization (if needed at all). 1.3 Clustering Investigate the use of clustering techniques, for instance with K-means to segment the customers in order to get groups of customers with similar service usage characteristics. Profile the clusters, i.e. what can you learn about the types of records falling into each cluster (how can you describe in plain English each cluster). Note that this problem can be addressed by building classification rules (e.g. association rule mining can be used) that predicts the cluster attribute. If K-means is used, justify also the number of clusters you have chosen. Remember that measures such as the Sum of Square Errors (SSE) and cross-tabulating the clustering and class labelling (churn) can be useful to access the quality of the clusters Compare the clustering results with the two classes existing in the data, i.e. churners and non-churners. 1.4 Building Classification Models for the Data The third step is to use some classifier algorithms available in Weka to discover hidden patterns in the data. You should repeat the steps described below for each of the datasets you created during preprocessing. Include in your report a brief description of the classification algorithms you have used. 3

4 1. Start with OneR classifier. (a) What can you conclude? Compare your conclusions with your previous conclusions obtained in section 1.1. (b) Compare the accuracy of the classifier on the training set with the accuracy estimation obtained through 10 fold-cross validation. How do you explain the difference (if any)? 2. Use JRip classifier, i.e. the Weka version of the rule classifier RIPPER. (a) Build a classifier with and without rule pruning. Which one is preferable? Motivate your answer. (b) Summarize the patterns you obtained and compare with your previous conclusions. 3. Use J48 classifier, i.e. the Weka version of the decision tree classifier C4.5. Do not forget to append the tree obtained to your report. (a) Investigate the use of different J48 s parameters such as pruning and minimum number of records in the leaves. (b) Describe the patterns you obtained and compare with your previous conclusions. 4. Use of association rule mining (ARM) to build high confidence rules predicting churn. (a) Investigate the use of the Apriori algorithm. Remember that the implementation of Apriori algorithm available in Weka can only cope with discrete attributes. (b) Describe the patterns you obtained and compare with your previous conclusions Models Performance In the previous step, you have built several models. You need to assess the quality of the models and compare the different models. 1. Weka outputs several performance measures. Choose some of the performance measures and motivate your choice. 2. Summarize in a table the performance measures for each classifier and each dataset. 3. What can you conclude? 4

5 1.5 Conclusions Summarize your major findings and describe which risk factors for churning have you found in the data, after analyzing it. Based on your findings, indicate also some practical measures the company can put in action to retain those customers who are likely to churn. This last aspect is after all what the company is interested in. Lycka till! 5

6 Appendix You can find below a brief description of the meaning of each attribute. 1. State: discrete variable that indicates the state where the customer lives. 2. Account Length: integer variable that indicates how long the account has been active. 3. Area Code. 4. Phone Number. 5. Inter Plan: binary variable indicating whether the customer has an international plan. 6. Voic Plan: binary variable indicating whether the customer has a voice mail plan. 7. No of Vmail Mesgs: integer variable that indicates the number of voice mail messages. 8. Total Day Min: continuous variable that indicates number of minutes the customer used the service during day time. 9. Total Day Calls: integer variable that indicates the number of calls during day time. 10. Total Day Charge: continuous variable that indicates how much was charged for using the service during day time. 11. Total Evening Min: continuous variable that indicates number of minutes the customer used the service during evening time 12. Total Evening Calls: integer variable that indicates the number of calls during evening time. 13. Total Evening Charge: continuous variable that indicates how much was charged for using the service during evening time. 14. Total Night Min: continuous variable that indicates number of minutes the customer used the service during night time. 15. Total Night Calls: integer variable that indicates the number of calls during night time. 16. Total Night Charge: continuous variable that indicates how much was charged for using the service during night time. 17. Total Int Min: continuous variable that indicates number of minutes the customer used the service to make international calls. 6

7 18. Total Int Calls: integer variable that indicates the number of international calls. 19. Total Int Charge: continuous variable that indicates how much was charged for international calls. 20. No of Calls Customer Service: integer variable that indicates the number of calls to customer support service. 21. Churn: binary class variable. 7

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2007 Prentice Hall Chapter Objectives To explain how knowledge is discovered

More information

BUSINESS DATA MINING (IDS 572) Please include the names of all team-members in your write up and in the name of the file.

BUSINESS DATA MINING (IDS 572) Please include the names of all team-members in your write up and in the name of the file. BUSINESS DATA MINING (IDS 572) HOMEWORK 4 DUE DATE: TUESDAY, APRIL 10 AT 3:20 PM Please provide succinct answers to the questions below. You should submit an electronic pdf or word file in blackboard.

More information

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 02 Data Mining Process Welcome to the lecture 2 of

More information

Data Mining, CSCI 347, Fall 2017 Exam 1, Sept. 22

Data Mining, CSCI 347, Fall 2017 Exam 1, Sept. 22 Data Mining, CSCI 347, Fall 2017 Exam 1, Sept. 22 1. Supervised learning is best described by: (4 pts.) a. Weka learning which requires user input b. Weka learning which focuses on clusters c. Learning

More information

Data Mining Applications with R

Data Mining Applications with R Data Mining Applications with R Yanchang Zhao Senior Data Miner, RDataMining.com, Australia Associate Professor, Yonghua Cen Nanjing University of Science and Technology, China AMSTERDAM BOSTON HEIDELBERG

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

FINAL PROJECT REPORT IME672. Group Number 6

FINAL PROJECT REPORT IME672. Group Number 6 FINAL PROJECT REPORT IME672 Group Number 6 Ayushya Agarwal 14168 Rishabh Vaish 14553 Rohit Bansal 14564 Abhinav Sharma 14015 Dil Bag Singh 14222 Introduction Cell2Cell, The Churn Game. The cellular telephone

More information

Churn Prediction Model Using Linear Discriminant Analysis (LDA)

Churn Prediction Model Using Linear Discriminant Analysis (LDA) IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 5, Ver. IV (Sep. - Oct. 2016), PP 86-93 www.iosrjournals.org Churn Prediction Model Using Linear Discriminant

More information

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration KnowledgeSTUDIO Advanced Modeling for Better Decisions Companies that compete with analytics are looking for advanced analytical technologies that accelerate decision making and identify opportunities

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

SAS Visual Statistics 8.1: The New Self-Service Easy Analytics Experience Xiangxiang Meng, Cheryl LeSaint, Don Chapman, SAS Institute Inc.

SAS Visual Statistics 8.1: The New Self-Service Easy Analytics Experience Xiangxiang Meng, Cheryl LeSaint, Don Chapman, SAS Institute Inc. ABSTRACT Paper SAS5780-2016 SAS Visual Statistics 8.1: The New Self-Service Easy Analytics Experience Xiangxiang Meng, Cheryl LeSaint, Don Chapman, SAS Institute Inc. In today's Business Intelligence world,

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Business Customer Value Segmentation for strategic targeting in the utilities industry using SAS

Business Customer Value Segmentation for strategic targeting in the utilities industry using SAS Paper 1772-2018 Business Customer Value Segmentation for strategic targeting in the utilities industry using SAS Spyridon Potamitis, Centrica; Paul Malley, Centrica ABSTRACT Numerous papers have discussed

More information

Data Visualization. Prof.Sushila Aghav-Palwe

Data Visualization. Prof.Sushila Aghav-Palwe Data Visualization By Prof.Sushila Aghav-Palwe Importance of Graphs in BI Business intelligence or BI is a technology-driven process that aims at collecting data and analyze it to extract actionable insights

More information

Preface to the third edition Preface to the first edition Acknowledgments

Preface to the third edition Preface to the first edition Acknowledgments Contents Foreword Preface to the third edition Preface to the first edition Acknowledgments Part I PRELIMINARIES XXI XXIII XXVII XXIX CHAPTER 1 Introduction 3 1.1 What Is Business Analytics?................

More information

Indicators of Hydrologic Alteration (IHA) software, Version 7.1

Indicators of Hydrologic Alteration (IHA) software, Version 7.1 Indicators of Hydrologic Alteration (IHA) software, Version 7.1 IHA Software Analyzes hydrologic characteristics and their changes over time. Computes 67 ecologicallyrelevant flow statistics using daily

More information

Predicting Customer Loyalty Using Data Mining Techniques

Predicting Customer Loyalty Using Data Mining Techniques Predicting Customer Loyalty Using Data Mining Techniques Simret Solomon University of South Africa (UNISA), Addis Ababa, Ethiopia simrets2002@yahoo.com Tibebe Beshah School of Information Science, Addis

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

SAS Enterprise Miner 5.3 for Desktop

SAS Enterprise Miner 5.3 for Desktop Fact Sheet SAS Enterprise Miner 5.3 for Desktop A fast, powerful data mining workbench delivered to your desktop What does SAS Enterprise Miner for Desktop do? SAS Enterprise Miner for Desktop is a complete

More information

Analysis of Factors Affecting Resignations of University Employees

Analysis of Factors Affecting Resignations of University Employees Analysis of Factors Affecting Resignations of University Employees An exploratory study was conducted to identify factors influencing voluntary resignations at a large research university over the past

More information

CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT

CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT CHAPTER 8 APPLICATION OF CLUSTERING TO CUSTOMER RELATIONSHIP MANAGEMENT 8.1 Introduction Customer Relationship Management (CRM) is a process that manages the interactions between a company and its customers.

More information

Chapter 8 Analytical Procedures

Chapter 8 Analytical Procedures Slide 8.1 Principles of Auditing: An Introduction to International Standards on Auditing Chapter 8 Analytical Procedures Rick Hayes, Hans Gortemaker and Philip Wallage Slide 8.2 Analytical procedures Analytical

More information

Distinguish between different types of numerical data and different data collection processes.

Distinguish between different types of numerical data and different data collection processes. Level: Diploma in Business Learning Outcomes 1.1 1.3 Distinguish between different types of numerical data and different data collection processes. Introduce the course by defining statistics and explaining

More information

part 1 Marketing Research and the Research Process 1 part 5 Data Analysis and Interpretation 349 2, Determine Research Design 57

part 1 Marketing Research and the Research Process 1 part 5 Data Analysis and Interpretation 349 2, Determine Research Design 57 B R I E F C O N T E N T S part 1 Marketing Research and the Research Process 1 part 1 Marketing Research: It's Everywhere! 2 2 Alternative Approaches to Marketing Intelligence 18 3 The Research Process

More information

Predicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining

Predicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining Predicting Factors which Determine Customer Transaction Pattern and Transaction Type Using Data Mining Frehiywot Nega HiLCoE, Computer Science Programme, Ethiopia fr.nega@gmail.com Tibebe Beshah HiLCoE,

More information

From Profit Driven Business Analytics. Full book available for purchase here.

From Profit Driven Business Analytics. Full book available for purchase here. From Profit Driven Business Analytics. Full book available for purchase here. Contents Foreword xv Acknowledgments xvii Chapter 1 A Value-Centric Perspective Towards Analytics 1 Introduction 1 Business

More information

SAS BIG DATA ANALYTICS INCREASING YOUR COMPETITIVE EDGE

SAS BIG DATA ANALYTICS INCREASING YOUR COMPETITIVE EDGE SAS BIG DATA ANALYTICS INCREASING YOUR COMPETITIVE EDGE SAS VISUAL ANALYTICS STATE OF THE ART SOLUTION FOR FASTER, SMARTER DECISIONS AIMED AT THE MASSES Data visualization Approachable analytics Robust

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

Building the In-Demand Skills for Analytics and Data Science Course Outline

Building the In-Demand Skills for Analytics and Data Science Course Outline Day 1 Module 1 - Predictive Analytics Concepts What and Why of Predictive Analytics o Predictive Analytics Defined o Business Value of Predictive Analytics The Foundation for Predictive Analytics o Statistical

More information

Advanced Higher Statistics

Advanced Higher Statistics Advanced Higher Statistics 2018-19 Advanced Higher Statistics - 3 Unit Assessments - Prelim - Investigation - Final Exam (3 Hours) 1 Advanced Higher Statistics Handouts - Data Booklet - Course Outlines

More information

Statistics, Data Analysis, and Decision Modeling

Statistics, Data Analysis, and Decision Modeling - ' 'li* Statistics, Data Analysis, and Decision Modeling T H I R D E D I T I O N James R. Evans University of Cincinnati PEARSON Prentice Hall Upper Saddle River, New Jersey 07458 CONTENTS Preface xv

More information

DATA ANALYTICS WITH R, EXCEL & TABLEAU

DATA ANALYTICS WITH R, EXCEL & TABLEAU Learn. Do. Earn. DATA ANALYTICS WITH R, EXCEL & TABLEAU COURSE DETAILS centers@acadgild.com www.acadgild.com 90360 10796 Brief About this Course Data is the foundation for technology-driven digital age.

More information

BUSS1020. Quantitative Business Analysis. Lecture Notes

BUSS1020. Quantitative Business Analysis. Lecture Notes BUSS1020 Quantitative Business Analysis Lecture Notes Week 1: Unit Introduction Introduction Analytics is the discover and communication of meaningful patterns in data. Statistics is the study of the collection,

More information

MAS187/AEF258. University of Newcastle upon Tyne

MAS187/AEF258. University of Newcastle upon Tyne MAS187/AEF258 University of Newcastle upon Tyne 2005-6 Contents 1 Collecting and Presenting Data 5 1.1 Introduction...................................... 5 1.1.1 Examples...................................

More information

New Customer Acquisition Strategy

New Customer Acquisition Strategy Page 1 New Customer Acquisition Strategy Based on Customer Profiling Segmentation and Scoring Model Page 2 Introduction A customer profile is a snapshot of who your customers are, how to reach them, and

More information

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate

More information

Proactive Data Mining Using Decision Trees

Proactive Data Mining Using Decision Trees 2012 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel Proactive Data Mining Using Decision Trees Haim Dahan and Oded Maimon Dept. of Industrial Engineering Tel-Aviv University Tel

More information

Data Mining using SPSS Modeler 2nd Session

Data Mining using SPSS Modeler 2nd Session IBM Taiwan Claire Lin Data Mining using SPSS Modeler 2nd Session 2014 IBM Corporation Agenda Data Mining Process Business Understanding Data Understanding Live Demo and Exercise Data Preparation and Manipulation

More information

Mining Churning Behaviors and Developing Retention Strategies Based on A Partial Least Square (PLS) Model

Mining Churning Behaviors and Developing Retention Strategies Based on A Partial Least Square (PLS) Model Accepted Manuscript Mining Churning Behaviors and Developing Retention Strategies Based on A Partial Least Square (PLS) Model Hyeseon Lee, Yeonhee Lee, Hyunbo Cho, Kwanyoung Im, Yong Seog Kim PII: S0167-9236(11)00125-4

More information

Why Learn Statistics?

Why Learn Statistics? Why Learn Statistics? So you are able to make better sense of the ubiquitous use of numbers: Business memos Business research Technical reports Technical journals Newspaper articles Magazine articles Basic

More information

One Year Executive Program in Applied Business Analytics

One Year Executive Program in Applied Business Analytics Vivekanand Education Society Institute of Management Studies & Research Big Data & Advanced Analytics One Year Executive Program in Applied Business Analytics In a rapidly changing global environment,

More information

Linear model to forecast sales from past data of Rossmann drug Store

Linear model to forecast sales from past data of Rossmann drug Store Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge

More information

Determing the Business Value of Business Intelligence with Data Mining Methods

Determing the Business Value of Business Intelligence with Data Mining Methods Determing the Business Value of Business Intelligence with Data Mining Methods Karin Hartl, Olaf Jacob Department of Information Management University of Applied Sciences Neu-Ulm (HNU) Neu-Ulm, Germany

More information

Predicting the Influencers on Wireless Subscriber Churn

Predicting the Influencers on Wireless Subscriber Churn Predicting the Influencers on Wireless Subscriber Churn Sara Motahari 1, Taeho Jung 2, Hui Zang 3, Krishna Janakiraman 4, Xiang-Yang Li 2, Kevin Soo Hoo 1 1 Sprint Advanced Analysis Lab, Burlingame, CA,

More information

SPSS 14: quick guide

SPSS 14: quick guide SPSS 14: quick guide Edition 2, November 2007 If you would like this document in an alternative format please ask staff for help. On request we can provide documents with a different size and style of

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION v171025 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

BAR CHARTS. Display frequency distributions for nominal or ordinal data. Ej. Injury deaths of 100 children, ages 5-9, USA,

BAR CHARTS. Display frequency distributions for nominal or ordinal data. Ej. Injury deaths of 100 children, ages 5-9, USA, Graphs BAR CHARTS. Display frequency distributions for nominal or ordinal data. Ej. Injury deaths of 100 children, ages 5-9, USA, 1980-85. HISTOGRAMS. Display frequency distributions for continuous or

More information

KDD Challenge Orange Labs R&D

KDD Challenge Orange Labs R&D KDD Challenge 2009 Orange Labs R&D Vincent Lemaire, Research & Development 03/19/2009, presentation to reading group http://perso.rd.francetelecom.fr/lemaire/ contents Oranges Labs CRM at Orange Problems

More information

Relational Data Mining and Web Mining

Relational Data Mining and Web Mining Relational Data Mining and Web Mining Prof. Dr. Daning Hu Department of Informatics University of Zurich Nov 20th, 2012 Outline Introduction: Big Data Relational Data Mining Web Mining Ref Book: Web Intelligence,

More information

Heart Diseases Diagnosis Using Data Mining Techniques

Heart Diseases Diagnosis Using Data Mining Techniques Heart Diseases Diagnosis Using Data Mining Techniques 1 Dr. R. Muralidharan, 2 D. Vinodhini 1 M.Sc. M.Phil. MCA., Ph.D. Vice Principal & HOD in CS Rathinam College of Arts & Science Coimbatore, India 2

More information

Metaheuristics. Approximate. Metaheuristics used for. Math programming LP, IP, NLP, DP. Heuristics

Metaheuristics. Approximate. Metaheuristics used for. Math programming LP, IP, NLP, DP. Heuristics Metaheuristics Meta Greek word for upper level methods Heuristics Greek word heuriskein art of discovering new strategies to solve problems. Exact and Approximate methods Exact Math programming LP, IP,

More information

Questionnaires/Surveys

Questionnaires/Surveys Questionnaires/Surveys International Atomic Energy Agency Exercise Complete mini engagement survey Hand in to course leaders to compile results Why surveys? To capture attitudes and impressions of a large

More information

AN INTELLIGENT AGENT BASED TALENT EVALUATION SYSTEM USING A KNOWLEDGE BASE

AN INTELLIGENT AGENT BASED TALENT EVALUATION SYSTEM USING A KNOWLEDGE BASE International Journal of Information Technology and Knowledge Management July-December 2010, Volume 2, No. 2, pp. 231-236 AN INTELLIGENT AGENT BASED TALENT EVALUATION SYSTEM USING A KNOWLEDGE BASE R.Lakshmipathi1,

More information

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Mahsa Naseri and Simone A. Ludwig Abstract In service-oriented environments, services with different functionalities are combined

More information

Predicting ratings of peer-generated content with personalized metrics

Predicting ratings of peer-generated content with personalized metrics Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu

More information

[Kaur*, 5(3): March,2016] ISSN: (I2OR), Publication Impact Factor: 3.785

[Kaur*, 5(3): March,2016] ISSN: (I2OR), Publication Impact Factor: 3.785 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY HYBRID APPROACH OF BOOSTED TREE FOR CHURN PREDICTION IN MATLAB Navneet Kaur *, Naseeb Singh * M.Tech Student, Department of IT,

More information

STAT 2300: Unit 1 Learning Objectives Spring 2019

STAT 2300: Unit 1 Learning Objectives Spring 2019 STAT 2300: Unit 1 Learning Objectives Spring 2019 Unit tests are written to evaluate student comprehension, acquisition, and synthesis of these skills. The problems listed as Assigned MyStatLab Problems

More information

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM)

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) OUTLINE FOR THE POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM) Module Subject Topics Learning outcomes Delivered by Exploratory & Visualization Framework Exploratory Data Collection and

More information

An Implementation of genetic algorithm based feature selection approach over medical datasets

An Implementation of genetic algorithm based feature selection approach over medical datasets An Implementation of genetic algorithm based feature selection approach over medical s Dr. A. Shaik Abdul Khadir #1, K. Mohamed Amanullah #2 #1 Research Department of Computer Science, KhadirMohideen College,

More information

Ranking Potential Customers based on GroupEnsemble method

Ranking Potential Customers based on GroupEnsemble method Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,

More information

Churn Analysis and Plan Recommendation for Telecom Operators

Churn Analysis and Plan Recommendation for Telecom Operators Journal for Research Volume 02 Issue 03 May 2016 ISSN: 2395-7549 Churn Analysis and Plan Recommendation for Telecom Operators Ashwini S Wali Student Department of Information Science Engineering M S Ramaiah

More information

Learn What s New. Statistical Software

Learn What s New. Statistical Software Statistical Software Learn What s New Upgrade now to access new and improved statistical features and other enhancements that make it even easier to analyze your data. The Assistant Let Minitab s Assistant

More information

Carry out rule-based statistical analysis

Carry out rule-based statistical analysis Overview This unit is about carrying out a range of different types of statistical analysis for internal and external clients under supervision. Applicable NOS Unit SSC/ N 2101 Unit Code SSC/ N 2101 Unit

More information

Privacy-preserving Datamining: Differential Privacy And Applications

Privacy-preserving Datamining: Differential Privacy And Applications Privacy-preserving Datamining: Differential Privacy And Applications Christine Task PhD Candidate Computer Science Department Purdue University Advisor: Chris Clifton 1 In The Era of Big Data... 2 Presentation

More information

Describe the impact that small numbers may have on data

Describe the impact that small numbers may have on data 1 Describe the impact that small numbers may have on data analysis and successfully identify small numbers in a data set Discuss the appropriate use of control charts, comparison charts, scatter plots

More information

Data Mining in CRM THE CRM STRATEGY

Data Mining in CRM THE CRM STRATEGY CHAPTER ONE Data Mining in CRM THE CRM STRATEGY Customers are the most important asset of an organization. There cannot be any business prospects without satisfied customers who remain loyal and develop

More information

Module - 01 Lecture - 03 Descriptive Statistics: Graphical Approaches

Module - 01 Lecture - 03 Descriptive Statistics: Graphical Approaches Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B. Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institution of Technology, Madras

More information

Machine learning techniques for customer churn prediction in banking environments

Machine learning techniques for customer churn prediction in banking environments Università degli Studi di Padova DIPARTIMENTO DI INGEGNERIA DELL INFORMAZIONE Corso di Laurea Magistrale in Ingegneria Informatica Machine learning techniques for customer churn prediction in banking environments

More information

ON THE USE OF BASE CHOICE STRATEGY FOR TESTING INDUSTRIAL CONTROL SOFTWARE

ON THE USE OF BASE CHOICE STRATEGY FOR TESTING INDUSTRIAL CONTROL SOFTWARE School of Innovation Design and Engineering Västerås, Sweden Thesis for the Degree of Bachelor of Science in Computer Science 15.0 credits ON THE USE OF BASE CHOICE STRATEGY FOR TESTING INDUSTRIAL CONTROL

More information

Week 1 Unit 6: Initial Data Analysis & Exploratory Data Analysis

Week 1 Unit 6: Initial Data Analysis & Exploratory Data Analysis Week 1 Unit 6: Initial Data Analysis & Exploratory Data Analysis Initial data analysis Initial data analysis (IDA) is an essential part of nearly every analysis Problem Solving, A Statisticians Guide Christopher

More information

Telecom Churn Prediction Model Using Data Mining Techniques

Telecom Churn Prediction Model Using Data Mining Techniques Telecom Churn Prediction Model Using Data Mining Techniques Sufian Albadawi, Khalid Latif, Muhammad Moazam Fraz, Fatin Kharbat Abstract Recently several churn prediction models are being introduced that

More information

Application of Data Mining Techniques for Crop Productivity Prediction

Application of Data Mining Techniques for Crop Productivity Prediction Application of Data Mining Techniques for Crop Productivity Prediction Zekarias Diriba zekariaskifle@gmail.com Berhanu Borena PhD Candidate, Addis Ababa University, Ethiopia berhanuborena@gmail.com Abstract

More information

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS

THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS THE LEAD PROFILE AND OTHER NON-PARAMETRIC TOOLS TO EVALUATE SURVEY SERIES AS LEADING INDICATORS Anirvan Banerji New York 24th CIRET Conference Wellington, New Zealand March 17-20, 1999 Geoffrey H. Moore,

More information

Mirco Nanni KDD Lab,ISTI-CNR, Pisa Churn analysis. Introduction

Mirco Nanni KDD Lab,ISTI-CNR, Pisa Churn analysis. Introduction Mirco Nanni KDD Lab,ISTI-CNR, Pisa mirco.nanni@isti.cnr.it Churn analysis Introduction Context Activities and services characterized by: Continuous relationships between consumer/customer and provider

More information

Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry

Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry Michael C. Mozer + * Richard Wolniewicz* Robert Dodier* Lian Yan* David B. Grimes + * Eric Johnson*

More information

KNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA

KNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA DOI: 10.21917/ijsc.2011.0022 KNOWLEDGE ENGINEERING TO AID THE RECRUITMENT PROCESS OF AN INDUSTRY BY IDENTIFYING SUPERIOR SELECTION CRITERIA N. Sivaram 1 and K. Ramar 2 1 Department of Computer Science

More information

INFORMS Analytics Maturity Model. User Guide

INFORMS Analytics Maturity Model. User Guide INFORMS Analytics Maturity Model User Guide Introducing the INFORMS Scorecard INFORMS, the leading association for advanced analytics professionals, seeks to advance the practice, research, methods, and

More information

PROVEN PRACTICES FOR PREDICTIVE MODELING

PROVEN PRACTICES FOR PREDICTIVE MODELING PROVEN PRACTICES FOR PREDICTIVE MODELING BROUGHT TO YOU BY SAS CUSTOMER LOYALTY CONTRIBUTIONS FROM: DARIUS BAER DAVID OGDEN DOUG WIELENGA MARY-ELIZABETH ( M-E ) EDDLESTONE PRINCIPAL SYSTEMS ENGINEER, ANALYTICS

More information

Online Student Guide Scatter Diagrams

Online Student Guide Scatter Diagrams Online Student Guide Scatter Diagrams OpusWorks 2016, All Rights Reserved 1 Table of Contents LEARNING OBJECTIVES... 3 INTRODUCTION... 3 UNIVARIATE AND BIVARIATE DATA... 3 CORRELATION... 4 POSITIVE OR

More information

Identifying Clostridium Difficile in the ICU Using Bayesian Networks

Identifying Clostridium Difficile in the ICU Using Bayesian Networks Identifying Clostridium Difficile in the ICU Using Bayesian Networks Phenome Based Analysis Peijin Zhang Second Annual MIT PRIMES Conference May 20th, 2012 MIT Primes Conference - Peijin Zhang 1 of 21

More information

Understanding Churn. Context: retail sales

Understanding Churn. Context: retail sales Context: retail sales Dataset Real data describing customers and transactions Several department stores Purchases performed over several years Includes product details, customer ID articolo.csv cliente.csv

More information

How to view Results with Scaffold. Proteomics Shared Resource

How to view Results with Scaffold. Proteomics Shared Resource How to view Results with Scaffold Proteomics Shared Resource Starting out Download Scaffold from http://www.proteomes oftware.com/proteom e_software_prod_sca ffold_download.html Follow installation instructions

More information

CASE STUDY: WEB-DOMAIN PRICE PREDICTION ON THE SECONDARY MARKET (4-LETTER CASE)

CASE STUDY: WEB-DOMAIN PRICE PREDICTION ON THE SECONDARY MARKET (4-LETTER CASE) CASE STUDY: WEB-DOMAIN PRICE PREDICTION ON THE SECONDARY MARKET (4-LETTER CASE) MAY 2016 MICHAEL.DOPIRA@ DATA-TRACER.COM TABLE OF CONTENT SECTION 1 Research background Page 3 SECTION 2 Study design Page

More information

Application of Association Rule Mining in Supplier Selection Criteria

Application of Association Rule Mining in Supplier Selection Criteria Vol:, No:4, 008 Application of Association Rule Mining in Supplier Selection Criteria A. Haery, N. Salmasi, M. Modarres Yazdi, and H. Iranmanesh International Science Index, Industrial and Manufacturing

More information

IBM SPSS Decision Trees

IBM SPSS Decision Trees IBM SPSS Decision Trees 20 IBM SPSS Decision Trees Easily identify groups and predict outcomes Highlights With SPSS Decision Trees you can: Identify groups, segments, and patterns in a highly visual manner

More information

How to view Results with. Proteomics Shared Resource

How to view Results with. Proteomics Shared Resource How to view Results with Scaffold 3.0 Proteomics Shared Resource An overview This document is intended to walk you through Scaffold version 3.0. This is an introductory guide that goes over the basics

More information

AP Statistics Scope & Sequence

AP Statistics Scope & Sequence AP Statistics Scope & Sequence Grading Period Unit Title Learning Targets Throughout the School Year First Grading Period *Apply mathematics to problems in everyday life *Use a problem-solving model that

More information

Business Quantitative Analysis [QU1] Examination Blueprint

Business Quantitative Analysis [QU1] Examination Blueprint Business Quantitative Analysis [QU1] Examination Blueprint 2014-2015 Purpose The Business Quantitative Analysis [QU1] examination has been constructed using an examination blueprint. The blueprint, also

More information

Case studies in Data Mining & Knowledge Discovery

Case studies in Data Mining & Knowledge Discovery Case studies in Data Mining & Knowledge Discovery Knowledge Discovery is a process Data Mining is just a step of a (potentially) complex sequence of tasks KDD Process Data Mining & Knowledge Discovery

More information

IM S5028. Architecture for Analytical CRM. Architecture for Analytical CRM. Customer Analytics. Data Mining for CRM: an overview.

IM S5028. Architecture for Analytical CRM. Architecture for Analytical CRM. Customer Analytics. Data Mining for CRM: an overview. Customer Analytics Data Mining for CRM: an overview Architecture for Analytical CRM customer contact points Retrospective analysis tools OLAP Query Reporting Customer Data Warehouse Operational systems

More information

Biostatistics 208 Data Exploration

Biostatistics 208 Data Exploration Biostatistics 208 Data Exploration Dave Glidden Professor of Biostatistics Univ. of California, San Francisco January 8, 2008 http://www.biostat.ucsf.edu/biostat208 Organization Office hours by appointment

More information

Applying Specific Clusterization and Fingerprint Density Distribution with Genetic Algorithm Overall Tuning in External Plagiarism Detection

Applying Specific Clusterization and Fingerprint Density Distribution with Genetic Algorithm Overall Tuning in External Plagiarism Detection Applying Specific Clusterization and Fingerprint Density Distribution with Genetic Algorithm Overall Tuning in External Plagiarism Detection Notebook for PAN at CLEF 2012 Yurii Palkovskii, Alexei Belov

More information

Session 15 Business Intelligence: Data Mining and Data Warehousing

Session 15 Business Intelligence: Data Mining and Data Warehousing 15.561 Information Technology Essentials Session 15 Business Intelligence: Data Mining and Data Warehousing Copyright 2005 Chris Dellarocas and Thomas Malone Adapted from Chris Dellarocas, U. Md. Outline

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104A ZB (279 004A) BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences, the

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

{saharonr, lastgift>35

{saharonr, lastgift>35 KDD-Cup 99 : Knowledge Discovery In a Charitable Organization s Donor Database Saharon Rosset and Aron Inger Amdocs (Israel) Ltd. 8 Hapnina St. Raanana, Israel, 43000 {saharonr, aroni}@amdocs.com 1. INTRODUCTION

More information