Evaluation next steps Lift and Costs

Size: px
Start display at page:

Download "Evaluation next steps Lift and Costs"

Transcription

1 Evaluation next steps Lift and Costs

2 Outline Lift and Gains charts *ROC Cost-sensitive learning Evaluation for numeric predictions 2

3 Application Example: Direct Marketing Paradigm Find most likely prospects to contact Not everybody needs to be contacted Number of targets is usually much smaller than number of prospects Typical Applications retailers, catalogues, direct mail (and ) customer acquisition, cross-sell, attrition prediction... 3

4 Direct Marketing Evaluation Accuracy on the entire dataset is not the right measure Approach develop a target model score all prospects and rank them by decreasing score select top P% of prospects for action How to decide what is the best selection? 4

5 Model-Sorted List Use a model to assign score to each customer Sort customers by decreasing score Expect more targets (hits) near the top of the list No Score Target CustID Age Y N Y Y N N N hits in top 5% of the list If there 15 targets overall, then top 5 has 3/15=20% of targets 5

6 CPH (Cumulative Pct Hits) Definition: CPH(P,M) = % of all targets in the first P% of the list scored by model M CPH frequently called Gains Cumulative % Hits % of random list have 5% of targets Random Pct list Q: What is expected value for CPH(P,Random)? A: Expected value for CPH(P,Random) = P

7 CPH: Random List vs Modelranked list Cumulative % Hits % of random list have 5% of targets, but 5% of model ranked list have 21% of targets CPH(5%,model)=21%. Random Model Pct list

8 Lift Lift(P,M) = CPH(P,M) / P Lift (at 5%) = 21% / 5% = 4.2 better than random Lift Note: Some (including Witten & Eibe) use Lift for what we call CPH P -- percent of the list

9 Lift Properties Q: Lift(P,Random) = A: 1 (expected value, can vary) Q: Lift(100%, M) = A: 1 (for any model M) Q: Can lift be less than 1? A: yes, if the model is inverted (all the non-targets precede targets in the list) Generally, a better model has higher lift 9

10 ROC curves ROC curves Stands for receiver operating characteristic Used in signal detection to show tradeoff between hit rate and false alarm rate over noisy channel Plots TP Rate vs FP Rate True-positive rate is also known as recall TP/(TP+FN): fraction of positives that you correctly classify False Positive rate FP/(FP+TN): fraction of negatives that you incorrectly classify witten & eibe 10

11 A sample ROC curve Jagged curve one set of test data witten & eibe Smooth curve use cross-validation 11

12 ROC curves for two schemes witten & eibe For a small, focused sample, use method A only need to act on a few predictions For a larger one, use method B In between, choose between A and B with appropriate probabilities 13

13 Cost Sensitive Learning There are two types of errors Actual class Yes Yes Predicted class TP: True positive No FN: False negative No Machine Learning methods usually minimize FP+FN Direct marketing maximizes TP FP: False positive TN: True negative 15

14 Different Costs In practice, true positive and false negative errors often incur different costs Examples: Medical diagnostic tests: does X have leukemia? Loan decisions: approve mortgage for X? Web mining: will X click on this link? Promotional mailing: will X buy the product? 16

15 Cost-sensitive learning Most learning schemes do not perform cost-sensitive learning They generate the same classifier no matter what costs are assigned to the different classes Example: standard decision tree learner Simple methods for cost-sensitive learning: Re-sampling of instances according to costs Weighting of instances according to costs Change probability threshold 17

16 Evaluating numeric prediction Same strategies: independent test set, cross-validation, significance tests, etc. Difference: error measures Actual target values: a 1 a 2 a n Predicted target values: p 1 p 2 p n Most popular measure: mean-squared error 2 2 ( p1 a1)... ( pn an) n Easy to manipulate mathematically witten & eibe 18

17 Other measures The root mean-squared error : n) ( p a1)... ( pn a n The mean absolute error is less sensitive to outliers than the mean-squared error: p1 a1... pn an n witten & eibe 19

Chapter 5 Evaluating Classification & Predictive Performance

Chapter 5 Evaluating Classification & Predictive Performance Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available

More information

Overview. Presenter: Bill Cheney. Audience: Clinical Laboratory Professionals. Field Guide To Statistics for Blood Bankers

Overview. Presenter: Bill Cheney. Audience: Clinical Laboratory Professionals. Field Guide To Statistics for Blood Bankers Field Guide To Statistics for Blood Bankers A Basic Lesson in Understanding Data and P.A.C.E. Program: 605-022-09 Presenter: Bill Cheney Audience: Clinical Laboratory Professionals Overview Statistics

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

A News-Based Portfolio Management System

A News-Based Portfolio Management System Development and Application of Data Mining and Learning Systems Universität Bonn May 21, 2014 Goal Goal Based on the history of the stock and the corresponding news, predict its direction with a certain

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge MODELING THE EXPERT An Introduction to Logistic Regression 15.071 The Analytics Edge Ask the Experts! Critical decisions are often made by people with expert knowledge Healthcare Quality Assessment Good

More information

Feature Selection in Pharmacogenetics

Feature Selection in Pharmacogenetics Feature Selection in Pharmacogenetics Application to Calcium Channel Blockers in Hypertension Treatment IEEE CIS June 2006 Dr. Troy Bremer Prediction Sciences Pharmacogenetics Great potential SNPs (Single

More information

Machine Learning Techniques For Particle Identification

Machine Learning Techniques For Particle Identification Machine Learning Techniques For Particle Identification 06.March.2018 I Waleed Esmail, Tobias Stockmanns, Michael Kunkel, James Ritman Institut für Kernphysik (IKP), Forschungszentrum Jülich Outlines:

More information

ScienceDirect. An Efficient CRM-Data Mining Framework for the Prediction of Customer Behaviour

ScienceDirect. An Efficient CRM-Data Mining Framework for the Prediction of Customer Behaviour Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 725 731 International Conference on Information and Communication Technologies (ICICT 2014) An Efficient CRM-Data

More information

Supply Chain Management Case Study

Supply Chain Management Case Study Supply Chain Management Case Study ----- Can you predict product backorders? Part backorders is a common supply chain problem. A backorder is a retailer s order for a part that is temporarily out of stock

More information

Segmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches

Segmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches Segmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches Alexander Tuzhilin Stern School of Business New York University (joint work with Tianyi Jiang)

More information

A Semi-automated Peer-review System Bradly Alicea Orthogonal Research

A Semi-automated Peer-review System Bradly Alicea Orthogonal Research A Semi-automated Peer-review System Bradly Alicea bradly.alicea@ieee.org Orthogonal Research Abstract A semi-supervised model of peer review is introduced that is intended to overcome the bias and incompleteness

More information

A Comparative Study of Filter-based Feature Ranking Techniques

A Comparative Study of Filter-based Feature Ranking Techniques Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,

More information

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set

More information

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites

Web Customer Modeling for Automated Session Prioritization on High Traffic Sites Web Customer Modeling for Automated Session Prioritization on High Traffic Sites Nicolas Poggi 1, Toni Moreno 2,3, Josep Lluis Berral 1, Ricard Gavaldà 4, and Jordi Torres 1,2 1 Computer Architecture Department,

More information

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Paper 942-2017 Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Josephine S Akosa, Oklahoma State University ABSTRACT The most commonly reported model evaluation metric

More information

Credibility: Evaluating What s Been Learned

Credibility: Evaluating What s Been Learned Evaluation: the Key to Success Credibility: Evaluating What s Been Learned Chapter 5 of Data Mining How predictive is the model we learned? Accuracy on the training data is not a good indicator of performance

More information

Real-time crop mask production using high-spatial-temporal resolution image times series

Real-time crop mask production using high-spatial-temporal resolution image times series Real-time crop mask production using high-spatial-temporal resolution image times series S.Valero and CESBIO TEAM 1 1 Centre d Etudes Spatiales de la BIOsphre, CESBIO, Toulouse, France Table of Contents

More information

Keywords acrm, RFM, Clustering, Classification, SMOTE, Metastacking

Keywords acrm, RFM, Clustering, Classification, SMOTE, Metastacking Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Comparative

More information

Advances in Machine Learning for Credit Card Fraud Detection

Advances in Machine Learning for Credit Card Fraud Detection Advances in Machine Learning for Credit Card Fraud Detection May 14, 2014 Alejandro Correa Bahnsen Introduction Europe fraud evolution Internet transactions (millions of euros) 800 700 600 500 2007 2008

More information

Purchase Predicting with Clickstream Data

Purchase Predicting with Clickstream Data Purchase Predicting with Clickstream Data Submitted By Tanzim Rizwan ID: 2013-1-60-063 Department of Computer Science and Engineering East West University Supervised By Dr. Mohammad Rezwanul Huq Assistant

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 International

More information

Advanced Analytics through the credit cycle

Advanced Analytics through the credit cycle Advanced Analytics through the credit cycle Alejandro Correa B. Andrés Gonzalez M. Introduction PRE ORIGINATION Credit Cycle POST ORIGINATION ORIGINATION Pre-Origination Propensity Models What is it?

More information

IMPROVING PREDICTIVE VALUE OF INDICATORS OF POOR PERFORMANCE Patricia Fazio Bronson and David Sparrow

IMPROVING PREDICTIVE VALUE OF INDICATORS OF POOR PERFORMANCE Patricia Fazio Bronson and David Sparrow IMPROVING PREDICTIVE VALUE OF INDICATORS OF POOR PERFORMANCE Patricia Fazio Bronson and David Sparrow The Problem Screening techniques need to be developed to identify Major Defense Acquisition Programs

More information

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values Australian Journal of Basic and Applied Sciences, 6(7): 312-317, 2012 ISSN 1991-8178 A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values 1 K. Suresh

More information

Assay Validation Services

Assay Validation Services Overview PierianDx s assay validation services bring clinical genomic tests to market more rapidly through experimental design, sample requirements, analytical pipeline optimization, and criteria tuning.

More information

Predicting Customer Purchase to Improve Bank Marketing Effectiveness

Predicting Customer Purchase to Improve Bank Marketing Effectiveness Business Analytics Using Data Mining (2017 Fall).Fianl Report Predicting Customer Purchase to Improve Bank Marketing Effectiveness Group 6 Sandy Wu Andy Hsu Wei-Zhu Chen Samantha Chien Instructor:Galit

More information

Getting Started with Predictive Analytics

Getting Started with Predictive Analytics Getting Started with Predictive Analytics What is Predictive Analytics? Predictive analytics uses many techniques from data mining, statistics, modeling, machine learning, and artificial intelligence to

More information

Prediction of Road Accidents Severity using various algorithms

Prediction of Road Accidents Severity using various algorithms Volume 119 No. 12 2018, 16663-16669 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Prediction of Road Accidents Severity using various algorithms ABSTRACT: V M Ramachandiran, P N Kailash

More information

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps:// IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://www.certqueen.com Exam : A00-240 Title : SAS Certified Statistical Business Analyst Using SAS 9: Regression and Modeling Credential

More information

Mining a Marketing Campaigns Data of Bank

Mining a Marketing Campaigns Data of Bank Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 6.017 IJCSMC,

More information

Application of Decision Trees in Mining High-Value Credit Card Customers

Application of Decision Trees in Mining High-Value Credit Card Customers Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,

More information

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles

More information

Outdoor Temperature and Differential Temperature as Indicators for Timing of VI Sampling to Capture the RME

Outdoor Temperature and Differential Temperature as Indicators for Timing of VI Sampling to Capture the RME Outdoor Temperature and Differential Temperature as Indicators for Timing of VI Sampling to Capture the RME EPA VI Workshop AEHS San Diego, March 20, 2018 What is the current status of sampling approaches?

More information

GENERATING A NETWORK OF INFORMATION DEPENDENCIES AUTOMATICALLY

GENERATING A NETWORK OF INFORMATION DEPENDENCIES AUTOMATICALLY 14 TH INTERNATIONAL DEPENDENCY AND STRUCTURE MODELLING CONFERENCE, DSM 12 KYOTO, JAPAN, SEPTEMBER 13 14, 2012 GENERATING A NETWORK OF INFORMATION DEPENDENCIES AUTOMATICALLY Reid R. Senescu 1, Austen W.

More information

Determining presence/absence threshold for your dataset

Determining presence/absence threshold for your dataset Determining presence/absence threshold for your dataset In PanCGHweb there are two ways to determine the presence/absence calling threshold. One is based on Receiver Operating Curves (ROC) generated for

More information

Decision Analytic Thinking. Pekka Malo, Assist. Prof. (statistics) Aalto BIZ / Department of Information and Service Economy

Decision Analytic Thinking. Pekka Malo, Assist. Prof. (statistics) Aalto BIZ / Department of Information and Service Economy Decision Analytic Thinking Pekka Malo, Assist. Prof. (statistics) Aalto BIZ / Department of Information and Service Economy Agenda Data-driven Decision Making - Predictive analytics as a process (CRISP-DM)

More information

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner SAS Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Melodie Rush Principal

More information

AN ADAPTIVE PERSONNEL SELECTION MODEL FOR RECRUITMENT USING DOMAIN-DRIVEN DATA MINING

AN ADAPTIVE PERSONNEL SELECTION MODEL FOR RECRUITMENT USING DOMAIN-DRIVEN DATA MINING AN ADAPTIVE PERSONNEL SELECTION MODEL FOR RECRUITMENT USING DOMAIN-DRIVEN DATA MINING 1 MUHAMMAD AHMAD SHEHU, 2 FAISAL SAEED* 1 Faculty of Computing. Universiti Teknologi Malaysia, 81310 UTM, Johor Bahru,

More information

An Introduction to Data Analytics For Software Security

An Introduction to Data Analytics For Software Security Chapter 3 An Introduction to Data Analytics For Software Security Lotfi ben Othmane Fraunhofer Institute for Secure Information Technology, Darmstadt, Germany Achim D. Brucker Department of Computer Science,

More information

How Predictive Analytics Can Deepen Customer Relationships. STEVEN RAMIREZ, CEO Beyond the Arc

How Predictive Analytics Can Deepen Customer Relationships. STEVEN RAMIREZ, CEO Beyond the Arc How Predictive Analytics Can Deepen Customer Relationships STEVEN RAMIREZ, CEO Beyond the Arc How Predictive Analytics Can Deepen Customer Relationships Financial Brand Forum 2017 2016 Beyond the Arc,

More information

Model-based analysis of the potential of macroinvertebrates as indicators for microbial pathogens in rivers

Model-based analysis of the potential of macroinvertebrates as indicators for microbial pathogens in rivers DEPARTMENT OF APPLIED ECOLOGY AND ENVIRONMENTAL BIOLOGY RESEARCH GROUP AQUATIC ECOLOGY (AECO) Model-based analysis of the potential of macroinvertebrates as indicators for microbial pathogens in rivers

More information

AMERICAN SOCIETY FOR QUALITY CERTIFIED RELIABILITY ENGINEER (CRE) BODY OF KNOWLEDGE

AMERICAN SOCIETY FOR QUALITY CERTIFIED RELIABILITY ENGINEER (CRE) BODY OF KNOWLEDGE AMERICAN SOCIETY FOR QUALITY CERTIFIED RELIABILITY ENGINEER (CRE) BODY OF KNOWLEDGE The topics in this Body of Knowledge include additional detail in the form of subtext explanations and the cognitive

More information

Macroinvertebrate based mathematical models for the prediction of microbial pathogen in rivers

Macroinvertebrate based mathematical models for the prediction of microbial pathogen in rivers Macroinvertebrate based mathematical models for the prediction of microbial pathogen in rivers Rubén Jerves-Cobo, I. Nopens, P. Goethals Ruben.JervesCobo@UGent.Be Laboratory of Environmental Toxicology

More information

RESULT AND DISCUSSION

RESULT AND DISCUSSION 4 Figure 3 shows ROC curve. It plots the probability of false positive (1-specificity) against true positive (sensitivity). The area under the ROC curve (AUR), which ranges from to 1, provides measure

More information

Model Selection, Evaluation, Diagnosis

Model Selection, Evaluation, Diagnosis Model Selection, Evaluation, Diagnosis INFO-4604, Applied Machine Learning University of Colorado Boulder October 31 November 2, 2017 Prof. Michael Paul Today How do you estimate how well your classifier

More information

Credit Card Marketing Classification Trees

Credit Card Marketing Classification Trees Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.

More information

Chief Mathematician, Bank of Montreal, Toronto, Canada. Published Papers in Top Mathematical Journals in US, UK, Canada, Germany, Israel, Australia

Chief Mathematician, Bank of Montreal, Toronto, Canada. Published Papers in Top Mathematical Journals in US, UK, Canada, Germany, Israel, Australia About the Presenter Chief Mathematician, Bank of Montreal, Toronto, Canada. 40+ Published Papers in Top Mathematical Journals in US, UK, Canada, Germany, Israel, Australia Area of Expertise: from Non-associative,

More information

KDD Challenge Orange Labs R&D

KDD Challenge Orange Labs R&D KDD Challenge 2009 Orange Labs R&D Vincent Lemaire, Research & Development 03/19/2009, presentation to reading group http://perso.rd.francetelecom.fr/lemaire/ contents Oranges Labs CRM at Orange Problems

More information

Ranking Potential Customers based on GroupEnsemble method

Ranking Potential Customers based on GroupEnsemble method Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,

More information

Applying an Interactive Machine Learning Approach to Statutory Analysis

Applying an Interactive Machine Learning Approach to Statutory Analysis Applying an Interactive Machine Learning Approach to Statutory Analysis Jaromir Savelka a,b, Gaurav Trivedi a, Kevin D. Ashley a,b,c a Intelligent Systems Program, University of Pittsburgh, USA b Learning

More information

Applying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010

Applying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010 Original Article Applying CHAID for logistic regression diagnostics and classification accuracy improvement Received (in revised form): 22 nd March 2010 Evgeny Antipov is the President of The Center for

More information

depend on knowing in advance what the fraud perpetrators are atlarnpting. Sampling and Analysis File Preparation Nonnal Behavior Samples

depend on knowing in advance what the fraud perpetrators are atlarnpting. Sampling and Analysis File Preparation Nonnal Behavior Samples Use of Data Enterprise Miner & Multivariate Statistical Analysis to Build Fraud Detection Models Shiao-ping Lu, Lucid Consulting Group, Palo Alto, CA ABSTRACT The increasing number of checking account

More information

SOCIAL MEDIA MINING. Behavior Analytics

SOCIAL MEDIA MINING. Behavior Analytics SOCIAL MEDIA MINING Behavior Analytics Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

Mining Financial Statement Fraud

Mining Financial Statement Fraud Mining Financial Statement Fraud An Analysis of Some Experimental Issues Jarrod West School of Computing & Mathematics Charles Sturt University, Australia Albury, Australia-2640 jnwest@netspace.net.au

More information

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds Overview In order for accuracy and precision to be optimal, the assay must be properly evaluated and a few

More information

TABLE OF CONTENTS ix

TABLE OF CONTENTS ix TABLE OF CONTENTS ix TABLE OF CONTENTS Page Certification Declaration Acknowledgement Research Publications Table of Contents Abbreviations List of Figures List of Tables List of Keywords Abstract i ii

More information

New Customer Acquisition Strategy

New Customer Acquisition Strategy Page 1 New Customer Acquisition Strategy Based on Customer Profiling Segmentation and Scoring Model Page 2 Introduction A customer profile is a snapshot of who your customers are, how to reach them, and

More information

Quantifying Counts, Costs, and Trends Accurately via Machine Learning

Quantifying Counts, Costs, and Trends Accurately via Machine Learning Quantifying Counts, Costs, and Trends Accurately via Machine Learning George Forman Business Optimization Lab HP Laboratories Palo Alto HPL-2007-164 (R.1) March 26, 2007* supervised machine learning, classification,

More information

The potential of in-forest segregation using an acoustic tool on a harvester head

The potential of in-forest segregation using an acoustic tool on a harvester head Date: June 2016 Reference: GCFF TN-011 The potential of in-forest segregation using an acoustic tool on a harvester head Author/s: John Moore, Peter Carter, Nigel Sharplin, Marco Lausberg Corresponding

More information

Logistic Regression and Decision Trees

Logistic Regression and Decision Trees Logistic Regression and Decision Trees Reminder: Regression We want to find a hypothesis that explains the behavior of a continuous y y = B0 + B1x1 + + Bpxp+ ε Source Regression for binary outcomes Regression

More information

Classifier performance

Classifier performance Classifier performance Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Joint Initiative of IITs and IISc Funded by MHRD Page 1 of

More information

A Brief History. Bootstrapping. Bagging. Boosting (Schapire 1989) Adaboost (Schapire 1995)

A Brief History. Bootstrapping. Bagging. Boosting (Schapire 1989) Adaboost (Schapire 1995) A Brief History Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) What s So Good About Adaboost Improves classification accuracy Can be used with many different classifiers Commonly

More information

Motif Finding: Summary of Approaches. ECS 234, Filkov

Motif Finding: Summary of Approaches. ECS 234, Filkov Motif Finding: Summary of Approaches Lecture Outline Flashback: Gene regulation, the cis-region, and tying function to sequence Motivation Representation simple motifs weight matrices Problem: Finding

More information

BUSINESS DATA MINING (IDS 572) Please include the names of all team-members in your write up and in the name of the file.

BUSINESS DATA MINING (IDS 572) Please include the names of all team-members in your write up and in the name of the file. BUSINESS DATA MINING (IDS 572) HOMEWORK 4 DUE DATE: TUESDAY, APRIL 10 AT 3:20 PM Please provide succinct answers to the questions below. You should submit an electronic pdf or word file in blackboard.

More information

LIACS Data Mining course. an introduction

LIACS Data Mining course. an introduction LIACS Data Mining course an introduction Course Textbook Data Mining Practical Machine Learning Tools and Techniques third edition, Morgan Kaufmann, ISBN 9780123748560 by Ian Witten and Eibe Frank (EUR

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

CH 1: Economics and Economic Reasoning

CH 1: Economics and Economic Reasoning CH 1: Economics and Economic Reasoning What is Economics? Economics is the study of how human beings coordinate their wants and desires, given the decision-making mechanism, social customs, and political

More information

Evaluating a Leading Indicator: An Application: the Term Spread

Evaluating a Leading Indicator: An Application: the Term Spread Evaluating a Leading Indicator: An Application: the Term Spread Herman O. Stekler The George Washington University hstekler@gwu.edu Tianyu Ye The George Washington University RPF Working Paper No. 2016-004

More information

Diagnostics. Using Diagnostic Tests. Test Sensitivity and Specificity. Dr. Randall Singer Professor of Epidemiology

Diagnostics. Using Diagnostic Tests. Test Sensitivity and Specificity. Dr. Randall Singer Professor of Epidemiology Executive Veterinary Program University of Illinois December 11-12, 2014 Outline Diagnostics Dr. Randall Singer Professor of Epidemiology review of sensitivity, specificity, predictive value test agreement

More information

Video Traffic Classification

Video Traffic Classification Video Traffic Classification A Machine Learning approach with Packet Based Features using Support Vector Machine Videotrafikklassificering En Maskininlärningslösning med Paketbasereade Features och Supportvektormaskin

More information

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh Statistic Methods in in Mining Business Understanding Understanding Preparation Deployment Modelling Evaluation Mining Process (( Part 3) 3) Professor Dr. Gholamreza Nakhaeizadeh Professor Dr. Gholamreza

More information

Marketing Data Solutions for the Financial Services Industry

Marketing Data Solutions for the Financial Services Industry Marketing Data Solutions for the Financial Services Industry Maximize your revenue with better data. Publication Date: July, 2015 www.datamentors.com info@datamentors.com 01. Maximize Revenue with Better

More information

Analysis of Associative Classification for Prediction of HCV Response to Treatment

Analysis of Associative Classification for Prediction of HCV Response to Treatment International Journal of Computer Applications (975 8887) Analysis of Associative Classification for Prediction of HCV Response to Treatment Enas M.F. El Houby Systems & Information Dept., Engineering

More information

Outline. Quality of Data Linkage Megan Bohensky August Background Quality in Data Linkage Study Example Reporting Standards.

Outline. Quality of Data Linkage Megan Bohensky August Background Quality in Data Linkage Study Example Reporting Standards. Outline Background Quality in Data Linkage Study Example Reporting Standards Quality of Data Linkage Megan Bohensky August 2016 2 of 20 Benefits of Data Linkage Data Linkage is Popular Data is distributed

More information

MYTH-BUSTING SOCIAL MEDIA ADVERTISING Do ads on social Web sites work? Multi-touch attribution makes it possible to separate the facts from fiction.

MYTH-BUSTING SOCIAL MEDIA ADVERTISING Do ads on social Web sites work? Multi-touch attribution makes it possible to separate the facts from fiction. RE MYTH-BUSTING SOCIAL MEDIA ADVERTISING Do ads on social Web sites work? Multi-touch attribution makes it possible to separate the facts from fiction. Data brought to you by: In recent years, social media

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

PERSONALIZED INCENTIVE RECOMMENDATIONS USING ARTIFICIAL INTELLIGENCE TO OPTIMIZE YOUR INCENTIVE STRATEGY

PERSONALIZED INCENTIVE RECOMMENDATIONS USING ARTIFICIAL INTELLIGENCE TO OPTIMIZE YOUR INCENTIVE STRATEGY PERSONALIZED INCENTIVE RECOMMENDATIONS USING ARTIFICIAL INTELLIGENCE TO OPTIMIZE YOUR INCENTIVE STRATEGY CONTENTS Introduction 3 Optimizing Incentive Recommendations 4 Data Science and Incentives: Building

More information

PRiSM Roadmap. What s new, what s coming. Bill Bielke

PRiSM Roadmap. What s new, what s coming. Bill Bielke PRiSM Roadmap What s new, what s coming Bill Bielke What s new in PRiSM 2.8 Alarm management, distributed agents, Wonderware, asset health What s new in PRiSM 2.8 Alarm management Distributed agents Wonderware

More information

DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues

DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues Supporting Materials for DRNApred, fast sequence-based method that accurately predicts and discriminates DNA- and RNA-binding residues Jing Yan 1 and Lukasz Kurgan 2, * 1 Department of Electrical and Computer

More information

Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data

Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data Proceedings of Machine Learning Research 77:240 247, 2017 ACML 2017 Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data Wei Zhang Said Kobeissi Scott Tomko Chris Challis Adobe

More information

World Environmental and Water Resources Congress 2012: Crossing Boundaries ASCE 2012

World Environmental and Water Resources Congress 2012: Crossing Boundaries ASCE 2012 2903 A Coupled Decision Trees Bayesian Approach for Water Distribution Systems Event Detection Jonathan Arad 1, Lina Perelman 2, and Avi Ostfeld 3 1 M.Sc. student, Faculty of Civil and Environmental Engineering,

More information

Individual-Based Correlates of Protection

Individual-Based Correlates of Protection Module 8 Evaluating Immunological Correlates of Protection Session 3 Evaluating Correlates of Protection Using Individual, Population, and Titer-specific Approaches Ivan S.F. Chan, Ph.D. Merck Research

More information

Mining knowledge using Decision Tree Algorithm

Mining knowledge using Decision Tree Algorithm International Journal of Scientific & Engineering Research Volume 2, Issue 5, May-2011 1 Mining knowledge using Decision Tree Algorithm Mrs. Swati.V. Kulkarni Abstract Industry is experiencing more and

More information

Retail Product Bundling A new approach

Retail Product Bundling A new approach Paper 1728-2018 Retail Product Bundling A new approach Bruno Nogueira Carlos, Youman Mind Over Data ABSTRACT Affinity analysis is referred to as Market Basket Analysis in retail and e-commerce outlets

More information

Brand Measurement Study. A White Paper January 2018

Brand Measurement Study. A White Paper January 2018 Brand Measurement Study A White Paper January 2018 Introduction Direct vs. Brand Response When a TV ad runs, we can identify two main types of responders: immediate and latent. Immediate represent those

More information

Data Warehousing Class Project Report

Data Warehousing Class Project Report Portland State University PDXScholar Engineering and Technology Management Student Projects Engineering and Technology Management Winter 2018 Data Warehousing Class Project Report Gaya Haciane Portland

More information

Concept Paper. An Item Bank Approach to Testing. Prepared by. Dr. Paul Squires

Concept Paper. An Item Bank Approach to Testing. Prepared by. Dr. Paul Squires Concept Paper On Prepared by Dr. Paul Squires 20 Community Place Morristown, NJ 07960 973-631-1607 fax 973-631-8020 www.appliedskills.com An item bank 1 is an important new approach to testing. Some use

More information

Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES *

Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES * Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES * Michael Gelman a, Yuriy Gorodnichenko b,c, Shachar Kariv b, Dmitri Koustas b, Matthew D. Shapiro c,d, Dan Silverman

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

Document Classification and Clustering II. Linear classifiers

Document Classification and Clustering II. Linear classifiers Document Classification and Clustering II CS 510 Winter 2007 1 Linear classifiers Support both DPC, CPC Can make use of search index DPC: Treat d as a query against category vectors v 1, v 2,, v n CPC:

More information

A Machine Learning Approach to Early Detection of Incorrect Anode Positioning in an Aluminum Electrolysis Cell

A Machine Learning Approach to Early Detection of Incorrect Anode Positioning in an Aluminum Electrolysis Cell A Machine Learning Approach to Early Detection of Incorrect Anode Positioning in an Aluminum Electrolysis Cell Abstract Patrice Côté 1 and Sébastien Guérard 2 1. Principal Advisor, Data Science and Artificial

More information

Analysing Players in Online Games

Analysing Players in Online Games T H E H U N I V E R S I T Y O F G E D I N B U R Masters of Science High Performance Computing with Data Science Analysing Players in Online Games Brian Flynn University of Edinburgh August, 2016 A B S

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

Building the In-Demand Skills for Analytics and Data Science Course Outline

Building the In-Demand Skills for Analytics and Data Science Course Outline Day 1 Module 1 - Predictive Analytics Concepts What and Why of Predictive Analytics o Predictive Analytics Defined o Business Value of Predictive Analytics The Foundation for Predictive Analytics o Statistical

More information

Financial Services: Maximize Revenue with Better Marketing Data. Marketing Data Solutions for the Financial Services Industry

Financial Services: Maximize Revenue with Better Marketing Data. Marketing Data Solutions for the Financial Services Industry Financial Services: Maximize Revenue with Better Marketing Data Marketing Data Solutions for the Financial Services Industry DataMentors, LLC April 2014 1 Financial Services: Maximize Revenue with Better

More information

Segmentation and Targeting

Segmentation and Targeting Segmentation and Targeting Outline The segmentation-targeting-positioning (STP) framework Segmentation The concept of market segmentation Managing the segmentation process Deriving market segments and

More information