My Five Predictive Analytics Pet Peeves

Size: px
Start display at page:

Download "My Five Predictive Analytics Pet Peeves"

Transcription

1 My Five Predictive Analytics Pet Peeves Dean Abbott Abbott Analytics, Inc. Predictive Analytics World, San Francisco, CA (#pawcon) April 16, Blog: 1

2 Topics Why Pet Peeves? A call for humility for Predictive Modelers The Five Pet Peeves 1. Machine Learning Skills > Domain Expertise 2. Just Build the Most Accurate Model! 3. Significance? What do you mean by Significance? 4. My Algorithm is better than Your Algorithm 5. My classifier calls everything 0 time to resample! 2

3 Peeve 1 Which is Better: Machine Learning Expertise or Domain Expertise? Question: who is more important in the process of building predictive models: The Data Scientist / Predictive Modeler / Data Miner The Domain Expert / Business Stakeholder? Photo from 3

4 Which is Better: 2012 Strata Conference Debate? From Strata Conference: machine-learning-expertise-googleanalytics.html I think you can get pretty far with some common sense, maybe Google-ing the basic information you need to know about a domain, and a lot of statistical intuition 4

5 Formula for Success? 5

6 Conclusion: Frame the Problem First Mark Driscoll: Moderator of Strata Debate could you currently prepare your data for a Kaggle competition? If so, then hire a machine learner. If not, hire a data scientist who has the domain expertise and the data hacking skills to get you there. But even this may not work, which brings me to the second pet peeve 6

7 Peeve 2 Just Build Accurate Models The Problems with Model Accuracy: 1. There s More to Success than Accuracy 2. Which Accuracy? 7

8 The Winner is Best Accuracy 8

9 Why Model Accuracy is Not Enough: Netflix Prize netflix-recommendations-beyond-5-stars.html 9

10 Why Data Science is Not Enough: Netflix Prize netflix-recommendations-beyond-5-stars.html There s more to a solution than accuracy you have to be able to use it! 10

11 Peeve 3 The Best Model Wins We select the winning model, but is there a significant difference in model performance? 12

12 KDD Cup 98 Results Calculator from 13

13 Example: Statistical Significance without Practical Significance Measure Control Campaign (based on model) Number Mailed 5,000,000 4,000,000 Response Rate 1% 1.011% outside margin of error? yes i.e., statisticall significant? yes expected responders 50,000 40,000 actual responders 50,000 40,440 difference Revenue Per Responder $100 Total Revenue Expected $4,000,000 Total Revenue Actual $4,044,000 Difference Revenue $44,000 Significance based on z=2 (95.45% confidence) Cost per contact: negligible ( ) Cost for analysts to build model: $80,000 14

14 Peeve 4 My Algorithm is Better than Your Algorithm From 2011 Rexer Analytics Data Mining Survey Data-Miner-Survey- Results-2011.html 15

15 Every Algorithm Has its Day Elder, IV, J. F., and Lee, S. S. (1997), Bundling Heterogeneous Classi ers with Advisor Perceptrons, Technical Report, University of Idaho, October,

16 PAKDD Cup 2007 Results: Look at all them Algorithms! 18 Different Algorithms Used in Top 20 Solutions; Modeling) Par6cipant) Par6cipant) AUCROC) (Trapezoi AUCROC) (Trapezoidal) Top)Decile) Top)Decile) Response) Modeling)Technique)/>) Implementa6on)/>) Affilia6on)Loca6on) Affilia6on)Type) dal)rule)) Rule))Rank)) Response)Rat) Rate)Rank) TreeNet&+&Logis-c&Regression& Salford&Systems& Mainland&China& Prac--oner& 70.01%& 1& 13.00%& 7& Probit&Regression& SAS& USA& Prac--oner& 69.99%& 2& 13.13%& 6& MLP&+&nHTuple&Classifier& Brazil& Prac--oner& 69.62%& 3& 13.88%& 1& TreeNet& Salford&Systems& USA& Prac--oner& 69.61%& 4& 13.25%& 4& TreeNet& Salford&Systems& Mainland&China& Prac--oner& 69.42%& 5& 13.50%& 2& Ridge&Regression& Rank& Belgium& Prac--oner& 69.28%& 6& 12.88%& 9& 2HLayer&Linear&Regression& USA& Prac--oner& 69.14%& 7& 12.88%& 9& Log&Regr+&Decision&Stump&+&AdaBoost&+&VFI& Mainland&China& Academia& 69.10%& 8& 13.25%& 4& Logis-c&Average&of&Single&Decision&Func-ons& Australia& Prac--oner& 68.85%& 9& 12.13%& 17& Logis-c&Regression& Weka& Singapore& Academia& 68.69%& 10& 12.38%& 16& Logis-c&Regression& Mainland&China& Prac--oner& 68.58%& 11& 12.88%& 9& Decision&Tree&+&Neural&Network&+&Log.Regression& Singapore& 68.54%& 12& 13.00%& 7& Scorecard&Linear&Addi-ve&Model& Xeno& USA& Prac--oner& 68.28%& 13& 11.75%& 20& Random&Forest& Weka& USA& 68.04%& 14& 12.50%& 14& Expanding&Regression&Tree&+&RankBoost&+&Bagging& Weka& Mainland&China& Academia& 68.02%& 15& 12.50%& 14& Logis-c&Regression& SAS&+&Salford& India& Prac--oner& 67.58%& 16& 12.00%& 19& J48&+&BayesNet& Weka& Mainland&China& Academia& 67.56%& 17& 11.63%& 21& Neural&Network&+&General&Addi-ve&Model& Tiberius& USA& Prac--oner& 67.54%& 18& 11.63%& 21& Decision&Tree&+&Neural&Network& Mainland&China& Academia& 67.50%& 19& 12.88%& 9& Decision&Tree&+&Neural&Network&+&Log.&Regression& SAS& USA& Academia& 66.71%& 20& 13.50%& 2& 17

17 Peeve 5 You Must Stratify Data to Balance the Target Class For example, 93% non-responders (N), 7% responders (R) What s the Problem? (The justification for resampling) Sample is biased toward responders Models will learn non-responders better Most algorithms will generate models that say call everything a non-responder and get 93% correct classification! (I used to say this too) Most common solution: Stratify the sample to get 50%/50% (some will argue that one only needs 20-30% responders) 18

18 Neural Network Results on Same Data Distribution of Target NOTE: all models built using JMP 10, SAS Institute, Inc. 19

19 Sample Decision Tree Built on Imbalanced Population Distribution of Target But.ROC Curve Looks like this All Rows Count G^2 LogWorth AVG_DON>=12.6 AVG_DON<12.6 Count G^2 LogWorth Count G^2 LogWorth REC_DON_AMT>=22 REC_DON_AMT<22 REC_DON_AMT>=15 REC_DON_AMT<15 Count G^2 LogWorth Count G^2 LogWorth Count G^2 LogWorth Count G^2 LogWorth RFA_2(L3F, L2F, L3G) RFA_2(L4G, L2G, L1F, L1G, L1E, L2E, CARDPM12>=8 CARDPM12<8 MAX_DON_AMT>=21 MAX_DON_AMT<21 MAX_DON_AMT>=20 MAX_DON_AMT<20 L4F) Count G^2 Count G^2 Count G^2 Count G^2 Count G^2 Count G^2 LogWorth Count G^ Count G^2 LogWorth CARDGIFT_LIFE<4 CARDGIFT_LIFE>=4 MAX_DON_DT<9110 MAX_DON_DT>=9110 Count G^2 Count G^2 Count G^2 Count G^ Predictions of Target Variable Sensitivity Specificity Why do we get a ROC Curve that looks OK, but the confusion matrix says everything is N (No)? 20

20 So What Happened? Note: no algorithm predicts decisions (N or R): they all produce probabilities/likelihoods/confidences Every data mining tool creates decisions (and by extension, forms confusion matrices) by thresholding the predicted probability by 0.5 (i.e., assuming equal likelihoods is the baseline) When the imbalance is large, algorithms will not produce probs/likelihoods > 0.5 a score this large is far too unlikely for an algorithm to be that sure 21

21 What the Predictions Looks Like 22

22 Confusion Matrices For the Decision Tree: Before and After Decision Tree: Threshold at 0.5 Decision Tree: Threshold at Response_ STR) N) R) Total) N& 5,002& 0& 5,002& R& 386& 0& 386& Total& 5,388& 0& 5,388& Response_ STR) N) R) Total) N& 2,798& 2,204& 5,002& R& 45& 341& 386& Total& 2,843& 2,545& 5,388& 24

23 Conclusions The Rant is Done! The Five Pet Peeves 1. Machine Learning Skills > Domain Expertise Be humble; we need both data science and domain experts! 2. Just Build the Most Accurate Model! Select the model that addresses your metric 3. Significance? What do you mean by Significance? Don t get hung up on best when many models will do well Learn from difference in patterns found by these models 4. My Algorithm is better than Your Algorithm Don t stress about the algorithm; learn to use a few very well 5. My classifier calls everything 0 time to resample! Don t throw away 0s needlessly; only do it when there are enough of them that you won t miss them. 25

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

3 rd Annual Data Miner Survey

3 rd Annual Data Miner Survey 3 rd Annual Data Miner Survey 2009 Survey Summary Report For more information, please contact: Karl Rexer, PhD krexer@rexeranalytics.com www.rexeranalytics.com! 2010 Rexer Analytics Overview! Executive

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Ranking Potential Customers based on GroupEnsemble method

Ranking Potential Customers based on GroupEnsemble method Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,

More information

Predicting Customer Purchase to Improve Bank Marketing Effectiveness

Predicting Customer Purchase to Improve Bank Marketing Effectiveness Business Analytics Using Data Mining (2017 Fall).Fianl Report Predicting Customer Purchase to Improve Bank Marketing Effectiveness Group 6 Sandy Wu Andy Hsu Wei-Zhu Chen Samantha Chien Instructor:Galit

More information

Credit Card Marketing Classification Trees

Credit Card Marketing Classification Trees Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

Drive Collective, Innovative Decision Making within Your Enterprise, a simple approach

Drive Collective, Innovative Decision Making within Your Enterprise, a simple approach TM EzDataMunch A New Way To Discover Your Data Drive Collective, Innovative Decision Making within Your Enterprise, a simple approach Tell me and I will forget. Show me and I may remember. Involve me and

More information

Big Data for Sales Taking Analytics from Insights to Action. Copyright 2015 The Sales Management Association. All rights reserved.

Big Data for Sales Taking Analytics from Insights to Action. Copyright 2015 The Sales Management Association. All rights reserved. Big Data for Sales Taking Analytics from Insights to Action Copyright 2015 The Sales Management Association. All rights reserved. SPEAKERS Manu Kumar Head of Data Science Vodafone Global Enterprise Email:

More information

Domain Authority is changing. Here s what you need to know.

Domain Authority is changing. Here s what you need to know. Domain Authority is changing. Here s what you need to know. What is Domain Authority? Domain Authority (DA) predicts the ranking ability of a website. GUIDE TO DOMAIN AUTHORITY 2.0 1 What is Domain Authority?

More information

Model Selection, Evaluation, Diagnosis

Model Selection, Evaluation, Diagnosis Model Selection, Evaluation, Diagnosis INFO-4604, Applied Machine Learning University of Colorado Boulder October 31 November 2, 2017 Prof. Michael Paul Today How do you estimate how well your classifier

More information

Keywords acrm, RFM, Clustering, Classification, SMOTE, Metastacking

Keywords acrm, RFM, Clustering, Classification, SMOTE, Metastacking Volume 5, Issue 9, September 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Comparative

More information

INSIGHTS PREDICTIVE SCORES

INSIGHTS PREDICTIVE SCORES INSIGHTS PREDICTIVE SCORES INSIGHTS OVERVIEW PREDICTIVE SCORES Zuora Insights helps companies move into the next phase of maturity in the Subscription Economy where they can apply data and analytics to

More information

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner SAS Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Melodie Rush Principal

More information

2018 Analytics Challenge Winner Presentation

2018 Analytics Challenge Winner Presentation SPEAKER NAME TITLE, COMPANY 2018 Analytics Challenge Winner Presentation Announcement and Analytic Trends 1 Thanks to our Challenge Partner and Emcee: Aaron Davis Chief of Analytical Services for DataLab

More information

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set

More information

Business-Insight Top winner at the PAKDD 2010 cup

Business-Insight Top winner at the PAKDD 2010 cup Business-Insight Top winner at the PAKDD 2010 cup The objective of the 14th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2010) is Re-Calibration of a Credit Risk Assessment System

More information

Surveying the Field: Current Data Mining Applications, Analytic Tools, and Practical Challenges

Surveying the Field: Current Data Mining Applications, Analytic Tools, and Practical Challenges Surveying the Field: Current Data Mining Applications, Analytic Tools, and Practical Challenges Karl Rexer, PhD; Paul Gearan; & Heather N. Allen, PhD Rexer Analytics 30 Vine Street Winchester MA 01890

More information

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh Statistic Methods in in Mining Business Understanding Understanding Preparation Deployment Modelling Evaluation Mining Process (( Part 3) 3) Professor Dr. Gholamreza Nakhaeizadeh Professor Dr. Gholamreza

More information

Certification In SAS Programming. Introduction to Analytics and SAS

Certification In SAS Programming. Introduction to Analytics and SAS Certification In SAS Programming Introduction to Analytics and SAS What Lies Ahead In this session, you will gain answers to: Overview of Analytics Careers in Analytics Why Use SAS? Introduction to SAS

More information

Top intelligent tools that every sales rep should have in 2017

Top intelligent tools that every sales rep should have in 2017 Top intelligent tools that every sales rep should have in 2017 Key findings: Why artificial intelligence (AI) is a game-changer for organizations from various industries How sales reps can streamline their

More information

Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction Final Report: Local Structure and Evolution for Cascade Prediction Jake Lussier (lussier1@stanford.edu), Jacob Bank (jbank@stanford.edu) ABSTRACT Information cascades in large social networks are complex

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Predicting Airbnb Bookings by Country

Predicting Airbnb Bookings by Country Michael Dimitras A12465780 CSE 190 Assignment 2 Predicting Airbnb Bookings by Country 1: Dataset Description For this assignment, I selected the Airbnb New User Bookings set from Kaggle. The dataset is

More information

Analytics Landscape and Careers

Analytics Landscape and Careers Analytics Landscape and Careers Industry Landscape Big Data Everywhere! BIG DATA Data that is TOO LARGE & TOO COMPLEX for conventional data tools to capture, store and analyze. The 3V s of Big Data Shares

More information

Oracle Using Oracle Advanced Analytics to Target the Right Customers with the Right Oracle Products

Oracle Using Oracle Advanced Analytics to Target the Right Customers with the Right Oracle Products January 28, 2015 San Francisco Oracle Using Oracle Advanced Analytics to Target the Right Customers with the Right Oracle Products Sumbo Ajisafe, Snr Business Analyst Oracle Frank Heiland, Snr Business

More information

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat Startup Machine Learning: Bootstrapping a fraud detection system Michael Manapat Stripe @mlmanapat About me: Engineering Manager of the Machine Learning Products Team at Stripe About Stripe: Payments infrastructure

More information

How to do SEO for Swiss companies?

How to do SEO for Swiss companies? How to do SEO for Swiss companies? This article aims to help small and big companies and organizations in Switzerland which are aware of Search Engine Optimization (SEO) and are about to take advantage

More information

Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction Final Report: Local Structure and Evolution for Cascade Prediction Jake Lussier (lussier1@stanford.edu), Jacob Bank (jbank@stanford.edu) December 10, 2011 Abstract Information cascades in large social

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM Abbas Heiat, College of Business, Montana State University-Billings, Billings, MT 59101, 406-657-1627, aheiat@msubillings.edu ABSTRACT CRT and ANN

More information

Putting Workforce Analytics to Work: Achieving Objectives and Realizing Outcomes

Putting Workforce Analytics to Work: Achieving Objectives and Realizing Outcomes Putting Workforce Analytics to Work: Achieving Objectives and Realizing Outcomes Workforce analytics turns people and business data into actionable intelligence. If you re in HR, you ve no doubt noticed

More information

From Profit Driven Business Analytics. Full book available for purchase here.

From Profit Driven Business Analytics. Full book available for purchase here. From Profit Driven Business Analytics. Full book available for purchase here. Contents Foreword xv Acknowledgments xvii Chapter 1 A Value-Centric Perspective Towards Analytics 1 Introduction 1 Business

More information

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG MACPA Government & Non Profit Conference April 26, 2013 Isaiah Goodall, Director of Business

More information

KDD Challenge Orange Labs R&D

KDD Challenge Orange Labs R&D KDD Challenge 2009 Orange Labs R&D Vincent Lemaire, Research & Development 03/19/2009, presentation to reading group http://perso.rd.francetelecom.fr/lemaire/ contents Oranges Labs CRM at Orange Problems

More information

New Customer Acquisition Strategy

New Customer Acquisition Strategy Page 1 New Customer Acquisition Strategy Based on Customer Profiling Segmentation and Scoring Model Page 2 Introduction A customer profile is a snapshot of who your customers are, how to reach them, and

More information

Decision Support for Increasing the Efficiency of Crowdsourced Software Development

Decision Support for Increasing the Efficiency of Crowdsourced Software Development Decision Support for Increasing the Efficiency of Crowdsourced Software Development Muhammad Rezaul Karim University of Calgary 2500 University Drive NW Calgary, Alberta T2N 1N4 +1 (403) 220 7692 mrkarim@ucalgary.ca

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

NewsWhip Spike Cheat Sheet

NewsWhip Spike Cheat Sheet NewsWhip Spike Cheat Sheet Coverage & Sources Media Types Custom Metrics Search & Filtering Saving a Panel Alerts & Digests Coverage The Basics: What does Spike track and cover? Spike uses a combination

More information

Audit the Future: Using Audit Analysis to Predictively Manage Future Risks. Dan Zitting, CPA, CISA, GRCA Chief Product Officer, ACL

Audit the Future: Using Audit Analysis to Predictively Manage Future Risks. Dan Zitting, CPA, CISA, GRCA Chief Product Officer, ACL Audit the Future: Using Audit Analysis to Predictively Manage Future Risks Dan Zitting, CPA, CISA, GRCA Chief Product Officer, ACL I Hear Unbelievable Stories Every Day A savvy ACL user last year landed

More information

Optimize Marketing Budget with Marketing Mix Model

Optimize Marketing Budget with Marketing Mix Model Optimize Marketing Budget with Marketing Mix Model Data Analytics Spreadsheet Modeling New York Boston San Francisco Hyderabad Perceptive-Analytics.com (646) 583 0001 cs@perceptive-analytics.com Executive

More information

A classical predictive modeling approach for Task Who rated what? of the KDD CUP 2007

A classical predictive modeling approach for Task Who rated what? of the KDD CUP 2007 A classical predictive modeling approach for Task Who rated what? of the KDD CUP 2007 Jorge Sueiras jorge.sueiras@neo-metrics.com Alfonso Salafranca Jose Luis Florez alfonso.salafranca@neometrics.com jose.luis.florez@neometrics.com

More information

Data mining and Renewable energy. Cindi Thompson

Data mining and Renewable energy. Cindi Thompson Data mining and Renewable energy Cindi Thompson June 2012 Analytics, Big Data, and Data Science 1 What is Analytics? makes extensive use of data, statistical and quantitative analysis, explanatory and

More information

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Paper 942-2017 Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Josephine S Akosa, Oklahoma State University ABSTRACT The most commonly reported model evaluation metric

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March ISSN Web and Text Mining Sentiment Analysis

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March ISSN Web and Text Mining Sentiment Analysis International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March-2015 672 Web and Text Mining Sentiment Analysis Ms. Anjana Agrawal Abstract This paper describes the key steps followed

More information

Predict by Versium User Guide

Predict by Versium User Guide Predict by Versium User Guide Predict by Versium - version 1.5 1 Table of Contents Predict by Versium User Guide 1 What is Predict by Versium? 3 Get Started 4 Install Data Insights by Predict by Versium

More information

Top Social Learnings from Connect via HootSuite: San Francisco. #CvHSF

Top Social Learnings from Connect via HootSuite: San Francisco. #CvHSF Top Social Learnings from Connect via HootSuite: San Francisco #CvHSF Businesses are still unsure of the role social media plays enterprise-wide, beyond marketing and communications. Source: Brian Solis:

More information

Whitepaper. Manual vs. Automated Matches

Whitepaper. Manual vs. Automated Matches Whitepaper Manual vs. Automated Matches Product Comparison as an Essential Retail Element Whether a retailer monitors competitors to make sure its prices are optimal or uses an aggressive pricing strategy

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

Azure ML Studio. Overview for Data Engineers & Data Scientists

Azure ML Studio. Overview for Data Engineers & Data Scientists Azure ML Studio Overview for Data Engineers & Data Scientists Rakesh Soni, Big Data Practice Director Randi R. Ludwig, Ph.D., Data Scientist Daniel Lai, Data Scientist Intersys Company Summary Overview

More information

The Evolution of Big Data

The Evolution of Big Data The Evolution of Big Data Andrew Fast, Ph.D. Chief Scientist fast@elderresearch.com Headquarters 300 W. Main Street, Suite 301 Charlottesville, VA 22903 434.973.7673 fax 434.973.7875 www.elderresearch.com

More information

A CLEAR DIFFERENCE. Why We Are Significantly Better Than Other Digital Marketing Training Companies. Digital Marketing Skill Institute

A CLEAR DIFFERENCE. Why We Are Significantly Better Than Other Digital Marketing Training Companies. Digital Marketing Skill Institute Digital Marketing Skill Institute A CLEAR DIFFERENCE Why We Are Significantly Better Than Other Digital Marketing Training Companies Earn Globally Recognised Certifications with a Professional Diploma in

More information

Key Performance Indicator (KPI)

Key Performance Indicator (KPI) In this data-driven world, everything counts upon insights and facts. Whether it is a technological advancement or evaluating the performance, data is used everywhere. In this context today we will talk

More information

Introduction to The Sage Group plc

Introduction to The Sage Group plc Introduction to The Sage Group plc January 2014 An introduction to The Sage Group plc 1 Our vision Our vision is to be recognised as the most valuable supporter of small and medium sized companies by creating

More information

Data Mining and Marketing Intelligence

Data Mining and Marketing Intelligence Data Mining and Marketing Intelligence Alberto Saccardi Abstract The technological advance has made possible to create data bases designed for the marketing intelligence, with the availability of large

More information

Predictive Analytics With Oracle Data Mining

Predictive Analytics With Oracle Data Mining 1 Predictive Analytics With Oracle Data Mining How it can help to answer business questions? Presented by Gautham Sampath Innive Inc January 27, 2016 Welcome What we plan to discuss 2 Predictive Analytics

More information

Data Mining, CSCI 347, Fall 2017 Exam 1, Sept. 22

Data Mining, CSCI 347, Fall 2017 Exam 1, Sept. 22 Data Mining, CSCI 347, Fall 2017 Exam 1, Sept. 22 1. Supervised learning is best described by: (4 pts.) a. Weka learning which requires user input b. Weka learning which focuses on clusters c. Learning

More information

Chapter 5 Evaluating Classification & Predictive Performance

Chapter 5 Evaluating Classification & Predictive Performance Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available

More information

Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective

Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective IWD2016 Dublin, March 2016 Online Advertising Landscape [Introduction to Computational

More information

Better Marketing Analytics Using Genetic Algorithms. Doug Newell Genalytics Inc. June 2005

Better Marketing Analytics Using Genetic Algorithms. Doug Newell Genalytics Inc. June 2005 Better Marketing Analytics Using Genetic Algorithms Doug Newell Genalytics Inc. June 2005 Today s Session Direct Marketing Today Predictive Modeling Techniques Building Models with Genetic Algorithms 2

More information

Employer Branding Essentials. 4 Tips Inspired by LinkedIn s Top Attractors Ranking

Employer Branding Essentials. 4 Tips Inspired by LinkedIn s Top Attractors Ranking Employer Branding Essentials 4 Tips Inspired by LinkedIn s Top Attractors Ranking Introduction Your reputation as an employer is everything. If you have a good one, top candidates want to work for you

More information

TURNING TWEETS INTO KNOWLEDGE

TURNING TWEETS INTO KNOWLEDGE TURNING TWEETS Image removed due to copyright restrictions. INTO KNOWLEDGE An Introduction to Text Analytics Twitter Twitter is a social networking and communication website founded in 2006 Users share

More information

Problem Set #3 Revised: April 2, 2007

Problem Set #3 Revised: April 2, 2007 Global Economy Chris Edmond Problem Set #3 Revised: April 2, 2007 Before attempting this problem set, you will probably need to read over the lecture notes on Labor Markets and on Labor Market Dynamics.

More information

Improving Road Safety by Profiling Different Accident Type

Improving Road Safety by Profiling Different Accident Type Improving Road Safety by Profiling Different Accident Type Business Analytics Using Data Mining Team7 Angela Hung Aylada Khunvaranont Celia Chen Dobby Yang Mahsa Ashouri Executive Summary Business Problem

More information

Using Analytical Marketing Optimization to Achieve Exceptional Results WHITE PAPER

Using Analytical Marketing Optimization to Achieve Exceptional Results WHITE PAPER Using Analytical Marketing Optimization to Achieve Exceptional Results WHITE PAPER SAS White Paper Table of Contents Optimization Defined... 1 Prioritization, Rules and Optimization a Method Comparison...

More information

Predictive Analytics for the Business Analyst. Fern Halper July 8,

Predictive Analytics for the Business Analyst. Fern Halper July 8, Predictive Analytics for the Business Analyst Fern Halper July 8, 2014 @fhalper Sponsor Speakers Fern Halper Research Director for Advanced Analytics, TDWI Allen Bonde VP, Product Marketing and Innovation,

More information

Introduction to The Sage Group plc

Introduction to The Sage Group plc Introduction to The Sage Group plc June 2015 An introduction to The Sage Group plc 1 What we do We provide small and medium sized organisations with a range of easy-to-use business management software

More information

CSC-272 Exam #1 February 13, 2015

CSC-272 Exam #1 February 13, 2015 CSC-272 Exam #1 February 13, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors

More information

From Theory to Data Product

From Theory to Data Product From Theory to Data Product Applying Data Science Methods to Effect Business Change KDD 2017 - August 13 Advanced Analytics Entry Points Strategy Organization Policies, Procedures & Standards Components

More information

BIG DATA MBA. Understanding How Big Data and Data Science Drive Data Monetization

BIG DATA MBA. Understanding How Big Data and Data Science Drive Data Monetization BIG DATA MBA Understanding How Big Data and Data Science Drive Data Monetization Bill Schmarzo, CTO IoT and Analytics, Hitachi Vantara University San Francisco, School Of Management Executive Fellow Twitter:

More information

Logistic Regression and Decision Trees

Logistic Regression and Decision Trees Logistic Regression and Decision Trees Reminder: Regression We want to find a hypothesis that explains the behavior of a continuous y y = B0 + B1x1 + + Bpxp+ ε Source Regression for binary outcomes Regression

More information

MISTAKE No. 1. Missing Identity. Social Media is all about interaction between people. Real people and real companies, brands and products.

MISTAKE No. 1. Missing Identity. Social Media is all about interaction between people. Real people and real companies, brands and products. MISTAKE No. 1 Missing Identity Social Media is all about interaction between people. Real people and real companies, brands and products. The biggest mistake you can make when using any of the social media

More information

A BIT ABOUT US. Richardson and our clients are highly recognized and have won numerous industry awards, including the following:

A BIT ABOUT US. Richardson and our clients are highly recognized and have won numerous industry awards, including the following: Founded in 1978, we are headquartered in Philadelphia and have international offices in the UK, Singapore, Australia, and additional satellite offices around the globe. A BIT ABOUT US We re Richardson,

More information

Credibility: Evaluating What s Been Learned

Credibility: Evaluating What s Been Learned Evaluation: the Key to Success Credibility: Evaluating What s Been Learned Chapter 5 of Data Mining How predictive is the model we learned? Accuracy on the training data is not a good indicator of performance

More information

ACC: Review of the Weekly Compensation Duration Model

ACC: Review of the Weekly Compensation Duration Model ACC: Review of the Weekly Compensation Duration Model Prepared By: Knoware Creation Date July 2013 Version: 1.0 Author: M. Kelly Mara Knoware +64 21 979 799 kelly.mara@knoware.co.nz Document Control Change

More information

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge

Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge Chapter 13 Knowledge Discovery Systems: Systems That Create Knowledge Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2007 Prentice Hall Chapter Objectives To explain how knowledge is discovered

More information

Getting The Best ROI in Marketing

Getting The Best ROI in Marketing Getting The Best ROI in Marketing Google Adwords Edition Table of Contents Getting The Best ROI in Marketing 1 Table of Contents 1 Introduction 1 Researching Your Marketing & Initial Setup Preparation

More information

Supply Chain Management Case Study

Supply Chain Management Case Study Supply Chain Management Case Study ----- Can you predict product backorders? Part backorders is a common supply chain problem. A backorder is a retailer s order for a part that is temporarily out of stock

More information

TURNING TWEETS INTO KNOWLEDGE. An Introduction to Text Analytics

TURNING TWEETS INTO KNOWLEDGE. An Introduction to Text Analytics TURNING TWEETS INTO KNOWLEDGE An Introduction to Text Analytics Twitter Twitter is a social networking and communication website founded in 2006 Users share and send messages that can be no longer than

More information

TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011

TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011 TNM033 Data Mining Practical Final Project Deadline: 17 of January, 2011 1 Develop Models for Customers Likely to Churn Churn is a term used to indicate a customer leaving the service of one company in

More information

Save Time & Increase Efficiency With Social Media Aggregators

Save Time & Increase Efficiency With Social Media Aggregators Save Time & Increase Efficiency With Social Media Aggregators Increased Brand Recognition & Loyalty With regular posting, keep your brand visible in front of potential customers Convert Prospects to Customers

More information

How to Use PPC Advertising to Grow Your Pool Business!

How to Use PPC Advertising to Grow Your Pool Business! How to Use PPC Advertising to Grow Your Pool Business! Welcome From print materials to online marketing, there is no shortage of ways to spend your marketing budget. And whether your annual budget is $1000

More information

Chapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition

Chapter 4: Foundations for inference. OpenIntro Statistics, 2nd Edition Chapter 4: Foundations for inference OpenIntro Statistics, 2nd Edition Variability in estimates 1 Variability in estimates Application exercise Sampling distributions - via CLT 2 Confidence intervals 3

More information

Definitive Guide for Better Pricing. Build a solid pricing foundation that will help you create consistent sales and profit growth.

Definitive Guide for Better Pricing. Build a solid pricing foundation that will help you create consistent sales and profit growth. Definitive Guide for Better Pricing Build a solid pricing foundation that will help you create consistent sales and profit growth. INDEX Introduction 2 Identifying New Customers 4 Here Are Some Questions

More information

The 10 Big Mistakes People Make When Running Customer Surveys

The 10 Big Mistakes People Make When Running Customer Surveys The 10 Big Mistakes People Make When Running Customer Surveys If you want to understand what drives customer loyalty for your business and how to align your business to improve customer loyalty, Genroe

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

Data Science Training Course

Data Science Training Course About Intellipaat Intellipaat is a fast-growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over

More information

Transforming Government with Analytics. Tom Davenport Babson College

Transforming Government with Analytics. Tom Davenport Babson College Transforming Government with Analytics Tom Davenport Babson College ICT and Public Policy Workshop September 30, 2011 What Are Analytics? Optimization What s the best that can happen? Degree of Intelligence

More information

Yes, there is going to be some math (but not much) STATISTICAL APPROACH TO MEDICAL DEVICE VERIFICATION AND VALIDATION

Yes, there is going to be some math (but not much) STATISTICAL APPROACH TO MEDICAL DEVICE VERIFICATION AND VALIDATION Yes, there is going to be some math (but not much) STATISTICAL APPROACH TO MEDICAL DEVICE VERIFICATION AND VALIDATION Medical Device Verification and Validation In the medical device world, verification

More information

Stock Price Prediction with Daily News

Stock Price Prediction with Daily News Stock Price Prediction with Daily News GU Jinshan MA Mingyu Derek MA Zhenyuan ZHOU Huakang 14110914D 14110562D 14111439D 15050698D 1 Contents 1. Work flow of the prediction tool 2. Model performance evaluation

More information

Machine Learning Techniques For Particle Identification

Machine Learning Techniques For Particle Identification Machine Learning Techniques For Particle Identification 06.March.2018 I Waleed Esmail, Tobias Stockmanns, Michael Kunkel, James Ritman Institut für Kernphysik (IKP), Forschungszentrum Jülich Outlines:

More information

Innovation PLAYBOOK TOOLKIT

Innovation PLAYBOOK TOOLKIT Innovation PLAYBOOK TOOLKIT About the Author Greg Satell is a popular author, speaker and innovation advisor, whose work has appeared in Harvard Business Review, Forbes, Fast Company, Inc. and other A-list

More information

2017 North American Pulse of Internal Audit. Public Sector Focus. Courageous Leadership: Instilling Confidence from Within

2017 North American Pulse of Internal Audit. Public Sector Focus. Courageous Leadership: Instilling Confidence from Within 2017 North American Pulse of Internal Audit Public Sector Focus Courageous Leadership: Instilling Confidence from Within Agenda Pulse Overview Topics Communications Not Traditionally Subject to Assurance

More information

Implementing Predictive Analytics to Generate Big Win s for Trading Partners

Implementing Predictive Analytics to Generate Big Win s for Trading Partners Implementing Predictive Analytics to Generate Big Win s for Trading Partners Richard Althoff, Founder Sequoya Analytics & Eric Nordquist, Partner - Sequoya Analytics Our Morning Agenda This session will

More information

Predictive Marketing: Buyer s Guide

Predictive Marketing: Buyer s Guide Predictive Marketing: Buyer s Guide Index Introduction 2 About This ebook 2 The Benefits of Using a Predictive Marketing Solution 2 Recommended Steps to Making an Informed Purchase 2 Step #1: How do you

More information

HOW MX DIFFERS FROM THE COMPETITION

HOW MX DIFFERS FROM THE COMPETITION HOW MX DIFFERS FROM THE COMPETITION How MX is different MX teams up with digital banking providers to supercharge the account holder experience at financial institutions. We fer money management and user

More information

Redefining Measurement for Continuous Learning

Redefining Measurement for Continuous Learning Redefining Measurement for Continuous Learning How to demonstrate impact when learning goes beyond L&D Todd Tauber Vice President, Learning Research Bersin by Deloitte, Deloitte Consulting LLP April 1,

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at Caltech CS/IST Lunch Bunch March 4, 2008 Benefited from joint work with Dr.

More information