depend on knowing in advance what the fraud perpetrators are atlarnpting. Sampling and Analysis File Preparation Nonnal Behavior Samples

Size: px
Start display at page:

Download "depend on knowing in advance what the fraud perpetrators are atlarnpting. Sampling and Analysis File Preparation Nonnal Behavior Samples"

Transcription

1 Use of Data Enterprise Miner & Multivariate Statistical Analysis to Build Fraud Detection Models Shiao-ping Lu, Lucid Consulting Group, Palo Alto, CA ABSTRACT The increasing number of checking account fraud cases and the dollar loss reported at Wells Fargo Bank evoked the need to build an early-detection program at the point r:4 transaction Since the fraud rate is relati\lely small ~ared to milions of transactions per month, it is both desirable and warranted to explore new predictive techniques and to provide eariy warnings to the fraud prevention unit This paper will address data sampling, variable construction and model construclion Several approaches taken in the modeling with the utilization of Data Enterprise Miner include logistic regression, treeanalysis and neural nelw()r1( The accuracy of prediction, misclassification costs and robustness of various models will be compared Other multivariate statistical methods such as factor analysis in variable dimension reduction will also be discussed NTRODUCTON Background n spite the presence of ~ fraud detection programs, the reported cases d checking account fraud and millions d dollar loss are rising at Wells Fargo Bank Majority of the fraud detection is operated in batch processing (after-the-fact) and the detection mechanism is more or less "chasing the fraud" A typical scenario is once the organized airninal group changes the modes d operation, the existing programs are no longer effective n order to bring loss under control and to minimize it, a new strategy must be implemented This paper will discuss the basic concepts and illustrate some d the implementation techniques by utilizing SAS Enterprise Miner Since the fraud rate is relati\lely small (approximately 1 in 10,000 transactions) compared to millions d transaction per month, the model needs to be robust enough to yield high hit rate while keeps the false positive rate low f the latter is not low, major operation inconvenience and c:ustomer complaints will be resulted Based on the mode d operation, two major types d fraud, victimized and perpetrating, have been reported This paper wih iuustrate the model budding on the victimized fraud only A victimized fraud is defined as the crooks use several techniques such as making oounterfeit check, altering the check and dollar amount on a wrilten check or stealing the checks from customers and cash out from a victimized account Our strategy is to look for the deviation d fraudulent transaction behavior from normal customers' behavior pattern t will be good for preventing good customers from being victimized by fraud and also does not depend on knowing in advance what the fraud perpetrators are atlarnpting Sampling and Analysis File Preparation Nonnal Behavior Samples To build nonnal customers' transaction behavior pattem fa a representative set from the 11 million checking account database, first we took a small sample accounts from the 11 mhiion database The occurrence d business account frauds to consumer account frauds is close to 1:1 However 90 % d the checking accounts are consumers Therefore, a stratified sampling approach was used to sample equal number of business and consumer accounts A one-year worth of transactions were then extracted for the sampled accounts and several transaction parameters ~ clerived per-business cycle basis A business cycle is equivalent to one-month length wise except the cycle starting date and anding date vary from customer to customer n older to construct one observation per customer for predictive modeling, weighted average of 12 cycle for the transaction parameters were derived Number d transactions per cycle is used as the weight Fraud Behavior Samples Several thousands of fraudulent transactions occurred in 1996 & 1997 were included and the transaction parameters were computed in a similar way as normal behavior for each fraud transaction To be noted, majority d the frauds occurred within the same cycle due to short interval between transactions Data Analysis and Modeling SAS Enterprise Miner was employed to build the predictive models through various multivariate statistical approaches: neural networks, logistic regrassion and decision tree The diagram shown in Figure 1 mustrates an example d steps taken in predictive modeling The nput Data Source Node requires assignment d target and input variables for the model The distribution d each variable can be viewed by sifll)ly clicking on the variable and select "vidistribution" Figure 2 shows an example d detecting incorrect data from viewing distribution Time on book (ie age d the account) should be greater or equal to 0 n addition, percent of missing data for input variables can be lliewed so to help planning which data inputation strategy is needed Oecision Tree Analysis One primary advantage that tree modeling has over neural networks for our data is that tree-based models can handle missing values by treating them simply as another category d the analysis variable The entire 183

2 data (with 14,417 obselvalion) was used for the modeling and the partition between training and validation is 67% and 33% respectively The "root" of the tree is the entire data set The subsets form the "branches" of the tree Subsets that meet a stepping criterion and thus are not partitioned are "leaws" Any subset in the tree, including the root or leaves, is a "node" Figure 3 presents the results of tree analysis; the upper left summary table stores descriptive statistics about the number and proportion of observations correctly and incorrectly classified in the training and validation data sets; the upper right tree ring summarizes tree complexity, split balance and the discriminatory power of the tree; the right lower graph shows that the 16-leaf tree is optimal in terms of accuracy on validation data set Figure 4 shows the tree diagram for the fraud model The decision tree contains 12 nodes Two subsets (or leaves) were found to contain more than 75% of the fraud; the larger leave has 2143 observations and the small one has 12 observations The classification rate was 92% (1961 of 2141) for the bad accounts and 92% for the good ones That is, 8% of good accounts were misclassified as bad ones (false positive rate) Neural Netwolk Model and Logistic Regre&$ion An artificial neural network can be defined as a computer application that attempts to mimic the neurophysiology of the human brain in the sense that it learns from examples to find patterns in data By finding complex non-linear relationship in data, neural networks can help to make predictions about real-world problems The data was partitioned 50: 50 to training and validation respectively Several hidden layers and number of iterations were experimented Figure 4 presents a comparison of lift chart between neural network and logistic regre&$ion The horizontal axis gives the decile groupings d the predicted probability (the highest are on the left) The Wlrtical axis represents the actual fraud rate in each decile f the top 30% d the cases were targeted then the fraud rate woukl be 97% compared to the baseline fraud rate around 45% (with a lift value c1 22) Lift value is computed as cumulative response rate divided by baseline response rate The fluctuation represent random variation Frgure 5 presents a Lorenz curve (also known as ROC ane) which depicls the cumulative percent of aclual frauds that are captured in each cumulative decile t shows 90% ci all frauds would be captured, if the top 40% of the cases were targeted Overall, neural networks model outperforms logistic regression in terms ar % response rille and % captured response rate The overall misclassification rate is lowest in neural net model {75%), followed by decision tree model (83%) and logistic regression (17%) The two-way classification table ar the best neural net and decision tree models is presented in Table 1 CONCLUSON This paper ilustrates a successful example of use of predictive modeling identifying frauds in miftions of transactions Both the neural nelwo!k model and decision tree model achieved over 90% hit rate with less than 10% false positive rete Efforts in lowering false positive rate are und~ There are several keys to the success: a key strategy d looking for the deviation d fraud transaction behavior from normal one, generation of transaction-based parameters for the model, and utilization ar Enterprise Miner in data analysis and modeling n devalcping a major risk management mode~ the ease ar implementation and integration into the existing infrastructure are also the criteria to selection of an appropriate model Shiao-ping Lu 2669 Greer Road Palo Alto, CA splu@aimnetcom 184

3 Figure 1 Figure2 Percentage : : : me 185

4 186 Figure3

5 ' 0 (~5y4s Total \the ~e~-n, CA5Hirl otal ; Total i 11 io Total ,6'< , 00 < 141'9951 ' ' ~::i~~~~-"-' 1!14~ ~7\ 0 CUi\ 118 :lilt ?1& 1111 Total J the me~ n:, roramtj ~ j ~14\ 303\ % 697%' r::-::: , >= 2sL~s2l 557% 550\ < Figure 4 l ' tn_ _;_- ~ - ~ 81 '* ''rii o<o D :';";JYJ~\10'0 ilii O'< i!:f::";;,:;,'! ''~- ~ - g!rotal~ ; ;,2$',!~ t 0 "' the mean, MLEsJ J < >= ~ 230% 2871\o 0 770'0 713%

6 FigureS Figure6 '8} -ro :eo 50 #J - "_-,"-- '00 :~() ;t{} to /'' J /'' l/ '" 1/ 10 ~ ',,,~,;;;~~- ---~, '>: '< 7 ~ ' _:,, i'"' v ' : l/,: ' / V' v v v v / : ' : : :' ' ; '():~~:: '', 188

7 Table 1 Two-way classification table of Neural Network Model and Decision Tree Model Neural Network Model Frequency Actual Percent Row Percent 0 Fraud Good Fraud Good Decision Tree Model Frequency Actual Percent Row Percent 1 0 Fraud Good Fraud Good

Chapter 5 Evaluating Classification & Predictive Performance

Chapter 5 Evaluating Classification & Predictive Performance Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available

More information

Credit Card Marketing Classification Trees

Credit Card Marketing Classification Trees Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.

More information

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner SAS Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Melodie Rush Principal

More information

WHITE PAPER. Building Credit Scorecards Using Credit Scoring for SAS. Enterprise Miner. A SAS Best Practices Paper

WHITE PAPER. Building Credit Scorecards Using Credit Scoring for SAS. Enterprise Miner. A SAS Best Practices Paper WHITE PAPER Building Credit Scorecards Using Credit Scoring for SAS Enterprise Miner A SAS Best Practices Paper Table of Contents Introduction...1 Building credit models in-house...2 Building credit models

More information

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS

Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS Paper 1414-2017 Churn Prevention in Telecom Services Industry- A systematic approach to prevent B2B churn using SAS ABSTRACT Krutharth Peravalli, Dr. Dmitriy Khots West Corporation It takes months to find

More information

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Paper 942-2017 Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Josephine S Akosa, Oklahoma State University ABSTRACT The most commonly reported model evaluation metric

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Evaluation next steps Lift and Costs

Evaluation next steps Lift and Costs Evaluation next steps Lift and Costs Outline Lift and Gains charts *ROC Cost-sensitive learning Evaluation for numeric predictions 2 Application Example: Direct Marketing Paradigm Find most likely prospects

More information

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved.

AcaStat How To Guide. AcaStat. Software. Copyright 2016, AcaStat Software. All rights Reserved. AcaStat How To Guide AcaStat Software Copyright 2016, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Frequencies... 3 List Variables... 4 Descriptives... 5 Explore Means...

More information

Classification and Regression Trees (CART)

Classification and Regression Trees (CART) Classification and Regression Trees (CART) Revised: 10/10/2017 Summary... 1 Data Input... 4 Analysis Options... 6 Tables and Graphs... 7 Analysis Summary... 8 R Tree Diagram... 10 Tree Diagram... 11 R

More information

Quality Control Charts

Quality Control Charts Quality Control Charts General Purpose In all production processes, we need to monitor the extent to which our products meet specifications. In the most general terms, there are two "enemies" of product

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

SPSS Neural Networks 17.0

SPSS Neural Networks 17.0 i SPSS Neural Networks 17.0 For more information about SPSS Inc. software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL

More information

Week 1 Unit 4: Defining Project Success Criteria

Week 1 Unit 4: Defining Project Success Criteria Week 1 Unit 4: Defining Project Success Criteria Business and data science project success criteria: reminder Business success criteria Describe the criteria for a successful or useful outcome to the project

More information

USING NEURAL NETWORKS IN ESTIMATING DEFAULT PROBABILITY A CASE STUDY ON RETAIL LENDING

USING NEURAL NETWORKS IN ESTIMATING DEFAULT PROBABILITY A CASE STUDY ON RETAIL LENDING Laura Maria BADEA, PhD Candidate Department of Economic Cybernetics E-mail: laura.maria.badea@gmail.com USING NEURAL NETWORKS IN ESTIMATING DEFAULT PROBABILITY A CASE STUDY ON RETAIL LENDING Abstract.

More information

Describing DSTs Analytics techniques

Describing DSTs Analytics techniques Describing DSTs Analytics techniques This document presents more detailed notes on the DST process and Analytics techniques 23/03/2015 1 SEAMS Copyright The contents of this document are subject to copyright

More information

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate

More information

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for human health in the past two centuries. Adding chlorine

More information

Segmentation and Targeting

Segmentation and Targeting Segmentation and Targeting Outline The segmentation-targeting-positioning (STP) framework Segmentation The concept of market segmentation Managing the segmentation process Deriving market segments and

More information

Approaching an Analytical Project. Tuba Islam, Analytics CoE, SAS UK

Approaching an Analytical Project. Tuba Islam, Analytics CoE, SAS UK Approaching an Analytical Project Tuba Islam, Analytics CoE, SAS UK Approaching an Analytical Project Starting with questions.. What is the problem you would like to solve? Why do you need analytics? Which

More information

Session 15 Business Intelligence: Data Mining and Data Warehousing

Session 15 Business Intelligence: Data Mining and Data Warehousing 15.561 Information Technology Essentials Session 15 Business Intelligence: Data Mining and Data Warehousing Copyright 2005 Chris Dellarocas and Thomas Malone Adapted from Chris Dellarocas, U. Md. Outline

More information

Segmentation and Targeting

Segmentation and Targeting Segmentation and Targeting Outline The segmentation-targeting-positioning (STP) framework Segmentation The concept of market segmentation Managing the segmentation process Deriving market segments and

More information

CREDIT RISK MODELLING Using SAS

CREDIT RISK MODELLING Using SAS Basic Modelling Concepts Advance Credit Risk Model Development Scorecard Model Development Credit Risk Regulatory Guidelines 70 HOURS Practical Learning Live Online Classroom Weekends DexLab Certified

More information

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization 1) One of SiriusXM's challenges was tracking potential customers

More information

Using Decision Tree to predict repeat customers

Using Decision Tree to predict repeat customers Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the

More information

Seven Basic Quality Tools

Seven Basic Quality Tools Pareto Analysis 1 Seven Basic Quality Tools 1. Process Mapping / Flow Charts* 2. Check Sheets 3. Pareto Analysis 4. Cause & Effect Diagrams 5. Histograms 6. Scatter Diagrams (XY Graph) 7. Control Charts

More information

Text Analysis of American Airlines Customer Reviews

Text Analysis of American Airlines Customer Reviews SESUG 2016 Paper EPO-281 Text Analysis of American Airlines Customer Reviews Rajesh Tolety, Oklahoma State University Saurabh Kumar Choudhary, Oklahoma State University ABSTRACT Which airline should I

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

Machine Learning 101

Machine Learning 101 Machine Learning 101 Mike Alperin September, 2016 Copyright 2000-2016 TIBCO Software Inc. Agenda What is Machine Learning? Decision Tree Models Customer Analytics Examples Manufacturing Examples Fraud

More information

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 54 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 54 (2015 ) 396 404 Eleventh International Multi-Conference on Information Processing-2015 (IMCIP-2015) Predicting Financial

More information

ASSIGNMENT SUBMISSION FORM

ASSIGNMENT SUBMISSION FORM ASSIGNMENT SUBMISSION FORM Treat this as the first page of your assignment Course Name: Assignment Title: Business Analytics using Data Mining Crowdanalytix - Predicting Churn/Non-Churn Status of a Consumer

More information

Who Are My Best Customers?

Who Are My Best Customers? Technical report Who Are My Best Customers? Using SPSS to get greater value from your customer database Table of contents Introduction..............................................................2 Exploring

More information

Advances in Machine Learning for Credit Card Fraud Detection

Advances in Machine Learning for Credit Card Fraud Detection Advances in Machine Learning for Credit Card Fraud Detection May 14, 2014 Alejandro Correa Bahnsen Introduction Europe fraud evolution Internet transactions (millions of euros) 800 700 600 500 2007 2008

More information

Decision Tree Learning. Richard McAllister. Outline. Overview. Tree Construction. Case Study: Determinants of House Price. February 4, / 31

Decision Tree Learning. Richard McAllister. Outline. Overview. Tree Construction. Case Study: Determinants of House Price. February 4, / 31 1 / 31 Decision Decision February 4, 2008 2 / 31 Decision 1 2 3 3 / 31 Decision Decision Widely Used Used for approximating discrete-valued functions Robust to noisy data Capable of learning disjunctive

More information

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat Startup Machine Learning: Bootstrapping a fraud detection system Michael Manapat Stripe @mlmanapat About me: Engineering Manager of the Machine Learning Products Team at Stripe About Stripe: Payments infrastructure

More information

Improved Risk Management via Data Quality Improvement

Improved Risk Management via Data Quality Improvement Improved Risk Management via Data Quality Improvement Prepared by: David Loshin Knowledge Integrity, Inc. January, 2011 Sponsored by: 2011 Knowledge Integrity, Inc. 1 Introduction All too frequently, we

More information

Strategic inventory management through analytics

Strategic inventory management through analytics Strategic inventory management through analytics BY SEEMA PHULL, ED LAWTON AND REGINALDO A. MONTAGUE EXECUTIVE SUMMARY Many organizations hold plenty of inventory, which in theory means they should be

More information

Lithium-Ion Battery Analysis for Reliability and Accelerated Testing Using Logistic Regression

Lithium-Ion Battery Analysis for Reliability and Accelerated Testing Using Logistic Regression for Reliability and Accelerated Testing Using Logistic Regression Travis A. Moebes, PhD Dyn-Corp International, LLC Houston, Texas tmoebes@nasa.gov Biography Dr. Travis Moebes has a B.S. in Mathematics

More information

Online Student Guide Types of Control Charts

Online Student Guide Types of Control Charts Online Student Guide Types of Control Charts OpusWorks 2016, All Rights Reserved 1 Table of Contents LEARNING OBJECTIVES... 4 INTRODUCTION... 4 DETECTION VS. PREVENTION... 5 CONTROL CHART UTILIZATION...

More information

STATISTICAL TECHNIQUES. Data Analysis and Modelling

STATISTICAL TECHNIQUES. Data Analysis and Modelling STATISTICAL TECHNIQUES Data Analysis and Modelling DATA ANALYSIS & MODELLING Data collection and presentation Many of us probably some of the methods involved in collecting raw data. Once the data has

More information

SAS Machine Learning and other Analytics: Trends and Roadmap. Sascha Schubert Sberbank 8 Sep 2017

SAS Machine Learning and other Analytics: Trends and Roadmap. Sascha Schubert Sberbank 8 Sep 2017 SAS Machine Learning and other Analytics: Trends and Roadmap Sascha Schubert Sberbank 8 Sep 2017 How Big Analytics will Change Organizations Optimization and Innovation Optimizing existing processes Customer

More information

Quantitative Methods. Presenting Data in Tables and Charts. Basic Business Statistics, 10e 2006 Prentice-Hall, Inc. Chap 2-1

Quantitative Methods. Presenting Data in Tables and Charts. Basic Business Statistics, 10e 2006 Prentice-Hall, Inc. Chap 2-1 Quantitative Methods Presenting Data in Tables and Charts Basic Business Statistics, 10e 2006 Prentice-Hall, Inc. Chap 2-1 Learning Objectives In this chapter you learn: To develop tables and charts for

More information

Report for PAKDD 2007 Data Mining Competition

Report for PAKDD 2007 Data Mining Competition Report for PAKDD 2007 Data Mining Competition Li Guoliang School of Computing, National University of Singapore April, 2007 Abstract The task in PAKDD 2007 data mining competition is a cross-selling business

More information

Seven Basic Quality Tools. SE 450 Software Processes & Product Metrics 1

Seven Basic Quality Tools. SE 450 Software Processes & Product Metrics 1 Seven Basic Quality Tools SE 450 Software Processes & Product Metrics 1 The Seven Basic Tools Checklists (Checksheets) Pareto Diagrams Histograms Run Charts Scatter Diagrams (Scatter Plots) Control Charts

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION v171025 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

Data Analysis and Sampling

Data Analysis and Sampling Data Analysis and Sampling About This Course Course Description In order to perform successful internal audits, you must know how to reduce a large data set down to critical subsets based on risk or importance,

More information

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software Integrating Market and Credit Risk Measures using SAS Risk Dimensions software Sam Harris, SAS Institute Inc., Cary, NC Abstract Measures of market risk project the possible loss in value of a portfolio

More information

Advanced Analytics through the credit cycle

Advanced Analytics through the credit cycle Advanced Analytics through the credit cycle Alejandro Correa B. Andrés Gonzalez M. Introduction PRE ORIGINATION Credit Cycle POST ORIGINATION ORIGINATION Pre-Origination Propensity Models What is it?

More information

Using Data Mining Techniques on APC Data to Develop Effective Bus Scheduling Plans

Using Data Mining Techniques on APC Data to Develop Effective Bus Scheduling Plans Using Data Mining Techniques on APC Data to Develop Effective Bus Scheduling Plans Jayakrishna PATNAIK Department of Industrial and Manufacturing Engineering, NJIT Steven CHIEN Department of Civil and

More information

FMS New York/ New Jersey Chapter Meeting January 14, The Impact of Models. by: Scott Baranowski

FMS New York/ New Jersey Chapter Meeting January 14, The Impact of Models. by: Scott Baranowski FMS New York/ New Jersey Chapter Meeting January 14, 2015 The Impact of Models by: Scott Baranowski MEMBER OF PKF NORTH AMERICA, AN ASSOCIATION OF LEGALLY INDEPENDENT FIRMS 2010 Wolf & Company, P.C. About

More information

Tutorial #3: Brand Pricing Experiment

Tutorial #3: Brand Pricing Experiment Tutorial #3: Brand Pricing Experiment A popular application of discrete choice modeling is to simulate how market share changes when the price of a brand changes and when the price of a competitive brand

More information

Chapter 8 Analytical Procedures

Chapter 8 Analytical Procedures Slide 8.1 Principles of Auditing: An Introduction to International Standards on Auditing Chapter 8 Analytical Procedures Rick Hayes, Hans Gortemaker and Philip Wallage Slide 8.2 Analytical procedures Analytical

More information

Predicting Yelp Ratings From Business and User Characteristics

Predicting Yelp Ratings From Business and User Characteristics Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

Identification of Rogue Tools and Process Stage Drift by using JMP Software Visualization and Analytical Techniques

Identification of Rogue Tools and Process Stage Drift by using JMP Software Visualization and Analytical Techniques September 16 th, 2010 Identification of Rogue Tools and Process Stage Drift by using JMP Software Visualization and Analytical Techniques Jim Nelson Engineering Data Specialist, Freescale Semiconductor,

More information

Churn Reduction in the Wireless Industry

Churn Reduction in the Wireless Industry Mozer, M. C., Wolniewicz, R., Grimes, D. B., Johnson, E., & Kaushansky, H. (). Churn reduction in the wireless industry. In S. A. Solla, T. K. Leen, & K.-R. Mueller (Eds.), Advances in Neural Information

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 12.0 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Proven Strategies for Finding Profitable Seller Carryback Notes

Proven Strategies for Finding Profitable Seller Carryback Notes Advanced Seller Data Services Exclusively serving note investors and brokers since 2004 15685 SW 116 th Avenue Phone: 1-800-992-4536 Suite 136 Fax: 503-549-0589 Tigard, OR 97224 e-mail: sellerdata@comcast.net

More information

TRANSPORTATION ASSET MANAGEMENT GAP ANALYSIS TOOL

TRANSPORTATION ASSET MANAGEMENT GAP ANALYSIS TOOL Project No. 08-90 COPY NO. 1 TRANSPORTATION ASSET MANAGEMENT GAP ANALYSIS TOOL USER S GUIDE Prepared For: National Cooperative Highway Research Program Transportation Research Board of The National Academies

More information

Forecasting Software

Forecasting Software Appendix B Forecasting Software B1 APPENDIX B Forecasting Software Good forecasting software is essential for both forecasting practitioners and students. The history of forecasting is to a certain extent

More information

Forecasting Cash Withdrawals in the ATM Network Using a Combined Model based on the Holt-Winters Method and Markov Chains

Forecasting Cash Withdrawals in the ATM Network Using a Combined Model based on the Holt-Winters Method and Markov Chains Forecasting Cash Withdrawals in the ATM Network Using a Combined Model based on the Holt-Winters Method and Markov Chains 1 Mikhail Aseev, 1 Sergei Nemeshaev, and 1 Alexander Nesterov 1 National Research

More information

A PRIMER TO MACHINE LEARNING FOR FRAUD MANAGEMENT

A PRIMER TO MACHINE LEARNING FOR FRAUD MANAGEMENT A PRIMER TO MACHINE LEARNING FOR FRAUD MANAGEMENT TABLE OF CONTENTS Growing Need for Real-Time Fraud Identification... 3 Machine Learning Today... 4 Big Data Makes Algorithms More Accurate... 5 Machine

More information

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial

Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds Overview In order for accuracy and precision to be optimal, the assay must be properly evaluated and a few

More information

EXPERIENCES ON INCREMENTAL RESPONSE MODELLING

EXPERIENCES ON INCREMENTAL RESPONSE MODELLING EXPERIENCES ON INCREMENTAL RESPONSE MODELLING SAS User Forum Finland Helsinki May 24th 2017 Jaakko Riihimäki Senior Data Analyst, Customer Insights & Analysis Telia Finland Oyj OUTLINE Telia Company Incremental

More information

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005 Biological question Experimental design Microarray experiment Failed Lecture : Discrimination (cont) Quality Measurement Image analysis Preprocessing Jane Fridlyand Pass Normalization Sample/Condition

More information

Credit Risk Models Cross-Validation Is There Any Added Value?

Credit Risk Models Cross-Validation Is There Any Added Value? Credit Risk Models Cross-Validation Is There Any Added Value? Croatian Quants Day Zagreb, June 6, 2014 Vili Krainz vili.krainz@rba.hr The views expressed during this presentation are solely those of the

More information

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge MODELING THE EXPERT An Introduction to Logistic Regression 15.071 The Analytics Edge Ask the Experts! Critical decisions are often made by people with expert knowledge Healthcare Quality Assessment Good

More information

An Assessment of Deforestation Models for Reducing Emissions from Deforestation & forest Degradation (REDD)

An Assessment of Deforestation Models for Reducing Emissions from Deforestation & forest Degradation (REDD) An Assessment of Deforestation Models for Reducing Emissions from Deforestation & forest Degradation (REDD) A Case Study of Chiquitanía, Bolivia Oh Seok Kim Department of Geography, University of Southern

More information

Multistage Cross-Sell Model of Employers in the Financial Industry

Multistage Cross-Sell Model of Employers in the Financial Industry Paer 4-8 Multistage Cross-Sell Model of Emloyers in the Financial Industry Kwan Park and Steve Donohue The Princial Financial Grou ABSTRACT This aer details the stes to develo a multistage cross-sell model

More information

EST Accuracy of FEL 2 Estimates in Process Plants

EST Accuracy of FEL 2 Estimates in Process Plants EST.2215 Accuracy of FEL 2 Estimates in Process Plants Melissa C. Matthews Abstract Estimators use a variety of practices to determine the cost of capital projects at the end of the select stage when only

More information

HTS Report. d2-r. Test of Attention Revised. Technical Report. Another Sample ID Date 14/04/2016. Hogrefe Verlag, Göttingen

HTS Report. d2-r. Test of Attention Revised. Technical Report. Another Sample ID Date 14/04/2016. Hogrefe Verlag, Göttingen d2-r Test of Attention Revised Technical Report HTS Report ID 467-500 Date 14/04/2016 d2-r Overview 2 / 16 OVERVIEW Structure of this report Narrative Introduction Verbal interpretation of standardised

More information

How to build and deploy machine learning projects

How to build and deploy machine learning projects How to build and deploy machine learning projects Litan Ilany, Advanced Analytics litan.ilany@intel.com Agenda Introduction Machine Learning: Exploration vs Solution CRISP-DM Flow considerations Other

More information

EnterpriseOne JDE5 Forecasting PeopleBook

EnterpriseOne JDE5 Forecasting PeopleBook EnterpriseOne JDE5 Forecasting PeopleBook May 2002 EnterpriseOne JDE5 Forecasting PeopleBook SKU JDE5EFC0502 Copyright 2003 PeopleSoft, Inc. All rights reserved. All material contained in this documentation

More information

Data-driven metallurgical design for high strength low alloy (HSLA) steel

Data-driven metallurgical design for high strength low alloy (HSLA) steel Graduate Theses and Dissertations Graduate College 2008 Data-driven metallurgical design for high strength low alloy (HSLA) steel Wei Hu Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Sampling Steps. Decide whether or not to sample. (Applies to all circumstances.) Identify the available data, records, and supporting documents.

Sampling Steps. Decide whether or not to sample. (Applies to all circumstances.) Identify the available data, records, and supporting documents. Sampling Steps I. PLAN THE SAMPLE A. Decide whether or not to sample. (Applies to all circumstances.) Define the audit objective. The audit objective usually comes directly from the audit program or is

More information

Office Virtual Training: Quarter-End Billing

Office Virtual Training: Quarter-End Billing Morningstar SM Office Virtual Training: Quarter-End Billing Overview - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 Reviewing the Billing Process...............................................

More information

Technology Consulting Analytics solutions for manufacturing and industrial products

Technology Consulting Analytics solutions for manufacturing and industrial products www.pwc.in Technology Consulting Analytics solutions for manufacturing and industrial products Overview Technological and digital innovations are transforming the manufacturing and industrial products

More information

Why Learn Statistics?

Why Learn Statistics? Why Learn Statistics? So you are able to make better sense of the ubiquitous use of numbers: Business memos Business research Technical reports Technical journals Newspaper articles Magazine articles Basic

More information

Analysing which factors are of influence in predicting the employee turnover

Analysing which factors are of influence in predicting the employee turnover Analysing which factors are of influence in predicting the employee turnover Research Paper Business Analytics HMN Yousaf Supervised by Dr. Sandjai Bhulai Vrije Universiteit Amsterdam Faculty of Sciences

More information

Clearing the Clutter: An Overview of the Marketing Analytics Ecosystem

Clearing the Clutter: An Overview of the Marketing Analytics Ecosystem Clearing the Clutter: An Overview of the Marketing Analytics Ecosystem Dr. Michael Koved Lecturer, School of Information November 14th, 2017 mkoved@ischool.berkeley.edu Introduction Speaker Introduction

More information

Getting Started with HLM 5. For Windows

Getting Started with HLM 5. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing

More information

Chapter 1 INTRODUCTION TO SIMULATION

Chapter 1 INTRODUCTION TO SIMULATION Chapter 1 INTRODUCTION TO SIMULATION Many problems addressed by current analysts have such a broad scope or are so complicated that they resist a purely analytical model and solution. One technique for

More information

LOGISTICAL ASPECTS OF THE SOFTWARE TESTING PROCESS

LOGISTICAL ASPECTS OF THE SOFTWARE TESTING PROCESS LOGISTICAL ASPECTS OF THE SOFTWARE TESTING PROCESS Kazimierz Worwa* * Faculty of Cybernetics, Military University of Technology, Warsaw, 00-908, Poland, Email: kazimierz.worwa@wat.edu.pl Abstract The purpose

More information

Understanding and accounting for product

Understanding and accounting for product Understanding and Modeling Product and Process Variation Variation understanding and modeling is a core component of modern drug development. Understanding and accounting for product and process variation

More information

TOPIC #12: GENERAL PUBLIC EXPOSURES SYNOPSIS

TOPIC #12: GENERAL PUBLIC EXPOSURES SYNOPSIS TOPIC #12: GENERAL PUBLIC EXPOSURES SYNOPSIS Prepared by T. Dan Bracken T. Dan Bracken, Inc. Portland, OR Purpose To characterize the EMF exposure of the general public. Introduction Exposure of the general

More information

Ranking Potential Customers based on GroupEnsemble method

Ranking Potential Customers based on GroupEnsemble method Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,

More information

EVALUATING THE ACCURACY OF 2005 MULTITEMPORAL TM AND AWiFS IMAGERY FOR CROPLAND CLASSIFICATION OF NEBRASKA INTRODUCTION

EVALUATING THE ACCURACY OF 2005 MULTITEMPORAL TM AND AWiFS IMAGERY FOR CROPLAND CLASSIFICATION OF NEBRASKA INTRODUCTION EVALUATING THE ACCURACY OF 2005 MULTITEMPORAL TM AND AWiFS IMAGERY FOR CROPLAND CLASSIFICATION OF NEBRASKA Robert Seffrin, Statistician US Department of Agriculture National Agricultural Statistics Service

More information

STP351: Purchase Order Collaboration in SNC

STP351: Purchase Order Collaboration in SNC SAP Training Source To Pay STP351: Purchase Order Collaboration in SNC External User Training Version: 4.0 Last Updated: 03-Apr-2017 3M Business Transformation & Information Technology Progress set in

More information

Journal of Business & Economics Research January 2006 Volume 4, Number 1

Journal of Business & Economics Research January 2006 Volume 4, Number 1 Family Owned Business Fraud: The Silent Thief M. Tony Bledsoe, (E-mail: bledsoem@meredith.edu), Meredith College Susan B. Wessels, (E-mail: wesselss@meredith.edu), Meredith College Abstract The ACFE 2002

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................3

More information

Performance of Energy Management Systems

Performance of Energy Management Systems Performance of Energy Management Systems Greg Wheeler, Oregon State University Performance of building Energy Management Systems (EMS) has been controversial in several studies due to lack of measured

More information

Harbingers of Failure: Online Appendix

Harbingers of Failure: Online Appendix Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine

More information

Pareto Charts [04-25] Finding and Displaying Critical Categories

Pareto Charts [04-25] Finding and Displaying Critical Categories Introduction Pareto Charts [04-25] Finding and Displaying Critical Categories Introduction Pareto Charts are a very simple way to graphically show a priority breakdown among categories along some dimension/measure

More information

USER S GUIDE. ESTCP Project ER Methods for Minimization and Management of Variability in Long-Term Groundwater Monitoring Results

USER S GUIDE. ESTCP Project ER Methods for Minimization and Management of Variability in Long-Term Groundwater Monitoring Results USER S GUIDE Methods for Minimization and Management of Variability in Long-Term Groundwater Monitoring Results ESTCP Project ER-201209 Dr. Thomas McHugh GSI Environmental, Inc. SEPTEMBER 2015 Distribution

More information

FI334 Umoja Month-End Closing Process. Umoja Period and Year End Closing Process Version 17

FI334 Umoja Month-End Closing Process. Umoja Period and Year End Closing Process Version 17 FI334 Umoja Month-End Closing Process Umoja Period and Year End Closing Process Version 17 Last Copyright Modified: United 16-August-13 Nations 1 Agenda Course Introduction Module 1: Pre-closing Check/Readiness

More information

Predicting Yelp Restaurant Reviews

Predicting Yelp Restaurant Reviews Predicting Yelp Restaurant Reviews Wael Farhan UCSD: A53070918 9500 Gilman Drive La Jolla, CA, 92093 wfarhan@eng.ucsd.edu ABSTRACT Starting a restaurant is a tricky business. Restaurant owners have to

More information

Forecasting erratic demand of medicines in a public hospital: A comparison of artificial neural networks and ARIMA models

Forecasting erratic demand of medicines in a public hospital: A comparison of artificial neural networks and ARIMA models Int'l Conf. Artificial Intelligence ICAI' 1 Forecasting erratic demand of medicines in a public hospital: A comparison of artificial neural networks and ARIMA models A. Molina Instituto Tecnológico de

More information