2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

Size: px
Start display at page:

Download "2 Maria Carolina Monard and Gustavo E. A. P. A. Batista"

Transcription

1 Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of Computer Science and Statistics SCE Laboratory of Computational Intelligence LABIC P. O. Box 668, , São Carlos, SP, Brazil {gbatista, Abstract. Evaluating the performance of classifiers is not as trivial as it would seem at a first glance. Even the most widely used methods such as measuring accuracy or error rate on a test set has severe limitations. Two of the most prominent limitations of these measures are that they do not consider misclassification costs and can be misleading when the classes have very different prior probabilities. On the last years, several researches have pointed out alternative methods to evaluate the performance of learning systems. Some of those methods are based on graphical evaluation of classifiers. Usually, a graphical evaluation lets the user analyze the performance of a classifier under different scenarios, for instance, with different misclassification costs, and to select the classifier parameters setting that provides the best result. The objective of this paper is to survey some of the most used graphical methods for performance evaluation, which do not rely on precise class and cost distribution information. 1 Introduction In supervised learning, a set of n training examples is given to an inducer. Each example E i is a tuple ( x i, y i ), where x i is a vector of m features values and y i is the class value. The main objective in supervised learning is to induce a general mapping of the vectors x to the class values y. Thus, the inducer should build a model, y = f( x), of an unknown function, f, also known as concept function, which predicts y values for previously unseen examples. However, in most cases, the number of examples used to induce a model is not sufficient to completely characterize the function f. In fact, the inducers are usually able to induce a function h that approximates f, i.e., h( x) f( x), where h is known as the hypotheses of the concept function f. For classification problems, the y values are drawn from a discrete set of classes C = {C 1, C 2,... C Ncl }, where Ncl is the number of classes. Given a set of training examples, the learning algorithm outputs a classifier such that, given a new unlabelled example, it accurately predicts the label y. Assuming the vectors x correspond to points

2 2 Maria Carolina Monard and Gustavo E. A. P. A. Batista in a m-dimensional space, R m, the objective is to find a function h that approximates the function f : R m C. Thus, h is a classifier that outputs a class value C k C for each new example x. In this work, we reserve our discussion to concept-learning 1 domains, so y can assume one of two mutually exclusive values. We use the general labels positive and negative to discriminate between the two class values. Evaluating the performance of classifiers is not as trivial as it would seen at a first glance. Even the most widely used methods such as measuring accuracy or error rate on a test set has severe limitations. Two of the most prominent limitations of these measures are that they do not consider misclassification costs and can be misleading when the classes have very different prior probabilities. On the last years, several researches have pointed out alternative methods to evaluate the performance of learning systems. Some of those methods are based on a graphical evaluation of classifiers. Usually, a graphical evaluation lets the user analyze the performance of a classifier under different scenarios, for instance, with different misclassification costs, and to select the classifier s parameters setting that provides the best result. The objective of this paper is to survey some of the most used graphical methods for performance evaluation, which do not rely on precise class and cost distribution information. This work is organized as follows: Section 2 describes the necessary conditions to use accuracy and error rates to measure classifiers performance; Section 3 discusses cost sensitive learning and Section 4 introduces probabilistic classifiers; Section 5 presents three graphical methods that can be used when accuracy is not an appropriated measure for classifiers performance and Section 6 concludes this work. 2 Accuracy and Error Rate Accuracy and error rate are appropriate measures when misclassification costs and prior probabilities of each class are the same. However, these assumptions can be hardly confirmed in practice. In most application domains, each type of error that can be performed by a classifier has different costs. For instance, in fraud detection for financial applications, the cost of generating a false alarm is usually lower than the cost of not detecting a fraud. Also, several researches have reported that a large intrinsic difference in prior probability of each class is common for a number of domains. In other words, there is a large difference among the number of examples belonging to each class. Using the same example, fraud detection, the number of fraudulent transactions is usually much smaller than the number of regular transactions. When there is a significant difference among the prior probability of each class, error rate and accuracy can be very misleading metrics. For instance, it is straightforward to create a classifier 99% accurate if the data set has a majority class with 99% of all examples, by simply classifying every new case as belonging to the majority class. Different types of errors and hits performed by a classifier can be summarized in a confusion matrix. In Table 1 is illustrated a confusion matrix for a two-class problem. 1 However, the methods discussed here can be adapted to multi-class problems by considering the class under study as the positive class, and the remaining classes as the negative class.

3 Graphical Methods for Classifier Performance Evaluation 3 Positive Prediction Negative Prediction Positive Class True Positive (a) False Negative (b) Negative Class False Positive (c) True Negative (d) Table 1: Different types of errors and hits for a two-class problem. For a multi-class problem with Ncl classes, the confusion matrix will have Ncl 2 entries. The correct classifications lie on the diagonal line, and the off-diagonal entries contain the various cross-classification errors. Several metrics to measure the performance of learning systems can be extracted c+b from a two-class confusion matrix, such as error rate Err = and accuracy a+b+c+d Acc = a+d = 1 Err. Also, we can derive other metrics that disassociate the errors, a+b+c+d or hits, occurred in each class. These metrics measure the classification performance on the positive and negative classes independently. Some of these metrics are: False negative rate: F N = b is the percentage of positive cases misclassified a+b as belonging to the negative class; False positive rate: F P = c is the percentage of negative cases misclassified c+d as belonging to the positive class; True negative rate: T N = d = 1 F P is the percentage of negative cases c+d correctly classified as belonging to the negative class; True positive rate: T P = a = 1 F N is the percentage of positive cases a+b correctly classified as belonging to the positive class; 3 Cost Sensitive Learning A cost-sensitive learning system can be used in applications where the misclassification costs are known. A misclassification cost is simply a value that is assigned as a penalty for making a mistake. In this case, misclassification cost can be used in substitution to error rate, and a cost-sensitive learning system attempts to reduce the cost of misclassified examples instead of classification error. Usually, a cost matrix is used to define the costs associated to a domain. A cost matrix is similar to a confusion matrix. Each entry of a cost matrix defines a constant cost for each type of error that can be committed by a classifier. Given a confusion matrix and a cost matrix, the total misclassification cost, T C, can be computed using Equation 1. Ncl Ncl T C = Conf ij Cost ij (1) i=1 j=1 where, Conf ij is the number of errors in the confusion matrix and Cost ij is the cost for that type of misclassification. If the values on the diagonal line are represented with negative costs, then these values can be interpreted as gains or benefits. So far, fixed numerical values have been used to measure costs. In a utility model of performance analysis, measures of cost can be modified by a function called utility function. The nature of this function is part of the specification of the problem under study. Utility theory is widely used in economic analysis. For instance, a utility function based on wealth might be used to modify cost

4 4 Maria Carolina Monard and Gustavo E. A. P. A. Batista values of an uncertain investment decision, because the risk in investing $10,000 is much greater for a small investor than for a large one [10]. Some learning systems are not able to integrate cost information into the learning process. However, there is a simple and general method to make any learning system cost-sensitive for a concept-learning problem if the costs are known and are constant [2]. The idea is to change the class distributions in the training set towards the most costly class. Suppose that the positive class is five times more costly than the negative class. If the number of positive examples are artificially increased by a factor of five, then the learning system, aiming to reduce the number of classification errors, will come up with a classifier that is skewed towards the avoidance of error in the positive class, since any such errors are penalized 5 times more. In [4] is provided a theorem that shows how to change the proportion of positive and negative examples in order to make optimal cost-sensitive classifications for a concept-learning problem. Moreover, a general method to make a learning system cost-sensitive is presented in [3]. This last method has the advantage of being applicable to multi-class problems. 4 Probabilistic Classifiers Most learning algorithms can be adapted to produce probabilistic classifiers, i.e., to induce a classifier that produces probabilities of an example being in each class. In this scenario, given a new example x, the classifier does not output a class value, but a tuple (P (C 1 ), P (C 2 ),... P (C Ncl )), where P (C k ) is the probability that x belongs to class C k. Naive Bayes is an intrinsically probabilistic classifier, but other learning system can be adapted to produce such posterior probabilities estimates. For instance, in decision trees, the class distributions at the leaves can be used as an estimate. Rule learning systems can make similar estimates with the class distributions in each rule, and neural networks produce continuous outputs that can be mapped to probability estimates. In fact, it might be more natural to take the probability of each class into account when judging correctness. For instance, an outcome predicted with a probability of 99% should perhaps weigh more heavily than one predicted with a probability of 51%. Several works have shown that symbolic Machine Learning algorithms, specially decision trees, produce poor probability estimates. In [8], it is concluded that the limitation of the decision trees algorithms for probability estimation is not on the tree structure but on the tree-building algorithm. The use of Laplace correction is a very simple and effective method for improving the quality of a tree s probability estimates [1]. Even though decision trees do not produce good probability estimates, in [7] it is shown that decision trees produce surprisingly good probability rankings. Probability rankings are the basis for building graphical methods for performance analysis, as discussed in the next section. 5 Graphical Performance Analysis with Probabilistic Classifiers In practice, costs are rarely known with accuracy, and the analyst might want to ponder different scenarios with different classifications costs. For instance, in direct mailing, the number of respondents is much smaller than the number of non-respondents, usually the respondents are less than 1%. Suppose that a mail campaign with a promotional offer is

5 Graphical Methods for Classifier Performance Evaluation 5 going to be sent to 100,000 households, and it is expected that 1% of them will respond, i.e. 1,000 respondents. Using a predictive model, with a certain parameters setting, an analyst may be able to select 40,000 households (40%) for which the response rate is estimated to be 2%, i.e. 800 respondents. With another parameters setting, he may be able to select a smaller set of 10,000 households (10%) expecting a response rate of 3%. Which setting should be selected? The answer depends on the cost of sending each offer and the profit obtained by selling each product. Table 2 2 shows a hypothetical scenario in which the cost of sending each offer is estimated to be $0.70 and the profit obtained by selling each product is $ In mass mailing, the cost of the mail campaign is greater than the profit obtained in selling the offered product with a response rate of 1%. Mailing 40% of the customer base was the scenario that provided the best results comparing the scenarios analyzed in Table 2. However, how could we find out the best number of customers to be mailed, given the cost of mailing and the profit per product sold? This question can be answered with the aid of graphical methods for performance analysis, described next. Mass mailing Direct mailing Direct mailing (100%) (40%) (10%) Number of customers mailed 100,000 40,000 10,000 Cost of printing and mailing ($0.70 each) 70,000 28,000 7,000 Response rate 1% 2% 3% Number of products sold Profit from sale ($50.00 each) 50,000 40,000 15,000 Net profit -20,000 12,000 8,000 Table 2: Profit analysis for a direct mail campaign. 5.1 Lift Graph Assume the class under interest is the positive class. In the previous example, the positive class is the class of households who will purchase the product under offer. Given a classifier that outputs probabilities, each example in the test set can be labelled with the probability the example belongs to the positive class, i.e., P (positive). If the test set is labelled in descending order of the predicted probability, then it should be similar to the data represented in Table 3. If the learning system is able to identify some predictive patterns, then it is expected that there are more positive examples than negative in the top ranked examples. This ranking from most likely to least likely makes possible to choose any number of examples from the test set. For instance, the top 10 ranked examples could be selected for the mail campaign, and 8 of them will respond. This is the basic idea behind the lift graph. A lift graph is a widely used method in database marketing [6], and it is built over a test set. The lift graph shows the relationship between the set of X% top ranked examples and the number of positive examples in this set. The number of positive examples can be expressed as a percentage of the total number of positive examples in the test set. Figure 1 shows a lift graph for the hypothetical direct mailing example. 2 The cost of mining the data is not considered for simplicity.

6 6 Maria Carolina Monard and Gustavo E. A. P. A. Batista Rank Predicted Actual Rank Predicted Actual Probability Class Probability Class positive negative positive positive negative positive positive positive positive negative positive negative positive positive negative positive positive negative positive Table 3: Hypothetical test set with examples ranked by the probability of belonging to the positive class. Figure 1: A hypothetical lift graph. The lift graph shows a diagonal line and a curve. The curve, also called lift curve, represents the performance obtained by the classifier. The x-axis represents the number of examples of the test set that were selected according to the probabilistic ranking generated by the classifier. The y-axis represents the percentage of positive examples in the subset of selected examples. This percentage is calculated over the total number of examples in the test set. The diagonal line represents a random classifier, i.e., a classifier that selects a random subset of examples from the test set. For instance, if 50% of the test set examples were selected, it is expected that 50% of the positive examples would be in this set. The graph in Figure 1 emphasizes two points in the lift curve. The first one represents the selection of the top 10% ranked examples of the test set, and the second one represents the selection of the top 40%. These selections result in mailing 30% and 80% of the positive examples, respectively. These choices are the same shown in Table 2. Lift graphs are independent of costs and class distribution. This property allows the user to analyze different scenarios in which the selection of a larger subset of examples results in a larger number of contacted buyers. Through the selection of subsets with different sizes and a profit analysis similar to the one shown in Table 2, the user may decide which subset size will provide an appropriate result. With the addition of cost information to a lift graph, it is possible to obtain a more

7 Graphical Methods for Classifier Performance Evaluation 7 direct answer to the question proposed in the beginning of Section 5, i.e., to find out the number of customers to be mailed that gives the higher profit, given the cost of the mailing and the profit per product sold. 5.2 ROI Graph A ROI (Return of Investment) graph is similar to a lift graph. However, the gain obtained by the classifier is expressed in terms of profit instead of percentage of positive examples. In order to build a ROI graph, the same procedure used to build a lift graph is applied, i.e., selecting the top X% ranked examples in a test set and calculating the profit obtained in these examples. Figure 2 shows a ROI graph for the example of direct marketing. As in lift graphs, ROI graphs usually present a diagonal line and a curve that is also called ROI curve. The ROI curve represents the profit obtained by the classifier under analysis, and the diagonal line the profit obtained by a random classifier. Figure 2: A hypothetical ROI graph. The profit is usually calculated using the total cost given by Equation 1, and associating negative costs with correct classifications. Consequently, a ROI curve is dependent on a specific cost matrix, and in order to analyze the behavior of a classifier under different cost scenarios, it is necessary to plot one curve for each cost matrix. Frequently, a ROI curve presents a maximum point that provides a maximum return of investment. From this point, the percentage of the top ranked test set examples can be identified in order to obtain the best net profit for a certain cost scenario. The graph in Figure 2 shows the point having the maximum return of investment, as well as the returns obtained by selecting 10% and 40% of the test set. The later two points were used in the analysis presented in Table 2.

8 8 Maria Carolina Monard and Gustavo E. A. P. A. Batista 5.3 ROC Graph T P, T N, F P and F N are four performance measures that have the advantage of being independent of class costs and prior probabilities. It is obvious that the main objective of a classifier is to minimize the false positive and negative rates or, similarly, to maximize the true negative and positive rates. Unfortunately, for most real world applications, there is a tradeoff between F N and F P and, similarly, between T N and T P. The ROC 3 graphs [9] can be used to analyze the relationship between F N and F P (or T N and T P ) for a classifier. Lets continue considering the positive class as the class under study. On a ROC graph, T P is plotted on the y-axis and F P is plotted on the x-axis. One approach to plot a ROC graph is to use a probabilistic classifier. A threshold parameter determines the final classification. For instance, the threshold can be set to 0.90 and only the examples labelled with positive class probability higher than the threshold are labelled as positive, the remaining examples are labelled as negative. We can construct more or less strict classifiers by varying the threshold. Plotting all the ROC points that can be produced by varying these parameters produces a ROC curve for the classifier. Typically this is a discrete set of points, including (0,0) and (1,1), which are connected by line segments. Figure 3 illustrates a ROC graph of 3 classifiers: A, B and C. Several points on a ROC graph should be noted. The lower left point (0,0) represents a strategy that classifies every example as belonging to the negative class. The upper right point represents a strategy that classifies every example as belonging to the positive class. The point (0,1) represents the perfect classification, and the line x = y represents the strategy of random guessing the class. Figure 3: A ROC graph for 3 classifiers. From a ROC graph is possible to calculate an overall measure of quality, the under the ROC curve area (AUC). The AUC is the fraction of the total area that falls under the ROC curve. This measure is equivalent to several other statistical measures for evaluating classification and ranking models [5]. The AUC effectively factors in the performance of a classifier over all costs and distributions. However, it is important to 3 ROC is an acronym for Receiver Operating Characteristic, a term used in signal detection to characterize the tradeoff between hit rate and false alarm rate over a noisy channel.

9 Graphical Methods for Classifier Performance Evaluation 9 note that for a specific cost matrix, the classifier with maximum AUC may not be the best classifier. 6 Conclusion The traditional way to build a classification system consists in experimenting with many different classifiers, comparing their performance in terms of accuracy and choosing the classifier that performs best. However, accuracy is often not an appropriated measure of classifier performance, specially in classification problems with heavily imbalanced classes and asymmetric misclassification costs. In practice, costs are rarely known with accuracy, thus it is interesting to ponder various different scenarios. In this work we have described three methods, lift, ROI and ROC graph, that can be applied whenever there is a learning scheme that outputs probabilities, like Naive Bayes does, for the predicted class of each member of the set of test instances. These sort of tools can aid in freeing researchers from the need to have precise class and cost distribution information. Acknowledgements. The authors would like to thank Ronaldo C. Prati for his helpful comments on the draft of this paper. This research is partially supported by Brazilian Research Councils CAPES and FAPESP. References [1] E. Bauer and R. Kohavi. An Empirical Comparision of Voting Classification Algorithms: Bagging, Bosting and Variants. Machine Learning, 36: , [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth & Books, Pacific Grove, CA, [3] Pedro Domingos. MetaCost: A General Method for Making Classifiers Cost-Sensitive. In Knowledge Discovery and Data Mining, pages , [4] Charles Elkan. The Foundations of Cost-Sensitive Learning. In Seventeenth International Joint Conference on Artificial Intelligence, pages , [5] David J. Hand. Construction and Assessment of Classification Rules. John Wiley and Sons, [6] Charles X. Ling and Chenghui Li. Data Mining for Direct Mining: Problems and Solutions. In Forth International Conference on Knownledge Discovery and Data Mining, pages 73 79, [7] D. D. Margineantu and T. G. Dietterich. Improved Class Probability Estimates from Decision Tree Models. In Nonlinear Estimation and Classification, pages , Lecture Notes in Statistics, 171. [8] Foster J. Provost and Pedro Domingos. The Induction for Probability-based Ranking. Machine Learning, 52(3): , [9] Foster J. Provost and Tom Fawcett. Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In Knowledge Discovery and Data Mining, pages 43 48, [10] S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn. Morgan Kaufmann, San Mateo, CA, 1991.

Chapter 5 Evaluating Classification & Predictive Performance

Chapter 5 Evaluating Classification & Predictive Performance Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available

More information

Evaluation next steps Lift and Costs

Evaluation next steps Lift and Costs Evaluation next steps Lift and Costs Outline Lift and Gains charts *ROC Cost-sensitive learning Evaluation for numeric predictions 2 Application Example: Direct Marketing Paradigm Find most likely prospects

More information

Credit Card Marketing Classification Trees

Credit Card Marketing Classification Trees Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.

More information

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge MODELING THE EXPERT An Introduction to Logistic Regression 15.071 The Analytics Edge Ask the Experts! Critical decisions are often made by people with expert knowledge Healthcare Quality Assessment Good

More information

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for human health in the past two centuries. Adding chlorine

More information

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data

Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Paper 942-2017 Predictive Accuracy: A Misleading Performance Measure for Highly Imbalanced Data Josephine S Akosa, Oklahoma State University ABSTRACT The most commonly reported model evaluation metric

More information

Data Mining in CRM THE CRM STRATEGY

Data Mining in CRM THE CRM STRATEGY CHAPTER ONE Data Mining in CRM THE CRM STRATEGY Customers are the most important asset of an organization. There cannot be any business prospects without satisfied customers who remain loyal and develop

More information

Churn Reduction in the Wireless Industry

Churn Reduction in the Wireless Industry Mozer, M. C., Wolniewicz, R., Grimes, D. B., Johnson, E., & Kaushansky, H. (). Churn reduction in the wireless industry. In S. A. Solla, T. K. Leen, & K.-R. Mueller (Eds.), Advances in Neural Information

More information

ECONOMIC MACHINE LEARNING FOR FRAUD DETECTION

ECONOMIC MACHINE LEARNING FOR FRAUD DETECTION ECONOMIC MACHINE LEARNING FOR FRAUD DETECTION Maytal Saar-Tsechansky 2015 UT CID Report #1511 This UT CID research was supported in part by the following organizations: identity.utexas.edu ECONOMIC MACHINE

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

Advances in Machine Learning for Credit Card Fraud Detection

Advances in Machine Learning for Credit Card Fraud Detection Advances in Machine Learning for Credit Card Fraud Detection May 14, 2014 Alejandro Correa Bahnsen Introduction Europe fraud evolution Internet transactions (millions of euros) 800 700 600 500 2007 2008

More information

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set

More information

Competition and Fraud in Online Advertising Markets

Competition and Fraud in Online Advertising Markets Competition and Fraud in Online Advertising Markets Bob Mungamuru 1 and Stephen Weis 2 1 Stanford University, Stanford, CA, USA 94305 2 Google Inc., Mountain View, CA, USA 94043 Abstract. An economic model

More information

Report for PAKDD 2007 Data Mining Competition

Report for PAKDD 2007 Data Mining Competition Report for PAKDD 2007 Data Mining Competition Li Guoliang School of Computing, National University of Singapore April, 2007 Abstract The task in PAKDD 2007 data mining competition is a cross-selling business

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Generative Models for Networks and Applications to E-Commerce

Generative Models for Networks and Applications to E-Commerce Generative Models for Networks and Applications to E-Commerce Patrick J. Wolfe (with David C. Parkes and R. Kang-Xing Jin) Division of Engineering and Applied Sciences Department of Statistics Harvard

More information

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara Customer Relationship Management in marketing programs: A machine learning approach for decision Fernanda Alcantara F.Alcantara@cs.ucl.ac.uk CRM Goal Support the decision taking Personalize the best individual

More information

1,3. not. mailed. mailed 1. mailed. not mailed. not mailed. mailed X 2 M X. p(s=subscribed) = 0.6 p(s=not subscribed) = 0.4

1,3. not. mailed. mailed 1. mailed. not mailed. not mailed. mailed X 2 M X. p(s=subscribed) = 0.6 p(s=not subscribed) = 0.4 A Decision Theoretic Approach to Targeted Advertising David axwell Chickering and David Heckerman icrosoft Research Redmond WA, 98052-6399 dmax@microsoft.com heckerma@microsoft.com Abstract A simple advertising

More information

WE consider the general ranking problem, where a computer

WE consider the general ranking problem, where a computer 5140 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock and Tong Zhang Abstract The ranking problem has become increasingly

More information

Predicting Yelp Ratings From Business and User Characteristics

Predicting Yelp Ratings From Business and User Characteristics Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online

More information

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005 Biological question Experimental design Microarray experiment Failed Lecture : Discrimination (cont) Quality Measurement Image analysis Preprocessing Jane Fridlyand Pass Normalization Sample/Condition

More information

Domain Driven Data Mining for Unavailability Estimation of Electrical Power Grids

Domain Driven Data Mining for Unavailability Estimation of Electrical Power Grids Domain Driven Data Mining for Unavailability Estimation of Electrical Power Grids Paulo J.L. Adeodato 1,2, Petrônio L. Braga 2, Adrian L. Arnaud 1, Germano C. Vasconcelos 1,2, Frederico Guedes 3, Hélio

More information

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING Abbas Heiat, College of Business, Montana State University, Billings, MT 59102, aheiat@msubillings.edu ABSTRACT The purpose of this study is to investigate

More information

E-Commerce Sales Prediction Using Listing Keywords

E-Commerce Sales Prediction Using Listing Keywords E-Commerce Sales Prediction Using Listing Keywords Stephanie Chen (asksteph@stanford.edu) 1 Introduction Small online retailers usually set themselves apart from brick and mortar stores, traditional brand

More information

Active Chemical Sensing with Partially Observable Markov Decision Processes

Active Chemical Sensing with Partially Observable Markov Decision Processes Active Chemical Sensing with Partially Observable Markov Decision Processes Rakesh Gosangi and Ricardo Gutierrez-Osuna* Department of Computer Science, Texas A & M University {rakesh, rgutier}@cs.tamu.edu

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery Data Mining and Knowledge Discovery Petra Kralj Novak Petra.Kralj.Novak@ijs.si Practice, 2010/12/2 1 Practice plan 2010/11/25: Predictive data mining Decision trees Naïve Bayes classifier Evaluating classifiers

More information

Logistic Regression for Early Warning of Economic Failure of Construction Equipment

Logistic Regression for Early Warning of Economic Failure of Construction Equipment Logistic Regression for Early Warning of Economic Failure of Construction Equipment John Hildreth, PhD and Savannah Dewitt University of North Carolina at Charlotte Charlotte, North Carolina Equipment

More information

1.0 Chapter Introduction

1.0 Chapter Introduction 1.0 Chapter Introduction In this chapter, you will learn to use price index numbers to make the price adjustments necessary to analyze price and cost information collected over time. Price Index Numbers.

More information

Metamodelling and optimization of copper flash smelting process

Metamodelling and optimization of copper flash smelting process Metamodelling and optimization of copper flash smelting process Marcin Gulik mgulik21@gmail.com Jan Kusiak kusiak@agh.edu.pl Paweł Morkisz morkiszp@agh.edu.pl Wojciech Pietrucha wojtekpietrucha@gmail.com

More information

Ranking Potential Customers based on GroupEnsemble method

Ranking Potential Customers based on GroupEnsemble method Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,

More information

Churn Prediction for Game Industry Based on Cohort Classification Ensemble

Churn Prediction for Game Industry Based on Cohort Classification Ensemble Churn Prediction for Game Industry Based on Cohort Classification Ensemble Evgenii Tsymbalov 1,2 1 National Research University Higher School of Economics, Moscow, Russia 2 Webgames, Moscow, Russia etsymbalov@gmail.com

More information

Session 15 Business Intelligence: Data Mining and Data Warehousing

Session 15 Business Intelligence: Data Mining and Data Warehousing 15.561 Information Technology Essentials Session 15 Business Intelligence: Data Mining and Data Warehousing Copyright 2005 Chris Dellarocas and Thomas Malone Adapted from Chris Dellarocas, U. Md. Outline

More information

In silico prediction of novel therapeutic targets using gene disease association data

In silico prediction of novel therapeutic targets using gene disease association data In silico prediction of novel therapeutic targets using gene disease association data, PhD, Associate GSK Fellow Scientific Leader, Computational Biology and Stats, Target Sciences GSK Big Data in Medicine

More information

Customer Targeting Models Using Actively-Selected Web Content

Customer Targeting Models Using Actively-Selected Web Content Customer Targeting Models Using Actively-Selected Web Content Prem Melville IBM T.J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598 pmelvil@us.ibm.com Saharon Rosset IBM T.J. Watson Research

More information

DATA MINING: A BRIEF INTRODUCTION

DATA MINING: A BRIEF INTRODUCTION DATA MINING: A BRIEF INTRODUCTION Matthew N. O. Sadiku, Adebowale E. Shadare Sarhan M. Musa Roy G. Perry College of Engineering, Prairie View A&M University Prairie View, USA Abstract Data mining may be

More information

Putting Big Data & Analytics to Work!

Putting Big Data & Analytics to Work! Putting Big Data & Analytics to Work! Prof. dr. Bart Baesens Department of Decision Sciences and Information Management, KU Leuven (Belgium) School of Management, University of Southampton (United Kingdom)

More information

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES Sample Size Issues for Conjoint Analysis Studies Bryan Orme, Sawtooth Software, Inc. 1998 Copyright 1998-2001, Sawtooth Software, Inc. 530 W. Fir St. Sequim, WA

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

A monopoly market structure is one characterized by a single seller of a unique product with no close substitutes.

A monopoly market structure is one characterized by a single seller of a unique product with no close substitutes. These notes provided by Laura Lamb are intended to complement class lectures. The notes are based on chapter 12 of Microeconomics and Behaviour 2 nd Canadian Edition by Frank and Parker (2004). Chapter

More information

An Empirical Analysis of the Modeling Benefits of Complete Information for ecrm Problems

An Empirical Analysis of the Modeling Benefits of Complete Information for ecrm Problems 1 1 1 1 1 1 1 1 0 An Empirical Analysis of the Modeling Benefits of Complete Information for ecrm Problems Abstract Due to the vast amount of user data tracked online, the use of data-based analytical

More information

Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer

Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer Abstract More often than not sales forecasting in modern companies is poorly implemented despite the wealth of data that is readily available

More information

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam

ECLT 5810 E-Commerce Data Mining Techniques - Introduction. Prof. Wai Lam ECLT 5810 E-Commerce Data Mining Techniques - Introduction Prof. Wai Lam Data Opportunities Business infrastructure have improved the ability to collect data Virtually every aspect of business is now open

More information

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG

Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG Using Predictive Analytics to Detect Contract Fraud, Waste, and Abuse Case Study from U.S. Postal Service OIG MACPA Government & Non Profit Conference April 26, 2013 Isaiah Goodall, Director of Business

More information

arxiv: v1 [math.oc] 27 Nov 2012

arxiv: v1 [math.oc] 27 Nov 2012 Simulation model for assembly lines with heterogeneous workers Pedro B. Castellucci and Alysson M. Costa arxiv:1211.6406v1 [math.oc] 27 Nov 2012 Instituto de Ciências Matemáticas e de Computação, Universidade

More information

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner SAS Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Melodie Rush Principal

More information

Stock data models analysis based on window mechanism

Stock data models analysis based on window mechanism Journal of Mathematics and Informatics Vol. 1, 2013-14, 60-65 ISSN: 2349-0632 (P), 2349-0640 (online) Published on 20 May 2014 www.researchmathsci.org Journal of Stock data models analysis based on window

More information

Bandwagon and Underdog Effects and the Possibility of Election Predictions

Bandwagon and Underdog Effects and the Possibility of Election Predictions Reprinted from Public Opinion Quarterly, Vol. 18, No. 3 Bandwagon and Underdog Effects and the Possibility of Election Predictions By HERBERT A. SIMON Social research has often been attacked on the grounds

More information

Using Decision Tree to predict repeat customers

Using Decision Tree to predict repeat customers Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the

More information

Application of the method of data reconciliation for minimizing uncertainty of the weight function in the multicriteria optimization model

Application of the method of data reconciliation for minimizing uncertainty of the weight function in the multicriteria optimization model archives of thermodynamics Vol. 36(2015), No. 1, 83 92 DOI: 10.1515/aoter-2015-0006 Application of the method of data reconciliation for minimizing uncertainty of the weight function in the multicriteria

More information

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16

International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 A REVIEW OF DATA MINING TECHNIQUES FOR AGRICULTURAL CROP YIELD PREDICTION Akshitha K 1, Dr. Rajashree Shettar 2 1 M.Tech Student, Dept of CSE, RV College of Engineering,, Bengaluru, India 2 Prof: Dept.

More information

Predictive Modelling for Customer Targeting A Banking Example

Predictive Modelling for Customer Targeting A Banking Example Predictive Modelling for Customer Targeting A Banking Example Pedro Ecija Serrano 11 September 2017 Customer Targeting What is it? Why should I care? How do I do it? 11 September 2017 2 What Is Customer

More information

Validation methodologies for default risk models

Validation methodologies for default risk models p51p56.qxd 15/05/00 12:22 Page 51 Validation methodologies for models The asle Committee has identified credit model validation as one of the most challenging issues in quantitative credit model development.

More information

A Fuzzy Optimization Model for Single-Period Inventory Problem

A Fuzzy Optimization Model for Single-Period Inventory Problem , July 6-8, 2011, London, U.K. A Fuzzy Optimization Model for Single-Period Inventory Problem H. Behret and C. Kahraman Abstract In this paper, the optimization of single-period inventory problem under

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

WHITE PAPER. Building Credit Scorecards Using Credit Scoring for SAS. Enterprise Miner. A SAS Best Practices Paper

WHITE PAPER. Building Credit Scorecards Using Credit Scoring for SAS. Enterprise Miner. A SAS Best Practices Paper WHITE PAPER Building Credit Scorecards Using Credit Scoring for SAS Enterprise Miner A SAS Best Practices Paper Table of Contents Introduction...1 Building credit models in-house...2 Building credit models

More information

A logistic regression model for Semantic Web service matchmaking

A logistic regression model for Semantic Web service matchmaking . BRIEF REPORT. SCIENCE CHINA Information Sciences July 2012 Vol. 55 No. 7: 1715 1720 doi: 10.1007/s11432-012-4591-x A logistic regression model for Semantic Web service matchmaking WEI DengPing 1*, WANG

More information

American Association for Public Opinion Research

American Association for Public Opinion Research American Association for Public Opinion Research Bandwagon and Underdog Effects and the Possibility of Election Predictions Author(s): Herbert A. Simon Source: The Public Opinion Quarterly, Vol. 18, No.

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 12.0 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

Software for Typing MaxDiff Respondents Copyright Sawtooth Software, 2009 (3/16/09)

Software for Typing MaxDiff Respondents Copyright Sawtooth Software, 2009 (3/16/09) Software for Typing MaxDiff Respondents Copyright Sawtooth Software, 2009 (3/16/09) Background: Market Segmentation is a pervasive concept within market research, which involves partitioning customers

More information

A MACHINE-LEARNING APPROACH TO OPTIMAL BID PRICING

A MACHINE-LEARNING APPROACH TO OPTIMAL BID PRICING A MACHINE-LEARNING APPROACH TO OPTIMAL BID PRICING Richard D. Lawrence IBM T.J. Watson Research Center P.O. Box 218 Yorktown Heights, New York 10598 ricklawr@us.ibm.com Abstract We consider the problem

More information

EnterpriseOne JDE5 Forecasting PeopleBook

EnterpriseOne JDE5 Forecasting PeopleBook EnterpriseOne JDE5 Forecasting PeopleBook May 2002 EnterpriseOne JDE5 Forecasting PeopleBook SKU JDE5EFC0502 Copyright 2003 PeopleSoft, Inc. All rights reserved. All material contained in this documentation

More information

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat Startup Machine Learning: Bootstrapping a fraud detection system Michael Manapat Stripe @mlmanapat About me: Engineering Manager of the Machine Learning Products Team at Stripe About Stripe: Payments infrastructure

More information

Forest Stewardship Council

Forest Stewardship Council Introduction Frequently Asked Questions (FAQs) FSC-STD-40-004 V3-0 and FSC-STD-20-011 V4-0 27 March 2017 On 1 January 2017, the FSC Board of Directors approved the revised FSC chain-ofcustody standards

More information

Preference Elicitation for Group Decisions

Preference Elicitation for Group Decisions Preference Elicitation for Group Decisions Lihi Naamani-Dery 1, Inon Golan 2, Meir Kalech 2, and Lior Rokach 1 1 Telekom Innovation Laboratories at Ben-Gurion University, Israel 2 Ben Gurion University,

More information

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software

Integrating Market and Credit Risk Measures using SAS Risk Dimensions software Integrating Market and Credit Risk Measures using SAS Risk Dimensions software Sam Harris, SAS Institute Inc., Cary, NC Abstract Measures of market risk project the possible loss in value of a portfolio

More information

Modeling of competition in revenue management Petr Fiala 1

Modeling of competition in revenue management Petr Fiala 1 Modeling of competition in revenue management Petr Fiala 1 Abstract. Revenue management (RM) is the art and science of predicting consumer behavior and optimizing price and product availability to maximize

More information

Database Searching and BLAST Dannie Durand

Database Searching and BLAST Dannie Durand Computational Genomics and Molecular Biology, Fall 2013 1 Database Searching and BLAST Dannie Durand Tuesday, October 8th Review: Karlin-Altschul Statistics Recall that a Maximal Segment Pair (MSP) is

More information

Chapter 8 Analytical Procedures

Chapter 8 Analytical Procedures Slide 8.1 Principles of Auditing: An Introduction to International Standards on Auditing Chapter 8 Analytical Procedures Rick Hayes, Hans Gortemaker and Philip Wallage Slide 8.2 Analytical procedures Analytical

More information

Predictive Planning for Supply Chain Management

Predictive Planning for Supply Chain Management Predictive Planning for Supply Chain Management David Pardoe and Peter Stone Department of Computer Sciences The University of Texas at Austin {dpardoe, pstone}@cs.utexas.edu Abstract Supply chains are

More information

Incentives in Crowdsourcing: A Game-theoretic Approach

Incentives in Crowdsourcing: A Game-theoretic Approach Incentives in Crowdsourcing: A Game-theoretic Approach ARPITA GHOSH Cornell University NIPS 2013 Workshop on Crowdsourcing: Theory, Algorithms, and Applications Incentives in Crowdsourcing: A Game-theoretic

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

Context-Aware Movie Recommendations: An Empirical Comparison of Pre-Filtering, Post-Filtering and Contextual Modeling Approaches

Context-Aware Movie Recommendations: An Empirical Comparison of Pre-Filtering, Post-Filtering and Contextual Modeling Approaches Context-Aware Movie Recommendations: An Empirical Comparison of Pre-Filtering, Post-Filtering and Contextual Modeling Approaches Pedro G. Campos 1,2, Ignacio Fernández-Tobías 2, Iván Cantador 2, Fernando

More information

1.0 Introduction 1.1 Identifying Situations For Use 1.2 Constructing Price Index Numbers

1.0 Introduction 1.1 Identifying Situations For Use 1.2 Constructing Price Index Numbers 1.0 Introduction In this chapter, you will learn to use price index numbers to make the price adjustments necessary to analyze price and cost information collected over time. Price Index Numbers. Price

More information

Energy-Aware Active Chemical Sensing

Energy-Aware Active Chemical Sensing Energy-Aware Active Chemical Sensing Rakesh Gosangi and Ricardo Gutierrez-Osuna Department of Computer Science and Engineering Texas A&M University, College Station, Texas 77840 {rakesh,rgutier}@cse.tamu.edu

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

THE ECONOMICS OF THE ENVIRONMENT Microeconomics in Context (Goodwin, et al.), 3 rd Edition

THE ECONOMICS OF THE ENVIRONMENT Microeconomics in Context (Goodwin, et al.), 3 rd Edition Chapter 12 THE ECONOMICS OF THE ENVIRONMENT Microeconomics in Context (Goodwin, et al.), 3 rd Edition Chapter Summary This chapter has three sections. The first section presents the standard economic theory

More information

Control Charts for Customer Satisfaction Surveys

Control Charts for Customer Satisfaction Surveys Control Charts for Customer Satisfaction Surveys Robert Kushler Department of Mathematics and Statistics, Oakland University Gary Radka RDA Group ABSTRACT Periodic customer satisfaction surveys are used

More information

MANAGERIAL ECONOMICS WILEY A JOHN WILEY & SONS, INC., PUBLICATION. A Mathematical Approach

MANAGERIAL ECONOMICS WILEY A JOHN WILEY & SONS, INC., PUBLICATION. A Mathematical Approach MANAGERIAL ECONOMICS A Mathematical Approach M. J. ALHABEEB L. JOE MOFFITT Isenberg School of Management University of Massachusetts Amherst, MA, USA WILEY A JOHN WILEY & SONS, INC., PUBLICATION PREFACE

More information

Applying an Interactive Machine Learning Approach to Statutory Analysis

Applying an Interactive Machine Learning Approach to Statutory Analysis Applying an Interactive Machine Learning Approach to Statutory Analysis Jaromir Savelka a,b, Gaurav Trivedi a, Kevin D. Ashley a,b,c a Intelligent Systems Program, University of Pittsburgh, USA b Learning

More information

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur... What is Evolutionary Computation? Genetic Algorithms Russell & Norvig, Cha. 4.3 An abstraction from the theory of biological evolution that is used to create optimization procedures or methodologies, usually

More information

Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry

Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry Predicting Subscriber Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry Michael C. Mozer + * Richard Wolniewicz* Robert Dodier* Lian Yan* David B. Grimes + * Eric Johnson*

More information

Resource Decisions in Software Development Using Risk Assessment Model

Resource Decisions in Software Development Using Risk Assessment Model Proceedings of the 39th Hawaii International Conference on System Sciences - 6 Resource Decisions in Software Development Using Risk Assessment Model Wiboon Jiamthubthugsin Department of Computer Engineering

More information

Week 1 Unit 4: Defining Project Success Criteria

Week 1 Unit 4: Defining Project Success Criteria Week 1 Unit 4: Defining Project Success Criteria Business and data science project success criteria: reminder Business success criteria Describe the criteria for a successful or useful outcome to the project

More information

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University Neural Networks and Applications in Bioinformatics Yuzhen Ye School of Informatics and Computing, Indiana University Contents Biological problem: promoter modeling Basics of neural networks Perceptrons

More information

AN ADAPTIVE PERSONNEL SELECTION MODEL FOR RECRUITMENT USING DOMAIN-DRIVEN DATA MINING

AN ADAPTIVE PERSONNEL SELECTION MODEL FOR RECRUITMENT USING DOMAIN-DRIVEN DATA MINING AN ADAPTIVE PERSONNEL SELECTION MODEL FOR RECRUITMENT USING DOMAIN-DRIVEN DATA MINING 1 MUHAMMAD AHMAD SHEHU, 2 FAISAL SAEED* 1 Faculty of Computing. Universiti Teknologi Malaysia, 81310 UTM, Johor Bahru,

More information

Application of Association Rule Mining in Supplier Selection Criteria

Application of Association Rule Mining in Supplier Selection Criteria Vol:, No:4, 008 Application of Association Rule Mining in Supplier Selection Criteria A. Haery, N. Salmasi, M. Modarres Yazdi, and H. Iranmanesh International Science Index, Industrial and Manufacturing

More information

STATISTICAL TECHNIQUES. Data Analysis and Modelling

STATISTICAL TECHNIQUES. Data Analysis and Modelling STATISTICAL TECHNIQUES Data Analysis and Modelling DATA ANALYSIS & MODELLING Data collection and presentation Many of us probably some of the methods involved in collecting raw data. Once the data has

More information

The Combined Model of Gray Theory and Neural Network which is based Matlab Software for Forecasting of Oil Product Demand

The Combined Model of Gray Theory and Neural Network which is based Matlab Software for Forecasting of Oil Product Demand The Combined Model of Gray Theory and Neural Network which is based Matlab Software for Forecasting of Oil Product Demand Song Zhaozheng 1,Jiang Yanjun 2, Jiang Qingzhe 1 1State Key Laboratory of Heavy

More information

Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution

Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution 1290 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 5, OCTOBER 2011 Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution Christina Aperjis, Ramesh Johari, Member, IEEE, and Michael

More information

Risk Analysis Overview

Risk Analysis Overview Risk Analysis Overview What Is Risk? Uncertainty about a situation can often indicate risk, which is the possibility of loss, damage, or any other undesirable event. Most people desire low risk, which

More information

On Optimal Tiered Structures for Network Service Bundles

On Optimal Tiered Structures for Network Service Bundles On Tiered Structures for Network Service Bundles Qian Lv, George N. Rouskas Department of Computer Science, North Carolina State University, Raleigh, NC 7695-86, USA Abstract Network operators offer a

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Available online at  ScienceDirect. Procedia Computer Science 54 (2015 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 54 (2015 ) 396 404 Eleventh International Multi-Conference on Information Processing-2015 (IMCIP-2015) Predicting Financial

More information

The Efficient Allocation of Individuals to Positions

The Efficient Allocation of Individuals to Positions The Efficient Allocation of Individuals to Positions by Aanund Hylland and Richard Zeckhauser Presented by Debreu Team: Justina Adamanti, Liz Malm, Yuqing Hu, Krish Ray Hylland and Zeckhauser consider

More information

HTS Report. d2-r. Test of Attention Revised. Technical Report. Another Sample ID Date 14/04/2016. Hogrefe Verlag, Göttingen

HTS Report. d2-r. Test of Attention Revised. Technical Report. Another Sample ID Date 14/04/2016. Hogrefe Verlag, Göttingen d2-r Test of Attention Revised Technical Report HTS Report ID 467-500 Date 14/04/2016 d2-r Overview 2 / 16 OVERVIEW Structure of this report Narrative Introduction Verbal interpretation of standardised

More information

Cost-Time Sensitive Decision Tree with Missing Values 1

Cost-Time Sensitive Decision Tree with Missing Values 1 Cost-Time Sensitive Decision Tree with Missing Values 1 Shichao Zhang 1, 2, Xiaofeng Zhu 1, Jilian Zhang 3, Chengqi Zhang 2 1 Department of Computer Science, Guangxi Normal University, Guilin, China 2

More information

Prediction of Employee Turnover in Organizations using Machine Learning Algorithms A case for Extreme Gradient Boosting

Prediction of Employee Turnover in Organizations using Machine Learning Algorithms A case for Extreme Gradient Boosting Prediction of Employee Turnover in Organizations using Machine Learning Algorithms A case for Extreme Gradient Boosting Rohit Punnoose, PhD candidate XLRI Xavier School of Management Jamshedpur, India

More information

Risk Management User Guide

Risk Management User Guide Risk Management User Guide Version 17 December 2017 Contents About This Guide... 5 Risk Overview... 5 Creating Projects for Risk Management... 5 Project Templates Overview... 5 Add a Project Template...

More information