Lecture 6: Decision Tree, Random Forest, and Boosting

Size: px
Start display at page:

Download "Lecture 6: Decision Tree, Random Forest, and Boosting"

Transcription

1 Lecture 6: Decision Tree, Random Forest, and Boosting Tuo Zhao Schools of ISyE and CSE, Georgia Tech

2 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Tree? Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 2/42

3 Decision Trees

4 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Decision Trees Decision trees have a long history in machine learning The first popular algorithm dates back to 979 Very popular in many real world problems Intuitive to understand Easy to build Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 4/42

5 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis History EPAM- Elementary Perceiver and Memorizer 96: Cognitive simulation model of human concept learning 966: CLS-Early algorithm for decision tree construction 979: ID3 based on information theory 993: C4.5 improved over ID3 Also has history in statistics as CART (Classification and regression tree) Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 5/42

6 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Motivation How do people make decisions? Consider a variety of factors Follow a logical path of checks Should I eat at this restaurant? If there is no wait Yes If there is short wait and I am hungry Yes Else No Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 6/42

7 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Decision Flowchart Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 7/42

8 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Example: Should We Play Tennis? If temperature is not hot Play If outlook is overcast Play tennis Otherwise Don t play tennis Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 8/42

9 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Decision Tree Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 9/42

10 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Structure A decision tree consists of Nodes Branches Tests for variables Results of tests Leaves Classification Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting /42

11 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Function Class What functions can decision trees model? Non-linear: very powerful function class A decision tree can encode any Boolean function Proof Given a truth table for a function Construct a path in the tree for each row of the table Given a row as input, follow that path to the desired leaf (output) Problem: exponentially large trees! Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting /42

12 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Smaller Trees Can we produce smaller decision trees for functions? Yes (Possible) Key decision factors Counter examples Parity function: Return on even inputs, on odd inputs Majority function: Return if more than half of inputs are Bias-Variance Tradeoff Bias: Representation Power of Decision Trees Variance: require a sample size exponential in depth Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 2/42

13 Fitting Decision Trees

14 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis What Makes a Good Tree? Small tree: Occam s razor: Simpler is better Avoids over-fitting A decision tree may be human readable, but not use human logic! How do we build small trees that accurately capture data? Learning an optimal decision tree is computationally intractable Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 4/42

15 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Greedy Algorithm We can get good trees by simple greedy algorithms Adjustments are usually to fix greedy selection problems Recursive: Select the best variable, and generate child nodes: One for each possible value; Partition samples using the possible values, and assign these subsets of samples to the child nodes; Repeat for each child node until all samples associated with a node that are either all positive or all negative. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 5/42

16 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Variable Selection The best variable for partition The most informative variable Select the variable that is most informative about the labels Classification error? Example: P(Y = X = ) =.75 v.s. P(Y = X 2 = ) =.55 Which to choose? Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 6/42

17 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Information Theory The quantification of information Founded by Claude Shannon Simple concepts: Entropy: H(X) = x P(X = x) log P(X = x) Conditional Entropy: H(Y X) = x P(X = x)h(y X = x) Information Gain: IG(Y X) = H(Y) H(Y X) Select the variable with the highest information gain Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 7/42

18 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Example: Should We Play Tennis? H(Tennis) = 3/5 log 2 3/5 2/5 log 2 2/5 =.97 H(Tennis Out. = Sunny) = 2/2 log 2 2/2 /2 log 2 /2 = H(Tennis Out. = Overcast) = / log 2 / / log 2 / = H(Tennis Out. = Rainy) = /2 log 2 /2 2/2 log 2 2/2 = H(Tennis Out.) = 2/5 + /5 + 2/5 = Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 8/42

19 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Example: Should We Play Tennis? IG(Tennis Out.) =.97 =.97 If we knew the Outlook we d be able to predict Tennis! Outlook is the variable to pick for our decision tree Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 9/42

20 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis When to Stop Training? All data have same label Return that label No examples Return majority label of all data No further splits possible Return majority label of passed data If max IG =? Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 2/42

21 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Example: No Information Gain? Y X X 2 Both features give IG Once we divide the data, perfect classification! We need a little exploration sometimes (Unstable Equilibrium) Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 2/42

22 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Decision Tree with Continuous Features Decision Stump Threshold for binary classification: More than two classes: Threshold for three classes: Check X j δ j or X j < δ j. Check X j δ j, γ j X j < δ j or γ j < X j. Decompose one node to two modes: First node: Check X j δ j or X j < δ j, Second node: If X j < δ j, check γ j X j or γ j < X j. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 22/42

23 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University Decision Tree for Spam Classification 6/536 ch$<.555 ch$> /77 remove<.6 remove>.6 spam 48/359 hp<.45 hp>.45 spam spam 8/65 9/2 26/337 /22 ch!<.9 george<.5capave<2.97 ch!>.9 george>.5capave> spam spam spam 8/86 /24 6/9 /3 9/ 7/227 george<.5 CAPAVE< <.58 george>.5 CAPAVE> > spam 8/652 /29 36/23 6/8 hp<.3 free<.65 hp>.3 free>.65 spam 8/9 / spam 77/423 3/229 6/94 9/29 CAPMAX<.5 business<.45 CAPMAX>.5 business> spam 2/238 57/85 4/89 3/5 receive<.25edu<.45 receive>.25edu>.45 spam 9/236 /2 48/3 9/72 our<.2 our>.2 spam 37/ /2 Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 23/42

24 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University Decision Tree for Spam Classification Sensitivity ROC curve for pruned tree on SPAM data o TREE Error: 8.7% Specificity SPAM Data Overall error rate on test data 8.7%. ROC curve obtained by vary ing the threshold c of the clas sifier: C(X) =+if ˆP (+ X) >c. Sensitivity: proportion of tru spam identified Specificity: proportion of tru identified. Sensitivity: We may proportion want specificity of true to spam be high, identified and suffer some spam: Specificity: Specificity proportion : 95% of = true Sensitivity identified. : 79% Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 24/42

25 CS764/ISYE/CSE 674: Boosting Machine Learning/Computational Trevor Data Analysis Hastie, Stanford University 2 Decision Tree v.s. SVM ROC curve for TREE vs SVM on SPAM data Sensitivity o o SVM Error: 6.7% TREE Error: 8.7% TREE vs SVM Comparing ROC curves on the test data is a good way to compare classifiers. SVM dominates TREE here Specificity SVM outperforms Decision Tree. AUC: Area Under Curve. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 25/42

26 Bagging and Random Forest

27 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Toy Example Toy Example - No Noise Bayes Error Rate: X2 X Deterministic proble noise comes from sa pling distribution of Use a training sam of size 2. Here Bayes Error %. Nonlinear separable data. Optimal decision boundary: X 2 + X2 2 =. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 27/42

28 x.<-.783 x.>-.783 CS764/ISYE/CSE 674: Machine Learning/ComputationalBoosting Data Analysis Trevor Hastie, Stanford University 5 Toy Example: Decision Tree Classification Tree Trevor Hastie, Stanford University Decision Boundary: Tree 94/2 x.2<-.67 x.2>-.67 /34 72/66 x.2<.4988 x.2> /34 /32 x.<.3632 x.> /7 /7 x.< x.> /26 2/9 x.<-.668 x.2< x.>-.668 x.2> /2 5/4 2/8 /83 X Error Rate:.73 When th are in sification rather no rule Ĉ(X around 3 /5 4/ X Sample size: 2 7 branching nodes; 6 layers. Classification error: 7.3% when d = 2; > 3% when d =. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 28/42

29 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Model Averaging Decision trees can be simple, but often produce noisy (bushy) or weak (stunted) classifiers. Bagging (Breiman, 996): Fit many large trees to bootstrap-resampled versions of the training data, and classify by majority vote. Boosting (Freund & Shapire, 996): Fit many large or small trees to reweighted versions of the training data. Classify by weighted majority vote. Random Forests (Breiman 999): Fancier version of bagging. In general Boosting>Random Forests>Bagging>Single Tree. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 29/42

30 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Bagging/Bootstrap Aggregation Motivation: Average a given procedure over many samples to reduce the variance. Given a classifier C(S, x), based on our training data S, producing a predicted class label at input point x. To bag C, we draw bootstrap samples S,..., S B each of size N with replacement from the training data. Then Ĉ Bag = Mojarity Vote{C(S b, x)} B b=. Bagging can dramatically reduce the variance of unstable procedures (like trees), leading to improved prediction. All simple structures in a tree are lost. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 3/42

31 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Toy Example: Bagging Trees Boosting Trevor Hastie, Stanford University 9 Original Tree Bootstrap Tree /3 7/3 x.2<.39 x.2>.39 x.2<.36 x.2>.36 3/2 2/9 /23 /7 x.3<-.575 x.3>-.575 x.<-.965 x.> /5 /6 /5 /8 Bootstrap Tree 2 Bootstrap Tree 3 /3 4/3 x.2<.39 x.2>.39 x.4<.395 x.4>.395 3/22 /8 2/25 2/5 x.3<-.575 x.3>-.575 x.3<-.575 x.3> /5 /7 2/5 /2 Bootstrap Tree 4 Bootstrap Tree 5 3/3 2/3 x.2<.255 x.2>.255 x.2<.38 x.2>.38 2/6 3/4 4/2 2/ x.3<-.385 x.3>-.385 x.3<-.6 x.3>-.6 2/5 / 2/6 /4 2 branching nodes; 2 layers. 5 dependent trees. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 3/42

32 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Toy Example: Bagging Trees Decision Boundary: Bagging X X Error Rate:.32 Bagging averages m trees, and produ smoother decision bou aries. A smoother decision boundary. Classification error: 3.2% (Single deeper tree 7.3%). Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 32/42

33 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Random Forest Bagging features and samples simultaneously: At each tree split, a random sample of m features is drawn, and only those m features are considered for splitting. Typically m = d or log 2 d, where d is the number of features For each tree grown on a bootstrap sample, the error rate for observations left out of the bootstrap sample is monitored. This is called the out-of-bag error rate. random forests tries to improve on bagging by de-correlating the trees. Each tree has the same expectation. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 33/42

34 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University 22 Random Forest for Spam Classification ROC curve for TREE, SVM and Random Forest on SPAM data Sensitivity o o o Random Forest Error: 5.% SVM Error: 6.7% TREE Error: 8.7% TREE, SVM and RF Random Forest dominates both other methods on the SPAM data 5.% error. Used 5 trees with default settings for random Forest package in R Specificity RF outperforms SVM. 5 Trees. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 34/42

35 oosting CS764/ISYE/CSE 674: Machine Trevor Learning/Computational Hastie, Stanford University Data Analysis 4 RandomSpam: Forest: Variable Variable Importance Importance Scores 3d addresses labs telnet direct table cs 85 parts # credit lab [ conference report original data project font make address order all hpl technology people pm mail over 65 meeting ; internet receive business re( 999 will money our you edu CAPTOT george CAPMAX your CAPAVE free remove hp $! Relative importance The bootstrap roughly covers /3 samples for each time. Do permutations over variables to check how important they are. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 35/42

36 Boosting

37 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University 23 Boosting Weighted Sample C M (x) Boosting Weighted Sample Weighted Sample C 3(x) C 2(x) Average many trees, each grown to re-weighted versions of the training data. Final Classifier is weighted average of classifiers: [ M ] C(x) =sign m= α mc m (x) Training Sample C (x) Average many trees, each grown to re-weighted versions of the training data (Iterative). Final Classifier is weighted average of classifiers: Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 37/42

38 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University 24 Adaboost v.s. Bagging Node Trees Boosting vs Bagging Test Error Bagging AdaBoost 2 points from Nested Spheres in R Bayes error rate is %. Trees are grown best first without pruning. Leftmost term is a single tree Number of Terms Each classifier for bagging is a tree with a single node (decision stump). 2 samples from Spheres in R. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 38/42

39 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University 25 Adaboost AdaBoost (Freund & Schapire, 996). Initialize the observation weights w i =/N, i =, 2,...,N. 2. For m =tom repeat steps (a) (d): (a) Fit a classifier C m (x) to the training data using weights w i. (b) Compute weighted error of newest tree N i= err m = w ii(y i C m (x i )) N i= w. i (c) Compute α m =log[( err m )/err m ]. (d) Update weights for i =,...,N: w i w i exp[α m I(y i C m (x i ))] and renormalize to w i to sum to. [ M ] 3. Output C(x) = sign m= α mc m (x). w i s are the weights of the samples. err m is the weighted training error. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 39/42

40 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Adaboost Boosting Trevor Hastie, Stanford University 26 Test Error Single Stump 4 Node Tree Boosting Stumps A stump is a two-node tree, after a single split. Boosting stumps works remarkably well on the nested-spheres problem Boosting Iterations The ensemble of classifiers is much more efficient than the simple combination of classifiers. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 4/42

41 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University 27 Overfitting of Adaboost Train and Test Error Training Error Nested spheres in - Dimensions. Bayes error is %. Boosting drives the training error to zero. Further iterations continue to improve test error in many examples Number of Terms More iterations continue to improve test error in many examples. The Adaboost is often observed to be robust to overfitting. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 4/42

42 CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Boosting Trevor Hastie, Stanford University 28 Overfitting of Adaboost Train and Test Error Bayes Error Noisy Problems Nested Gaussians in -Dimensions. Bayes error is 25%. Boosting with stumps Here the test error does increase, but quite slowly Number of Terms Optimal Bayes classification error: 25%. The test error does increase, but quite slowly. Tuo Zhao Lecture 6: Decision Tree, Random Forest, and Boosting 42/42

Decision Tree Learning. Richard McAllister. Outline. Overview. Tree Construction. Case Study: Determinants of House Price. February 4, / 31

Decision Tree Learning. Richard McAllister. Outline. Overview. Tree Construction. Case Study: Determinants of House Price. February 4, / 31 1 / 31 Decision Decision February 4, 2008 2 / 31 Decision 1 2 3 3 / 31 Decision Decision Widely Used Used for approximating discrete-valued functions Robust to noisy data Capable of learning disjunctive

More information

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005 Biological question Experimental design Microarray experiment Failed Lecture : Discrimination (cont) Quality Measurement Image analysis Preprocessing Jane Fridlyand Pass Normalization Sample/Condition

More information

Copyright 2013, SAS Institute Inc. All rights reserved.

Copyright 2013, SAS Institute Inc. All rights reserved. IMPROVING PREDICTION OF CYBER ATTACKS USING ENSEMBLE MODELING June 17, 2014 82 nd MORSS Alexandria, VA Tom Donnelly, PhD Systems Engineer & Co-insurrectionist JMP Federal Government Team ABSTRACT Improving

More information

CS6716 Pattern Recognition

CS6716 Pattern Recognition CS6716 Pattern Recognition Aaron Bobick School of Interactive Computing Administrivia Shray says the problem set is close to done Today chapter 15 of the Hastie book. Very few slides brought to you by

More information

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles

More information

Prediction and Interpretation for Machine Learning Regression Methods

Prediction and Interpretation for Machine Learning Regression Methods Paper 1967-2018 Prediction and Interpretation for Machine Learning Regression Methods D. Richard Cutler, Utah State University ABSTRACT The last 30 years has seen extraordinary development of new tools

More information

Supervised Learning Using Artificial Prediction Markets

Supervised Learning Using Artificial Prediction Markets Supervised Learning Using Artificial Prediction Markets Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, FSU Dept. of Scientific Computing 1 Main Contributions

More information

Using Decision Tree to predict repeat customers

Using Decision Tree to predict repeat customers Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the

More information

Decision Trees And Random Forests A Visual Introduction For Beginners

Decision Trees And Random Forests A Visual Introduction For Beginners Decision Trees And Random Forests A Visual Introduction For Beginners We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on

More information

Ranking Potential Customers based on GroupEnsemble method

Ranking Potential Customers based on GroupEnsemble method Ranking Potential Customers based on GroupEnsemble method The ExceedTech Team South China University Of Technology 1. Background understanding Both of the products have been on the market for many years,

More information

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat Startup Machine Learning: Bootstrapping a fraud detection system Michael Manapat Stripe @mlmanapat About me: Engineering Manager of the Machine Learning Products Team at Stripe About Stripe: Payments infrastructure

More information

Data Science in a pricing process

Data Science in a pricing process Data Science in a pricing process Michaël Casalinuovo Consultant, ADDACTIS Software michael.casalinuovo@addactis.com Contents Nowadays, we live in a continuously changing market environment, Pricing has

More information

Applications of Machine Learning to Predict Yelp Ratings

Applications of Machine Learning to Predict Yelp Ratings Applications of Machine Learning to Predict Yelp Ratings Kyle Carbon Aeronautics and Astronautics kcarbon@stanford.edu Kacyn Fujii Electrical Engineering khfujii@stanford.edu Prasanth Veerina Computer

More information

Airline Itinerary Choice Modeling Using Machine Learning

Airline Itinerary Choice Modeling Using Machine Learning Airline Itinerary Choice Modeling Using Machine Learning A. Lhéritier a, M. Bocamazo a, T. Delahaye a, R. Acuna-Agost a a Innovation and Research Division, Amadeus S.A.S., 485 Route du Pin Montard, 06902

More information

Introduction to Random Forests for Gene Expression Data. Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3.

Introduction to Random Forests for Gene Expression Data. Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3. Introduction to Random Forests for Gene Expression Data Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3.5 1 References Breiman, Machine Learning (2001) 45(1): 5-32. Diaz-Uriarte

More information

References. Introduction to Random Forests for Gene Expression Data. Machine Learning. Gene Profiling / Selection

References. Introduction to Random Forests for Gene Expression Data. Machine Learning. Gene Profiling / Selection References Introduction to Random Forests for Gene Expression Data Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 3.5 Breiman, Machine Learning (2001) 45(1): 5-32. Diaz-Uriarte

More information

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

Random Forests. Parametrization and Dynamic Induction

Random Forests. Parametrization and Dynamic Induction Random Forests Parametrization and Dynamic Induction Simon Bernard Document and Learning research team LITIS laboratory University of Rouen, France décembre 2014 Random Forest Classifiers Random Forests

More information

A Brief History. Bootstrapping. Bagging. Boosting (Schapire 1989) Adaboost (Schapire 1995)

A Brief History. Bootstrapping. Bagging. Boosting (Schapire 1989) Adaboost (Schapire 1995) A Brief History Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) What s So Good About Adaboost Improves classification accuracy Can be used with many different classifiers Commonly

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

Tree Depth in a Forest

Tree Depth in a Forest Tree Depth in a Forest Mark Segal Center for Bioinformatics & Molecular Biostatistics Division of Bioinformatics Department of Epidemiology and Biostatistics UCSF NUS / IMS Workshop on Classification and

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

Proposal for ISyE6416 Project

Proposal for ISyE6416 Project Profit-based classification in customer churn prediction: a case study in banking industry 1 Proposal for ISyE6416 Project Profit-based classification in customer churn prediction: a case study in banking

More information

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression

More information

Churn Prediction for Game Industry Based on Cohort Classification Ensemble

Churn Prediction for Game Industry Based on Cohort Classification Ensemble Churn Prediction for Game Industry Based on Cohort Classification Ensemble Evgenii Tsymbalov 1,2 1 National Research University Higher School of Economics, Moscow, Russia 2 Webgames, Moscow, Russia etsymbalov@gmail.com

More information

Forecasting Intermittent Demand Patterns with Time Series and Machine Learning Methodologies

Forecasting Intermittent Demand Patterns with Time Series and Machine Learning Methodologies Forecasting Intermittent Demand Patterns with Time Series and Machine Learning Methodologies Yuwen Hong, Jingda Zhou, Matthew A. Lanham Purdue University, Department of Management, 403 W. State Street,

More information

Random Forests. Parametrization, Tree Selection and Dynamic Induction

Random Forests. Parametrization, Tree Selection and Dynamic Induction Random Forests Parametrization, Tree Selection and Dynamic Induction Simon Bernard Document and Learning research team LITIS lab. University of Rouen, France décembre 2014 Random Forest Classifiers Random

More information

PREDICTION OF CONCRETE MIX COMPRESSIVE STRENGTH USING STATISTICAL LEARNING MODELS

PREDICTION OF CONCRETE MIX COMPRESSIVE STRENGTH USING STATISTICAL LEARNING MODELS Journal of Engineering Science and Technology Vol. 13, No. 7 (2018) 1916-1925 School of Engineering, Taylor s University PREDICTION OF CONCRETE MIX COMPRESSIVE STRENGTH USING STATISTICAL LEARNING MODELS

More information

Linear model to forecast sales from past data of Rossmann drug Store

Linear model to forecast sales from past data of Rossmann drug Store Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT ANALYTICAL MODEL DEVELOPMENT AGENDA Enterprise Miner: Analytical Model Development The session looks at: - Supervised and Unsupervised Modelling - Classification

More information

Logistic Regression and Decision Trees

Logistic Regression and Decision Trees Logistic Regression and Decision Trees Reminder: Regression We want to find a hypothesis that explains the behavior of a continuous y y = B0 + B1x1 + + Bpxp+ ε Source Regression for binary outcomes Regression

More information

WE consider the general ranking problem, where a computer

WE consider the general ranking problem, where a computer 5140 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock and Tong Zhang Abstract The ranking problem has become increasingly

More information

CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow

CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow haoyeec@stanford.edu ac1408@stanford.edu Problem Statement It is often said that stock prices are determined

More information

Automatic Ranking by Extended Binary Classification

Automatic Ranking by Extended Binary Classification Automatic Ranking by Extended Binary Classification Hsuan-Tien Lin Joint work with Ling Li (ALT 06, NIPS 06) Learning Systems Group, California Institute of Technology Talk at Institute of Information

More information

Comparative Study of Gaussian and Z- shaped curve membership function for Fuzzy Classification Approach

Comparative Study of Gaussian and Z- shaped curve membership function for Fuzzy Classification Approach Comparative Study of Gaussian and Z- shaped curve membership function for Fuzzy Classification Approach Pooja Agarwal 1*, Arti Arya 2, Suryaprasad J. 3 1* Dept. of CSE, PESIT Bangalore South Campus, Bangalore,

More information

Indicator Development for Forest Governance. IFRI Presentation

Indicator Development for Forest Governance. IFRI Presentation Indicator Development for Forest Governance IFRI Presentation Plan for workshop: 4 parts Creating useful and reliable indicators IFRI research and data Results of analysis Monitoring to improve interventions

More information

Decision Support Systems

Decision Support Systems Decision Support Systems 50 (2011) 602 613 Contents lists available at ScienceDirect Decision Support Systems journal homepage: www.elsevier.com/locate/dss Data mining for credit card fraud: A comparative

More information

Rank hotels on Expedia.com to maximize purchases

Rank hotels on Expedia.com to maximize purchases Rank hotels on Expedia.com to maximize purchases Nishith Khantal, Valentina Kroshilina, Deepak Maini December 14, 2013 1 Introduction For an online travel agency (OTA), matching users to hotel inventory

More information

Model Selection, Evaluation, Diagnosis

Model Selection, Evaluation, Diagnosis Model Selection, Evaluation, Diagnosis INFO-4604, Applied Machine Learning University of Colorado Boulder October 31 November 2, 2017 Prof. Michael Paul Today How do you estimate how well your classifier

More information

Sales prediction in online banking

Sales prediction in online banking Sales prediction in online banking Mathias Iden Erlend S. Stenberg Master of Science in Computer Science Submission date: June 2018 Supervisor: Jon Atle Gulla, IDI Norwegian University of Science and Technology

More information

Metaheuristics. Approximate. Metaheuristics used for. Math programming LP, IP, NLP, DP. Heuristics

Metaheuristics. Approximate. Metaheuristics used for. Math programming LP, IP, NLP, DP. Heuristics Metaheuristics Meta Greek word for upper level methods Heuristics Greek word heuriskein art of discovering new strategies to solve problems. Exact and Approximate methods Exact Math programming LP, IP,

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

ADVANCED DATA ANALYTICS

ADVANCED DATA ANALYTICS ADVANCED DATA ANALYTICS MBB essay by Marcel Suszka 17 AUGUSTUS 2018 PROJECTSONE De Corridor 12L 3621 ZB Breukelen MBB Essay Advanced Data Analytics Outline This essay is about a statistical research for

More information

Methodological challenges of Big Data for official statistics

Methodological challenges of Big Data for official statistics Methodological challenges of Big Data for official statistics Piet Daas Statistics Netherlands THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Content Big Data: properties

More information

Application research on traffic modal choice based on decision tree algorithm Zhenghong Peng 1,a, Xin Luan 2,b

Application research on traffic modal choice based on decision tree algorithm Zhenghong Peng 1,a, Xin Luan 2,b Applied Mechanics and Materials Online: 011-09-08 ISSN: 166-748, Vols. 97-98, pp 843-848 doi:10.408/www.scientific.net/amm.97-98.843 011 Trans Tech Publications, Switzerland Application research on traffic

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at Caltech CS/IST Lunch Bunch March 4, 2008 Benefited from joint work with Dr.

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at Dept. of CSIE, National Taiwan University March 21, 2008 Benefited from joint

More information

BUSINESS DATA MINING (IDS 572) Please include the names of all team-members in your write up and in the name of the file.

BUSINESS DATA MINING (IDS 572) Please include the names of all team-members in your write up and in the name of the file. BUSINESS DATA MINING (IDS 572) HOMEWORK 4 DUE DATE: TUESDAY, APRIL 10 AT 3:20 PM Please provide succinct answers to the questions below. You should submit an electronic pdf or word file in blackboard.

More information

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara Customer Relationship Management in marketing programs: A machine learning approach for decision Fernanda Alcantara F.Alcantara@cs.ucl.ac.uk CRM Goal Support the decision taking Personalize the best individual

More information

New restaurants fail at a surprisingly

New restaurants fail at a surprisingly Predicting New Restaurant Success and Rating with Yelp Aileen Wang, William Zeng, Jessica Zhang Stanford University aileen15@stanford.edu, wizeng@stanford.edu, jzhang4@stanford.edu December 16, 2016 Abstract

More information

Insights from the Wikipedia Contest

Insights from the Wikipedia Contest Insights from the Wikipedia Contest Kalpit V Desai, Roopesh Ranjan Abstract The Wikimedia Foundation has recently observed that newly joining editors on Wikipedia are increasingly failing to integrate

More information

Text Categorization. Hongning Wang

Text Categorization. Hongning Wang Text Categorization Hongning Wang CS@UVa Today s lecture Bayes decision theory Supervised text categorization General steps for text categorization Feature selection methods Evaluation metrics CS@UVa CS

More information

Text Categorization. Hongning Wang

Text Categorization. Hongning Wang Text Categorization Hongning Wang CS@UVa Today s lecture Bayes decision theory Supervised text categorization General steps for text categorization Feature selection methods Evaluation metrics CS@UVa CS

More information

Accelerating battery development via early prediction of cell lifetime

Accelerating battery development via early prediction of cell lifetime Accelerating battery development via early prediction of cell lifetime Peter Attia, Marc Deetjen, Jeremy Witmer Stanford University Abstract Early prediction of battery lifetime would aid in their development,

More information

Utilizing Optimization Techniques to Enhance Cost and Schedule Risk Analysis

Utilizing Optimization Techniques to Enhance Cost and Schedule Risk Analysis 1 Utilizing Optimization Techniques to Enhance Cost and Schedule Risk Analysis Colin Smith, Brandon Herzog SCEA 2012 2 Table of Contents Introduction to Optimization Optimization and Uncertainty Analysis

More information

BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION

BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION Ecology, 88(1), 2007, pp. 243 251 Ó 2007 by the Ecological Society of America BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION GLENN DE ATH 1 Australian Institute of Marine Science, PMB 3, Townsville

More information

Machine learning in neuroscience

Machine learning in neuroscience Machine learning in neuroscience Bojan Mihaljevic, Luis Rodriguez-Lujan Computational Intelligence Group School of Computer Science, Technical University of Madrid 2015 IEEE Iberian Student Branch Congress

More information

Predicting Restaurants Rating And Popularity Based On Yelp Dataset

Predicting Restaurants Rating And Popularity Based On Yelp Dataset CS 229 MACHINE LEARNING FINAL PROJECT 1 Predicting Restaurants Rating And Popularity Based On Yelp Dataset Yiwen Guo, ICME, Anran Lu, ICME, and Zeyu Wang, Department of Economics, Stanford University Abstract

More information

Active Learning for Logistic Regression

Active Learning for Logistic Regression Active Learning for Logistic Regression Andrew I. Schein The University of Pennsylvania Department of Computer and Information Science Philadelphia, PA 19104-6389 USA ais@cis.upenn.edu April 21, 2005 What

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

ECONOMIC MODELLING & MACHINE LEARNING

ECONOMIC MODELLING & MACHINE LEARNING ECONOMIC MODELLING & MACHINE LEARNING A PROOF OF CONCEPT NICOLAS WOLOSZKO, OECD TECHNOLOGY POLICY INSTITUTE FEB 22 2017 Economic forecasting with Adaptive Trees 1 2 Motivation Adaptive Trees 3 Proof of

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at CS Department, National Tsing-Hua University March 27, 2008 Benefited from

More information

Detecting gene gene interactions that underlie human diseases

Detecting gene gene interactions that underlie human diseases Genome-wide association studies Detecting gene gene interactions that underlie human diseases Heather J. Cordell Abstract Following the identification of several disease-associated polymorphisms by genome-wide

More information

Detecting gene gene interactions that underlie human diseases

Detecting gene gene interactions that underlie human diseases REVIEWS Genome-wide association studies Detecting gene gene interactions that underlie human diseases Heather J. Cordell Abstract Following the identification of several disease-associated polymorphisms

More information

Predicting ratings of peer-generated content with personalized metrics

Predicting ratings of peer-generated content with personalized metrics Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu

More information

In silico prediction of novel therapeutic targets using gene disease association data

In silico prediction of novel therapeutic targets using gene disease association data In silico prediction of novel therapeutic targets using gene disease association data, PhD, Associate GSK Fellow Scientific Leader, Computational Biology and Stats, Target Sciences GSK Big Data in Medicine

More information

Machine Learning Models for Sales Time Series Forecasting

Machine Learning Models for Sales Time Series Forecasting Article Machine Learning Models for Sales Time Series Forecasting Bohdan M. Pavlyshenko SoftServe, Inc., Ivan Franko National University of Lviv * Correspondence: bpavl@softserveinc.com, b.pavlyshenko@gmail.com

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at CS Department, National Chiao Tung University March 26, 2008 Benefited from

More information

Strength in numbers? Modelling the impact of businesses on each other

Strength in numbers? Modelling the impact of businesses on each other Strength in numbers? Modelling the impact of businesses on each other Amir Abbas Sadeghian amirabs@stanford.edu Hakan Inan inanh@stanford.edu Andres Nötzli noetzli@stanford.edu. INTRODUCTION In many cities,

More information

When to Book: Predicting Flight Pricing

When to Book: Predicting Flight Pricing When to Book: Predicting Flight Pricing Qiqi Ren Stanford University qiqiren@stanford.edu Abstract When is the best time to purchase a flight? Flight prices fluctuate constantly, so purchasing at different

More information

Enabling Foresight. Skills for Predictive Analytics

Enabling Foresight. Skills for Predictive Analytics Enabling Foresight Skills for Predictive Analytics Topic Outline INTRODUCTION Predictive Analytics Definition & Core Concepts Terms Used in Today s Analytics Environment Artificial Intelligence Deep Learning

More information

Prediction from Blog Data

Prediction from Blog Data Prediction from Blog Data Aditya Parameswaran Eldar Sadikov Petros Venetis 1. INTRODUCTION We have approximately one year s worth of blog posts from [1] with over 12 million web blogs tracked. On average

More information

Predictive Analytics

Predictive Analytics Predictive Analytics Mani Janakiram, PhD Director, Supply Chain Intelligence & Analytics, Intel Corp. Adjunct Professor of Supply Chain, ASU October 2017 "Prediction is very difficult, especially if it's

More information

Advances in Machine Learning for Credit Card Fraud Detection

Advances in Machine Learning for Credit Card Fraud Detection Advances in Machine Learning for Credit Card Fraud Detection May 14, 2014 Alejandro Correa Bahnsen Introduction Europe fraud evolution Internet transactions (millions of euros) 800 700 600 500 2007 2008

More information

Dynamic Advisor-Based Ensemble (dynabe): Case Study in Stock Trend Prediction of Critical Metal Companies

Dynamic Advisor-Based Ensemble (dynabe): Case Study in Stock Trend Prediction of Critical Metal Companies Dynamic Advisor-Based Ensemble (dynabe): Case Study in Stock Trend Prediction of Critical Metal Companies Zhengyang Dong Middlesex School, Concord, MA, USA Corresponding Author: Zhengyang Dong Middlesex

More information

Study on Customer Loyalty Prediction Based on RF Algorithm

Study on Customer Loyalty Prediction Based on RF Algorithm 2134 JOURNAL OF COMPUTERS, VOL. 8, NO. 8, AUGUST 213 Study on Customer Loyalty Prediction Based on RF Algorithm Jiong Mu Email: scmjmj@yahoo.com.cn Lijia Xu* Email: lijiaxu13@163.com, Xuliang Duan and

More information

Project Category: Life Sciences SUNet ID: bentyeh

Project Category: Life Sciences SUNet ID: bentyeh Project Category: Life Sciences SUNet ID: bentyeh Predicting Protein Interactions of Intrinsically Disordered Protein Regions Benjamin Yeh, Department of Computer Science, Stanford University CS 229 Project,

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Accurate Campaign Targeting Using Classification Algorithms

Accurate Campaign Targeting Using Classification Algorithms Accurate Campaign Targeting Using Classification Algorithms Jieming Wei Sharon Zhang Introduction Many organizations prospect for loyal supporters and donors by sending direct mail appeals. This is an

More information

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for human health in the past two centuries. Adding chlorine

More information

E-Commerce Sales Prediction Using Listing Keywords

E-Commerce Sales Prediction Using Listing Keywords E-Commerce Sales Prediction Using Listing Keywords Stephanie Chen (asksteph@stanford.edu) 1 Introduction Small online retailers usually set themselves apart from brick and mortar stores, traditional brand

More information

Three lessons learned from building a production machine learning system. Michael Manapat

Three lessons learned from building a production machine learning system. Michael Manapat Three lessons learned from building a production machine learning system Michael Manapat Stripe @mlmanapat Fraud Card numbers are stolen by hacking, malware, etc. Dumps are sold in carding forums Fraudsters

More information

Stock Price Prediction with Daily News

Stock Price Prediction with Daily News Stock Price Prediction with Daily News GU Jinshan MA Mingyu Derek MA Zhenyuan ZHOU Huakang 14110914D 14110562D 14111439D 15050698D 1 Contents 1. Work flow of the prediction tool 2. Model performance evaluation

More information

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report Predicting prokaryotic incubation times from genomic features Maeva Fincker - mfincker@stanford.edu Final report Introduction We have barely scratched the surface when it comes to microbial diversity.

More information

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10 1 C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING CAS Annual Meeting November 7-10 GLM Process 2 Data Prep Model Form Validation Reduction Simplification Interactions GLM Process 3 Opportunities

More information

FINAL PROJECT REPORT IME672. Group Number 6

FINAL PROJECT REPORT IME672. Group Number 6 FINAL PROJECT REPORT IME672 Group Number 6 Ayushya Agarwal 14168 Rishabh Vaish 14553 Rohit Bansal 14564 Abhinav Sharma 14015 Dil Bag Singh 14222 Introduction Cell2Cell, The Churn Game. The cellular telephone

More information

Acknowledgments. Data Mining. Examples. Overview. Colleagues

Acknowledgments. Data Mining. Examples. Overview. Colleagues Data Mining A regression modeler s view on where it is and where it s likely to go. Bob Stine Department of Statistics The School of the Univ of Pennsylvania March 30, 2006 Acknowledgments Colleagues Dean

More information

Chapter 3 Top Scoring Pair Decision Tree for Gene Expression Data Analysis

Chapter 3 Top Scoring Pair Decision Tree for Gene Expression Data Analysis Chapter 3 Top Scoring Pair Decision Tree for Gene Expression Data Analysis Marcin Czajkowski and Marek Krȩtowski Abstract Classification problems of microarray data may be successfully performed with approaches

More information

Predicting Corporate Influence Cascades In Health Care Communities

Predicting Corporate Influence Cascades In Health Care Communities Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice

More information

Application of Decision Trees in Mining High-Value Credit Card Customers

Application of Decision Trees in Mining High-Value Credit Card Customers Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,

More information

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification CSE 255 Lecture 3 Data Mining and Predictive Analytics Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression,

More information

Intro Logistic Regression Gradient Descent + SGD

Intro Logistic Regression Gradient Descent + SGD Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 29, 2016 1 Ad Placement

More information

Credit Card Marketing Classification Trees

Credit Card Marketing Classification Trees Credit Card Marketing Classification Trees From Building Better Models With JMP Pro, Chapter 6, SAS Press (2015). Grayson, Gardner and Stephens. Used with permission. For additional information, see community.jmp.com/docs/doc-7562.

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set

More information

Advanced Analytics through the credit cycle

Advanced Analytics through the credit cycle Advanced Analytics through the credit cycle Alejandro Correa B. Andrés Gonzalez M. Introduction PRE ORIGINATION Credit Cycle POST ORIGINATION ORIGINATION Pre-Origination Propensity Models What is it?

More information

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project. Airbnb Price Estimation Liubov Nikolenko SUNet ID: liubov Hoormazd Rezaei SUNet ID: hoormazd Pouya Rezazadeh SUNet ID: pouyar Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.git

More information

Watts App: An Energy Analytics and Demand-Response Advisor Tool

Watts App: An Energy Analytics and Demand-Response Advisor Tool Watts App: An Energy Analytics and Demand-Response Advisor Tool Santiago Gonzalez, Case Western Reserve University, Electrical Engineering, SUNFEST Fellow Dr. Rahul Mangharam, Electrical and Systems Engineering

More information

Gene Expression Classification: Decision Trees vs. SVMs

Gene Expression Classification: Decision Trees vs. SVMs Gene Expression Classification: Decision Trees vs. SVMs Xiaojing Yuan and Xiaohui Yuan and Fan Yang and Jing Peng and Bill P. Buckles Electrical Engineering and Computer Science Department Tulane University

More information