Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Similar documents
Intro Logistic Regression Gradient Descent + SGD

PAST research has shown that real-time Twitter data can

Improving the Accuracy of Base Calls and Error Predictions for GS 20 DNA Sequence Data

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Startup Machine Learning: Bootstrapping a fraud detection system. Michael Manapat

Modeling User Click Behavior in Sponsored Search

Derivative-based Optimization (chapter 6)

Using Decision Tree to predict repeat customers

Application of Machine Learning to Financial Trading

New restaurants fail at a surprisingly

Analysing the Immune System with Fisher Features

Multi-objective optimization

EFA in a CFA Framework

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

Predicting International Restaurant Success with Yelp

IBM SPSS & Apache Spark

A comparative study of Linear learning methods in Click-Through Rate Prediction

Predicting gas usage as a function of driving behavior

Credit Risk Models Cross-Validation Is There Any Added Value?

Response Modeling Marketing Engineering Technical Note 1

This is a quick-and-dirty example for some syntax and output from pscore and psmatch2.

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks


E-Commerce Sales Prediction Using Listing Keywords

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005

Predictive Modelling for Customer Targeting A Banking Example

Evolutionary Algorithms

Data Mining Applications with R

DISTRIBUTED ARTIFICIAL INTELLIGENCE

WE consider the general ranking problem, where a computer

Advances in Machine Learning for Credit Card Fraud Detection

Data Mining and Applications in Genomics

Prediction of permeability from reservoir main properties using neural network

Top-down Forecasting Using a CRM Database Gino Rooney Tom Bauer

On Burst Detection and Prediction in Retweeting Sequence

A logistic regression model for Semantic Web service matchmaking

Estimation, Forecasting and Overbooking

Getting Started With PROC LOGISTIC

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

TRANSPORTATION PROBLEM AND VARIANTS

Machine Learning Models for Pattern Classification. Comp 473/6731

Exploring the Genetic Basis of Congenital Heart Defects

Correcting Sample Bias in Oversampled Logistic Modeling. Building Stable Models from Data with Very Low event Count

Active Chemical Sensing with Partially Observable Markov Decision Processes

What is Evolutionary Computation? Genetic Algorithms. Components of Evolutionary Computing. The Argument. When changes occur...

Energy-Aware Active Chemical Sensing

Evolutionary Algorithms - Population management and popular algorithms Kai Olav Ellefsen

Capacity Dilemma: Economic Scale Size versus. Demand Fulfillment

A New Interpretation of the Logistic Model in Estimating Seasonal and Yearly Natural Gas Consumption

USING NEURAL NETWORKS IN ESTIMATING DEFAULT PROBABILITY A CASE STUDY ON RETAIL LENDING

A NOVEL FOREST FIRE PREDICTION TOOL UTILIZING FIRE WEATHER AND MACHINE LEARNING METHODS

The Transportation and Assignment Problems. Chapter 9: Hillier and Lieberman Chapter 7: Decision Tools for Agribusiness Dr. Hurley s AGB 328 Course

Examination of Cross Validation techniques and the biases they reduce.

Lithium-Ion Battery Analysis for Reliability and Accelerated Testing Using Logistic Regression

PPDSparse: A Parallel Primal and Dual Sparse Method to Extreme Classification

^ Springer. The Logic of Logistics. Theory, Algorithms, and Applications. for Logistics Management. David Simchi-Levi Xin Chen Julien Bramel

Click Through Rate Prediction for Contextual Advertisment Using Linear Regression

Using Social Media Metrics to Predict Artist and Album Success

Price of anarchy in auctions & the smoothness framework. Faidra Monachou Algorithmic Game Theory 2016 CoReLab, NTUA

Logistic Regression, Part III: Hypothesis Testing, Comparisons to OLS

How much is my car worth? A methodology for predicting used cars prices using Random Forest

Spatial analysis of epidemics: Disease gradients and patterns

Chapter kn m/kg Ans kn m/kg Ans kn m/kg Ans

Incentives in Crowdsourcing: A Game-theoretic Approach

The Willingness to Pay for a Certification Label and a Tribal Art Design of Native American Wooden Gift Products. Daisuke Sasatani

Agile Industrial Analytics

A MACHINE-LEARNING APPROACH TO OPTIMAL BID PRICING

Global Optimization for Advertisement Selection in Sponsored Search

A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING

Machine Learning in Computational Biology CSC 2431

Chapter 3. Labour Demand. Introduction. purchase a variety of goods and services.

Adaptive Mechanism Design: A Metalearning Approach

Genetic Algorithms in Matrix Representation and Its Application in Synthetic Data

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors

arxiv: v2 [cs.cv] 15 Feb 2018

MODELING THE EXPERT. An Introduction to Logistic Regression The Analytics Edge

A SAS Macro to Analyze Data From a Matched or Finely Stratified Case-Control Design

Kuhn-Tucker Estimation of Recreation Demand A Study of Temporal Stability

Producer Preferences and Characteristics in Biomass Supply Chains. Ira J. Altman Southern Illinois University-Carbondale

OVERVIEW OF ELECTRIC COST OF SERVICE STUDIES

Chapter 15: Monopoly. Notes. Watanabe Econ Monopoly 1 / 83. Notes. Watanabe Econ Monopoly 2 / 83. Notes

A stochastic production planning optimization for multi parallel machine under leasing Contract

Very Short-Term Electricity Load Demand Forecasting Using Support Vector Regression

CEng 713 Evolutionary Computation, Lecture Notes

Evaluation of random forest regression for prediction of breeding value from genomewide SNPs

Machine learning in neuroscience

Brian Macdonald Big Data & Analytics Specialist - Oracle

PLANNING FOR PRODUCTION

A Parametric Bootstrapping Approach to Forecast Intermittent Demand

CREDIT RISK MODELLING Using SAS

Advanced Analytics through the credit cycle

Economic management and thresholds in ecosystems. Aart de Zeeuw Tilburg University, the Netherlands Beijer Institute Stockholm, Sweden

Forecasting, Overbooking and Dynamic Pricing

Metamodelling and optimization of copper flash smelting process

1) Operating costs, such as fuel and labour. 2) Maintenance costs, such as overhaul of engines and spraying.

Predicting Yelp Restaurant Reviews

Transcription:

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 /

Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression Logistic regression: Decision surface Logistic regression: ML estimation Logistic regression: Gradient descent Logistic regression: multi-class Logistic Regression: Regularization Logistic Regression VS. Naïve Bayes 2

Probabilistic Classification Generative probabilistic classification (Previous lecture) motivation: assume a distribution for each class and try to find the parameters for the distributions cons: need to assume distributions; need to fit many parameters Discriminative approach: Logistic regression (Focus of today) motivation: like least square, but assume logistic distribution y(x) = (wtx); classify based on y(x) > 0:5 or not. technique: gradient descent 3

Introduction to Logistic regression Logistic regression represents the probability of category i using a linear function of the input variables: The name comes from the logit transformation: 4

Binary logistic regression Logistic Regression assumes a parametric form for the distribution then directly estimates its parameters from the training data. The parametric model assumed by Logistic Regression in the case where boolean is: P( Y X ) Y is Notice that equation (2) follows directly from equation (1), because the sum of these two probabilities must equal 1. 5

Binary logistic regression We only need one set of parameters: Sigmoid (logistic) function 6

Logistic regression vs. Linear regression Adapted from slides of John Whitehead 7

Logistic regression: Decision surface Given a logistic regression W and an X: Decision surface f(x;w)=constant Decision surfaces are linear functions of x Decision making on Y: 8

Computing the likelihood in details We can re-express the log of the conditional likelihood as: l l l l l l l( w) y ln P( y 1 x, w) (1 y )ln P( y 0 x, w) l l l l Py ( 1 x, w) l l y ln ln P( y 0 x, w) l l Py ( 0 x, w) l n n l l l y w0 wi xi w0 wi xi l i 1 i 1 ( ) ln(1 exp( )) 9

Logistic regression: ML estimation is a concave in w What is a concave and a convex function? No closed form solution 10

Optimizing concave/convex function Maximum of a concave function = minimum of a convex function Gradient ascent (concave) / Gradient descent (convex) 11

Gradient ascent / Gradient descent For function f(w) If f is concave : Gradient ascent rule If f is convex: Gradient descent rule 12

Logistic regression: Gradient descent Iteratively updating the weights in this fashion increases likelihood each round. We eventually reach the maximum We are near the maximum when changes in the weights are small. Thus, we can stop when the sum of the absolute values of the weight differences is less than some small number. 13

Logistic regression: multi-class In the two-class case For multiclass, we work with soft-max function instead of logistic sigmoid Aka Softmax 14

Logistic Regression: Regularization Overfitting the training data is a problem that can arise in Logistic Regression, especially when data has very high dimensions and is sparse. One approach to reducing overfitting is regularization, in which we create a modified penalized log likelihood function, which penalizes large values of w. l l 2 w = arg max ln Py ( x, w) w w l 2 The derivative of this penalized log likelihood function is similar to our earlier derivative, with one additional penalty term l( w) x l ( l ˆ( l 1 l i y P y x, w)) wi wi l which gives us the modified gradient descent rule w w x l ( y l Pˆ ( y l 1 x l, w)) w i i i i l 15

Logistic Regression VS. Naïve Bayes In general, NB and LR make different assumptions NB: Features independent given class -> assumption on P(X Y) LR: Functional form of P(Y X), no assumption on P(X Y) LR is a linear classifier decision rule is a hyperplane LR optimized by conditional likelihood no closed-form solution concave -> global optimum with gradient ascent 16

Logistic Regression VS. Naïve Bayes Consider Y and Xi boolean, X=<X1... Xn> Number of parameters: NB: 2n +1 LR: n+1 Estimation method: NB parameter estimates are uncoupled LR parameter estimates are coupled 17

Logistic Regression VS. Gaussian Naive Bayes When the GNB modeling assumptions do not hold, Logistic Regression and GNB typically learn different classifier functions Logistic Regression is consistent with the Naïve Bayes assumption that the input features Xi are conditionally independent given Y,it is not rigidly tied to this assumption as is Naive Bayes. GNB parameter estimates converge toward their asymptotic values in order log(n) examples, where n is the dimension of X. Logistic Regression parameter estimates converge more slowly, requiring order (n ) examples. 18

Summary Logistic Regression learns the Conditional Probability Distribution P(y x) Local Search. Begins with initial weight vector. Modifies it iteratively to maximize an objective function. The objective function is the conditional log likelihood of the data: so the algorithm seeks the probability distribution P(y x) that is most likely given the data. 19

Any Question End of Lecture 9 Thank you! Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1/ 20