Data Mining. Chapter 7: Score Functions for Data Mining Algorithms. Fall Ming Li

Size: px
Start display at page:

Download "Data Mining. Chapter 7: Score Functions for Data Mining Algorithms. Fall Ming Li"

Transcription

1 Data Mining Chapter 7: Score Functions for Data Mining Algorithms Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

2 The merit of score function Score function indicates what is a good instantiation of a model structure. Ultimately, score function should be able to rank the models (with different complexity or different parameters) according to the usefulness of the model to the data miners Practically, simple generic score functions with well-understood properties and being relatively easy to work with are usually used to derive particular score functions for different applications.

3 Score functions for predictive models (I) Notation: Let training set be Let be the current model to be evaluated 0-1 loss (misclassification loss) Directly measures classification error Can hardly applied to regression

4 Score functions for predictive models (II) Least square loss: The output value should be as close to the observed value as possible Can be applied to both regression and classification problem

5 Score functions for predictive models (III) Hinge loss Enforce to separate the data with a margin using the model as the boundary Output value should have the same sign as the observed value The data should locate in the correct side of the boundary and be away from the boundary at least certain distance. 1 Can be applied to classification problem 0 1

6 Score functions for predictive models (IV) ε-insensitive Enforce the difference between the output value and observed value should be less than ε Can be applied to both classification and regression -ε 0 ε

7 Remarks on score functions for predictive models Strong assumption of previous score function: Every instance and observed labels are equally important to the tasks Input space: Can hardly emphasize some hard or important instances Output space: Can hardly emphasize different cost for making incorrect predictions Different choice of score function may be different upper (lower) bound of the true score functions, minimize (maximize) those score function can implicitly optimize the true score function

8 Score functions for descriptive models (I) Probabilistic models Better models assign higher probability to observed data Likelihood Can be regarded as error term Negative log-likelihood i.e., the over-head that are needed for describe the data

9 Score functions for descriptive models (II) Non-probablistic descriptive models Compactness in clustering Data within cluster should be close to each other Compressed quality in dimensionality reduction Recovered data should be similar to its original one

10 Considering model complexity Modeling in another word Reduce the complexity of the data to something that is more comprehensible High goodness of fit: Can fit the training data reasonably well Low model complexity: model should not be too difficult to describe Score function should reflects the difficulties of describing the current system S I (θ, M) = # bits to describe the data given the model + # bits to describe the model (and parameter)

11 Why consider model complexity? Simple models Model complexity complex models Do not fit data Goodness of fit Perfectly fit data Underfitting Only catch the vague picture Poor generalization Overfitting Catches every individual details Improving generalization ability MSE = Variance + Bias 2 Low complexity Well-fit of data

12 Penalize complexity To achieve better compromise, the score function should penalizes: Error made be by underfitting of the model Complexity of the model Score (model) = error(model) + penalty(model)

13 Score functions using external validation Split data into mutually exclusive parts Design part: used for training Validation part: used for evaluate score function. Different types of splits Random split at a fixed ratio and repeat multiple times Cross-validation: e.g., 10-fold cross validation Leave one out The validation set should be different from the test set for evaluate the performance of the model to achieve unbiased estimate

14 Score functions for pattern structure Two important issues on scoring pattern Coverage: the pattern that frequently appears in the data set should be of more interest Accuracy: the discovered patterns should provide accurate information Should describe the interestingness of the pattern

15 Let s move to Chapter 8

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

Predicting Corporate Influence Cascades In Health Care Communities

Predicting Corporate Influence Cascades In Health Care Communities Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice

More information

Introduction to Sample Surveys

Introduction to Sample Surveys Introduction to Sample Surveys Statistics 331 Kirk Wolter September 26, 2016 1 Outline A. What are sample surveys? B. Main steps in a sample survey C. Limitations/Errors in survey data September 26, 2016

More information

Intro Logistic Regression Gradient Descent + SGD

Intro Logistic Regression Gradient Descent + SGD Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 29, 2016 1 Ad Placement

More information

Using Decision Tree to predict repeat customers

Using Decision Tree to predict repeat customers Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the

More information

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification CSE 255 Lecture 3 Data Mining and Predictive Analytics Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression,

More information

Linear model to forecast sales from past data of Rossmann drug Store

Linear model to forecast sales from past data of Rossmann drug Store Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

A standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University

A standardization approach to adjusting pretest item statistics. Shun-Wen Chang National Taiwan Normal University A standardization approach to adjusting pretest item statistics Shun-Wen Chang National Taiwan Normal University Bradley A. Hanson and Deborah J. Harris ACT, Inc. Paper presented at the annual meeting

More information

Methodology: Assessment and Cross-Validation

Methodology: Assessment and Cross-Validation Methodology: Assessment and Cross-Validation Dan Sheldon November 18, 2014 First story USPS uses a classifier to distinguish 4 from 9 Pays $1 for every mistake How much money should it budget for 2015?

More information

Data Mining Applications with R

Data Mining Applications with R Data Mining Applications with R Yanchang Zhao Senior Data Miner, RDataMining.com, Australia Associate Professor, Yonghua Cen Nanjing University of Science and Technology, China AMSTERDAM BOSTON HEIDELBERG

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

IMPROVING GROUNDWATER FLOW MODEL PREDICTION USING COMPLEMENTARY DATA-DRIVEN MODELS

IMPROVING GROUNDWATER FLOW MODEL PREDICTION USING COMPLEMENTARY DATA-DRIVEN MODELS XIX International Conference on Water Resources CMWR 2012 University of Illinois at Urbana-Champaign June 17-22,2012 IMPROVING GROUNDWATER FLOW MODEL PREDICTION USING COMPLEMENTARY DATA-DRIVEN MODELS Tianfang

More information

Competing Goals of Responsive Design in a Total Survey Error Framework: Minimization of Cost, Nonresponse Rates, Bias, and Variance

Competing Goals of Responsive Design in a Total Survey Error Framework: Minimization of Cost, Nonresponse Rates, Bias, and Variance Competing Goals of Responsive Design in a Total Survey Error Framework: Minimization of Cost, Nonresponse Rates, Bias, and Variance Andy Peytchev International Total Survey Error Workshop August 2, 2012

More information

Introduction to Data,Mining

Introduction to Data,Mining Introduction to Data,Mining Equivalent,Buzz,Words! Data Mining! Machine Learning! Analytics! Big Data (not in actually, but in 90% of uses)! Data Science! Business Intelligence (often means reporting)!

More information

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner SAS Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner Melodie Rush Principal

More information

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Approaching Performance Projects CS 147: Computer Systems Performance Analysis Approaching Performance Projects 1 / 35 Overview Overview Overview Planning

More information

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy

Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy Applying Regression Techniques For Predictive Analytics Paviya George Chemparathy AGENDA 1. Introduction 2. Use Cases 3. Popular Algorithms 4. Typical Approach 5. Case Study 2016 SAPIENT GLOBAL MARKETS

More information

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner Support Vector Machines (SVMs) for the classification of microarray data Basel Computational Biology Conference, March 2004 Guido Steiner Overview Classification problems in machine learning context Complications

More information

CSC-272 Exam #1 February 13, 2015

CSC-272 Exam #1 February 13, 2015 CSC-272 Exam #1 February 13, 2015 Name Questions are weighted as indicated. Show your work and state your assumptions for partial credit consideration. Unless explicitly stated, there are NO intended errors

More information

Credibility: Evaluating What s Been Learned

Credibility: Evaluating What s Been Learned Evaluation: the Key to Success Credibility: Evaluating What s Been Learned Chapter 5 of Data Mining How predictive is the model we learned? Accuracy on the training data is not a good indicator of performance

More information

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information

Disentangling Prognostic and Predictive Biomarkers Through Mutual Information Informatics for Health: Connected Citizen-Led Wellness and Population Health R. Randell et al. (Eds.) 2017 European Federation for Medical Informatics (EFMI) and IOS Press. This article is published online

More information

GLMs the Good, the Bad, and the Ugly Ratemaking and Product Management Seminar March Christopher Cooksey, FCAS, MAAA EagleEye Analytics

GLMs the Good, the Bad, and the Ugly Ratemaking and Product Management Seminar March Christopher Cooksey, FCAS, MAAA EagleEye Analytics Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to

More information

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 02 Data Mining Process Welcome to the lecture 2 of

More information

Adaptive Time Series Forecasting of Energy Consumption using Optimized Cluster Analysis

Adaptive Time Series Forecasting of Energy Consumption using Optimized Cluster Analysis Adaptive Time Series Forecasting of Energy Consumption using Optimized Cluster Analysis Peter Laurinec, Marek Lóderer, Petra Vrablecová, Mária Lucká, Viera Rozinajová, Anna Bou Ezzeddine 12.12.2016 Slovak

More information

Tree Depth in a Forest

Tree Depth in a Forest Tree Depth in a Forest Mark Segal Center for Bioinformatics & Molecular Biostatistics Division of Bioinformatics Department of Epidemiology and Biostatistics UCSF NUS / IMS Workshop on Classification and

More information

Evolutionary Algorithms

Evolutionary Algorithms Evolutionary Algorithms Evolutionary Algorithms What is Evolutionary Algorithms (EAs)? Evolutionary algorithms are iterative and stochastic search methods that mimic the natural biological evolution and/or

More information

Testing 2. Testing: Agenda. for Systems Validation. Testing for Systems Validation CONCEPT HEIDELBERG

Testing 2. Testing: Agenda. for Systems Validation. Testing for Systems Validation CONCEPT HEIDELBERG CONCEPT HEIDELBERG GMP Compliance for January 16-17, 2003 at Istanbul, Turkey Testing for Systems Validation Dr.-Ing. Guenter Generlich guenter@generlich.de Testing 1 Testing: Agenda Techniques Principles

More information

Chapter 12. Sample Surveys. Copyright 2010 Pearson Education, Inc.

Chapter 12. Sample Surveys. Copyright 2010 Pearson Education, Inc. Chapter 12 Sample Surveys Copyright 2010 Pearson Education, Inc. Background We have learned ways to display, describe, and summarize data, but have been limited to examining the particular batch of data

More information

Predicting user rating on Amazon Video Game Dataset

Predicting user rating on Amazon Video Game Dataset Predicting user rating on Amazon Video Game Dataset CSE190A Assignment2 Hongyu Li UC San Diego A900960 holi@ucsd.edu Wei He UC San Diego A12095047 whe@ucsd.edu ABSTRACT Nowadays, accurate recommendation

More information

Lecture (chapter 7): Estimation procedures

Lecture (chapter 7): Estimation procedures Lecture (chapter 7): Estimation procedures Ernesto F. L. Amaral February 19 21, 2018 Advanced Methods of Social Research (SOCI 420) Source: Healey, Joseph F. 2015. Statistics: A Tool for Social Research.

More information

The power of numbers. And how to always be right (well, most of the time)

The power of numbers. And how to always be right (well, most of the time) The power of numbers And how to always be right (well, most of the time) Mike Snyman (mikesnyman0403@gmail.com) Section of plane Bullet holes per square foot Engine 1.11 Fuselage 1.73 Fuel system 1.55

More information

Active Learning for Logistic Regression

Active Learning for Logistic Regression Active Learning for Logistic Regression Andrew I. Schein The University of Pennsylvania Department of Computer and Information Science Philadelphia, PA 19104-6389 USA ais@cis.upenn.edu April 21, 2005 What

More information

Gasoline Consumption Analysis

Gasoline Consumption Analysis Gasoline Consumption Analysis One of the most basic topics in economics is the supply/demand curve. Simply put, the supply offered for sale of a commodity is directly related to its price, while the demand

More information

Case study 2. Data Mining and Predictive Analytics. Understanding Opinions and Preferences in Product Networks

Case study 2. Data Mining and Predictive Analytics. Understanding Opinions and Preferences in Product Networks Case study 2 Data Mining and Predictive Analytics Understanding Opinions and Preferences in Product Networks Relationships between products Relationships between products browsed together bought together

More information

Unit 14: Introduction to the Use of Bayesian Methods for Reliability Data. Ramón V. León

Unit 14: Introduction to the Use of Bayesian Methods for Reliability Data. Ramón V. León Unit 14: Introduction to the Use of Bayesian Methods for Reliability Data Ramón V. León Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and

More information

Unit 14: Introduction to the Use of Bayesian Methods for Reliability Data

Unit 14: Introduction to the Use of Bayesian Methods for Reliability Data Unit 14: Introduction to the Use of Bayesian Methods for Reliability Data Ramón V. León Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and

More information

WaterlooClarke: TREC 2015 Total Recall Track

WaterlooClarke: TREC 2015 Total Recall Track WaterlooClarke: TREC 2015 Total Recall Track Haotian Zhang, Wu Lin, Yipeng Wang, Charles L. A. Clarke and Mark D. Smucker Data System Group University of Waterloo TREC, 2015 Haotian Zhang, Wu Lin, Yipeng

More information

Examination of Cross Validation techniques and the biases they reduce.

Examination of Cross Validation techniques and the biases they reduce. Examination of Cross Validation techniques and the biases they reduce. Dr. Jon Starkweather, Research and Statistical Support consultant. The current article continues from last month s brief examples

More information

Unravelling Airbnb Predicting Price for New Listing

Unravelling Airbnb Predicting Price for New Listing Unravelling Airbnb Predicting Price for New Listing Paridhi Choudhary H John Heinz III College Carnegie Mellon University Pittsburgh, PA 15213 paridhic@andrew.cmu.edu Aniket Jain H John Heinz III College

More information

{saharonr, lastgift>35

{saharonr, lastgift>35 KDD-Cup 99 : Knowledge Discovery In a Charitable Organization s Donor Database Saharon Rosset and Aron Inger Amdocs (Israel) Ltd. 8 Hapnina St. Raanana, Israel, 43000 {saharonr, aroni}@amdocs.com 1. INTRODUCTION

More information

Our MCMC algorithm is based on approach adopted by Rutz and Trusov (2011) and Rutz et al. (2012).

Our MCMC algorithm is based on approach adopted by Rutz and Trusov (2011) and Rutz et al. (2012). 1 ONLINE APPENDIX A MCMC Algorithm Our MCMC algorithm is based on approach adopted by Rutz and Trusov (2011) and Rutz et al. (2012). The model can be written in the hierarchical form: β X,ω,Δ,V,ε ε β,x,ω

More information

Video Traffic Classification

Video Traffic Classification Video Traffic Classification A Machine Learning approach with Packet Based Features using Support Vector Machine Videotrafikklassificering En Maskininlärningslösning med Paketbasereade Features och Supportvektormaskin

More information

A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL

A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL by THILANKA DILRUWANI PARANAGAMA B.Sc., University of Colombo, Sri Lanka,

More information

Dynamic Probit models for panel data: A comparison of three methods of estimation

Dynamic Probit models for panel data: A comparison of three methods of estimation Dynamic Probit models for panel data: A comparison of three methods of estimation Alfonso Miranda Keele University and IZA (A.Miranda@econ.keele.ac.uk) 2007 UK Stata Users Group meeting September 10. In

More information

Model Selection, Evaluation, Diagnosis

Model Selection, Evaluation, Diagnosis Model Selection, Evaluation, Diagnosis INFO-4604, Applied Machine Learning University of Colorado Boulder October 31 November 2, 2017 Prof. Michael Paul Today How do you estimate how well your classifier

More information

SPM 8.2. Salford Predictive Modeler

SPM 8.2. Salford Predictive Modeler SPM 8.2 Salford Predictive Modeler SPM 8.2 The SPM Salford Predictive Modeler software suite is a highly accurate and ultra-fast platform for developing predictive, descriptive, and analytical models from

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2011 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

Copyright 2013, SAS Institute Inc. All rights reserved.

Copyright 2013, SAS Institute Inc. All rights reserved. IMPROVING PREDICTION OF CYBER ATTACKS USING ENSEMBLE MODELING June 17, 2014 82 nd MORSS Alexandria, VA Tom Donnelly, PhD Systems Engineer & Co-insurrectionist JMP Federal Government Team ABSTRACT Improving

More information

Genomic Selection with Linear Models and Rank Aggregation

Genomic Selection with Linear Models and Rank Aggregation Genomic Selection with Linear Models and Rank Aggregation m.scutari@ucl.ac.uk Genetics Institute March 5th, 2012 Genomic Selection Genomic Selection Genomic Selection: an Overview Genomic selection (GS)

More information

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara Customer Relationship Management in marketing programs: A machine learning approach for decision Fernanda Alcantara F.Alcantara@cs.ucl.ac.uk CRM Goal Support the decision taking Personalize the best individual

More information

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values

A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values Australian Journal of Basic and Applied Sciences, 6(7): 312-317, 2012 ISSN 1991-8178 A Comparative evaluation of Software Effort Estimation using REPTree and K* in Handling with Missing Values 1 K. Suresh

More information

(& Classify Deaths Without Physicians) 1

(& Classify Deaths Without Physicians) 1 Advanced Quantitative Research Methodology, Lecture Notes: Text Analysis I: How to Read 100 Million Blogs (& Classify Deaths Without Physicians) 1 Gary King http://gking.harvard.edu April 25, 2010 1 c

More information

THOMAS-KILMANN CONFLICT MODE QUESTIONNAIRE

THOMAS-KILMANN CONFLICT MODE QUESTIONNAIRE THOMAS-KILMANN CONFLICT MODE QUESTIONNAIRE Consider situations in which you find your wishes differing from those of another person. How do you usually respond to such situations? On the following pages

More information

Influence Maximization on Social Graphs. Yu-Ting Wen

Influence Maximization on Social Graphs. Yu-Ting Wen Influence Maximization on Social Graphs Yu-Ting Wen 05-25-2018 Outline Background Models of influence Linear Threshold Independent Cascade Influence maximization problem Proof of performance bound Compute

More information

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression

More information

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report

Predicting prokaryotic incubation times from genomic features Maeva Fincker - Final report Predicting prokaryotic incubation times from genomic features Maeva Fincker - mfincker@stanford.edu Final report Introduction We have barely scratched the surface when it comes to microbial diversity.

More information

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa

The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pa The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics 7.5, pages 37-64. The description of the problem can be found

More information

Week 10: Heteroskedasticity

Week 10: Heteroskedasticity Week 10: Heteroskedasticity Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline The problem of (conditional)

More information

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh

Professor Dr. Gholamreza Nakhaeizadeh. Professor Dr. Gholamreza Nakhaeizadeh Statistic Methods in in Mining Business Understanding Understanding Preparation Deployment Modelling Evaluation Mining Process (( Part 3) 3) Professor Dr. Gholamreza Nakhaeizadeh Professor Dr. Gholamreza

More information

A Comparative Study of Filter-based Feature Ranking Techniques

A Comparative Study of Filter-based Feature Ranking Techniques Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,

More information

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Statistical Methods for Quantitative Trait Loci (QTL) Mapping Statistical Methods for Quantitative Trait Loci (QTL) Mapping Lectures 4 Oct 10, 011 CSE 57 Computational Biology, Fall 011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 1:00-1:0 Johnson

More information

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project. Airbnb Price Estimation Liubov Nikolenko SUNet ID: liubov Hoormazd Rezaei SUNet ID: hoormazd Pouya Rezazadeh SUNet ID: pouyar Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.git

More information

Advanced Quantitative Research Methodology, Lecture Notes: Text Analysis: Supervised Learning

Advanced Quantitative Research Methodology, Lecture Notes: Text Analysis: Supervised Learning Advanced Quantitative Research Methodology, Lecture Notes: Text Analysis: Supervised Learning Gary King Institute for Quantitative Social Science Harvard University April 22, 2012 Gary King (Harvard, IQSS)

More information

Machine Learning Based Prescriptive Analytics for Data Center Networks Hariharan Krishnaswamy DELL

Machine Learning Based Prescriptive Analytics for Data Center Networks Hariharan Krishnaswamy DELL Machine Learning Based Prescriptive Analytics for Data Center Networks Hariharan Krishnaswamy DELL Modern Data Center Characteristics Growth in scale and complexity Addition and removal of system components

More information

374 Index. disposal policy, 292 dynamic ambulance relocation problem (DYNAROC), 41 5 Dynamic Programming (DP), 251, 272 3, 275, 276, 277, 293

374 Index. disposal policy, 292 dynamic ambulance relocation problem (DYNAROC), 41 5 Dynamic Programming (DP), 251, 272 3, 275, 276, 277, 293 Index AIDS, see HIV/AIDS epidemic, modelling treatment effects in ambulance dispatch, 40 1 ambulance logistics, 36 50 computational results of, 47 50 dispatching support, 40 1 dynamic ambulance relocation,

More information

A Prediction Reference Model for Air Conditioning Systems in Commercial Buildings

A Prediction Reference Model for Air Conditioning Systems in Commercial Buildings A Prediction Reference Model for Air Conditioning Systems in Commercial Buildings Mahdis Mahdieh, Milad Mohammadi, Pooya Ehsani School of Electrical Engineering, Stanford University Abstract Nearly 45%

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

Final Examination. Department of Computer Science and Engineering CSE 291 University of California, San Diego Spring Tuesday June 7, 2011

Final Examination. Department of Computer Science and Engineering CSE 291 University of California, San Diego Spring Tuesday June 7, 2011 Department of Computer Science and Engineering CSE 291 University of California, San Diego Spring 2011 Your name: Final Examination Tuesday June 7, 2011 Instructions: Answer each question in the space

More information

Chapter 4. Phase Four: Evaluating Jobs. Key points made in this chapter

Chapter 4. Phase Four: Evaluating Jobs. Key points made in this chapter C H A P T E R F O U R Chapter 4 Phase Four: Evaluating Jobs Key points made in this chapter The evaluation process: should be done on a factor-by-factor basis rather than job-by-job should include a sore-thumbing

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

Review Model I Model II Model III

Review Model I Model II Model III Maximum Likelihood Estimation & Expectation Maximization Lectures 3 Oct 5, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson

More information

The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation

The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation Kosuke Imai Princeton University Joint work with Gary King (Harvard)

More information

Using People Analytics to Help Prevent Absences Due to Mental Health Issues

Using People Analytics to Help Prevent Absences Due to Mental Health Issues Hitachi Solutions for New Work Styles Using People Analytics to Help Prevent Absences Due to Mental Health Issues With AI, especially deep learning, experiencing a third boom driven by such factors as

More information

A Survey on Recommendation Techniques in E-Commerce

A Survey on Recommendation Techniques in E-Commerce A Survey on Recommendation Techniques in E-Commerce Namitha Ann Regi Post-Graduate Student Department of Computer Science and Engineering Karunya University, India P. Rebecca Sandra Assistant Professor

More information

Secondary Math Margin of Error

Secondary Math Margin of Error Secondary Math 3 1-4 Margin of Error What you will learn: How to use data from a sample survey to estimate a population mean or proportion. How to develop a margin of error through the use of simulation

More information

Crowe Critical Appraisal Tool (CCAT) User Guide

Crowe Critical Appraisal Tool (CCAT) User Guide Crowe Critical Appraisal Tool (CCAT) User Guide Version 1.4 (19 November 2013) Use with the CCAT Form version 1.4 only Michael Crowe, PhD michael.crowe@my.jcu.edu.au This work is licensed under the Creative

More information

A Systematic Approach to Performance Evaluation

A Systematic Approach to Performance Evaluation A Systematic Approach to Performance evaluation is the process of determining how well an existing or future computer system meets a set of alternative performance objectives. Arbitrarily selecting performance

More information

The Wisdom Of the Crowds: Enhanced Reputation-Based Filtering

The Wisdom Of the Crowds: Enhanced Reputation-Based Filtering The Wisdom Of the Crowds: Enhanced Reputation-Based Filtering Jason Feriante feriante@cs.wisc.edu CS 761 Spring 2015 Department of Computer Science University of Wisconsin-Madison Abstract Crowdsourcing

More information

Active Learning for Conjoint Analysis

Active Learning for Conjoint Analysis Peter I. Frazier Shane G. Henderson snp32@cornell.edu pf98@cornell.edu sgh9@cornell.edu School of Operations Research and Information Engineering Cornell University November 1, 2015 Learning User s Preferences

More information

Mallow s C p for Selecting Best Performing Logistic Regression Subsets

Mallow s C p for Selecting Best Performing Logistic Regression Subsets Mallow s C p for Selecting Best Performing Logistic Regression Subsets Mary G. Lieberman John D. Morris Florida Atlantic University Mallow s C p is used herein to select maximally accurate subsets of predictor

More information

Market Concentration and Power

Market Concentration and Power Market Concentration and Power What can the data tell us about θ? If we had data on marginal costs, we should be able to estimated θ easily. Because then we could get L, and having estimated H (easy),

More information

ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR. The Rand Corporatlon

ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR. The Rand Corporatlon EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 1980,40 ESTIMATING TOTAL-TEST SCORES FROM PARTIAL SCORES IN A MATRIX SAMPLING DESIGN JANE SACHAR The Rand Corporatlon PATRICK SUPPES Institute for Mathematmal

More information

Predicting Yelp Ratings From Business and User Characteristics

Predicting Yelp Ratings From Business and User Characteristics Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online

More information

Comparison of Different Independent Component Analysis Algorithms for Sales Forecasting

Comparison of Different Independent Component Analysis Algorithms for Sales Forecasting International Journal of Humanities Management Sciences IJHMS Volume 2, Issue 1 2014 ISSN 2320 4044 Online Comparison of Different Independent Component Analysis Algorithms for Sales Forecasting Wensheng

More information

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy

Affymetrix GeneChip Arrays. Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Arrays Lecture 3 (continued) Computational and Statistical Aspects of Microarray Analysis June 21, 2005 Bressanone, Italy Affymetrix GeneChip Design 5 3 Reference sequence TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

More information

OPTIMIZATION AND CV ESTIMATION OF A PLATE COUNT ASSAY USING JMP

OPTIMIZATION AND CV ESTIMATION OF A PLATE COUNT ASSAY USING JMP OPTIMIZATION AND CV ESTIMATION OF A PLATE COUNT ASSAY USING JMP Author: Marianne Toft, Statistician, Novozymes A/S, Denmark ABSTRACT Some of our products are bacterial spores, for which the assay used

More information

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10

C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING. CAS Annual Meeting November 7-10 1 C-14 FINDING THE RIGHT SYNERGY FROM GLMS AND MACHINE LEARNING CAS Annual Meeting November 7-10 GLM Process 2 Data Prep Model Form Validation Reduction Simplification Interactions GLM Process 3 Opportunities

More information

Generalized Maximum Entropy estimation method for studying the effect of the Management s Factors on the Enterprise Performances

Generalized Maximum Entropy estimation method for studying the effect of the Management s Factors on the Enterprise Performances Generalized Maximum Entropy estimation method for studying the effect of the Management s Factors on the Enterprise Performances Enrico Ciavolino, Researcher of Statistics, University of Salento, Department

More information

Strength in numbers? Modelling the impact of businesses on each other

Strength in numbers? Modelling the impact of businesses on each other Strength in numbers? Modelling the impact of businesses on each other Amir Abbas Sadeghian amirabs@stanford.edu Hakan Inan inanh@stanford.edu Andres Nötzli noetzli@stanford.edu. INTRODUCTION In many cities,

More information

(DMSTT 21) M.Sc. (Final) Final Year DEGREE EXAMINATION, DEC Statistics. Time : 03 Hours Maximum Marks : 100

(DMSTT 21) M.Sc. (Final) Final Year DEGREE EXAMINATION, DEC Statistics. Time : 03 Hours Maximum Marks : 100 (DMSTT 21) M.Sc. (Final) Final Year DEGREE EXAMINATION, DEC. - 2012 Statistics Paper - I : STATISTICAL QUALITY CONTROL Time : 03 Hours Maximum Marks : 100 Answer any Five questions All questions carry

More information

Economics of Strategy Fifth Edition

Economics of Strategy Fifth Edition Economics of Strategy Fifth Edition Besanko, Dranove, Shanley, and Schaefer Chapter 16 Performance Measurement and Incentive in Firms Slides by: Richard Ponarul, California State University, Chico Copyright

More information

Preface to the third edition Preface to the first edition Acknowledgments

Preface to the third edition Preface to the first edition Acknowledgments Contents Foreword Preface to the third edition Preface to the first edition Acknowledgments Part I PRELIMINARIES XXI XXIII XXVII XXIX CHAPTER 1 Introduction 3 1.1 What Is Business Analytics?................

More information

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005

Today. Last time. Lecture 5: Discrimination (cont) Jane Fridlyand. Oct 13, 2005 Biological question Experimental design Microarray experiment Failed Lecture : Discrimination (cont) Quality Measurement Image analysis Preprocessing Jane Fridlyand Pass Normalization Sample/Condition

More information

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES Sample Size Issues for Conjoint Analysis Studies Bryan Orme, Sawtooth Software, Inc. 1998 Copyright 1998-2001, Sawtooth Software, Inc. 530 W. Fir St. Sequim, WA

More information

LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA

LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA LOSS DISTRIBUTION ESTIMATION, EXTERNAL DATA AND MODEL AVERAGING Ethan Cohen-Cole Federal Reserve Bank of Boston Working Paper No. QAU07-8 Todd Prono Federal Reserve Bank of Boston This paper can be downloaded

More information

Toronto Data Science Forum. Wednesday May 2 nd, 2018

Toronto Data Science Forum. Wednesday May 2 nd, 2018 Toronto Data Science Forum Wednesday May 2 nd, 2018 Prescriptive Analytics: Using Optimization with Predictive Models to find the Best Action Dr. Mamdouh Refaat, Angoss Software (Datawatch) Mamdouh Refaat

More information

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN

Predictive Modeling using SAS. Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN Predictive Modeling using SAS Enterprise Miner and SAS/STAT : Principles and Best Practices CAROLYN OLSEN & DANIEL FUHRMANN 1 Overview This presentation will: Provide a brief introduction of how to set

More information

RFM analysis for decision support in e-banking area

RFM analysis for decision support in e-banking area RFM analysis for decision support in e-banking area VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University

More information