A comparative study of Linear learning methods in Click-Through Rate Prediction

Size: px
Start display at page:

Download "A comparative study of Linear learning methods in Click-Through Rate Prediction"

Transcription

1 2015 International Conference on Soft Computing Techniques and Implementations- (ICSCTI) Department of ECE, FET, MRIU, Faridabad, India, Oct 8-10, 2015 A comparative study of Linear learning methods in Click-Through Rate Prediction Antriksh Agarwal Avishkar Gupta Dr. Tanvir Ahmad Department of Computer Engg, Department of Computer Engg, Department of Computer Engg, Jamia Millia Islamia Jamia Millia Islamia Jamia Millia Islamia Okhla, Delhi Okhla, Delhi Okhla, Delhi antriksh5235@gmail.com avishkar.gupta.delhi@gmail.com tahmad2@jmi.ac.in Abstract A major challenge in the current era of search engine advertising is choosing which advertisements to show in response to a user query. This significantly impacts the overall user experience, and more importantly the advertising revenue stream for the search engine provider. Predicting click-through rates (CTR) for an advertisement is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. This study examines the performance of some wellknown statistical learning methods (linear and logistic) with respect to their efficiency in predicting the click through rate of an impression, where an impression can simply be defined as an instance of a particular advertisement, with each instance defined in terms of the learning parameters in our data set. Our data set consisted of three types of independent attributes to act as a regressor in predicting our dependent variable - the app through which it was clicked, the site type and the domain to which it led with the help of other anonymised variables. Fine tuning of the algorithm parameters was done to get promising results. Besides that a dimensionality check on the data set was conducted to observe the possibilities of dimensionality reduction. Logistic loss (log-loss) was used as the validation index in all cases. Our observations led us to the conclusion that with minimal data preprocessing, linear models give competitive on-par results suited for most practical applications, where the learning method chosen should not be computationally expensive. We go on to further verify this claim by comparing the performance of linear models on various subsets of the data set attributes, showing that the performance of the linear techniques was consistent all across. Keywords Logistic Regression; Click-Through Rate; Linear Models; Logistic Classifier-Regressor; Advertising I. INTRODUCTION Advertising via sponsored search results has become the platform for companies to gain a reputation for themselves beyond local markets, acting as a major income/revenue source for search engine providers such as Google/Yahoo/Bing etc., generating revenues of the order of 25 billion dollars and upwards[2]. A way to predict the effectiveness of a marketing campaign would be to record the user's reaction to the ad when it shows up. However, seeing as how that is not feasible with the current technology, click through rate, which tells us about how many visitors merely "initiated action" in response to the showing of the ad servers as a metric in understanding user behavior. Different advertisers target different kinds of users: a mountaineering equipment company will be interested in users who may be bought some sporting gear recently, and an airline would prefer to display its ads to people who are frequent fliers. Click through rate prediction plays an important role in this area of sponsored advertising. A higher click through rate is a clear indicator for predicting the success of an online marketing campaign, as well as the success of an marketing campaign. A higher click through rate means more number of users are clicking the ad, which means our campaign is reaching the target audience. Click through rate prediction is therefore necessary to be able to further optimize ad placement in the sponsored search market. The sponsored search advertising model exploits two key aspects of on-line advertising [3]. First, the user enters a query to the search engine, which is a give away of their intent and determines the type of advertisement that would be shown to them. Also, if a user is to follow the said link, then the success can be attributed directly to the search engine provider in the case of sponsored search. However, in the cases where these advertisements are placed on websites as banner ads, etc. a large number of factors come into play and things are not so straight forward. The positioning of the ad on the site, the device being used to surf the site, are some common ones. Also, the advertising on these sites is directly linked to the traffic volume on the original website where the ad is displayed. Because of this, it is necessary to factor in these attributes when trying to calculate the click through rate for an advertisement on a site other than that of a search engine provider. Our work is aimed at predicting CTR in these cases where the advertisement display is not necessarily on a search portal. In these scenarios user queries are no longer available to exploit, and factors such as the theme of the website, etc. then have to be taken into account. Most work in this area has been carried out by Search Engine Providers, but the techniques given by them are not applicable 'as-is' here because they in most cases, do not accommodate the said metrics. Click through rate can be defined as: /15/$ IEEE Track 3 : Hybrid Intelligence - 97

2 where each impression refers to one showing of the ad. This paper attempts to make a comparison of the performance of some well-known linear and logistic learning methods in click-through-rate prediction and touches on the key role that CTR prediction plays in sponsored search. We chose linear models, since training a single layer model would allow us to handle significantly larger data sets and larger models than have been reported elsewhere. Also, the data per- was to draw processing was kept to minimum as our objective a comparison based on the performance of classification on the data set. II. RELATED WORKS Craswell, Ramsey, et. al. [5] analyzed the effect of a links' position in determining the probability that the link will be clicked. They compared four real world situation models to that of logistic regression. They proposed a cascade model that can be applied without the need for training data, and parameter-free to click observations. Thir model however performed badly in lower ranks. Their results went into depth about how the position of an ad will affect its probability of being clicked, just like a search result. Azin Ashkan, Charles L.A. Clarke et. al. estimated ad click-through rate by exploring user queries and click-through logs. Their findings go on to prove that rank of an ad, query intent, no. of ads displayed on result page etc. are effective in estimating click-through rate[6]. A related paper [7] by Zhong, Wang et. al. explored the user's post-click behavior (such as the dwell time on the clicked document, and whether there are further clicks on the clicked document). They worked on monitoring the user's activity post -click after leaving the search page and proposed a click model. The works of Ye Chen, Tak W. Yan[8] and several others hint at the positional-biaworks of Jingfang Xu et. al[9] and Ben Carterette & Rosie problem in Unison. The Jones[10] touches on the problem of minimizing relevance judgment errors. Their findings provided a way for comparing raking functions by predicting relevance from click-through rates. In addition to this, previous eye-tracking experiments and studies on explaining position-bias of user clicks provide a spectrum of hypotheses and models on how an average user examines and possibly clicks web documents returned by a search engine with respect to the submitted query. III. PROPOSED ARCHITECTURE Fig. 1 describes the proposed methodology we employed for the prediction of click-through rate based on the independent variables, having following modules - Data Pre- Classifier Selection, Linear processing, Logistic Loss based Models and Dimensionality Reduction. A. Procuring the Data-set This is the primary step during which data obtained from logs of websites are used to derive the independent variables. A raw (not scaled) data set is obtained and saved in a standard format (eg. CSV). Fig. 1.Proposed Architecture of Prediction Track 3 : Hybrid Intelligence - 98

3 B. Data Pre-processing Various data pre-processing steps like data scaling, field removal and format conversion were applied that can be summarized as follows. Feature Selection: In the field removal steps columns like ID, Serial No. were removed from the data since these columns were used for identifying the rows and have no role in classification. Candidate features are chosen out of the features obtained in the previous step, such that, their removal does not affect the accuracy of classification model. Among those candidates for the pair about which we have a rationale for their removal are removed such as the identities of each of the table as well as the features provided in the table. Feature Engineering: Features such as time and hour which have been given in a date time format in the table had to be separated and special functions were created for the same. 1) Feature Extraction: Often features are not given as continuous values but categorical. When discrete values constitute the data of a particular feature, instead of the continuous values that are usually used to classify, we cannot use these features directly with the estimators. The estimators expect the input to be continuous and would interpret the categories as being ordered, which is not often desired. One possibility to convert categorical features to features that can be used is feature hashing. Feature hashing, also known as the hashing trick, is a fast and space-efficient way of vectorising features, i.e. turning categorical features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly, rather than looking the indices up in an associative array and creates features to determine column index in sample matrices directly. C. Log-Loss Based Classifier Selection In this step the emphasis is on the selection of the classification algorithm. The data set should be tried on various Machine learning (ML) algorithms. This aids in selection of the base learner. Logarithmic Loss is the loss function used in multinomial logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of the true labels given a probabilistic classifier s predictions. Logistic loss for y i {0,1}: where is a prediction for the i th. Logistic Loss for y i {- 1,1}: where p i is a raw score from the model and, i{1,,m}. D. Classifier Identifying to which set of categories a new observations belongs, on the basis of a training set and the statistical relationship among variables in the training set whose category membership is known is commonly referred to as classification. It includes many techniques for modelling and analysing several variables and finally derives a relationship between a dependent variable and the independent variables. Classifier performance depends greatly on the characteristics of the data to be classified. Various empirical tests have to be performed to compare classifier performance and to find the characteristics of data that determine classifier performance. A large number of algorithms for classification can be phrased ni terms of a linear function that assigns a score to each possible category k by combining the feature vector of an instance with a vector of weights, using a dot product. The predicted category is the one with the highest score. This type of score function is known as a linear predictor function and has the following general form: where X i is the feature vector for instance i, k is the vector of weights corresponding to category k, and score(x i, k) is the score associated with assigning instance i to category k. Algorithms with this basic setup are known as linear classifiers. E. Dimensionality Reduction (DR) DR is the process of removal of variables from the data set which are correlated with each other and might degrade the classifier accuracy. Following steps were performed in order to improve the accuracy. 1) Iterative Classification: Each variable in the data set is excluded one by one and a model is built using Logistic Regression, features whose exclusion results in logistic loss lower than the default logistic loss (when no variable is removed) are noted down. Dimension Reduction: Candidate features are chosen out of the features obtained in the previous step, such that, their removal does not affect the accuracy of classification model. Among those candidates for the pair about which we have a rationale for their removal are removed. This accuracy driven DR approach is also known as the wrapper approach. Track 3 : Hybrid Intelligence - 99

4 IV. EXPERIMENTAL SETUP Experimental Data-Set We were provided with ten days of sub sampled click- on kaggle, an through rate data by Avazu[1], made available online portal for data-science. This set consisted of about 40 million lines of training data, which we further sub-sampled this data set to create a split of training and testing data using the first 500,000 records. The sets had the following attributes. Fig. 2. Types of attributes in the data-sets, categorized according to their physical intuition The feature we had to predict in the attributes above is 'click' and the others function as the independent features. Out of the independent features, the parameters C1, C14-C21 represent categorical features(where each value represented an ID and has no quantitative significance) whose significance was anonymized by Avazu for businesss reasons. These anonymized features represent variant attributes such as the dimensions of the advertisement. The features whose attributions were known even though also containing hashed strings were also categorical (discrete) features that covered the following attributes: site_id, site_domain and site_category Features that specify the site on which an impression of the advertisement was put. app_id, app_domain and app_category Specify the app in which the advertisement/webpage with the advertisement was shown. device_id, device_ip, device_model, device type, conn_type Identify the device of the user on which the impressions were shown. Prototyping Tools We used the classifiers provider in the scikit-learn toolkit[11] available for the Python programming language. The SciPy toolkit featuring Numpy, Scipy, and all associated packages was also used. B. Experimental Procedure The experiment was then conducted using the architecture proposed in section III. We tested the architecture using three learning methods vanilla logistic regression, Stochastic Gradient Descent (SGD Classifier) and a Bayesian method(multinomial Bayes). Logistic regression and Stochastic Gradient Descent were used as logistic regression attempts to minimize log-loss and SGD for its ability to support supports different loss functions and penalties for classification. We used a Bayesian method so as to show that the features are not independent of each other, as otherwise the naïve Bayes assumption of feature independence would make it also a viable option. This was done to find which classifier performed best given our set of chosen features. Some variables, such as id, app_id, site_id, site_domain, were removed at the start, so that the models do not use these distinct valued attributes to create additional features that are too-specific to an impression. This was also done to not un-necessarilthe dataset. Other variables were removed and tested to see increase the size of which variables best fit our classification. The logistic loss computed provided a fair deal of insight on how good the algorithm was performing on our sub sampled data, with click, the click through rate of the impression being the binary attributed target feature for which probability of classification was calculated. Table I. Comparison of different learning methods with their output log- Learning Method Applied Logistic Loss losses. SGD Classifier Logistic Regression Multinomial Bayes We applied various learning methods to check out which one gave us the best results. As you can see above, Logistic Regression gave us the best output We conducted experiments on them to find out what set of attributes, taken together, gave us the best estimate of the click-through rate. For this we tested our results with iterative reduction and dimension reduction. Table II. Using linear models to see how attribute removal changed the output. Attributes Removed Log-Loss None app_category site_category device_conn_type C1, C14-C C1, app_category, site category C1, device_conn type C. Logistic Regression Logistic regression is a regression model in which the dependent variable is categorical. Logistic regression measures the relationship between the dependent variable and the independent variables by estimating probabilities using a logistic function. The mathematics of logistic regression Track 3 : Hybrid Intelligence - 100

5 begins with the explanation of logistic function. The logistic function is useful because it can take an input with any value from negative to positive infinity, whereas the output always takes values between zero and one and hence is interpretable as a probability. The logistic function is defined as follows: If is viewed as a linear function of an explanatory variable (or of a linear combination of explanatory variables), then we express as follows: And the logistic function can now be written as: Note F(x) is interpreted as the probability of the dependent variable equaling a "success" or "case" rather than a failure or non-case. It's clear that the response variables are not identically distributed: differs from one data point to another, though they are independent given design matrix and shared with parameters. V. RESULTS AND DISCUSSION The experiment resulted in finding out that among the listed linear models (in table 1), Logistic Regression was the best algorithm for finding the click-through rate. Such a result could be possible because logistic regression returns well calibrated predictions as it directly optimizes log-loss. This is because in the gradient descent of logistic regression the logistic regression is trying to minimize the cost, which is represented by equation (2). Hence, in a way logistic will always be giving better log-loss values. While Stochastic Gradient Descent (SGD), did not give a better result than logistic regression but it is an online classifier which does not need to be given all the data at the same time. For logistic regression, we have to feed all the data at the same time and with the amount of data we had in the dataset, we did consider using other algorithms before trying to use logistic for better results. It can be argued that the cost function being minimized in SGD is, the improvement of SGD with the size of data is good. Thus, it is very much possible that if would have supplied the whole of the data set that was available to us, we might just have been able to show that SGD was better than Logistic Regression. Other methods like Naïve Bayes tend to push probabilities to 0 or 1.This is mainly because it makes the assumption that features are conditionally independent given the class, which was not the case in this dataset. Another result that caught our eye was that increasing or decreasing any of the variables did not much contribute in improving the log-loss value. This shows that, as stated in the book [10], a regression model does not imply a cause-andeffect relationship between the independent and the dependent variables. Even though a strong empirical relationship may exist between them, it cannot be considered as evidence that the classifier features and the response are related in a causeand-effect manner. To establish casualty, the relationship between the classifiers and the response must be outside the sample data. VI. CONCLUSION AND FUTURE SCOPE The excellent performance of Logistic Regression in comparison to other models, and the consistency in results shown when using this technique across all sets of features, the recommendation based on our results would be to use logistic regression in a practical situation where once can afford to run batch learning jobs frequently. However, given that the SGD classifier came in as a close second, and given the fact that it is an online learning method, classification can be improved by partially fitting any new data that comes in, SGD is ideal for situations where data is constantly flowing in rather than arriving in batches. For such work flows, Logistic Regression would need one to train the classifier with the entire dataset each time some modification needs to be done. Also, from our dimensionality reduction efforts, it is clear that classification will remain consistent even if one of the key features is not present in the data set. We can look into more robust data preprocessing models and observe how preprocessing in different ways affects our results. We can also look into up and coming dataintensive, parallel programming techniques and GPU based programming to incorporate larger data sets, since at present we were able to use only part of the training data. Other potentially interesting future work would be to observe how variety websites such as aggregation or social media platforms compare to theme specific sites that focus on only one aspect of content, such as sports portals, etc. This constitutes our future work. which is very close to equation (2), but the summation over the terms, gave a better result, than one which was not being summed over. We also know that the performance of SGD improves as the size of data increases exponentially for it. So, REFERENCES [1] emarketer, April 2009 [2] Broder, Josifovski, Introduction to Computational Advertising at Stanford, Lecture Notes, 2009 Track 3 : Hybrid Intelligence - 101

6 [3] Nick Craswell, Onno Zoeter, Michael Taylor, Bill Ramsey, An Experimental Comparison of Click Position-Bias Models. [4] Azin Ashkan, Charles L.A. Clarke, Eugene Agichtein, Qi Guo, Estimating Ad Click-through Rate through Query Intent Analysis. [5] Zhong et. al, Incorporating Post-Click Behaviors into a Click Model. [6] Ye Chen, Tak W. Yan, Position-Normalized Click Prediction in Search Advertising [7] Xu et. al, Improving Quality of Training Data for Learning to Rank Using Click-Through Data. [8] Ben Carterette, Rosie Jones, Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks. [9 M. Young, The Technical Writer s Handbook. Mill Valley, CA: University Science, [10] Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." The Journal of Machine Learning Research 12 (2011): Track 3 : Hybrid Intelligence - 102

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

Predicting Restaurants Rating And Popularity Based On Yelp Dataset

Predicting Restaurants Rating And Popularity Based On Yelp Dataset CS 229 MACHINE LEARNING FINAL PROJECT 1 Predicting Restaurants Rating And Popularity Based On Yelp Dataset Yiwen Guo, ICME, Anran Lu, ICME, and Zeyu Wang, Department of Economics, Stanford University Abstract

More information

Intro Logistic Regression Gradient Descent + SGD

Intro Logistic Regression Gradient Descent + SGD Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 29, 2016 1 Ad Placement

More information

Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective

Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective IWD2016 Dublin, March 2016 Online Advertising Landscape [Introduction to Computational

More information

Machine Learning Models for Sales Time Series Forecasting

Machine Learning Models for Sales Time Series Forecasting Article Machine Learning Models for Sales Time Series Forecasting Bohdan M. Pavlyshenko SoftServe, Inc., Ivan Franko National University of Lviv * Correspondence: bpavl@softserveinc.com, b.pavlyshenko@gmail.com

More information

Accurate Campaign Targeting Using Classification Algorithms

Accurate Campaign Targeting Using Classification Algorithms Accurate Campaign Targeting Using Classification Algorithms Jieming Wei Sharon Zhang Introduction Many organizations prospect for loyal supporters and donors by sending direct mail appeals. This is an

More information

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification

CSE 255 Lecture 3. Data Mining and Predictive Analytics. Supervised learning Classification CSE 255 Lecture 3 Data Mining and Predictive Analytics Supervised learning Classification Last week Last week we started looking at supervised learning problems Last week We studied linear regression,

More information

Unravelling Airbnb Predicting Price for New Listing

Unravelling Airbnb Predicting Price for New Listing Unravelling Airbnb Predicting Price for New Listing Paridhi Choudhary H John Heinz III College Carnegie Mellon University Pittsburgh, PA 15213 paridhic@andrew.cmu.edu Aniket Jain H John Heinz III College

More information

Optimization of Click-Through Rate Prediction in the Yandex Search Engine

Optimization of Click-Through Rate Prediction in the Yandex Search Engine ISSN 5-155, Automatic Documentation and Mathematical Linguistics, 213, Vol. 47, No. 2, pp. 52 58. Allerton Press, Inc., 213. Original Russian Text K.E. Bauman, A.N. Kornetova, V.A. Topinskii, D.A. Khakimova,

More information

PAST research has shown that real-time Twitter data can

PAST research has shown that real-time Twitter data can Algorithmic Trading of Cryptocurrency Based on Twitter Sentiment Analysis Stuart Colianni, Stephanie Rosales, and Michael Signorotti ABSTRACT PAST research has shown that real-time Twitter data can be

More information

Cross-channel measurement and optimization: Targeting mobile app usage to increase desktop brand engagement Gilad Barash, Brian Dalessandro, Claudia

Cross-channel measurement and optimization: Targeting mobile app usage to increase desktop brand engagement Gilad Barash, Brian Dalessandro, Claudia Cross-channel measurement and optimization: Targeting mobile app usage to increase desktop brand engagement Gilad Barash, Brian Dalessandro, Claudia Perlich, Lauren Moores and Troy Raeder ARF Experiential

More information

Convex and Non-Convex Classification of S&P 500 Stocks

Convex and Non-Convex Classification of S&P 500 Stocks Georgia Institute of Technology 4133 Advanced Optimization Convex and Non-Convex Classification of S&P 500 Stocks Matt Faulkner Chris Fu James Moriarty Masud Parvez Mario Wijaya coached by Dr. Guanghui

More information

Using AI to Make Predictions on Stock Market

Using AI to Make Predictions on Stock Market Using AI to Make Predictions on Stock Market Alice Zheng Stanford University Stanford, CA 94305 alicezhy@stanford.edu Jack Jin Stanford University Stanford, CA 94305 jackjin@stanford.edu 1 Introduction

More information

Applications of Machine Learning to Predict Yelp Ratings

Applications of Machine Learning to Predict Yelp Ratings Applications of Machine Learning to Predict Yelp Ratings Kyle Carbon Aeronautics and Astronautics kcarbon@stanford.edu Kacyn Fujii Electrical Engineering khfujii@stanford.edu Prasanth Veerina Computer

More information

Do Ads Compete or Collaborate? Designing Click Models with Full Relationship Incorporated

Do Ads Compete or Collaborate? Designing Click Models with Full Relationship Incorporated Do s Compete or Collaborate? Designing Click Models with Full Relationship Incorporated Xin Xin School of Computer Science Beijing Institute of Technology xxin@bit.edu.cn Michael R. Lyu The Chinese University

More information

OCTOBOARD INTRO. Put your metrics around these practical questions and make sense out of your Facebook Ads Analytics!

OCTOBOARD INTRO. Put your metrics around these practical questions and make sense out of your Facebook Ads Analytics! OCTOBOARD INTRO The answer to all of your questions lies within one word - Data. You need loads and loads of data to be able to spot trends and get to insights on Facebook Advertising and see what works

More information

Rank hotels on Expedia.com to maximize purchases

Rank hotels on Expedia.com to maximize purchases Rank hotels on Expedia.com to maximize purchases Nishith Khantal, Valentina Kroshilina, Deepak Maini December 14, 2013 1 Introduction For an online travel agency (OTA), matching users to hotel inventory

More information

Data Visualization and Improving Accuracy of Attrition Using Stacked Classifier

Data Visualization and Improving Accuracy of Attrition Using Stacked Classifier Data Visualization and Improving Accuracy of Attrition Using Stacked Classifier 1 Deep Sanghavi, 2 Jay Parekh, 3 Shaunak Sompura, 4 Pratik Kanani 1-3 Students, 4 Assistant Professor 1 Information Technology

More information

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Probabilistic Classification Introduction to Logistic regression Binary logistic regression

More information

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Mahsa Naseri and Simone A. Ludwig Abstract In service-oriented environments, services with different functionalities are combined

More information

PRODUCT DESCRIPTIONS AND METRICS

PRODUCT DESCRIPTIONS AND METRICS PRODUCT DESCRIPTIONS AND METRICS Adobe PDM - Adobe Analytics (2015v1) The Products and Services described in this PDM are either On-demand Services or Managed Services (as outlined below) and are governed

More information

Digital Media Mix Optimization Model: A Case Study of a Digital Agency promoting its E-Training Services

Digital Media Mix Optimization Model: A Case Study of a Digital Agency promoting its E-Training Services Available online at: http://euroasiapub.org, pp. 127~137 Thomson Reuters Researcher ID: L-5236-2015 Digital Media Mix Optimization Model: A Case Study of a Digital Agency promoting its E-Training Services

More information

HUMAN RESOURCE PLANNING AND ENGAGEMENT DECISION SUPPORT THROUGH ANALYTICS

HUMAN RESOURCE PLANNING AND ENGAGEMENT DECISION SUPPORT THROUGH ANALYTICS HUMAN RESOURCE PLANNING AND ENGAGEMENT DECISION SUPPORT THROUGH ANALYTICS Janaki Sivasankaran 1, B Thilaka 2 1,2 Department of Applied Mathematics, Sri Venkateswara College of Engineering, (India) ABSTRACT

More information

Classification Model for Intent Mining in Personal Website Based on Support Vector Machine

Classification Model for Intent Mining in Personal Website Based on Support Vector Machine , pp.145-152 http://dx.doi.org/10.14257/ijdta.2016.9.2.16 Classification Model for Intent Mining in Personal Website Based on Support Vector Machine Shuang Zhang, Nianbin Wang School of Computer Science

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.

More information

Predicting Customer Purchase to Improve Bank Marketing Effectiveness

Predicting Customer Purchase to Improve Bank Marketing Effectiveness Business Analytics Using Data Mining (2017 Fall).Fianl Report Predicting Customer Purchase to Improve Bank Marketing Effectiveness Group 6 Sandy Wu Andy Hsu Wei-Zhu Chen Samantha Chien Instructor:Galit

More information

An Implementation of genetic algorithm based feature selection approach over medical datasets

An Implementation of genetic algorithm based feature selection approach over medical datasets An Implementation of genetic algorithm based feature selection approach over medical s Dr. A. Shaik Abdul Khadir #1, K. Mohamed Amanullah #2 #1 Research Department of Computer Science, KhadirMohideen College,

More information

Predicting Purchase Behavior of E-commerce Customer, One-stage or Two-stage?

Predicting Purchase Behavior of E-commerce Customer, One-stage or Two-stage? 2016 International Conference on Artificial Intelligence and Computer Science (AICS 2016) ISBN: 978-1-60595-411-0 Predicting Purchase Behavior of E-commerce Customer, One-stage or Two-stage? Chen CHEN

More information

Predicting Corporate 8-K Content Using Machine Learning Techniques

Predicting Corporate 8-K Content Using Machine Learning Techniques Predicting Corporate 8-K Content Using Machine Learning Techniques Min Ji Lee Graduate School of Business Stanford University Stanford, California 94305 E-mail: minjilee@stanford.edu Hyungjun Lee Department

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/5/18 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine

More information

Cryptocurrency Price Prediction Using News and Social Media Sentiment

Cryptocurrency Price Prediction Using News and Social Media Sentiment Cryptocurrency Price Prediction Using News and Social Media Sentiment Connor Lamon, Eric Nielsen, Eric Redondo Abstract This project analyzes the ability of news and social media data to predict price

More information

P P C G L O S S A R Y PPC GLOSSARY

P P C G L O S S A R Y  PPC GLOSSARY The following is a glossary of terms which you will see as you explore the world of PPC. A ACCELERATED AD DELIVERY A method of ad delivery which endeavours to show an ad as often as possible until the

More information

Data mining and Renewable energy. Cindi Thompson

Data mining and Renewable energy. Cindi Thompson Data mining and Renewable energy Cindi Thompson June 2012 Analytics, Big Data, and Data Science 1 What is Analytics? makes extensive use of data, statistical and quantitative analysis, explanatory and

More information

Using Decision Tree to predict repeat customers

Using Decision Tree to predict repeat customers Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the

More information

A Survey on Recommendation Techniques in E-Commerce

A Survey on Recommendation Techniques in E-Commerce A Survey on Recommendation Techniques in E-Commerce Namitha Ann Regi Post-Graduate Student Department of Computer Science and Engineering Karunya University, India P. Rebecca Sandra Assistant Professor

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Predicting Airbnb Bookings by Country

Predicting Airbnb Bookings by Country Michael Dimitras A12465780 CSE 190 Assignment 2 Predicting Airbnb Bookings by Country 1: Dataset Description For this assignment, I selected the Airbnb New User Bookings set from Kaggle. The dataset is

More information

Global Media Intelligence Report

Global Media Intelligence Report Q3 2013 Neustar Aggregate Knowledge Global Media Intelligence Report TABLE OF CONTENTS THE GLOBAL MEDIA INTELLIGENCE REPORT Where Math Men Meet Mad Men 3 About the Report 3 EXECUTIVE SUMMARY 4 COST INDEX

More information

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.

Airbnb Price Estimation. Hoormazd Rezaei SUNet ID: hoormazd. Project Category: General Machine Learning gitlab.com/hoorir/cs229-project. Airbnb Price Estimation Liubov Nikolenko SUNet ID: liubov Hoormazd Rezaei SUNet ID: hoormazd Pouya Rezazadeh SUNet ID: pouyar Project Category: General Machine Learning gitlab.com/hoorir/cs229-project.git

More information

Predict Commercial Promoted Contents Will Be Clicked By User

Predict Commercial Promoted Contents Will Be Clicked By User Predict Commercial Promoted Contents Will Be Clicked By User Gary(Xinran) Guo garyguo@stanford.edu SUNetID: garyguo Stanford University 1. Introduction As e-commerce, social media grows rapidly, advertisements

More information

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1

A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG 1, Hong-Yu FU 1 International Conference on Management Science and Management Innovation (MSMI 2014) A Study of Financial Distress Prediction based on Discernibility Matrix and ANN Xin-Zhong BAO 1,a,*, Xiu-Zhuan MENG

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Application of Decision Trees in Mining High-Value Credit Card Customers

Application of Decision Trees in Mining High-Value Credit Card Customers Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,

More information

Prediction of Google Local Users Restaurant ratings

Prediction of Google Local Users Restaurant ratings CSE 190 Assignment 2 Report Professor Julian McAuley Page 1 Nov 30, 2015 Prediction of Google Local Users Restaurant ratings Shunxin Lu Muyu Ma Ziran Zhang Xin Chen Abstract Since mobile devices and the

More information

Classifying Search Advertisers. By Lars Hirsch (Sunet ID : lrhirsch) Summary

Classifying Search Advertisers. By Lars Hirsch (Sunet ID : lrhirsch) Summary Classifying Search Advertisers By Lars Hirsch (Sunet ID : lrhirsch) Summary Multinomial Event Model and Softmax Regression were applied to classify search marketing advertisers into industry verticals

More information

Airbnb Capstone: Super Host Analysis

Airbnb Capstone: Super Host Analysis Airbnb Capstone: Super Host Analysis Justin Malunay September 21, 2016 Abstract This report discusses the significance of Airbnb s Super Host Program. Based on Airbnb s open data, I was able to predict

More information

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models.

Salford Predictive Modeler. Powerful machine learning software for developing predictive, descriptive, and analytical models. Powerful machine learning software for developing predictive, descriptive, and analytical models. The Company Minitab helps companies and institutions to spot trends, solve problems and discover valuable

More information

ML Methods for Solving Complex Sorting and Ranking Problems in Human Hiring

ML Methods for Solving Complex Sorting and Ranking Problems in Human Hiring ML Methods for Solving Complex Sorting and Ranking Problems in Human Hiring 1 Kavyashree M Bandekar, 2 Maddala Tejasree, 3 Misba Sultana S N, 4 Nayana G K, 5 Harshavardhana Doddamani 1, 2, 3, 4 Engineering

More information

When to Book: Predicting Flight Pricing

When to Book: Predicting Flight Pricing When to Book: Predicting Flight Pricing Qiqi Ren Stanford University qiqiren@stanford.edu Abstract When is the best time to purchase a flight? Flight prices fluctuate constantly, so purchasing at different

More information

How to Drive. Online Marketing through Web Analytics! Tips for leveraging Web Analytics to achieve the best ROI!

How to Drive. Online Marketing through Web Analytics! Tips for leveraging Web Analytics to achieve the best ROI! How to Drive Online Marketing through Web Analytics! Tips for leveraging Web Analytics to achieve the best ROI! an ebook by - Delhi School Of Internet marketing Table of Content Introduction Chapter1:

More information

Churn Prediction Model Using Linear Discriminant Analysis (LDA)

Churn Prediction Model Using Linear Discriminant Analysis (LDA) IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 5, Ver. IV (Sep. - Oct. 2016), PP 86-93 www.iosrjournals.org Churn Prediction Model Using Linear Discriminant

More information

Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES *

Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES * Online appendix for THE RESPONSE OF CONSUMER SPENDING TO CHANGES IN GASOLINE PRICES * Michael Gelman a, Yuriy Gorodnichenko b,c, Shachar Kariv b, Dmitri Koustas b, Matthew D. Shapiro c,d, Dan Silverman

More information

Bitcoin UTXO Lifespan Prediction

Bitcoin UTXO Lifespan Prediction Bitcoin UTXO Lifespan Prediction Robert Konrad & Stephen Pinto December, 05 Background & Motivation The Bitcoin crypto currency [, ] is the most widely used and highly valued digital currency in existence.

More information

State-of-the-Art Diamond Price Predictions using Neural Networks

State-of-the-Art Diamond Price Predictions using Neural Networks State-of-the-Art Diamond Price Predictions using Neural Networks Charley Yejia Zhang, Sean Oh, Jason Park Abstract In this paper, we discuss and evaluate models to predict the prices of diamonds given

More information

Strength in numbers? Modelling the impact of businesses on each other

Strength in numbers? Modelling the impact of businesses on each other Strength in numbers? Modelling the impact of businesses on each other Amir Abbas Sadeghian amirabs@stanford.edu Hakan Inan inanh@stanford.edu Andres Nötzli noetzli@stanford.edu. INTRODUCTION In many cities,

More information

Predicting and Explaining Price-Spikes in Real-Time Electricity Markets

Predicting and Explaining Price-Spikes in Real-Time Electricity Markets Predicting and Explaining Price-Spikes in Real-Time Electricity Markets Christian Brown #1, Gregory Von Wald #2 # Energy Resources Engineering Department, Stanford University 367 Panama St, Stanford, CA

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 3/8/2015 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 High dim. data Graph data Infinite data Machine

More information

3DCNN for Lung Nodule Detection And False Positive Reduction

3DCNN for Lung Nodule Detection And False Positive Reduction 3DCNN for Lung Nodule Detection And False Positive Reduction Ping An Technology (Shenzhen) Co.,Ltd, China WUTIANBO484@pingan.com.cn Abstract The diagnosis of pulmonary nodules can be roughly divided into

More information

Modeling User Click Behavior in Sponsored Search

Modeling User Click Behavior in Sponsored Search Modeling User Click Behavior in Sponsored Search Vibhanshu Abhishek, Peter S. Fader, Kartik Hosanagar The Wharton School, University of Pennsylvania, Philadelpha, PA 1914, USA {vabhi, faderp, kartikh}@wharton.upenn.edu

More information

IBM SPSS & Apache Spark

IBM SPSS & Apache Spark IBM SPSS & Apache Spark Making Big Data analytics easier and more accessible ramiro.rego@es.ibm.com @foreswearer 1 2016 IBM Corporation Modeler y Spark. Integration Infrastructure overview Spark, Hadoop

More information

What about streaming data?

What about streaming data? What about streaming data? 1 The Stream Model Data enters at a rapid rate from one or more input ports Such data are called stream tuples The system cannot store the entire (infinite) stream Distribution

More information

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 02 Data Mining Process Welcome to the lecture 2 of

More information

Fraud Detection for MCC Manipulation

Fraud Detection for MCC Manipulation 2016 International Conference on Informatics, Management Engineering and Industrial Application (IMEIA 2016) ISBN: 978-1-60595-345-8 Fraud Detection for MCC Manipulation Hong-feng CHAI 1, Xin LIU 2, Yan-jun

More information

NICE Customer Engagement Analytics - Architecture Whitepaper

NICE Customer Engagement Analytics - Architecture Whitepaper NICE Customer Engagement Analytics - Architecture Whitepaper Table of Contents Introduction...3 Data Principles...4 Customer Identities and Event Timelines...................... 4 Data Discovery...5 Data

More information

Keyword Performance Prediction in Paid Search Advertising

Keyword Performance Prediction in Paid Search Advertising Keyword Performance Prediction in Paid Search Advertising Sakthi Ramanathan 1, Lenord Melvix 2 and Shanmathi Rajesh 3 Abstract In this project, we explore search engine advertiser keyword bidding data

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu 2 Classic model of algorithms You get to see the entire input, then compute some function of it In this context,

More information

Profit Optimization ABSTRACT PROBLEM INTRODUCTION

Profit Optimization ABSTRACT PROBLEM INTRODUCTION Profit Optimization Quinn Burzynski, Lydia Frank, Zac Nordstrom, and Jake Wolfe Dr. Song Chen and Dr. Chad Vidden, UW-LaCrosse Mathematics Department ABSTRACT Each branch store of Fastenal is responsible

More information

PROGRAMMATIC DEMYSTIFY DIGITAL. From the Digital Experts: An essential bite-size guide to the acronyms and underpinning of modern digital advertising.

PROGRAMMATIC DEMYSTIFY DIGITAL. From the Digital Experts: An essential bite-size guide to the acronyms and underpinning of modern digital advertising. PROGRAMMATIC DEMYSTIFY DIGITAL From the Digital Experts: An essential bite-size guide to the acronyms and underpinning of modern digital advertising. P O W E R ED B Y CONTENTS 1 - Demystify 2 - We demystify

More information

A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING

A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING A HYBRID MODERN AND CLASSICAL ALGORITHM FOR INDONESIAN ELECTRICITY DEMAND FORECASTING Wahab Musa Department of Electrical Engineering, Universitas Negeri Gorontalo, Kota Gorontalo, Indonesia E-Mail: wmusa@ung.ac.id

More information

APPLYING MACHINE LEARNING IN MOBILE DEVICE AD TARGETING. Leonard Newnham Chief Data Scientist

APPLYING MACHINE LEARNING IN MOBILE DEVICE AD TARGETING. Leonard Newnham Chief Data Scientist APPLYING MACHINE LEARNING IN MOBILE DEVICE AD TARGETING Leonard Newnham Chief Data Scientist Introduction Who is LoopMe? What we do The problem we solve Data Predictive models Bidders Future Research Lessons

More information

Multi-Touch Attribution

Multi-Touch Attribution Multi-Touch Attribution BY DIRK BEYER HEAD OF SCIENCE, MARKETING ANALYTICS NEUSTAR A Guide to Methods, Math and Meaning Introduction Marketers today use multiple marketing channels that generate impression-level

More information

Azure ML Studio. Overview for Data Engineers & Data Scientists

Azure ML Studio. Overview for Data Engineers & Data Scientists Azure ML Studio Overview for Data Engineers & Data Scientists Rakesh Soni, Big Data Practice Director Randi R. Ludwig, Ph.D., Data Scientist Daniel Lai, Data Scientist Intersys Company Summary Overview

More information

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara Customer Relationship Management in marketing programs: A machine learning approach for decision Fernanda Alcantara F.Alcantara@cs.ucl.ac.uk CRM Goal Support the decision taking Personalize the best individual

More information

ENGG1811: Data Analysis using Excel 1

ENGG1811: Data Analysis using Excel 1 ENGG1811 Computing for Engineers Data Analysis using Excel (weeks 2 and 3) Data Analysis Histogram Descriptive Statistics Correlation Solving Equations Matrix Calculations Finding Optimum Solutions Financial

More information

New restaurants fail at a surprisingly

New restaurants fail at a surprisingly Predicting New Restaurant Success and Rating with Yelp Aileen Wang, William Zeng, Jessica Zhang Stanford University aileen15@stanford.edu, wizeng@stanford.edu, jzhang4@stanford.edu December 16, 2016 Abstract

More information

Predicting Yelp Ratings From Business and User Characteristics

Predicting Yelp Ratings From Business and User Characteristics Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online

More information

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration KnowledgeSTUDIO Advanced Modeling for Better Decisions Companies that compete with analytics are looking for advanced analytical technologies that accelerate decision making and identify opportunities

More information

Preference Elicitation for Group Decisions

Preference Elicitation for Group Decisions Preference Elicitation for Group Decisions Lihi Naamani-Dery 1, Inon Golan 2, Meir Kalech 2, and Lior Rokach 1 1 Telekom Innovation Laboratories at Ben-Gurion University, Israel 2 Ben Gurion University,

More information

Classic model of algorithms

Classic model of algorithms Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

Enhanced Cost Sensitive Boosting Network for Software Defect Prediction

Enhanced Cost Sensitive Boosting Network for Software Defect Prediction Enhanced Cost Sensitive Boosting Network for Software Defect Prediction Sreelekshmy. P M.Tech, Department of Computer Science and Engineering, Lourdes Matha College of Science & Technology, Kerala,India

More information

Jeffrey D. Ullman Stanford University/Infolab. Slides mostly developed by Anand Rajaraman

Jeffrey D. Ullman Stanford University/Infolab. Slides mostly developed by Anand Rajaraman Jeffrey D. Ullman Stanford University/Infolab Slides mostly developed by Anand Rajaraman 2 Classic model of (offline) algorithms: You get to see the entire input, then compute some function of it. Online

More information

A Soft Classification Model for Vendor Selection

A Soft Classification Model for Vendor Selection A Soft Classification Model for Vendor Selection Arpan K. Kar, Ashis K. Pani, Bijaya K. Mangaraj, and Supriya K. De Abstract This study proposes a pattern classification model for usage in the vendor selection

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 2015 The MathWorks, Inc. 1 MATLAB 을이용한머신러닝 ( 기본 ) Senior Application Engineer 엄준상과장 2015 The MathWorks, Inc. 2 Machine Learning is Everywhere Solution is too complex for hand written rules or equations

More information

Real Estate Appraisal

Real Estate Appraisal Real Estate Appraisal CS229 Machine Learning Final Project Writeup David Chanin, Ian Christopher, and Roy Fejgin December 10, 2010 Abstract This is our final project for Machine Learning (CS229) during

More information

Data Science in a pricing process

Data Science in a pricing process Data Science in a pricing process Michaël Casalinuovo Consultant, ADDACTIS Software michael.casalinuovo@addactis.com Contents Nowadays, we live in a continuously changing market environment, Pricing has

More information

C3 Products + Services Overview

C3 Products + Services Overview C3 Products + Services Overview AI CLOUD PREDICTIVE ANALYTICS IoT Table of Contents C3 is a Computer Software Company 1 C3 PaaS Products 3 C3 SaaS Products 5 C3 Product Trials 6 C3 Center of Excellence

More information

FINAL PROJECT REPORT IME672. Group Number 6

FINAL PROJECT REPORT IME672. Group Number 6 FINAL PROJECT REPORT IME672 Group Number 6 Ayushya Agarwal 14168 Rishabh Vaish 14553 Rohit Bansal 14564 Abhinav Sharma 14015 Dil Bag Singh 14222 Introduction Cell2Cell, The Churn Game. The cellular telephone

More information

Stock Prediction using Machine Learning

Stock Prediction using Machine Learning Stock Prediction using Machine Learning Yash Omer e-mail: yashomer0007@gmail.com Nitesh Kumar Singh e-mail: nitesh.321.singh@gmail.com Awadhendra Pratap Singh e-mail: apsingh1096@gmail.com Dilshad Ashmir

More information

Model Selection, Evaluation, Diagnosis

Model Selection, Evaluation, Diagnosis Model Selection, Evaluation, Diagnosis INFO-4604, Applied Machine Learning University of Colorado Boulder October 31 November 2, 2017 Prof. Michael Paul Today How do you estimate how well your classifier

More information

RECOGNIZING USER INTENTIONS IN REAL-TIME

RECOGNIZING USER INTENTIONS IN REAL-TIME WHITE PAPER SERIES IPERCEPTIONS ACTIVE RECOGNITION TECHNOLOGY: RECOGNIZING USER INTENTIONS IN REAL-TIME Written by: Lane Cochrane, Vice President of Research at iperceptions Dr Matthew Butler PhD, Senior

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

Case studies in Data Mining & Knowledge Discovery

Case studies in Data Mining & Knowledge Discovery Case studies in Data Mining & Knowledge Discovery Knowledge Discovery is a process Data Mining is just a step of a (potentially) complex sequence of tasks KDD Process Data Mining & Knowledge Discovery

More information

INSIGHTS. Driving Decisions With Data: iquanti s Hybrid Approach to Attribution Modeling. Ajay Rama, Pushpendra Kumar

INSIGHTS. Driving Decisions With Data: iquanti s Hybrid Approach to Attribution Modeling. Ajay Rama, Pushpendra Kumar INSIGHTS Driving Decisions With Data: iquanti s Hybrid Approach to Attribution Modeling Ajay Rama, Pushpendra Kumar TABLE OF CONTENTS Introduction The Marketer s Dilemma 1. Media Mix Modeling (MMM) 1.

More information

Finding Hidden Intelligence with Predictive Analysis of Data Mining

Finding Hidden Intelligence with Predictive Analysis of Data Mining Finding Hidden Intelligence with Predictive Analysis of Data Mining Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd rafal@projectbotticelli.com Objectives Show use of Microsoft SQL Server

More information

NVIDIA AND SAP INDUSTRY CHALLENGES INTEGRATED SOLUTION

NVIDIA AND SAP INDUSTRY CHALLENGES INTEGRATED SOLUTION NVIDIA AND SAP ACCELERATING ENTERPRISE INTELLIGENCE Deep learning is a collection of statistical machine learning techniques that is transforming every digital business. Applications using deep learning

More information

Multi-classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce

Multi-classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce 5 IEEE 5th International Conference on Data Mining Workshops Multi-classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce Qiang Li, Maojie Gu, Keren Zhou and Xiaoming

More information

Getting Started with HLM 5. For Windows

Getting Started with HLM 5. For Windows For Windows Updated: August 2012 Table of Contents Section 1: Overview... 3 1.1 About this Document... 3 1.2 Introduction to HLM... 3 1.3 Accessing HLM... 3 1.4 Getting Help with HLM... 3 Section 2: Accessing

More information

The People-Based Marketing Strategy. Optimize campaign success with humanized data.

The People-Based Marketing Strategy. Optimize campaign success with humanized data. The People-Based Marketing Strategy Optimize campaign success with humanized data. 01 Introducing: People-Based Marketing In an ever-evolving technological world, it s more imperative than ever to adapt

More information