Covariance Matrices & All-pairs Similarity

Size: px
Start display at page:

Download "Covariance Matrices & All-pairs Similarity"

Transcription

1 Matrices & Similarity April 2015, Stanford DAO (Stanford) April / 34

2 Notation for matrix A Given m n matrix A, with m n. a 1,1 a 1,2 a 1,n a 2,1 a 2,2 a 2,n A = a m,1 a m,2 a m,n A is tall and skinny, example values m = 10 12, n = {10 4, 10 6 }. A has sparse rows, each row has at most L nonzeros. A is stored across hundreds of machines and cannot be streamed through a single machine. (Stanford) April / 34

3 Computing A T A We compute A T A. A T A is n n, considerably smaller than A. A T A is dense. Holds dot products between all pairs of columns of A. (Stanford) April / 34

4 Guarantees There is a knob γ which can be turned to preserve similarities and singular values. Paying O(nLγ) communication cost and O(γ) computation cost. With a low setting of γ, preserve similar entries of A T A (via Cosine, Dice, Overlap, and Jaccard ). With a high setting of γ, preserve singular values of A T A. (Stanford) April / 34

5 Computing All Pairs of Cosine Similarities We have to find dot products between all pairs of columns of A We prove results for general matrices, but can do better for those entries with cos(i, j) s Cosine : a widely used definition for " between two vectors cos(i, j) = ct i c j c i c j c i is the i th column of A (Stanford) April / 34

6 Example matrix Rows: users. Columns: movies. (Stanford) April / 34

7 Distributed Computing Environment With such large datasets, we must use many machines. Algorithm code available in and Scalding. Described with Maps and Reduces so that the framework takes care of distributing the computation. (Stanford) April / 34

8 Naive Implementation 1 Given row r i, Map with NaiveMapper (Algorithm 1) 2 Reduce using the NaiveReducer (Algorithm 2) Algorithm 1 NaiveMapper(r i ) for all pairs (a ij, a ik ) in r i do Emit ((j, k) a ij a ik ) end for Algorithm 2 NaiveReducer((i, j), v 1,..., v R ) output c T i c j R i=1 v i (Stanford) April / 34

9 for Very easy analysis 1) Shuffle size: O(mL 2 ) 2) Largest reduce-key: O(m) Both depend on m, the larger dimension, and are intractable for m = 10 12, L = 100. We ll bring both down via clever sampling Assuming column norms are known or estimates available (Stanford) April / 34

10 Dimension Independent Matrix Square using MapReduce Algorithm 3 Mapper(r i ) for all pairs (a ij, a ik ) in r i do With probability min emit ((j, k) a ij a ik ) end for ( 1, γ 1 c j c k ) Algorithm 4 Reducer((i, j), v 1,..., v R ) γ if c i c j > 1 then output b ij 1 R c i c j i=1 v i else output b ij 1 R γ i=1 v i end if (Stanford) April / 34

11 for The algorithm outputs b ij, which is a matrix of cosine similarities, call it B. Four things to prove: 1 Shuffle size: O(nLγ) 2 Largest reduce-key: O(γ) 3 The sampling scheme preserves similarities when γ = Ω(log(n)/s) 4 The sampling scheme preserves singular values when γ = Ω(n/ɛ 2 ) (Stanford) April / 34

12 Shuffle size for Theorem For {0, 1} matrices, the expected shuffle size for Mapper is O(nLγ). Proof. The expected contribution from each pair of columns will constitute the shuffle size: n n i=1 j=i+1 #(c i,c j ) k=1 Pr[Emit(c i, c j )] = n n #(c i, c j )Pr[Emit(c i, c j )] i=1 j=i+1 (Stanford) April / 34

13 Shuffle size for Proof. n n #(c i, c j ) γ #(ci ) #(c j ) i=1 j=i+1 (Stanford) April / 34

14 Shuffle size for Proof. n n #(c i, c j ) γ #(ci ) #(c j ) (by AM-GM) γ 2 i=1 j=i+1 n n 1 #(c i, c j )( #(c i ) + 1 #(c j ) ) i=1 j=i+1 (Stanford) April / 34

15 Shuffle size for Proof. n (by AM-GM) γ 2 n i=1 j=i+1 γ n #(c i, c j ) γ #(ci ) #(c j ) n i=1 j=i+1 n i=1 1 #(c i ) 1 #(c i, c j )( #(c i ) + 1 #(c j ) ) n #(c i, c j ) j=1 (Stanford) April / 34

16 Shuffle size for Proof. n (by AM-GM) γ 2 n i=1 j=i+1 γ n #(c i, c j ) γ #(ci ) #(c j ) n i=1 j=i+1 n i=1 1 #(c i ) 1 #(c i, c j )( #(c i ) + 1 #(c j ) ) n #(c i, c j ) j=1 γ n i=1 1 #(c i ) L#(c i) = γln (Stanford) April / 34

17 Shuffle size for O(nLγ) has no dependence on the dimension m, this is the heart of. Happens because higher magnitude columns are sampled with lower probability: 1 γ c 1 c 2 (Stanford) April / 34

18 Shuffle size for For matrices with real entries, we can still get a bound Let H be the smallest nonzero entry in magnitude, after all entries of A have been scaled to be in [ 1, 1] E.g. for {0, 1} matrices, we have H = 1 Shuffle size is bounded by O(nLγ/H 2 ) (Stanford) April / 34

19 Largest reduce key for Each reduce key receives at most γ values (the oversampling parameter) Immediately get that reduce-key complexity is O(γ) Also independent of dimension m. Happens because high magnitude columns are sampled with lower probability. (Stanford) April / 34

20 Correctness Since higher magnitude columns are sampled with lower probability, are we guaranteed to obtain correct results w.h.p.? Yes. By setting γ correctly. Preserve similarities when γ = Ω(log(n)/s) Preserve singular values when γ = Ω(n/ɛ 2 ) (Stanford) April / 34

21 Correctness Theorem Let A be an m n tall and skinny (m > n) matrix. If γ = Ω(n/ɛ 2 ) and D a diagonal matrix with entries d ii = c i, then the matrix B output by satisfies, with probability at least 1/2. DBD A T A 2 A T A 2 ɛ Relative error guaranteed to be low with constant probability. (Stanford) April / 34

22 Proof Uses Latala s theorem, bounds 2nd and 4th central moments of entries of B. Really need extra power of moments. (Stanford) April / 34

23 Latala s Theorem Theorem (Latala s theorem). Let X be a random matrix whose entries x ij are independent centered random variables with finite fourth moment. Denoting X 2 as the matrix spectral norm, we have E X 2 C[max i j E x 2 ij 1/2 + max j 1/4 ( i E x 2 ij ) 1/2 + i,j E x 4 ij ]. (Stanford) April / 34

24 Proof Prove two things E[(b ij Eb ij ) 2 ] 1 γ (easy) E[(b ij Eb ij ) 4 ] 2 (not easy) γ 2 Details in paper. (Stanford) April / 34

25 Correctness Theorem For any two columns c i and c j having cos(c i, c j ) s, let B be the output of with entries b ij = 1 γ m k=1 X ijk with X ijk as the indicator for the k th coin in the call to Mapper. Now if γ = Ω(α/s), then we have, and ] ( Pr [ c i c j b ij > (1 + δ)[a T e δ A] ij (1 + δ) (1+δ) ] Pr [ c i c j b i,j < (1 δ)[a T A] ij < exp( αδ 2 /2) ) α Relative error guaranteed to be low with high probability. (Stanford) April / 34

26 Correctness Proof. In the paper Uses standard concentration inequality for sums of indicator random variables. Ends up requiring that the oversampling parameter γ be set to γ = log(n 2 )/s = 2 log(n)/s. (Stanford) April / 34

27 Observations helpful when there are some popular columns e.g. The Netflix Matrix (some columns way more popular than others) Power-law columns are effectively neutralized (Stanford) April / 34

28 In practice Forget about theoretical settings for γ Crank up γ until application works Estimates for c i can be used, expectations still hold, but concentration isn t guaranteed If using for singular values, watch for ill-conditioned matrices (Stanford) April / 34

29 Large scale production live at twitter.com (Stanford) April / 34

30 Figure : Y-axis ranges from 0 to 100s of terabytes (Stanford) April / 34

31 Implementation Figure : Widely distributed with as of version 1.2 (Stanford) April / 34

32 Other Similarity Measures Picking out similar columns work for some other measures. Similarity Definition Shuffle Size Reduce-key size #(x,y) Cosine O(nL log(n)/s) O(log(n)/s) #(x) #(y) Jaccard Overlap Dice #(x,y) #(x)+#(y) #(x,y) O((n/s) log(n/s)) O(log(n/s)/s) #(x,y) min(#(x),#(y)) O(nL log(n)/s) O(log(n)/s) 2#(x,y) #(x)+#(y) O(nL log(n)/s) O(log(n)/s) Table : All sizes are independent of m, the dimension. (Stanford) April / 34

33 Locality Sensitive Hashing MinHash from the Locality-Sensitive-Hashing family can have its vanilla implementation greatly improved by. Another set of theorems for shuffle size and correctness in DISCO paper. stanford.edu/~rezab/papers/disco.pdf (Stanford) April / 34

34 Conclusion Consider if you ever need to compute A T A for large sparse A Many more experiments and results in paper at stanford.edu/~rezab (Stanford) April / 34

35 Combiners All bounds are without combining: can only get better with combining For similarities, (without combiners) beats naive with combining outright For singular values, (without combiners) beats naive with combining if the number of machines is large (e.g. 1000) with combining empirically beats naive with combining Difficult to analyze combiners since they happen at many levels. Optimizations break models. with combiners is best of both. (Stanford) April / 34

36 Combiners With k machines shuffle with combiner: O(min(nLγ, kn 2 )) reduce-key with combiner: O(min(γ, k)) Naive shuffle with combiner: O(kn 2 ) Naive reduce-key with combiner: O(k) with combiners is best of both. (Stanford) April / 34

37 1 0.8 DISCO Cosine shuffle size vs accuracy tradeoff DISCO Shuffle / Naive Shuffle avg relative err DISCO Shuffle / Naive Shuffle avg relative err log(p / ε) Figure : As γ = p/ɛ increases, shuffle size increases and error decreases. There is no thresholding for highly similar pairs here. (Stanford) April / 34

A Dynamics for Advertising on Networks. Atefeh Mohammadi Samane Malmir. Spring 1397

A Dynamics for Advertising on Networks. Atefeh Mohammadi Samane Malmir. Spring 1397 A Dynamics for Advertising on Networks Atefeh Mohammadi Samane Malmir Spring 1397 Outline Introduction Related work Contribution Model Theoretical Result Empirical Result Conclusion What is the problem?

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at CS Department, National Tsing-Hua University March 27, 2008 Benefited from

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

A Spectral Approach to Collaborative Ranking

A Spectral Approach to Collaborative Ranking A Spectral Approach to Collaborative Ranking Joachim Giesen Max-Planck Institut für Informatik Stuhlsatzenhausweg 85 D-66123 Saarbrücken jgiesen@mpi-inf.mpg.de Dieter Mitsche Department of Computer Science

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at Caltech CS/IST Lunch Bunch March 4, 2008 Benefited from joint work with Dr.

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at CS Department, National Chiao Tung University March 26, 2008 Benefited from

More information

Automatic Ranking by Extended Binary Classification

Automatic Ranking by Extended Binary Classification Automatic Ranking by Extended Binary Classification Hsuan-Tien Lin Joint work with Ling Li (ALT 06, NIPS 06) Learning Systems Group, California Institute of Technology Talk at Institute of Information

More information

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Yixuan Li 1, Kun He 2, David Bindel 1 and John E. Hopcroft 1 1 Cornell University, USA 2 Huazhong University of Science

More information

Chapter 5 Evaluating Classification & Predictive Performance

Chapter 5 Evaluating Classification & Predictive Performance Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available

More information

Chapter 9 Assignment (due Wednesday, August 9)

Chapter 9 Assignment (due Wednesday, August 9) Math 146, Summer 2017 Instructor Linda C. Stephenson (due Wednesday, August 9) The purpose of the assignment is to find confidence intervals to predict the proportion of a population. The population in

More information

Predicting user rating for Yelp businesses leveraging user similarity

Predicting user rating for Yelp businesses leveraging user similarity Predicting user rating for Yelp businesses leveraging user similarity Kritika Singh kritika@eng.ucsd.edu Abstract Users visit a Yelp business, such as a restaurant, based on its overall rating and often

More information

Predicting Yelp Ratings From Business and User Characteristics

Predicting Yelp Ratings From Business and User Characteristics Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online

More information

From Ordinal Ranking to Binary Classification

From Ordinal Ranking to Binary Classification From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at Dept. of CSIE, National Taiwan University March 21, 2008 Benefited from joint

More information

Segmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches

Segmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches Segmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches Alexander Tuzhilin Stern School of Business New York University (joint work with Tianyi Jiang)

More information

HybridRank: Ranking in the Twitter Hybrid Networks

HybridRank: Ranking in the Twitter Hybrid Networks HybridRank: Ranking in the Twitter Hybrid Networks Jianyu Li Department of Computer Science University of Maryland, College Park jli@cs.umd.edu ABSTRACT User influence in social media may depend on multiple

More information

Data Mining. CS57300 Purdue University. March 27, 2018

Data Mining. CS57300 Purdue University. March 27, 2018 Data Mining CS57300 Purdue University March 27, 2018 1 Recap last class So far we have seen how to: Test a hypothesis in batches (A/B testing) Test multiple hypotheses (Paul the Octopus-style) 2 The New

More information

CS269I: Incentives in Computer Science Lecture #5: Market-Clearing Prices

CS269I: Incentives in Computer Science Lecture #5: Market-Clearing Prices CS269I: Incentives in Computer Science Lecture #5: Market-Clearing Prices Tim Roughgarden October 8, 208 Understanding Your Market Markets come in different flavors. Here are three questions worth asking

More information

On utility of temporal embeddings for skill matching. Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch

On utility of temporal embeddings for skill matching. Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch On utility of temporal embeddings for skill matching Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch Skill Trend Importance 1. Constant evolution of labor market yields differences in importance

More information

Chapter 9: Markov Chain

Chapter 9: Markov Chain Chapter 9: Markov Chain Section 9.1: Transition Matrices In Section 4.4, Bernoulli Trails: The probability of each outcome is independent of the outcome of any previous experiments and the probability

More information

Modeling of competition in revenue management Petr Fiala 1

Modeling of competition in revenue management Petr Fiala 1 Modeling of competition in revenue management Petr Fiala 1 Abstract. Revenue management (RM) is the art and science of predicting consumer behavior and optimizing price and product availability to maximize

More information

Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution

Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution 1290 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 5, OCTOBER 2011 Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution Christina Aperjis, Ramesh Johari, Member, IEEE, and Michael

More information

Predict Commercial Promoted Contents Will Be Clicked By User

Predict Commercial Promoted Contents Will Be Clicked By User Predict Commercial Promoted Contents Will Be Clicked By User Gary(Xinran) Guo garyguo@stanford.edu SUNetID: garyguo Stanford University 1. Introduction As e-commerce, social media grows rapidly, advertisements

More information

Homophily and Influence in Social Networks

Homophily and Influence in Social Networks Homophily and Influence in Social Networks Nicola Barbieri nicolabarbieri1@gmail.com References: Maximizing the Spread of Influence through a Social Network, Kempe et Al 2003 Influence and Correlation

More information

Problem Score Problem Score 1 /12 6 /12 2 /12 7 /12 3 /12 8 /12 4 /12 9 /12 5 /12 10 /12 Total /120

Problem Score Problem Score 1 /12 6 /12 2 /12 7 /12 3 /12 8 /12 4 /12 9 /12 5 /12 10 /12 Total /120 EE103/CME103: Introduction to Matrix Methods October 22 2015 S. Boyd Midterm Exam This is an in-class 80 minute midterm. You may not use any books, notes, or computer programs (e.g., Julia). Throughout

More information

Sequencing Problem. Dr.S.S.Kulkarni

Sequencing Problem. Dr.S.S.Kulkarni Sequencing Problem The selection of an appropriate order for a series of jobs to be done on a finite number of service facilities, in some preassigned order, is called sequencing. The general sequencing

More information

WE consider the general ranking problem, where a computer

WE consider the general ranking problem, where a computer 5140 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock and Tong Zhang Abstract The ranking problem has become increasingly

More information

A Study of Compact Reserve Pricing Languages

A Study of Compact Reserve Pricing Languages Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) A Study of Compact Reserve Pricing Languages MohammadHossein Bateni, Hossein Esfandiari, Vahab Mirrokni, Saeed Seddighin

More information

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania

More information

A Particle Swarm Optimization Algorithm for Multi-depot Vehicle Routing problem with Pickup and Delivery Requests

A Particle Swarm Optimization Algorithm for Multi-depot Vehicle Routing problem with Pickup and Delivery Requests A Particle Swarm Optimization Algorithm for Multi-depot Vehicle Routing problem with Pickup and Delivery Requests Pandhapon Sombuntham and Voratas Kachitvichayanukul Abstract A particle swarm optimization

More information

Use of AHP Method in Efficiency Analysis of Existing Water Treatment Plants

Use of AHP Method in Efficiency Analysis of Existing Water Treatment Plants International Journal of Engineering Research and Development ISSN: 78-067X, Volume 1, Issue 7 (June 01), PP.4-51 Use of AHP Method in Efficiency Analysis of Existing Treatment Plants Madhu M. Tomar 1,

More information

COMP20008 Elements of Data Processing. Data Pre-Processing and Cleaning: Recommender Systems and Missing Values

COMP20008 Elements of Data Processing. Data Pre-Processing and Cleaning: Recommender Systems and Missing Values COMP20008 Elements of Data Processing Data Pre-Processing and Cleaning: Recommender Systems and Missing Values Announcements Week 2 Lab: Unix Thanks for your feedback Answers are available on LMS Work

More information

CHAPTER II SEQUENCING MODELS

CHAPTER II SEQUENCING MODELS CHAPTER II SEQUENCING MODELS The basic models in scheduling due to Johnson (1957) and owing to Maggu & Das (1977) T.P. Singh (1985, 86, 2005, 2006) are explained one by one which form a basis of scheduling

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data

More information

Three New Connections Between Complexity Theory and Algorithmic Game Theory. Tim Roughgarden (Stanford)

Three New Connections Between Complexity Theory and Algorithmic Game Theory. Tim Roughgarden (Stanford) Three New Connections Between Complexity Theory and Algorithmic Game Theory Tim Roughgarden (Stanford) Three New Connections Between Complexity Theory and Algorithmic Game Theory (case studies in applied

More information

R&D, International Sourcing, and the Joint Impact on Firm Performance April / 25

R&D, International Sourcing, and the Joint Impact on Firm Performance April / 25 R&D, International Sourcing, and the Joint Impact on Firm Performance Esther Ann Boler, Andreas Moxnes, and Karen Helene Ulltveit-Moe American Economic Review (2015) Presented by Beatriz González April

More information

COMP 465: Data Mining Recommender Systems

COMP 465: Data Mining Recommender Systems //0 COMP : Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) Customer X Buys Metallica CD Buys Megadeth CD Customer Y Does search on Metallica Recommender system

More information

FIRST FUNDAMENTAL THEOREM OF WELFARE ECONOMICS

FIRST FUNDAMENTAL THEOREM OF WELFARE ECONOMICS FIRST FUNDAMENTAL THEOREM OF WELFARE ECONOMICS SICONG SHEN Abstract. Markets are a basic tool for the allocation of goods in a society. In many societies, markets are the dominant mode of economic exchange.

More information

Haplotype Based Association Tests. Biostatistics 666 Lecture 10

Haplotype Based Association Tests. Biostatistics 666 Lecture 10 Haplotype Based Association Tests Biostatistics 666 Lecture 10 Last Lecture Statistical Haplotyping Methods Clark s greedy algorithm The E-M algorithm Stephens et al. coalescent-based algorithm Hypothesis

More information

Logistics Service Network Design for Time-Critical Delivery

Logistics Service Network Design for Time-Critical Delivery Logistics Service Network Design for Time-Critical Delivery Cynthia Barnhart 1 andsushen 2 1 Massachuestts Institute of Technology, Cambridge, MA 02139 2 FedEx, Memphis, TN 38125 Abstract. Service network

More information

An Ascending Price Auction for Producer-Consumer Economy

An Ascending Price Auction for Producer-Consumer Economy An Ascending Price Auction for Producer-Consumer Economy Debasis Mishra (dmishra@cae.wisc.edu) University of Wisconsin-Madison Rahul Garg (grahul@in.ibm.com) IBM India Research Lab, New Delhi, India, 110016

More information

Prediction of Google Local Users Restaurant ratings

Prediction of Google Local Users Restaurant ratings CSE 190 Assignment 2 Report Professor Julian McAuley Page 1 Nov 30, 2015 Prediction of Google Local Users Restaurant ratings Shunxin Lu Muyu Ma Ziran Zhang Xin Chen Abstract Since mobile devices and the

More information

The Efficient Allocation of Individuals to Positions

The Efficient Allocation of Individuals to Positions The Efficient Allocation of Individuals to Positions by Aanund Hylland and Richard Zeckhauser Presented by Debreu Team: Justina Adamanti, Liz Malm, Yuqing Hu, Krish Ray Hylland and Zeckhauser consider

More information

Predictive Modelling for Customer Targeting A Banking Example

Predictive Modelling for Customer Targeting A Banking Example Predictive Modelling for Customer Targeting A Banking Example Pedro Ecija Serrano 11 September 2017 Customer Targeting What is it? Why should I care? How do I do it? 11 September 2017 2 What Is Customer

More information

Modules to Whole Food Webs

Modules to Whole Food Webs Modules to Whole Food Webs 1 Even Simplified Whole Food Webs are extremely complex. How is this complexity maintained from year to year? 2 Stability of Food Web Modules Before tackling food webs of hundreds

More information

Convex and Non-Convex Classification of S&P 500 Stocks

Convex and Non-Convex Classification of S&P 500 Stocks Georgia Institute of Technology 4133 Advanced Optimization Convex and Non-Convex Classification of S&P 500 Stocks Matt Faulkner Chris Fu James Moriarty Masud Parvez Mario Wijaya coached by Dr. Guanghui

More information

Robust Performance of Complex Network Infrastructures

Robust Performance of Complex Network Infrastructures Robust Performance of Complex Network Infrastructures Agostino Capponi Industrial Engineering and Operations Research Department Columbia University ac3827@columbia.edu GRAPHS/SIMPLEX Workshop Data, Algorithms,

More information

TOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS

TOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS TOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS Gediminas Adomavicius YoungOk Kwon Department of Information and Decision Sciences Carlson School of Management, University

More information

ICS 4U - Team Project. Books, Ratings, Suggestions and More!

ICS 4U - Team Project. Books, Ratings, Suggestions and More! ICS 4U - Team Project Books, Ratings, Suggestions and More! Note to Printed Slides If you are reading this only on the materials provided by the Summer Institute and did not see the actual presentation,

More information

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University

Multiple Regression. Dr. Tom Pierce Department of Psychology Radford University Multiple Regression Dr. Tom Pierce Department of Psychology Radford University In the previous chapter we talked about regression as a technique for using a person s score on one variable to make a best

More information

Cost-Aware Compressive Sensing for Networked Sensing Systems

Cost-Aware Compressive Sensing for Networked Sensing Systems Cost-Aware Compressive Sensing for Networked Sensing Systems Liwen Xu, Xiaohong Hao, Nicholas D. Lane, Xin Liu, Thomas Moscibroda Tsinghua University, Microsoft Research, U.C. Davis ABSTRACT Compressive

More information

Cost-Aware Compressive Sensing for Networked Sensing Systems

Cost-Aware Compressive Sensing for Networked Sensing Systems Cost-Aware Compressive Sensing for Networked Sensing Systems Liwen Xu, Xiaohong Hao, Nicholas D. Lane, Xin Liu, Thomas Moscibroda Tsinghua University, Microsoft Research, U.C. Davis ABSTRACT Compressive

More information

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors You are Who You Know and How You Behave: Attribute Inference via Users Social Friends and Behaviors Neil Zhenqiang Gong ECE Department, Iowa State University neilgong@iastate.edu Bin Liu MSIS Department,

More information

On-Line Restricted Assignment of Temporary Tasks with Unknown Durations

On-Line Restricted Assignment of Temporary Tasks with Unknown Durations On-Line Restricted Assignment of Temporary Tasks with Unknown Durations Amitai Armon Yossi Azar Leah Epstein Oded Regev Keywords: on-line algorithms, competitiveness, temporary tasks Abstract We consider

More information

Lecture 6: Decision Tree, Random Forest, and Boosting

Lecture 6: Decision Tree, Random Forest, and Boosting Lecture 6: Decision Tree, Random Forest, and Boosting Tuo Zhao Schools of ISyE and CSE, Georgia Tech CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Tree? Tuo Zhao Lecture 6: Decision

More information

CSE 473: Artificial Intelligence. Outline

CSE 473: Artificial Intelligence. Outline CSE 473: Artificial Intelligence Expecti-Max Search for Stochastic Games Dan Weld Based on slides from Mausam, Andrey Kolobov, Dan Klein, Stuart Russell, Pieter Abbeel, (best illustrations from ai.berkeley.edu)

More information

A logistic regression model for Semantic Web service matchmaking

A logistic regression model for Semantic Web service matchmaking . BRIEF REPORT. SCIENCE CHINA Information Sciences July 2012 Vol. 55 No. 7: 1715 1720 doi: 10.1007/s11432-012-4591-x A logistic regression model for Semantic Web service matchmaking WEI DengPing 1*, WANG

More information

Influence Maximization on Social Graphs. Yu-Ting Wen

Influence Maximization on Social Graphs. Yu-Ting Wen Influence Maximization on Social Graphs Yu-Ting Wen 05-25-2018 Outline Background Models of influence Linear Threshold Independent Cascade Influence maximization problem Proof of performance bound Compute

More information

Restaurant Recommendation for Facebook Users

Restaurant Recommendation for Facebook Users Restaurant Recommendation for Facebook Users Qiaosha Han Vivian Lin Wenqing Dai Computer Science Computer Science Computer Science Stanford University Stanford University Stanford University qiaoshah@stanford.edu

More information

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17

Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17 Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic

More information

3 Ways to Improve Your Targeted Marketing with Analytics

3 Ways to Improve Your Targeted Marketing with Analytics 3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers

More information

THE internet has changed different aspects of our lives, job

THE internet has changed different aspects of our lives, job 1 Recommender Systems for IT Recruitment João Almeida and Luís Custódio Abstract Recruitment processes have increasingly become dependent on the internet. Companies post job opportunities on their websites

More information

Modelling Load Retrievals in Puzzle-based Storage Systems

Modelling Load Retrievals in Puzzle-based Storage Systems Modelling Load Retrievals in Puzzle-based Storage Systems Masoud Mirzaei a, Nima Zaerpour b, René de Koster a a Rotterdam School of Management, Erasmus University, the Netherlands b Faculty of Economics

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

UNED: Evaluating Text Similarity Measures without Human Assessments

UNED: Evaluating Text Similarity Measures without Human Assessments UNED: Evaluating Text Similarity Measures without Human Assessments Enrique Amigó Julio Gonzalo Jesús Giménez Felisa Verdejo UNED, Madrid {enrique,julio,felisa}@lsi.uned.es Google, Dublin jesgim@gmail.com

More information

CMSC 474, Introduction to Game Theory Analyzing Normal-Form Games

CMSC 474, Introduction to Game Theory Analyzing Normal-Form Games CMSC 474, Introduction to Game Theory Analyzing Normal-Form Games Mohammad T. Hajiaghayi University of Maryland Some Comments about Normal-Form Games Only two kinds of strategies in the normal-form game

More information

The Core Pareto Optimality and Social Welfare Maximizationty

The Core Pareto Optimality and Social Welfare Maximizationty The Core Pareto Optimality and Social Welfare Maximizationty Econ 2100 Fall 2017 Lecture 14, October 19 Outline 1 The Core 2 Core, indiviual rationality, and Pareto effi ciency 3 Social welfare function

More information

TRANSPORTATION PROBLEM AND VARIANTS

TRANSPORTATION PROBLEM AND VARIANTS TRANSPORTATION PROBLEM AND VARIANTS Introduction to Lecture T: Welcome to the next exercise. I hope you enjoyed the previous exercise. S: Sure I did. It is good to learn new concepts. I am beginning to

More information

A MATHEMATICAL MODEL OF PAY-FOR- PERFORMANCE FOR A HIGHER EDUCATION INSTITUTION

A MATHEMATICAL MODEL OF PAY-FOR- PERFORMANCE FOR A HIGHER EDUCATION INSTITUTION Page 13 A MATHEMATICAL MODEL OF PAY-FOR- PERFORMANCE FOR A HIGHER EDUCATION INSTITUTION Matthew Hathorn, Weill Cornell Medical College John Hathorn, Metropolitan State College of Denver Lesley Hathorn,

More information

CSCI 3210: Computational Game Theory. Inefficiency of Equilibria & Routing Games Ref: Ch 17, 18 [AGT]

CSCI 3210: Computational Game Theory. Inefficiency of Equilibria & Routing Games Ref: Ch 17, 18 [AGT] CSCI 3210: Computational Game Theory Inefficiency of Equilibria & Routing Games Ref: Ch 17, 18 [AGT] Mohammad T. Irfan Email: mirfan@bowdoin.edu Web: www.bowdoin.edu/~mirfan Split or steal game u NE outcome

More information

Weka Evaluation: Assessing the performance

Weka Evaluation: Assessing the performance Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives

More information

Dynamic Programming Algorithms

Dynamic Programming Algorithms Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow

CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow haoyeec@stanford.edu ac1408@stanford.edu Problem Statement It is often said that stock prices are determined

More information

(32) where for are sub-matrices and, for are sub-vectors. Such a system of equations can be solved by solving the following sub-problems

(32) where for are sub-matrices and, for are sub-vectors. Such a system of equations can be solved by solving the following sub-problems Module 4 : Solving Linear Algebraic Equations Section 4 : Direct Methods for Solving Sparse Linear Systems 4 Direct Methods for Solving Sparse Linear Systems A system of Linear equations --------(32) is

More information

Introduction to Recommendation Engines

Introduction to Recommendation Engines Introduction to Recommendation Engines A guide to algorithmically predicting what your customers want and when. By Tuck Ngun, PhD Introduction Recommendation engines have become a popular solution for

More information

ON QUADRATIC CORE PROJECTION PAYMENT RULES FOR COMBINATORIAL AUCTIONS YU SU THESIS

ON QUADRATIC CORE PROJECTION PAYMENT RULES FOR COMBINATORIAL AUCTIONS YU SU THESIS ON QUADRATIC CORE PROJECTION PAYMENT RULES FOR COMBINATORIAL AUCTIONS BY YU SU THESIS Submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Electrical and Computer

More information

Cartelization of Cross-Holdings

Cartelization of Cross-Holdings Cartelization of Cross-Holdings Cheng-Zhong Qin, Shengping Zhang and Dandan Zhu May 25, 2012 Abstract Cross-holdings between firms cause their profits to be mutually dependent. This paper analyzes a model

More information

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014

ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each

More information

SUPPLEMENT TO THE VALUE OF FREE WATER: ANALYZING SOUTH AFRICA S FREE BASIC WATER POLICY (Econometrica, Vol. 83, No. 5, September 2015, )

SUPPLEMENT TO THE VALUE OF FREE WATER: ANALYZING SOUTH AFRICA S FREE BASIC WATER POLICY (Econometrica, Vol. 83, No. 5, September 2015, ) Econometrica Supplementary Material SUPPLEMENT TO THE VALUE OF FREE WATER: ANALYZING SOUTH AFRICA S FREE BASIC WATER POLICY (Econometrica, Vol. 83, No. 5, September 2015, 1913 1961) BY ANDREA SZABÓ This

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Combination Chemotherapy: DESIGNING CLINICAL RESEARCH Use of a combination of two therapeutics, especially drug-drug combination (called combination

More information

Submitted to the Grand Rapids Community College Provost and AGC Executive Committee.

Submitted to the Grand Rapids Community College Provost and AGC Executive Committee. Oscar Neal Fall 2012 Sabbatical Report Submitted to the Grand Rapids Community College Provost and AGC Executive Committee. The main objective of my sabbatical was to complete my dissertation proposal

More information

On-line Multi-threaded Scheduling

On-line Multi-threaded Scheduling Departamento de Computación Facultad de Ciencias Exactas y Naturales Universidad de Buenos Aires Tesis de Licenciatura On-line Multi-threaded Scheduling por Marcelo Mydlarz L.U.: 290/93 Director Dr. Esteban

More information

Accuracy-based identification of local RNA elements

Accuracy-based identification of local RNA elements Media Engineering and Technology Faculty German University in Cairo Accuracy-based identification of local RNA elements Bachelor Thesis Author: Supervisors: Essam A. Moaty A. Hady Prof. Dr. Rolf Backofen

More information

Price of anarchy in auctions & the smoothness framework. Faidra Monachou Algorithmic Game Theory 2016 CoReLab, NTUA

Price of anarchy in auctions & the smoothness framework. Faidra Monachou Algorithmic Game Theory 2016 CoReLab, NTUA Price of anarchy in auctions & the smoothness framework Faidra Monachou Algorithmic Game Theory 2016 CoReLab, NTUA Introduction: The price of anarchy in auctions COMPLETE INFORMATION GAMES Example: Chicken

More information

Collusive Behavior in a Network Constrained Market

Collusive Behavior in a Network Constrained Market Collusive Behavior in a Network Constrained Market Karla Atkins Jiangzhuo Chen V.S. Anil Kumar Matthew Macauley Achla Marathe 1 Introduction Market power is the ability of a firm to raise the of a product

More information

Priority-Driven Scheduling of Periodic Tasks. Why Focus on Uniprocessor Scheduling?

Priority-Driven Scheduling of Periodic Tasks. Why Focus on Uniprocessor Scheduling? Priority-Driven Scheduling of Periodic asks Priority-driven vs. clock-driven scheduling: clock-driven: cyclic schedule executive processor tasks a priori! priority-driven: priority queue processor tasks

More information

Supply Chain Supernetworks and Environmental Criteria

Supply Chain Supernetworks and Environmental Criteria Supply Chain Supernetworks and Environmental Criteria Anna Nagurney and Fuminori Toyasaki Department of Finance and Operations Management Isenberg School of Management University of Massachusetts Amherst,

More information

Pricing in Dynamic Advance Reservation Games

Pricing in Dynamic Advance Reservation Games Pricing in Dynamic Advance Reservation Games Eran Simhon, Carrie Cramer, Zachary Lister and David Starobinski College of Engineering, Boston University, Boston, MA 02215 Abstract We analyze the dynamics

More information

LECTURE 13 THE NEOCLASSICAL OR WALRASIAN EQUILIBRIUM INTRODUCTION

LECTURE 13 THE NEOCLASSICAL OR WALRASIAN EQUILIBRIUM INTRODUCTION LECTURE 13 THE NEOCLASSICAL OR WALRASIAN EQUILIBRIUM INTRODUCTION JACOB T. SCHWARTZ EDITED BY KENNETH R. DRIESSEL Abstract. Our model is like that of Arrow and Debreu, but with linear Leontief-like features,

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis

More information

Opinion dynamics: The effect of the number of peers met at once

Opinion dynamics: The effect of the number of peers met at once Diemo Urbig, Jan Lorenz, Heiko Herzberg (28) Opinion dynamics: The effect of the number of peers met at once Journal of Artificial Societies and Social Simulation vol., no. 2 4

More information

Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy

Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy Journal of Information & Computational Science 10:14 (013) 4495 4504 September 0, 013 Available at http://www.joics.com Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy

More information

DFT based DNA Splicing Algorithms for Prediction of Protein Coding Regions

DFT based DNA Splicing Algorithms for Prediction of Protein Coding Regions DFT based DNA Splicing Algorithms for Prediction of Protein Coding Regions Suprakash Datta, Amir Asif Department of Computer Science and Engineering York University, Toronto, Canada datta, asif @cs.yorku.ca

More information

Worker Skill Estimation from Crowdsourced Mutual Assessments

Worker Skill Estimation from Crowdsourced Mutual Assessments Worker Skill Estimation from Crowdsourced Mutual Assessments Shuwei Qiang The George Washington University Amrinder Arora BizMerlin Current approaches for estimating skill levels of workforce either do

More information

Analytic Preliminaries for Social Acceptability of Legal Norms &

Analytic Preliminaries for Social Acceptability of Legal Norms & Analytic Preliminaries for Social Acceptability of John Asker February 7, 2013 Yale Law School Roadmap Build on case studies of environments where extra-legal norms are at least as important as formal

More information

Using the FLQ formula in estimating interregional output multipliers

Using the FLQ formula in estimating interregional output multipliers Using the FLQ formula in estimating interregional output multipliers 9. Input-Output Workshop, Bremen 15.03.2018 M. Jahn 1, T. Tohmo 2, A.T. Flegg 3 1 Hamburg Institute of International Economics, 2 University

More information

THE NORMAL CURVE AND SAMPLES:

THE NORMAL CURVE AND SAMPLES: -69- &KDSWHU THE NORMAL CURVE AND SAMPLES: SAMPLING DISTRIBUTIONS A picture of an ideal normal distribution is shown below. The horizontal axis is calibrated in z-scores in terms of standard deviation

More information

Single-cell sequencing

Single-cell sequencing Single-cell sequencing Harri Lähdesmäki Department of Computer Science Aalto University December 5, 2017 Contents Background & Motivation Single cell sequencing technologies Single cell sequencing data

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Web advertising We discussed how to match advertisers to queries in real- time But we did not discuss how to

More information