Covariance Matrices & All-pairs Similarity
|
|
- Nathan Sanders
- 5 years ago
- Views:
Transcription
1 Matrices & Similarity April 2015, Stanford DAO (Stanford) April / 34
2 Notation for matrix A Given m n matrix A, with m n. a 1,1 a 1,2 a 1,n a 2,1 a 2,2 a 2,n A = a m,1 a m,2 a m,n A is tall and skinny, example values m = 10 12, n = {10 4, 10 6 }. A has sparse rows, each row has at most L nonzeros. A is stored across hundreds of machines and cannot be streamed through a single machine. (Stanford) April / 34
3 Computing A T A We compute A T A. A T A is n n, considerably smaller than A. A T A is dense. Holds dot products between all pairs of columns of A. (Stanford) April / 34
4 Guarantees There is a knob γ which can be turned to preserve similarities and singular values. Paying O(nLγ) communication cost and O(γ) computation cost. With a low setting of γ, preserve similar entries of A T A (via Cosine, Dice, Overlap, and Jaccard ). With a high setting of γ, preserve singular values of A T A. (Stanford) April / 34
5 Computing All Pairs of Cosine Similarities We have to find dot products between all pairs of columns of A We prove results for general matrices, but can do better for those entries with cos(i, j) s Cosine : a widely used definition for " between two vectors cos(i, j) = ct i c j c i c j c i is the i th column of A (Stanford) April / 34
6 Example matrix Rows: users. Columns: movies. (Stanford) April / 34
7 Distributed Computing Environment With such large datasets, we must use many machines. Algorithm code available in and Scalding. Described with Maps and Reduces so that the framework takes care of distributing the computation. (Stanford) April / 34
8 Naive Implementation 1 Given row r i, Map with NaiveMapper (Algorithm 1) 2 Reduce using the NaiveReducer (Algorithm 2) Algorithm 1 NaiveMapper(r i ) for all pairs (a ij, a ik ) in r i do Emit ((j, k) a ij a ik ) end for Algorithm 2 NaiveReducer((i, j), v 1,..., v R ) output c T i c j R i=1 v i (Stanford) April / 34
9 for Very easy analysis 1) Shuffle size: O(mL 2 ) 2) Largest reduce-key: O(m) Both depend on m, the larger dimension, and are intractable for m = 10 12, L = 100. We ll bring both down via clever sampling Assuming column norms are known or estimates available (Stanford) April / 34
10 Dimension Independent Matrix Square using MapReduce Algorithm 3 Mapper(r i ) for all pairs (a ij, a ik ) in r i do With probability min emit ((j, k) a ij a ik ) end for ( 1, γ 1 c j c k ) Algorithm 4 Reducer((i, j), v 1,..., v R ) γ if c i c j > 1 then output b ij 1 R c i c j i=1 v i else output b ij 1 R γ i=1 v i end if (Stanford) April / 34
11 for The algorithm outputs b ij, which is a matrix of cosine similarities, call it B. Four things to prove: 1 Shuffle size: O(nLγ) 2 Largest reduce-key: O(γ) 3 The sampling scheme preserves similarities when γ = Ω(log(n)/s) 4 The sampling scheme preserves singular values when γ = Ω(n/ɛ 2 ) (Stanford) April / 34
12 Shuffle size for Theorem For {0, 1} matrices, the expected shuffle size for Mapper is O(nLγ). Proof. The expected contribution from each pair of columns will constitute the shuffle size: n n i=1 j=i+1 #(c i,c j ) k=1 Pr[Emit(c i, c j )] = n n #(c i, c j )Pr[Emit(c i, c j )] i=1 j=i+1 (Stanford) April / 34
13 Shuffle size for Proof. n n #(c i, c j ) γ #(ci ) #(c j ) i=1 j=i+1 (Stanford) April / 34
14 Shuffle size for Proof. n n #(c i, c j ) γ #(ci ) #(c j ) (by AM-GM) γ 2 i=1 j=i+1 n n 1 #(c i, c j )( #(c i ) + 1 #(c j ) ) i=1 j=i+1 (Stanford) April / 34
15 Shuffle size for Proof. n (by AM-GM) γ 2 n i=1 j=i+1 γ n #(c i, c j ) γ #(ci ) #(c j ) n i=1 j=i+1 n i=1 1 #(c i ) 1 #(c i, c j )( #(c i ) + 1 #(c j ) ) n #(c i, c j ) j=1 (Stanford) April / 34
16 Shuffle size for Proof. n (by AM-GM) γ 2 n i=1 j=i+1 γ n #(c i, c j ) γ #(ci ) #(c j ) n i=1 j=i+1 n i=1 1 #(c i ) 1 #(c i, c j )( #(c i ) + 1 #(c j ) ) n #(c i, c j ) j=1 γ n i=1 1 #(c i ) L#(c i) = γln (Stanford) April / 34
17 Shuffle size for O(nLγ) has no dependence on the dimension m, this is the heart of. Happens because higher magnitude columns are sampled with lower probability: 1 γ c 1 c 2 (Stanford) April / 34
18 Shuffle size for For matrices with real entries, we can still get a bound Let H be the smallest nonzero entry in magnitude, after all entries of A have been scaled to be in [ 1, 1] E.g. for {0, 1} matrices, we have H = 1 Shuffle size is bounded by O(nLγ/H 2 ) (Stanford) April / 34
19 Largest reduce key for Each reduce key receives at most γ values (the oversampling parameter) Immediately get that reduce-key complexity is O(γ) Also independent of dimension m. Happens because high magnitude columns are sampled with lower probability. (Stanford) April / 34
20 Correctness Since higher magnitude columns are sampled with lower probability, are we guaranteed to obtain correct results w.h.p.? Yes. By setting γ correctly. Preserve similarities when γ = Ω(log(n)/s) Preserve singular values when γ = Ω(n/ɛ 2 ) (Stanford) April / 34
21 Correctness Theorem Let A be an m n tall and skinny (m > n) matrix. If γ = Ω(n/ɛ 2 ) and D a diagonal matrix with entries d ii = c i, then the matrix B output by satisfies, with probability at least 1/2. DBD A T A 2 A T A 2 ɛ Relative error guaranteed to be low with constant probability. (Stanford) April / 34
22 Proof Uses Latala s theorem, bounds 2nd and 4th central moments of entries of B. Really need extra power of moments. (Stanford) April / 34
23 Latala s Theorem Theorem (Latala s theorem). Let X be a random matrix whose entries x ij are independent centered random variables with finite fourth moment. Denoting X 2 as the matrix spectral norm, we have E X 2 C[max i j E x 2 ij 1/2 + max j 1/4 ( i E x 2 ij ) 1/2 + i,j E x 4 ij ]. (Stanford) April / 34
24 Proof Prove two things E[(b ij Eb ij ) 2 ] 1 γ (easy) E[(b ij Eb ij ) 4 ] 2 (not easy) γ 2 Details in paper. (Stanford) April / 34
25 Correctness Theorem For any two columns c i and c j having cos(c i, c j ) s, let B be the output of with entries b ij = 1 γ m k=1 X ijk with X ijk as the indicator for the k th coin in the call to Mapper. Now if γ = Ω(α/s), then we have, and ] ( Pr [ c i c j b ij > (1 + δ)[a T e δ A] ij (1 + δ) (1+δ) ] Pr [ c i c j b i,j < (1 δ)[a T A] ij < exp( αδ 2 /2) ) α Relative error guaranteed to be low with high probability. (Stanford) April / 34
26 Correctness Proof. In the paper Uses standard concentration inequality for sums of indicator random variables. Ends up requiring that the oversampling parameter γ be set to γ = log(n 2 )/s = 2 log(n)/s. (Stanford) April / 34
27 Observations helpful when there are some popular columns e.g. The Netflix Matrix (some columns way more popular than others) Power-law columns are effectively neutralized (Stanford) April / 34
28 In practice Forget about theoretical settings for γ Crank up γ until application works Estimates for c i can be used, expectations still hold, but concentration isn t guaranteed If using for singular values, watch for ill-conditioned matrices (Stanford) April / 34
29 Large scale production live at twitter.com (Stanford) April / 34
30 Figure : Y-axis ranges from 0 to 100s of terabytes (Stanford) April / 34
31 Implementation Figure : Widely distributed with as of version 1.2 (Stanford) April / 34
32 Other Similarity Measures Picking out similar columns work for some other measures. Similarity Definition Shuffle Size Reduce-key size #(x,y) Cosine O(nL log(n)/s) O(log(n)/s) #(x) #(y) Jaccard Overlap Dice #(x,y) #(x)+#(y) #(x,y) O((n/s) log(n/s)) O(log(n/s)/s) #(x,y) min(#(x),#(y)) O(nL log(n)/s) O(log(n)/s) 2#(x,y) #(x)+#(y) O(nL log(n)/s) O(log(n)/s) Table : All sizes are independent of m, the dimension. (Stanford) April / 34
33 Locality Sensitive Hashing MinHash from the Locality-Sensitive-Hashing family can have its vanilla implementation greatly improved by. Another set of theorems for shuffle size and correctness in DISCO paper. stanford.edu/~rezab/papers/disco.pdf (Stanford) April / 34
34 Conclusion Consider if you ever need to compute A T A for large sparse A Many more experiments and results in paper at stanford.edu/~rezab (Stanford) April / 34
35 Combiners All bounds are without combining: can only get better with combining For similarities, (without combiners) beats naive with combining outright For singular values, (without combiners) beats naive with combining if the number of machines is large (e.g. 1000) with combining empirically beats naive with combining Difficult to analyze combiners since they happen at many levels. Optimizations break models. with combiners is best of both. (Stanford) April / 34
36 Combiners With k machines shuffle with combiner: O(min(nLγ, kn 2 )) reduce-key with combiner: O(min(γ, k)) Naive shuffle with combiner: O(kn 2 ) Naive reduce-key with combiner: O(k) with combiners is best of both. (Stanford) April / 34
37 1 0.8 DISCO Cosine shuffle size vs accuracy tradeoff DISCO Shuffle / Naive Shuffle avg relative err DISCO Shuffle / Naive Shuffle avg relative err log(p / ε) Figure : As γ = p/ɛ increases, shuffle size increases and error decreases. There is no thresholding for highly similar pairs here. (Stanford) April / 34
A Dynamics for Advertising on Networks. Atefeh Mohammadi Samane Malmir. Spring 1397
A Dynamics for Advertising on Networks Atefeh Mohammadi Samane Malmir Spring 1397 Outline Introduction Related work Contribution Model Theoretical Result Empirical Result Conclusion What is the problem?
More informationFrom Ordinal Ranking to Binary Classification
From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at CS Department, National Tsing-Hua University March 27, 2008 Benefited from
More informationExploring Similarities of Conserved Domains/Motifs
Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;
More informationA Spectral Approach to Collaborative Ranking
A Spectral Approach to Collaborative Ranking Joachim Giesen Max-Planck Institut für Informatik Stuhlsatzenhausweg 85 D-66123 Saarbrücken jgiesen@mpi-inf.mpg.de Dieter Mitsche Department of Computer Science
More informationFrom Ordinal Ranking to Binary Classification
From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at Caltech CS/IST Lunch Bunch March 4, 2008 Benefited from joint work with Dr.
More informationFrom Ordinal Ranking to Binary Classification
From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at CS Department, National Chiao Tung University March 26, 2008 Benefited from
More informationAutomatic Ranking by Extended Binary Classification
Automatic Ranking by Extended Binary Classification Hsuan-Tien Lin Joint work with Ling Li (ALT 06, NIPS 06) Learning Systems Group, California Institute of Technology Talk at Institute of Information
More informationUncovering the Small Community Structure in Large Networks: A Local Spectral Approach
Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Yixuan Li 1, Kun He 2, David Bindel 1 and John E. Hopcroft 1 1 Cornell University, USA 2 Huazhong University of Science
More informationChapter 5 Evaluating Classification & Predictive Performance
Chapter 5 Evaluating Classification & Predictive Performance Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Why Evaluate? Multiple methods are available
More informationChapter 9 Assignment (due Wednesday, August 9)
Math 146, Summer 2017 Instructor Linda C. Stephenson (due Wednesday, August 9) The purpose of the assignment is to find confidence intervals to predict the proportion of a population. The population in
More informationPredicting user rating for Yelp businesses leveraging user similarity
Predicting user rating for Yelp businesses leveraging user similarity Kritika Singh kritika@eng.ucsd.edu Abstract Users visit a Yelp business, such as a restaurant, based on its overall rating and often
More informationPredicting Yelp Ratings From Business and User Characteristics
Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online
More informationFrom Ordinal Ranking to Binary Classification
From Ordinal Ranking to Binary Classification Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk at Dept. of CSIE, National Taiwan University March 21, 2008 Benefited from joint
More informationSegmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches
Segmenting Customer Bases in Personalization Applications Using Direct Grouping and Micro-Targeting Approaches Alexander Tuzhilin Stern School of Business New York University (joint work with Tianyi Jiang)
More informationHybridRank: Ranking in the Twitter Hybrid Networks
HybridRank: Ranking in the Twitter Hybrid Networks Jianyu Li Department of Computer Science University of Maryland, College Park jli@cs.umd.edu ABSTRACT User influence in social media may depend on multiple
More informationData Mining. CS57300 Purdue University. March 27, 2018
Data Mining CS57300 Purdue University March 27, 2018 1 Recap last class So far we have seen how to: Test a hypothesis in batches (A/B testing) Test multiple hypotheses (Paul the Octopus-style) 2 The New
More informationCS269I: Incentives in Computer Science Lecture #5: Market-Clearing Prices
CS269I: Incentives in Computer Science Lecture #5: Market-Clearing Prices Tim Roughgarden October 8, 208 Understanding Your Market Markets come in different flavors. Here are three questions worth asking
More informationOn utility of temporal embeddings for skill matching. Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch
On utility of temporal embeddings for skill matching Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch Skill Trend Importance 1. Constant evolution of labor market yields differences in importance
More informationChapter 9: Markov Chain
Chapter 9: Markov Chain Section 9.1: Transition Matrices In Section 4.4, Bernoulli Trails: The probability of each outcome is independent of the outcome of any previous experiments and the probability
More informationModeling of competition in revenue management Petr Fiala 1
Modeling of competition in revenue management Petr Fiala 1 Abstract. Revenue management (RM) is the art and science of predicting consumer behavior and optimizing price and product availability to maximize
More informationBilateral and Multilateral Exchanges for Peer-Assisted Content Distribution
1290 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 5, OCTOBER 2011 Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution Christina Aperjis, Ramesh Johari, Member, IEEE, and Michael
More informationPredict Commercial Promoted Contents Will Be Clicked By User
Predict Commercial Promoted Contents Will Be Clicked By User Gary(Xinran) Guo garyguo@stanford.edu SUNetID: garyguo Stanford University 1. Introduction As e-commerce, social media grows rapidly, advertisements
More informationHomophily and Influence in Social Networks
Homophily and Influence in Social Networks Nicola Barbieri nicolabarbieri1@gmail.com References: Maximizing the Spread of Influence through a Social Network, Kempe et Al 2003 Influence and Correlation
More informationProblem Score Problem Score 1 /12 6 /12 2 /12 7 /12 3 /12 8 /12 4 /12 9 /12 5 /12 10 /12 Total /120
EE103/CME103: Introduction to Matrix Methods October 22 2015 S. Boyd Midterm Exam This is an in-class 80 minute midterm. You may not use any books, notes, or computer programs (e.g., Julia). Throughout
More informationSequencing Problem. Dr.S.S.Kulkarni
Sequencing Problem The selection of an appropriate order for a series of jobs to be done on a finite number of service facilities, in some preassigned order, is called sequencing. The general sequencing
More informationWE consider the general ranking problem, where a computer
5140 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock and Tong Zhang Abstract The ranking problem has become increasingly
More informationA Study of Compact Reserve Pricing Languages
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) A Study of Compact Reserve Pricing Languages MohammadHossein Bateni, Hossein Esfandiari, Vahab Mirrokni, Saeed Seddighin
More informationNear-Balanced Incomplete Block Designs with An Application to Poster Competitions
Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania
More informationA Particle Swarm Optimization Algorithm for Multi-depot Vehicle Routing problem with Pickup and Delivery Requests
A Particle Swarm Optimization Algorithm for Multi-depot Vehicle Routing problem with Pickup and Delivery Requests Pandhapon Sombuntham and Voratas Kachitvichayanukul Abstract A particle swarm optimization
More informationUse of AHP Method in Efficiency Analysis of Existing Water Treatment Plants
International Journal of Engineering Research and Development ISSN: 78-067X, Volume 1, Issue 7 (June 01), PP.4-51 Use of AHP Method in Efficiency Analysis of Existing Treatment Plants Madhu M. Tomar 1,
More informationCOMP20008 Elements of Data Processing. Data Pre-Processing and Cleaning: Recommender Systems and Missing Values
COMP20008 Elements of Data Processing Data Pre-Processing and Cleaning: Recommender Systems and Missing Values Announcements Week 2 Lab: Unix Thanks for your feedback Answers are available on LMS Work
More informationCHAPTER II SEQUENCING MODELS
CHAPTER II SEQUENCING MODELS The basic models in scheduling due to Johnson (1957) and owing to Maggu & Das (1977) T.P. Singh (1985, 86, 2005, 2006) are explained one by one which form a basis of scheduling
More informationBioinformatics for Biologists
Bioinformatics for Biologists Microarray Data Analysis: Lecture 2. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data
More informationThree New Connections Between Complexity Theory and Algorithmic Game Theory. Tim Roughgarden (Stanford)
Three New Connections Between Complexity Theory and Algorithmic Game Theory Tim Roughgarden (Stanford) Three New Connections Between Complexity Theory and Algorithmic Game Theory (case studies in applied
More informationR&D, International Sourcing, and the Joint Impact on Firm Performance April / 25
R&D, International Sourcing, and the Joint Impact on Firm Performance Esther Ann Boler, Andreas Moxnes, and Karen Helene Ulltveit-Moe American Economic Review (2015) Presented by Beatriz González April
More informationCOMP 465: Data Mining Recommender Systems
//0 COMP : Data Mining Recommender Systems Slides Adapted From: www.mmds.org (Mining Massive Datasets) Customer X Buys Metallica CD Buys Megadeth CD Customer Y Does search on Metallica Recommender system
More informationFIRST FUNDAMENTAL THEOREM OF WELFARE ECONOMICS
FIRST FUNDAMENTAL THEOREM OF WELFARE ECONOMICS SICONG SHEN Abstract. Markets are a basic tool for the allocation of goods in a society. In many societies, markets are the dominant mode of economic exchange.
More informationHaplotype Based Association Tests. Biostatistics 666 Lecture 10
Haplotype Based Association Tests Biostatistics 666 Lecture 10 Last Lecture Statistical Haplotyping Methods Clark s greedy algorithm The E-M algorithm Stephens et al. coalescent-based algorithm Hypothesis
More informationLogistics Service Network Design for Time-Critical Delivery
Logistics Service Network Design for Time-Critical Delivery Cynthia Barnhart 1 andsushen 2 1 Massachuestts Institute of Technology, Cambridge, MA 02139 2 FedEx, Memphis, TN 38125 Abstract. Service network
More informationAn Ascending Price Auction for Producer-Consumer Economy
An Ascending Price Auction for Producer-Consumer Economy Debasis Mishra (dmishra@cae.wisc.edu) University of Wisconsin-Madison Rahul Garg (grahul@in.ibm.com) IBM India Research Lab, New Delhi, India, 110016
More informationPrediction of Google Local Users Restaurant ratings
CSE 190 Assignment 2 Report Professor Julian McAuley Page 1 Nov 30, 2015 Prediction of Google Local Users Restaurant ratings Shunxin Lu Muyu Ma Ziran Zhang Xin Chen Abstract Since mobile devices and the
More informationThe Efficient Allocation of Individuals to Positions
The Efficient Allocation of Individuals to Positions by Aanund Hylland and Richard Zeckhauser Presented by Debreu Team: Justina Adamanti, Liz Malm, Yuqing Hu, Krish Ray Hylland and Zeckhauser consider
More informationPredictive Modelling for Customer Targeting A Banking Example
Predictive Modelling for Customer Targeting A Banking Example Pedro Ecija Serrano 11 September 2017 Customer Targeting What is it? Why should I care? How do I do it? 11 September 2017 2 What Is Customer
More informationModules to Whole Food Webs
Modules to Whole Food Webs 1 Even Simplified Whole Food Webs are extremely complex. How is this complexity maintained from year to year? 2 Stability of Food Web Modules Before tackling food webs of hundreds
More informationConvex and Non-Convex Classification of S&P 500 Stocks
Georgia Institute of Technology 4133 Advanced Optimization Convex and Non-Convex Classification of S&P 500 Stocks Matt Faulkner Chris Fu James Moriarty Masud Parvez Mario Wijaya coached by Dr. Guanghui
More informationRobust Performance of Complex Network Infrastructures
Robust Performance of Complex Network Infrastructures Agostino Capponi Industrial Engineering and Operations Research Department Columbia University ac3827@columbia.edu GRAPHS/SIMPLEX Workshop Data, Algorithms,
More informationTOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS
TOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS Gediminas Adomavicius YoungOk Kwon Department of Information and Decision Sciences Carlson School of Management, University
More informationICS 4U - Team Project. Books, Ratings, Suggestions and More!
ICS 4U - Team Project Books, Ratings, Suggestions and More! Note to Printed Slides If you are reading this only on the materials provided by the Summer Institute and did not see the actual presentation,
More informationMultiple Regression. Dr. Tom Pierce Department of Psychology Radford University
Multiple Regression Dr. Tom Pierce Department of Psychology Radford University In the previous chapter we talked about regression as a technique for using a person s score on one variable to make a best
More informationCost-Aware Compressive Sensing for Networked Sensing Systems
Cost-Aware Compressive Sensing for Networked Sensing Systems Liwen Xu, Xiaohong Hao, Nicholas D. Lane, Xin Liu, Thomas Moscibroda Tsinghua University, Microsoft Research, U.C. Davis ABSTRACT Compressive
More informationCost-Aware Compressive Sensing for Networked Sensing Systems
Cost-Aware Compressive Sensing for Networked Sensing Systems Liwen Xu, Xiaohong Hao, Nicholas D. Lane, Xin Liu, Thomas Moscibroda Tsinghua University, Microsoft Research, U.C. Davis ABSTRACT Compressive
More informationYou are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors
You are Who You Know and How You Behave: Attribute Inference via Users Social Friends and Behaviors Neil Zhenqiang Gong ECE Department, Iowa State University neilgong@iastate.edu Bin Liu MSIS Department,
More informationOn-Line Restricted Assignment of Temporary Tasks with Unknown Durations
On-Line Restricted Assignment of Temporary Tasks with Unknown Durations Amitai Armon Yossi Azar Leah Epstein Oded Regev Keywords: on-line algorithms, competitiveness, temporary tasks Abstract We consider
More informationLecture 6: Decision Tree, Random Forest, and Boosting
Lecture 6: Decision Tree, Random Forest, and Boosting Tuo Zhao Schools of ISyE and CSE, Georgia Tech CS764/ISYE/CSE 674: Machine Learning/Computational Data Analysis Tree? Tuo Zhao Lecture 6: Decision
More informationCSE 473: Artificial Intelligence. Outline
CSE 473: Artificial Intelligence Expecti-Max Search for Stochastic Games Dan Weld Based on slides from Mausam, Andrey Kolobov, Dan Klein, Stuart Russell, Pieter Abbeel, (best illustrations from ai.berkeley.edu)
More informationA logistic regression model for Semantic Web service matchmaking
. BRIEF REPORT. SCIENCE CHINA Information Sciences July 2012 Vol. 55 No. 7: 1715 1720 doi: 10.1007/s11432-012-4591-x A logistic regression model for Semantic Web service matchmaking WEI DengPing 1*, WANG
More informationInfluence Maximization on Social Graphs. Yu-Ting Wen
Influence Maximization on Social Graphs Yu-Ting Wen 05-25-2018 Outline Background Models of influence Linear Threshold Independent Cascade Influence maximization problem Proof of performance bound Compute
More informationRestaurant Recommendation for Facebook Users
Restaurant Recommendation for Facebook Users Qiaosha Han Vivian Lin Wenqing Dai Computer Science Computer Science Computer Science Stanford University Stanford University Stanford University qiaoshah@stanford.edu
More informationTopics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2. B. Rosner, 5/09/17
Topics in Biostatistics Categorical Data Analysis and Logistic Regression, part 2 B. Rosner, 5/09/17 1 Outline 1. Testing for effect modification in logistic regression analyses 2. Conditional logistic
More information3 Ways to Improve Your Targeted Marketing with Analytics
3 Ways to Improve Your Targeted Marketing with Analytics Introduction Targeted marketing is a simple concept, but a key element in a marketing strategy. The goal is to identify the potential customers
More informationTHE internet has changed different aspects of our lives, job
1 Recommender Systems for IT Recruitment João Almeida and Luís Custódio Abstract Recruitment processes have increasingly become dependent on the internet. Companies post job opportunities on their websites
More informationModelling Load Retrievals in Puzzle-based Storage Systems
Modelling Load Retrievals in Puzzle-based Storage Systems Masoud Mirzaei a, Nima Zaerpour b, René de Koster a a Rotterdam School of Management, Erasmus University, the Netherlands b Faculty of Economics
More information2 Maria Carolina Monard and Gustavo E. A. P. A. Batista
Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of
More informationPredicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest
Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other
More informationUNED: Evaluating Text Similarity Measures without Human Assessments
UNED: Evaluating Text Similarity Measures without Human Assessments Enrique Amigó Julio Gonzalo Jesús Giménez Felisa Verdejo UNED, Madrid {enrique,julio,felisa}@lsi.uned.es Google, Dublin jesgim@gmail.com
More informationCMSC 474, Introduction to Game Theory Analyzing Normal-Form Games
CMSC 474, Introduction to Game Theory Analyzing Normal-Form Games Mohammad T. Hajiaghayi University of Maryland Some Comments about Normal-Form Games Only two kinds of strategies in the normal-form game
More informationThe Core Pareto Optimality and Social Welfare Maximizationty
The Core Pareto Optimality and Social Welfare Maximizationty Econ 2100 Fall 2017 Lecture 14, October 19 Outline 1 The Core 2 Core, indiviual rationality, and Pareto effi ciency 3 Social welfare function
More informationTRANSPORTATION PROBLEM AND VARIANTS
TRANSPORTATION PROBLEM AND VARIANTS Introduction to Lecture T: Welcome to the next exercise. I hope you enjoyed the previous exercise. S: Sure I did. It is good to learn new concepts. I am beginning to
More informationA MATHEMATICAL MODEL OF PAY-FOR- PERFORMANCE FOR A HIGHER EDUCATION INSTITUTION
Page 13 A MATHEMATICAL MODEL OF PAY-FOR- PERFORMANCE FOR A HIGHER EDUCATION INSTITUTION Matthew Hathorn, Weill Cornell Medical College John Hathorn, Metropolitan State College of Denver Lesley Hathorn,
More informationCSCI 3210: Computational Game Theory. Inefficiency of Equilibria & Routing Games Ref: Ch 17, 18 [AGT]
CSCI 3210: Computational Game Theory Inefficiency of Equilibria & Routing Games Ref: Ch 17, 18 [AGT] Mohammad T. Irfan Email: mirfan@bowdoin.edu Web: www.bowdoin.edu/~mirfan Split or steal game u NE outcome
More informationWeka Evaluation: Assessing the performance
Weka Evaluation: Assessing the performance Lab3 (in- class): 21 NOV 2016, 13:00-15:00, CHOMSKY ACKNOWLEDGEMENTS: INFORMATION, EXAMPLES AND TASKS IN THIS LAB COME FROM SEVERAL WEB SOURCES. Learning objectives
More informationDynamic Programming Algorithms
Dynamic Programming Algorithms Sequence alignments, scores, and significance Lucy Skrabanek ICB, WMC February 7, 212 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationCS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow
CS229 Project Report Using Newspaper Sentiments to Predict Stock Movements Hao Yee Chan Anthony Chow haoyeec@stanford.edu ac1408@stanford.edu Problem Statement It is often said that stock prices are determined
More information(32) where for are sub-matrices and, for are sub-vectors. Such a system of equations can be solved by solving the following sub-problems
Module 4 : Solving Linear Algebraic Equations Section 4 : Direct Methods for Solving Sparse Linear Systems 4 Direct Methods for Solving Sparse Linear Systems A system of Linear equations --------(32) is
More informationIntroduction to Recommendation Engines
Introduction to Recommendation Engines A guide to algorithmically predicting what your customers want and when. By Tuck Ngun, PhD Introduction Recommendation engines have become a popular solution for
More informationON QUADRATIC CORE PROJECTION PAYMENT RULES FOR COMBINATORIAL AUCTIONS YU SU THESIS
ON QUADRATIC CORE PROJECTION PAYMENT RULES FOR COMBINATORIAL AUCTIONS BY YU SU THESIS Submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Electrical and Computer
More informationCartelization of Cross-Holdings
Cartelization of Cross-Holdings Cheng-Zhong Qin, Shengping Zhang and Dandan Zhu May 25, 2012 Abstract Cross-holdings between firms cause their profits to be mutually dependent. This paper analyzes a model
More informationECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014
ECONOMICS AND ECONOMIC METHODS PRELIM EXAM Statistics and Econometrics May 2014 Instructions: Answer all five (5) questions. Point totals for each question are given in parentheses. The parts within each
More informationSUPPLEMENT TO THE VALUE OF FREE WATER: ANALYZING SOUTH AFRICA S FREE BASIC WATER POLICY (Econometrica, Vol. 83, No. 5, September 2015, )
Econometrica Supplementary Material SUPPLEMENT TO THE VALUE OF FREE WATER: ANALYZING SOUTH AFRICA S FREE BASIC WATER POLICY (Econometrica, Vol. 83, No. 5, September 2015, 1913 1961) BY ANDREA SZABÓ This
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Combination Chemotherapy: DESIGNING CLINICAL RESEARCH Use of a combination of two therapeutics, especially drug-drug combination (called combination
More informationSubmitted to the Grand Rapids Community College Provost and AGC Executive Committee.
Oscar Neal Fall 2012 Sabbatical Report Submitted to the Grand Rapids Community College Provost and AGC Executive Committee. The main objective of my sabbatical was to complete my dissertation proposal
More informationOn-line Multi-threaded Scheduling
Departamento de Computación Facultad de Ciencias Exactas y Naturales Universidad de Buenos Aires Tesis de Licenciatura On-line Multi-threaded Scheduling por Marcelo Mydlarz L.U.: 290/93 Director Dr. Esteban
More informationAccuracy-based identification of local RNA elements
Media Engineering and Technology Faculty German University in Cairo Accuracy-based identification of local RNA elements Bachelor Thesis Author: Supervisors: Essam A. Moaty A. Hady Prof. Dr. Rolf Backofen
More informationPrice of anarchy in auctions & the smoothness framework. Faidra Monachou Algorithmic Game Theory 2016 CoReLab, NTUA
Price of anarchy in auctions & the smoothness framework Faidra Monachou Algorithmic Game Theory 2016 CoReLab, NTUA Introduction: The price of anarchy in auctions COMPLETE INFORMATION GAMES Example: Chicken
More informationCollusive Behavior in a Network Constrained Market
Collusive Behavior in a Network Constrained Market Karla Atkins Jiangzhuo Chen V.S. Anil Kumar Matthew Macauley Achla Marathe 1 Introduction Market power is the ability of a firm to raise the of a product
More informationPriority-Driven Scheduling of Periodic Tasks. Why Focus on Uniprocessor Scheduling?
Priority-Driven Scheduling of Periodic asks Priority-driven vs. clock-driven scheduling: clock-driven: cyclic schedule executive processor tasks a priori! priority-driven: priority queue processor tasks
More informationSupply Chain Supernetworks and Environmental Criteria
Supply Chain Supernetworks and Environmental Criteria Anna Nagurney and Fuminori Toyasaki Department of Finance and Operations Management Isenberg School of Management University of Massachusetts Amherst,
More informationPricing in Dynamic Advance Reservation Games
Pricing in Dynamic Advance Reservation Games Eran Simhon, Carrie Cramer, Zachary Lister and David Starobinski College of Engineering, Boston University, Boston, MA 02215 Abstract We analyze the dynamics
More informationLECTURE 13 THE NEOCLASSICAL OR WALRASIAN EQUILIBRIUM INTRODUCTION
LECTURE 13 THE NEOCLASSICAL OR WALRASIAN EQUILIBRIUM INTRODUCTION JACOB T. SCHWARTZ EDITED BY KENNETH R. DRIESSEL Abstract. Our model is like that of Arrow and Debreu, but with linear Leontief-like features,
More informationBioinformatics for Biologists
Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis
More informationOpinion dynamics: The effect of the number of peers met at once
Diemo Urbig, Jan Lorenz, Heiko Herzberg (28) Opinion dynamics: The effect of the number of peers met at once Journal of Artificial Societies and Social Simulation vol., no. 2 4
More informationOptimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy
Journal of Information & Computational Science 10:14 (013) 4495 4504 September 0, 013 Available at http://www.joics.com Optimal Dynamic Pricing of Perishable Products Considering Quantity Discount Policy
More informationDFT based DNA Splicing Algorithms for Prediction of Protein Coding Regions
DFT based DNA Splicing Algorithms for Prediction of Protein Coding Regions Suprakash Datta, Amir Asif Department of Computer Science and Engineering York University, Toronto, Canada datta, asif @cs.yorku.ca
More informationWorker Skill Estimation from Crowdsourced Mutual Assessments
Worker Skill Estimation from Crowdsourced Mutual Assessments Shuwei Qiang The George Washington University Amrinder Arora BizMerlin Current approaches for estimating skill levels of workforce either do
More informationAnalytic Preliminaries for Social Acceptability of Legal Norms &
Analytic Preliminaries for Social Acceptability of John Asker February 7, 2013 Yale Law School Roadmap Build on case studies of environments where extra-legal norms are at least as important as formal
More informationUsing the FLQ formula in estimating interregional output multipliers
Using the FLQ formula in estimating interregional output multipliers 9. Input-Output Workshop, Bremen 15.03.2018 M. Jahn 1, T. Tohmo 2, A.T. Flegg 3 1 Hamburg Institute of International Economics, 2 University
More informationTHE NORMAL CURVE AND SAMPLES:
-69- &KDSWHU THE NORMAL CURVE AND SAMPLES: SAMPLING DISTRIBUTIONS A picture of an ideal normal distribution is shown below. The horizontal axis is calibrated in z-scores in terms of standard deviation
More informationSingle-cell sequencing
Single-cell sequencing Harri Lähdesmäki Department of Computer Science Aalto University December 5, 2017 Contents Background & Motivation Single cell sequencing technologies Single cell sequencing data
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu Web advertising We discussed how to match advertisers to queries in real- time But we did not discuss how to
More information