Constructing the Search Results Page 2016 Machine Learning Summer School Arequipa, Peru Dilan Gorur

Size: px
Start display at page:

Download "Constructing the Search Results Page 2016 Machine Learning Summer School Arequipa, Peru Dilan Gorur"

Transcription

1 Constructing the Search Results Page 2016 Machine Learning Summer School Arequipa, Peru Dilan Gorur

2 Overview Players in the search game Utility definition Whole page optimization Counterfactual reasoning Tuning ad auctions 8/04/2016 MLSS

3 The Search Game

4 The Search Game Publisher User Advertiser 8/04/2016 MLSS

5 The Search Game Each player has its own Role Goal Utility Publisher User Advertiser 8/04/2016 MLSS

6 Roles Users Submit query to express intent Browse relevant information Advertisers Provide ads to be matched with queries Bid on query phrases Pay per click (or other action) Publisher Execute query to retrieve information Run ad auction Publish SERP User Publisher Advertiser 8/04/2016 MLSS

7 Goals Users Navigate Find information Learn Advertisers Return on investment Publishers User retention Revenue Volume growth User Publisher Advertiser 8/04/2016 MLSS

8 Publisher Utility function User Advertiser A mathematical function that assigns a value to all possible choices. Expresses the preferences of the marketplace participants with respect to perceived risk and expected return. Each participant s utility function will be different. A utility function can be composed of several simpler utilities.

9 Publisher Utilities User Advertiser Conflicting or synergistic: Ads offer information. User clicks, publisher makes money, advertiser gets volume. (Irrelevant) ads annoy users. If clicked, publisher makes money, advertiser gets volume Limited page real estate, page components compete for visibility. Higher cost per click better for publisher, worse for advertiser. How to balance? 8/04/2016 MLSS

10 Publisher Formulating Search Game Utility User Advertiser Total utility is a balance of individual utilities: f(pub U, USER U, ADV U ) How to choose a combination function f? How to define PUB U, USER U and ADV U? Parameterize to express business priorities Capture the long-term goals of the system Find a balance between all players 8/04/2016 MLSS

11 Publisher Per SERP Utility User Advertiser Bottom up: focus on per SERP quality Simplify f to be e.g. convex combination of individual components: αpub U + βuser U + γadv U No single performance metric reflects the complete picture, therefore a multitude of Key Performance indicators (KPIs) help define utilities Revenue per mille (RPM) Click through rate (CTR) Quick back rate (QBR) Coverage (Cov) Impression Yield (IY) Dwell time (DT) Cost per click (CPC) 8/04/2016 MLSS

12 Publisher Per SERP Utility User Advertiser Monitor KPIs over time to ensure optimal long term utility Session-based KPIs KPIs on long range windows Pick web and vertical search results based on IR models Optimize which ads to show based on an auction mechanism Construct each SERP to optimize whole page utility 8/04/2016 MLSS

13 Whole Page Optimization 8/04/2016 MLSS

14 Whole Page Optimization WPO is the task of determining the search results page (SERP) content and layout given vertical results blocks. SERPs should satisfy query intent of users, retain users over time for the publisher to have query volume and allow advertisers reach their target audience for a sustainable business. How to do the trade off between revenue generating and volume generating page components? 8/04/2016 MLSS

15 SERP components search query Organic search/web search/algo Sponsored search/ads Text Ads Product Ads Vertical blocks Related searches 8/04/2016 MLSS

16 SERP components search query Organic search/web search/algo Sponsored search/ads Text Ads Product Ads Vertical blocks ml ads image module knowledge module map module sidebar ads 8/04/2016 MLSS

17 Scanning pattern The golden triangle scanning pattern Top-down Left-to-right Motivates top-heavy ranking models Motivates ads placement on top Hotchkiss et al. 2005, Enquiro 8/04/2016 MLSS

18 Triangle shifted Aversion to ads and/or poor ad quality? Scanning pattern still Top-down Left-to-right Buscher et al. SIGIR /04/2016 MLSS

19 Triangle disrupted Visually rich content captures attention Attention pattern is not a triangle anymore Browsing may be not top-down ICOMP Study, /04/2016 MLSS

20 Triangle disrupted Visually rich content captures attention Attention pattern is not a triangle anymore Browsing may be not top-down ICOMP Study, /04/2016 MLSS

21 WPO Objective Users see the whole page and not individual sections on the page. Vertical blocks provide visual richness. Ad blocks are getting visually richer. Page components compete for user s attention Understand how page layout influences user engagement and satisfaction, which in turn impacts the search volume and revenue potential of the search engine Go beyond single-column linear ranking of page components Learn a whole page utility function that can be optimized to make best use of the finite page real estate and limited user attention. 8/04/2016 MLSS

22 Modeling layout impact Help answer positioning and component interaction questions. Use click as a proxy to user engagement in this talk. Learn P(x; S): click probability on component x given page layout S Use the click probabilities to compute the utility per layout e.g. αpub U + βuser U + γadv U 8/04/2016 MLSS

23 Estimating Click Probabilities using Choice Models 8/04/2016 MLSS

24 Estimating click probabilities Modeling the probabilities requires exploratory data on layouts. Click data biased by policy at use. Limited variation on observed layouts. Ads, web and vertical blocks independently constructed Blocks are compared using rank scores ML models on randomized layout data can be used to Capture the interaction between the page components Give insights to the attributes that impact the page utility 8/04/2016 MLSS

25 Choice Models Terminology Goal of choice models understand and model the behavioral process that leads to the subject s choice Data: Comparison of alternatives from a choice set Marketing: which cell phone to buy Web: which link to view 8/04/2016 MLSS

26 Discrete Choice Models Terminology Alternatives: Items or courses of action (e.g. taking the bus) Choice set discrete finitely many alternatives exhaustive all possible alternatives are included mutually exclusive choosing one implies not choosing any other Data repeated choice from subsets of the choice set number of times each alternative is chosen over some others 8/04/2016 MLSS

27 Bradley-Terry-Luce (a.k.a. Logit) Most widely used choice model Allows multinomial choice probabilities to be inferred from binomial choice experiments Starts with the assumption that each alternative has a fixed utility independent from the choice set and arrives at the choice probabilities: P i; S m = U(i) σ j Sm U(j) = e βtxi σ j Sm e βt x j 8/04/2016 MLSS

28 Bradley-Terry-Luce Assumptions Simple scalability Scores of alternatives can be scaled such that the choice probability is a monotone function of the scores of the respective alternatives. Independence from irrelevant alternatives (IIA) The probability of choosing an alternative over another does not depend on the choice set. Simple scalability implies IIA IIA implies RUM (a class of models from economics literature) There are cases where both assumptions are violated, e.g. when alternatives share aspects. 8/04/2016 MLSS

29 Elimination by Aspects Takes into account similarities between alternatives. Represents characteristics of the alternatives in terms of their aspects binary feature vectors Each aspect has a utility positive weights The EBA choice process Select an aspect at random, w.p. proportional to its utility, Eliminate all alternatives without that aspect, Repeat until a single option remains. 8/04/2016 MLSS

30 EBA choice probabilities Paired comparison When presented with two options, the decision process reduces to choosing an option w.p. proportional to the sum weights of its distinguishing features: f ik : kth feature of option i w k : weight associated with feature k 8/04/2016 MLSS

31 Example click probabilities per layout {relevance:10,image:4, ML:2} P QSrr; {image, QSrr} = {relevance:10,qs:2, RR:1} = 1/3 feature image QSrr Weight relevance image query suggestion mainline Right rail /04/2016 MLSS

32 EBA choice probabilities Multiple comparison More generally, multiple choice probability resulting from the recursive elimination process is: P(i; S): probability of choosing option i from a set of options S F Sm : set of features that at least one option in set S m has F o Sm : set of features that are shared by all options in S m S α m : set of options in S m that have feature α 8/04/2016 MLSS

33 Example click probabilities per layout {relevance:10,image:4, ML:2} {relevance:10, QS:2, ML:2} P EBA QSrr; {image, QSrr, QSml} = = 5 27 = 0.19 P BTL QSrr; {image, QSrr, QSml} = {relevance:10,qs:2, RR:1} = 13/43 feature image QSrr QSml Weight relevance image query suggestion mainline Right rail /04/2016 MLSS

34 Example click probabilities per layout {relevance:10,image:4, ML:2} {relevance:10, QS:2, ML:2} P EBA QSrr; {image, QSrr, QSml} = = 5 27 = 0.19 P BTL QSrr; {image, QSrr, QSml} = {relevance:10,qs:2, RR:1} = = 0.30 feature image QSrr QSml Weight relevance image query suggestion mainline Right rail Total utility /04/2016 MLSS

35 BTL = EBA with independent features P i; S m = U(i) σ j Sm U(j) = σ α x i u α σ β Sm u β

36 Elimination by Aspects Can cope with choice scenarios where IIA and simple scalability assumptions do not hold. 8/04/2016 MLSS

37 Elimination by Aspects Independence from Irrelevant Alternatives IIA suggests the probability of choosing the car would drop from 1/2 to 1/3 when a second bus is introduced as the third alternative. 8/04/2016 MLSS

38 Elimination by Aspects Simple Scalability Simple scalability assumes that the choice probabilities change smoothly with the scale (utility) of an alternative. In some situations introducing a small but unique aspect may lead to an immense change in the probabilities. 8/04/2016 MLSS

39 Elimination by Aspects Simple Scalability {Trip to Paris}, {Trip to Rome} 50/50 {Trip to Paris}, {Trip to Paris + free cup of coffee} 0/100 {Trip to Paris}, {Trip to Rome}, {Trip to Paris + free cup of coffee} 0/50/50 8/04/2016 MLSS

40 Discrete Choice Models Different approaches Algebraic or absolute theories vs probabilistic or stochastic theories Stochastic behavioral models randomness in the decision rule Random utility models randomness in the determination of subjective value Start from different perspectives, arrive at similar model structures 8/04/2016 MLSS

41 Multinomial model likelihood X: observed number of clicks on each component i for each layout m F: the N K binary feature matrix w: weight vector P(i; S m ): probability of click on component i in layout m x im : number of clicks on component i in layout m (σ i Sm x im = X m ) S m : mth layout (set of components on the SERP and their position) M: number of different layouts observed 8/04/2016 MLSS

42 Priors for model parameters Weights are assumed to have independent Gamma priors. Hierarchical Beta-Bernoulli priors are assumed on the features. The presence of each feature f ik is independent of all other assignments conditioned on the feature presence probability π k π k are independently generated from a Beta distribution. 8/04/2016 MLSS

43 MCMC inference Metropolis Hastings sampling to update the weights Use a Gamma distribution with mean as the current weight value, standard deviation adjusted to have an acceptance ratio close to 0.5 Gibbs sampling for updating each f ik conditioned on F ik and w: f ik : the set of feature assignments for all options except i for feature k m i,k : the number of options that possess feature k 8/04/2016 MLSS

44 Experiments 8/04/2016 MLSS

45 Whole Page Utility Learn click probabilities given the layout. End goal is to pick the layout that optimizes the whole page utility WPU k = max [αpub U + βuser U + γadv U ] k Publisher, user and advertiser utilities can take various different forms. Here, we use: Volume (measured as DSQ) for PUB U Ad Clicks for ADV U SERP richness (as a proxy for visual appeal) and session success rate (SSR) for USER U 8/04/2016 MLSS

46 Queries Succinct representation of user s intent Advertisers bid on query keywords/phrases Type of queries Informational: want to learn about something zika virus Navigational: want to go to a particular page facebook Transactional: want to do something travel insurance Some queries are more monetizeable than others 8/04/2016 MLSS

47 Randomized data collection Variation of layout per query is necessary to train the model without production biases. Limit to a small fraction of queries. Queries with commercial intent Queries that have high revenue potential Explore a subset of page components Randomly assign components to particular positions Aggregate data used for modeling: For each layout observed, log clicks on each page component. Click counts are used to learn click probabilities for components in any layout.

48 POLE Exploring layouts TOP Ads RR 1 Web ranking not modified Ads placement not modified Place three of following on (TOP/MOP/BOP/RR) Local module Image module Video module News module Query suggestion module Travel module Finance module Twitter module Info module Reference module Dictionary module TOP MOP RR 2 RR 3 8/04/2016 MLSS

49 Data For each instance of a query in the experiment set, a random subset of modules were selected and randomly placed on the page. Data collection restricted to 15 verticals. Limited the positions to TOP/MOP_3/BOP and RR explored. Each page component with its position is treated as a distinct option. The SERP layout constitutes the choice set. 8/04/2016 MLSS

50 Exploratory analysis of the experimentation slice 8/04/2016 MLSS

51 AdsAlgoLayout vs RichLayout The x-axis represents coverage at increasing RPM thresholds. We use clicks on volume generating verticals such as query suggestions, image and video as proxies for contributors to DSQ. This is not the revenue optimal layout. RPS is high, but search volume loss results in less revenue gain. Certain query categories show a loss in both volume and RPS. 8/04/2016 MLSS

52 RevenueOptimized vs RichLayout RichLayout leads to increase in DSQ but a drop in revenue compared to a RevenueOptimizedLayout 8/04/2016 MLSS

53 Optimizing whole page utility Simplified objective: for each query, pick the layout that leads to maximum ad clicks while keeping at least one vertical block on the page and ensuring no search volume loss. We train a model to predict click probabilities of each component for a given layout Ask counterfactual questions such as How sensitive are the click probabilities to positioning of page components? Which component would lead to least click shift from ads if included in the layout? 8/04/2016 MLSS

54 Model training 8/04/2016 MLSS

55 Model training Queries are grouped into categories based on intent classifiers. Inherent assumption: each page component is equally relevant to every query conditioned on the query category. Clicks on each option are aggregated for each layout in each category. An EBA model is trained per group given the aggregate click data. Segment queries using category scores (Latent) Binary features for modeling page components Infer weights (and features) using MCMC Apply learned weights to the features to predict click probabilities We use a single posterior sample (i.e. a sample from the MCMC chain after the burn in) of the model parameters to do model evaluations. Compare alternative models using held out data. 8/04/2016 MLSS

56 Modeling details Example feature matrix with shared features Rows: Page components Columns: Features The feature matrix shows the feature sharing pattern between options BTL/Logit corresponds to EBA with only a diagonal feature matrix EBA with a block diagonal feature matrix is a special case of a nested logit model 8/04/2016 MLSS

57 Learned feature importance 8/04/2016 MLSS

58 Model comparison Different specifications, corresponding to well known models: 1. Diagonal feature matrix (Fdiag), i.e. assume no shared features between options, equivalent to the logit model. 2. A tree structured feature matrix (Fnest) manually constructed using component similarities, used to define an EBA equivalent to the nested logit model. 3. A manually engineered feature matrix (Feng) using domain knowledge about the page component similarities, without any restriction on the matrix. 4. A latent feature matrix (Fltn) which is treated as a model parameter and inferred from the data. 5. Baseline model: EBA with a diagonal matrix with weight values set to the observed overall CTR of the options, completely ignoring the co-ccurrence of the other options (Pctr) Upper bound: empirical click probabilities (Pctr) 8/04/2016 MLSS

59 Model comparison Log likelihood values of the four models, along with the values from Pctr and Pemp. KL divergence between Pemp and the other models. Both metrics show a significant performance improvement in going from diagonal feature structure to more flexible features. Models with the more flexible features Feng and Fltn are comparable, and they show a small gain over using Fnest. 8/04/2016 MLSS

60 % click shift from text ads TOP one additional component is introduced to the base page layout of ads, algo and query suggestions 8/04/2016 MLSS

61 Model based simulations 8/04/2016 MLSS

62 Having richness while preserving clicks on ads The idea is to use the learned model to keep some richness on the page Assume a base layout (e.g. {AdsTop, AdsBop, AdsRR, Algo}) For each module, create a proposed layout by adding a module to the base layout at positions TOP/Mop_3/RR Pick the proposed layout with maximum ad click probability 8/04/2016 MLSS

63 AdsAlgoLayout vs RichLayout The x-axis represents coverage at increasing RPM thresholds. We use clicks on volume generating verticals such as query suggestions, image and video as proxies for contributors to DSQ. This is not the revenue optimal layout. RPS is high, but search volume loss results in less revenue gain. Certain query categories show a loss in both volume and RPS. 8/04/2016 MLSS

64 Relative Revenue, RPS change Relative DSQ change RuleBasedLayout vs RevenueOptimizedLayout RuleBasedLayout vs RichLayout 3.0% Revenue change RPS change DSQ change 0.00% 2.5% -0.50% 2.0% -1.00% 1.5% -1.50% 1.0% -2.00% 0.5% -2.50% 0.0% 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 1% -3.00% RuleBasedLayout coverage of Experiment Slice Size 8/04/2016 MLSS

65 ModelBasedLayout vs RevenueOptimizedLayout 8/04/2016 MLSS

66 Evolution of the experimentation slice RichLayout 8/04/2016 RuleBased Layout MLSS 2016 ModelBased Layout 67

67 Summary Randomization allows collecting unbiased data for model learning EBA choice model captures Page component similarity Importance of each component Position prominence EBA gives an accurate estimate of true click probabilities Using click probabilities to construct the SERP leads to improved whole page utility 8/04/2016 MLSS

68 Other directions Possible to improve data collection using e.g. contextual bandits Other models may do equally well or better Alternative choice models Context-aware ranking models Use more inputs User features Session info Federated search rather than independently constructed blocks Variations on the whole page utility function possible