Netflix Optimization: A Confluence of Metrics, Algorithms, and Experimentation. CIKM 2013, UEO Workshop Caitlin Smallwood

Similar documents
From Theory to Data Product

RECURRING REVENUE OPTIMIZATION: THE BASICS

Insights from the Wikipedia Contest

Marketing. Georgian Ballroom

Modern Retrieval Evaluations. Hongning Wang

Hispanic Americans Foreshadow the Future of Media

Workfront Training Course Guide

Project Planning & Management. Lecture 11 Project Risk Management

Measuring user engagement: a holistic view

RECOMMENDER SYSTEM IN RETAIL

Drive your results with target-setting analytics

Making Good Impressions. Appnext s Best Practices for Ad Monetization

Segmentation and Targeting

AdColony Nielsen present: Cross-Platform Video Ad Effectiveness Study

Data Mining in CRM THE CRM STRATEGY

Seminar in E-Business & Recommender Systems University of Fribourg, Department of Informatics

Marketing and CS. Ranking Ad s on Search Engines. Enticing you to buy a product. Target customers. Traditional vs Modern Media.

Chapter 3. Integrating AHP, clustering and association rule mining

PRODUCT DESCRIPTIONS AND METRICS

Segmentation and Targeting

Predictive Conversion Modeling

PREPAID CUSTOMER SEGMENTATION IN TELECOMMUNICATIONS

SPM 8.2. Salford Predictive Modeler

Product Recommender System for Small-Scale Niche Businesses using Association Rule Mining

Organizations do not need a Big Data Strategy; they need a Business Strategy that incorporates Big Data

Robotic Process Automation

Building the In-Demand Skills for Analytics and Data Science Course Outline

2016 Acquisition and Retention Study. Drivers of customer retention

CREDIT RISK MODELLING Using SAS

Yelp Recommendation System Using Advanced Collaborative Filtering

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Combine attribution with data onboarding to bridge the digital marketing divide

Making Marketing Smarter with Analytics

Business Models. CDN industry. Ads. Other (freemium, bundles) Pricing V

The Basics of ITIL Help Desk for SMB s

DELIVERING INNOVATIONS TO ENHANCE THE DRIVING EXPERIENCE

Ask the Expert Model Selection Techniques in SAS Enterprise Guide and SAS Enterprise Miner

Call Center Benchmark

webinar for YMCAs Jump Start January

Predicting user rating for Yelp businesses leveraging user similarity

Accelerate Your CX Journey

Five Tips to Achieve a Lean Manufacturing Business

How to enable revenue growth in the digital age

Improve Alerting Accuracy

The U.S. Digital Video Benchmark 2012 Review. Adobe Digital Index

Evaluation next steps Lift and Costs

Pricing and the Psychology Consumption

Are you leveraging your path to purchase as a path to growth?

Are you leveraging your path to purchase as a path to growth?

Predicting user rating on Amazon Video Game Dataset

ENTERPRISE WORKFORCE OPTIMIZATION:

Employer Branding Essentials. 4 Tips Inspired by LinkedIn s Top Attractors Ranking

Global Media Intelligence Report

Oracle Real-Time Decisions zur Entscheidungsoptimierung und dessen Einführung

Business Intelligence, Analytics, and Predictive Modeling Kim Gaddy

Effective Frequency: Reaching Full Campaign Potential

Software Feature Sets. Powerful. Flexible. Intuitive. Alight feature sets. Driver-Based Planning & Analytics

Cross-channel attribution

The Art of Optimizing Media Asset Management

Verint Engagement Management Solution Brief. Overview of the Applications and Benefits of

IBM ICE (Innovation Centre for Education) Welcome to: Unit 1 Overview of delivery models in Cloud Computing. Copyright IBM Corporation

BUS 168 Chapter 6 - E-commerce Marketing Concepts: Social, Mobile, Local

1 This document was adapted from information from Center for Public Health Quality, Charlotte Area Health Education Center, NC State University

Postgraduate Diploma in Digital Marketing

Adapting Your Strategy to User Engagement Patterns: Age & App Usage in the United States

VERTICAL VIDEO AND BEYOND

Online Model Evaluation in a Large-Scale Computational Advertising Platform

Predicting Yelp Ratings From Business and User Characteristics

Top Customer Engagement Tools for Utilities

29 Must-Know Terms For Every Social Media Analyst. (and the people who work with them)

IT S TIME TO RETHINK CONCEPT TESTING

Good afternoon members of the jury and members of the audience.

Achieve Better Insight and Prediction with Data Mining

Fit For Purpose. Resilience & Agility in Modern Business. Presenter David J. Anderson CEO. Agile Business Conference London October 2014 Release 1.

The Digital Maturity Model & Metrics Accelerating Digital Transformation

Call Center Benchmark India

Data Mining Applications with R

Big Data. Methodological issues in using Big Data for Official Statistics

Hello Attribution. Goodbye Confusion. A comprehensive guide to the next generation of marketing attribution and optimization

Unleashing the Enormous Power of Call Center KPI s. Call Center Best Practices Series

MEASURING ROI & ATTRIBUTING SUCCESS TO DIGITAL MARKETING

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

The Fallacy of the Net Promoter Score: Customer Loyalty Predictive Model. Mohamed Zaki, Dalia Kandeil, Andy Neely and Janet McColl-Kennedy

TBM Awards Nomination Questionnaire

Clustering Method using Item Preference based on RFM for Recommendation System in u-commerce

Leveraging Smart Meter Data & Expanding Services BY ELLEN FRANCONI, PH.D., BEMP, MEMBER ASHRAE; DAVID JUMP, PH.D., P.E.

The Data Dilemma. Unlocking Customer Insights with Machine Learning

Turn Data into Business Value

Web Marketing Straight Talk: Who s the Real Boogeyman? Presented by Chuck Gordon, CEO SpareFoot

Talent Review and Development Process: A Step-by-Step Guide

A Personalized Company Recommender System for Job Seekers Yixin Cai, Ruixi Lin, Yue Kang

Data maturity model for digital advertising

Measuring online impact on offline conversations

AKAMAI WHITE PAPER. The Power of Media Analytics Data-Driven Insights for Boosting Audience Engagement

How to become a CLTV aligned organization?

Splitting Approaches for Context-Aware Recommendation: An Empirical Study

A Roadmap to Success: Five Guiding Principles to Incentive Compensation Planning. KMK Consulting, Inc.

ACCENTURE & SAP SUCCESS FACTORS INVESTIGATE CAPABILITIES WORKBOOK. Imagine where we will go together...

UNLOCK THE EXPERTISE INSIDE YOUR FIRM TO... TRAIN STAFF ENGAGE CLIENTS ATTRACT PROSPECTS

Calculation of Spot Reliability Evaluation Scores (SRED) for DNA Microarray Data

Transcription:

Netflix Optimization: A Confluence of Metrics, Algorithms, and Experimentation CIKM 2013, UEO Workshop Caitlin Smallwood 1

Allegheny Monongahela Ohio River 2

TV & Movie Enjoyment Made Easy Stream any video in our collection on a variety of devices for $7.99 a month

Over 800 Partner Products

5

Content Partners

Kids

Original Content

The UI 9

The Data Visitor data User Metadata Social Users Plays (streaming) Users Ratings Users Searches Device streaming performance Video Metadata Video Impressions 10

A few facts 40M members globally Ratings: 4M+/day Searches: 3M+/day Plays: 1B+/month 11

Metrics 12

Engagement is a user s response to an interaction that gains, maintains, and encourages their attention, particularly when they are intrinsically motivated - Jacques, 1996 13

User Engagement Measurement Techniques Self-reported or explicit Satisfaction, likelihood to recommend, likelihood to use or re-use, self-reported usage, self-reported preferences Physical observation of users User experience in-person research, eye tracking Behavioral observation of users Analytics on behavioral data 14

Common user engagement metrics Lifetime value (LTV) Retention Page views Time spent Number of distinct actions Recency of last visit/use Time between visits/uses 15

YOUR Engagement Metrics What s your business model? Monthly subscription What do you want your customers to do? Retain monthly (forever) because they enjoy the service What do your happiest, most valuable customers do? Retain month over month and watch 16

Monthly Retention 17

Specificity Billing Cancel request Cancel Cancel request Revoke cancel Billing Failure Billing Recovery 18

Voluntary Cancel Rate Our most satisfied customers watch more 20% 18% 16% 14% 12% 10% 8% 6% 4% 2% 0% 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 Customers Stream Hours in the past 28 days

Median streaming hours per user *Over 28-day period 20

User engagement = user granularity % of users who do x Medians or better, distributions of user-level volume measures 21

Negative metrics can also be useful Percent of users with no streaming* *Over 28-day period 22

One process for identifying engagement metrics Decide on criteria for a good metric Brainstorm metrics that might meet criteria Identify the best candidates Predictive modeling or other analytic techniques Expert judgment Qualitative research Validate by trying to use the metric Experiment measurement Algorithm or model input Trends 23

Some metric criteria suggestions Metric is correlated with core business metrics (conversion, retention) and contributes unique predictive power beyond the other metrics? Metric is user-level or weights users properly toward core metrics Metric is actionable Metric shows differentiation 24

Brainstorm from all angles Variety/novelty, joy, trust, focused attention Positive and negative experiences What, who, how, why? Recency, Frequency, Monetization Short-term, long-term, changes over time Metric variants 25

Variable Importance Measure Example of ranking metrics abilities to explain core business metrics (retention in this case) 0.012 0.01 0.008 0.006 0.004 0.002 0 26

27

Algorithms 28

Algorithms for Content recommendations Search results Streaming experience 29

80% of plays are based on recommendations phone PS3 tablet Same algorithms power the recommendations on all devices

The Basics Data Inputs Explicit member data Taste preferences Title ratings Implicit member data Viewing history Queue adds Ratings Non-personalized data Content library Title tags Popularity Algorithms Recommendations Rows Titles within rows 31

What the algorithms do Row selection Video ranking Video-video similarity User-user similarity Search recommendations Also need to consider complex characteristics and tradeoffs such as: Popularity vs personalization Diversity Novelty/Freshness Evidence

Probability, statistics, optimization, dynamic systems Hypothesis testing, estimation, delta method, bootstrapping Linear and generalized linear models Matrix factorization Markov processes Various clustering algorithms Bayesian models Latent Dirichlet Allocation L1 and L2 regularizations Association Rules Tree-based methods Bagging and boosting Vector spaces and the Mahalanobis distance 33

Source of Signals Entire population Segments Individuals Any can provide valuable signals 34

Evolution occurs through experimentation 35

Faster Innovation Through Offline Testing Train Model Offline Evaluate Model A/B Test Roll Out Change Generate Hypothesis

Offline Metrics Offline metrics help guide decisions on what to A/B test Understand metric limitations and ignore as needed No offline set of metrics is predictive enough of cancelation rates Some metrics predict local algorithm metrics In-line with the way algorithms are optimized

Root Mean Squared Error (RMSE) Historically used to measure accuracy of predicted star ratings; good for offline optimization?

Why would RMSE improvement be a key driver to increase retention? vs.

Personalized Video Ranking

Personalized Video Ranking TopN problem Natural metrics come from information retrieval: Mean reciprocal rank Precision Recall But which correlate with cancelation rates and overall usage?

Interesting Challenges in Algorithms How do we develop recommender systems that directly optimize long term goals (user retention and overall consumption) offline? The effect of presentation bias Can any offline metric help? Can we remove this bias from our signals and algorithms? What s the best way to define the space of rows of videos? What s the best way to construct a page of recommendations? How can we best cold-start users and videos?

Experimentation 43

Controlled Experiment Version A Target population Random distribution Identical except for the treatment being tested! Version B Analyze & compare key metrics (with statistical confidence measures) 44

The Appeal: Causality 45

Ingredients of great experimentation Innovation and prioritization of impactful tests Experimental design (methodology, test cell design, sampling ) Execution of controlled experiment Accuracy (of data, engineering, statistics) Proper decision-making metrics & measurement techniques Pace & agility Interpretation and decision-making 46

2010 47

Now 48

Characteristics specific to Netflix testing Challenges Sampling of new members has efficiency limitations Monthly billing cycles increase our testing timelines Breadth of devices and UIs impact pace of execution and add complexity across the ecosystem Assets Clear core metrics Member identification (logged-in, paying customers) Great data Bias toward product simplicity Culture of learning, openness, & debate Executive commitment & participation

Metrics Cumulative Retention Streaming Many other secondary engagement metrics 50

Streaming measurement: Kolmogorov-Smirnov (KS) test Hours KS Test statistic

Streaming measurement: KS example

Streaming measurement: Thresholds with z-tests for proportions Profiles win confirmation test 53

Retention Streaming measurement: Streaming score model Probability of retaining at each future billing cycle based on streaming S hours at N days of tenure Total hours consumed during 22 days of membership 54

Challenges with hours Not all hours have equal value to customers TV vs features have dramatically different consumption rates Service is available after cancel request Timespan for hours measurement 55

Similars Algorithm Experiment Algorithm A Algorithm B 56

What should we measure in this test? Ideas? Retention & overall streaming CTR on Similars rows; Share of hours from similars rows? Should we care about cannibalization? Horizontal position played? Should we measure whether the new algorithm generated results that were more similar? What does the customer expect out of the row based on its label? Did the customer enjoy the titles more even if he/she did not watch more in total hours? 57

How might we know whether a customer enjoyed a title? Gave it a high rating But only a subset of users rate Came back to watch again Different opportunity for a TV show vs movie Fraction of content viewed FCV = duration watched title runtime 58

Fraction of Content Viewed ( FCV ) Average FCV # of titles viewed/# of streaming hours % of Titles with FCV <= 15% % of sessions with FCV <= 15% % of Active Days with a Play >= 90% % of Play Days with a Full Play % of Hours from Browse Plays Variants for: TV vs movies Episode, season, show Timeframes How the title was found 59

Nearly every engagement metric is highly correlated with total streaming hours 60

Best metric variants do provide some lift Controlling for streaming hours, these metrics improve retention prediction 61

New metrics are often tested as algorithm input signals (and vice-versa) 62

Experimentation Acknowledgements: Carlos Gomez-Uribe Juliette Aurisset Kelly Uphoff Metrics Algorithms 63