As we approach the pinnacle of the big data movement, businesses face increasing pressure to integrate data analytics into their regular decision-making processes, and to constantly iterate on the valuable insights provided by increasingly large and multidimensional data sets. Yet, as the quality and quantity of data have consistently hewn to a steady upward trajectory, the accessibility and availability of analytical software solutions has remained stagnant. Many organizations, especially in the entertainment industry, face the difficult task of quantifying massive amounts of qualitative data. Much innovation in this industry has revolved around market research and user segmentation, with the key insight behind largescale marketing software being the development of granular viewer-level profiles. By identifying cross-platform viewer preferences, behaviors, and interests, marketing executives have been able to apply predictive analytics to the downstream stages of film production through targeted advertising campaigns. However, the greatest uncertainty and complexity of film production remains further upstream, at stages where talent and crew are often not fully determined, or where investment and distribution are still relatively unknown. This presents a problem that is particularly severe to studios and producers, who at this stage often exclusively rely on general comps reports of box office performance for similar films. With so much on the line financially, creatively, and temporally basic financial analyses often prove insufficient in today s competitive data-driven landscape. And as more technology giants with talented data science teams enter the entertainment industry Amazon and Netflix being prime examples in the U.S. it is imperative that traditional film industry professionals balance their urgent analytics needs of the present with a long-term outlook, taking great care to thoughtfully and deliberately integrate data into their production processes. Machine Learning as a Service Although predictive marketing analytics and audience research are hallmarks of the entertainment industry, they tend to rely on survey methodologies that still depend on human execution. Pilot has developed an analytical software solution that leverages novel academic research and democratizes access to advanced machine learning techniques. Our team constructed an intuitive user interface on top of our sophisticated prediction models so that anyone can now perform complex box office forecasting with zero domain experience in computer science or statistics. Pilot predictions can be calculated as early as the ideation phase of pre-production, and can be utilized in granular scenario analysis by varying input variables. Stakeholders can further use the platform to inform decision-making pertaining to film acquisition, insurance valuation, co-investment, and marketing spend. 1
Predictions are generated for the first eight weeks of a prospective project s theatrical run. For first week gross, across a historical data set of over 4,000 limited- and wide-release American films dating back to 1990, Pilot forecasts currently have a mean absolute error of $7M and a median absolute error of $3.5M. To arrive at its predictions, our model disassembles basic variables such as prospective cast members, directors, writers, producers, release date, genre, and plot information into hundreds of secondary features such as network-based variables, markers of seasonality, and country-specific economic indicators. For example, some key variables estimate the influence of a prospective director with the cast and writers, while others act as important indicators of how tightly connected cast members are to each other. We also use consumer spending indices, economic forecasts, and other indicators of consumption to adjust our predictions to the broader economic climate. A group of final secondary variables are then chosen through well-established variable selection methods more sophisticated than simple correlations, optimizing for generalizability and assigning more recent movies a larger weight. Given these final variables, Pilot predicts film performance using cutting-edge machine learning methods, including neural networks, and a proprietary back-end architecture that outputs a single prediction from a variety of models. Our models statistically infer the optimal significance to assign to variables, alongside the interactions between variables that provide the highest prediction accuracy. Pre-Production Scenario Analysis The core deliverable of the forecasting module is a week-by-week prediction for a prospective project s domestic theatrical box office revenue, based on the inputs specified for the project. Below, the blue band represents a 95% confidence interval for a film s performance over its first eight weeks at the box office. 2
As input variables are edited, the forecast will shift in: - narrowness, reflecting confidence in a given estimate - magnitude, reflecting the overall success of the film - decay, reflecting film performance week-to-week As a result, when making crucial and foundational casting decisions when packaging a film project, Pilot can be used to optimize ROI by balancing the cost of including a particular individual against the projected box office earnings of a project with their contribution. To aid in this process, Pilot also provides historical information on major cast and crew included as part of a prospective film. The previous graph plots the box office performance of films that a sample actor has starred in over time. Users can derive actionable insights from informed searches across variables that can optimize film revenue without compromising the artistic process, as no inputs are tied to the script and no humans are involved in making qualitative judgements. Instead, quantifying prior connections between a film s cast and crew as a network plays heavily into our predictions. Above is a graph displaying a small subset of our network, showing 322 edges connecting 104 actors in collaborations over the last two years. This is already of complexity far beyond manual interpretation. Over the past decades, tens of thousands of actors and crew members have generated hundreds of thousands of pairwise collaborations, which contain a wealth of information useful for statistical inference. 3
Film Acquisition, Investment, and Insurance The film industry is a volatile one, particularly if you examine stock returns of major production companies. 1 De Vany writes: There is no typical movie and averages signify nothing...the movie business is completely and utterly non-gaussian because it is a business of the extraordinary. Films are also subject to seasonality and broader economic influences. Most individuals can call to mind at least one box office bust as easily as they can summon up the memory of a blockbuster, which speaks to the variance in revenues that even seemingly safe films can make. However, the film industry is not as unpredictable as it may seem. In aggregate, films reliably perform contracyclically to the economy, tend to have stronger performance around major holidays, and are consumed relatively equally during economic downturns and upticks. 2 Furthermore, in examining our historical dataset, which contains a number of proprietary features, more than 50% of the variance in film performance can be explained when using even the simplest linear regression methods without any information available past the pre-production phase. Despite variability in execution, some of the most important factors in film the cast and crew remain constant across decades. One of our earliest clients used to rely on one-time comps analyses in Excel from third-party consultants during the film packaging process. Unfortunately, the comps analyses they received naively compared their indie film to those of big-budget studios, and even major changes to their project s cast and crew did not result in different comps. Pilot s visualizations fit into existing comps workflows, and we validate our predictions in realtime by also calculating predictions for similar historical films. Historical projections are made using models that have not seen the data relevant to those films to provide an honest forecast. 1 De Vany, Arthur, 2004. Hollywood Economics: How Extreme Uncertainty Shapes the Film Industry. 2 Vogel, Harold L., 2010. Entertainment Industry Economics: A guide for financial analysis. 4
Estimation of Downstream Costs Production is only part of the equation for creating a film. Distribution, exhibition, advertising, marketing, and syndication are all time-intensive and capital-intensive endeavors. For example, even low-end media campaigns can reach costs of $8-12M, 3 while for Tears of the Sun (2002), TV advertising alone cost $24M 4 a film which only grossed $86M worldwide on a production budget of $70M. 5 Marketing costs have grown in the past decades faster than inflation, and often 50% on top of a film s budget is allocated to marketing alone. 6 Further, theater exhibition often does not fully recoup a film s cost. Instead, ancillary revenues are typically required to recover a film s investment, and are directly related to box office performance. 7 Accordingly, downstream factors are considered as early as green-lighting. Pilot provides 95% confidence intervals represented by upper and lower bounds on its week-by-week projections for a simple estimation of the ROI in best-case, worst-case, and most-likely scenarios. Provided far upstream in the production process, these revenue projections can help answer important questions about risk profile and investor upside potential as well as flexibility in advertising costs, and these figures can demystify the time course of a film s box office performance. Similarly, by testing different release dates against the distribution schedules they permit and the competitive climate at the time of release, distributors can preemptively optimize downstream revenues in ancillary formats. The Future of Film Analytics By relying purely on historical data to forecast box office revenue, Pilot is uniquely positioned to avoid the sampling biases inherent to audience research methodologies. However, there is an important balance between upstream predictions based on historical data and downstream marketing based on audience segmentation. An optimal solution combines both strategies in a full-cycle forecast: as a film s release date approaches and more audience data becomes available, one could adjust predictions made with historical data using social media sentiment analysis and pre-release analysis of trends in audience taste. Pilot is the first step toward an end-to-end entertainment analytics platform. As you leverage data to maintain a competitive edge, our suite of software will tackle the financials, so you can focus on what matters most creating. 3 Friedman, R.G., 2006. Motion picture marketing. In: Squire, J.E. (ed.), The Movie Business Book, Intl. 3rd ed. 4 Elberse, A., Anand, B., 2007. The effectiveness of pre-release advertising for motion pictures: an empirical investigation using a simulated market. 5 Internet Movie Database (IMDB). Retrieved November 15, 2016. 6 Vogel 7 Ibid. 5