ABSTRACT SAS Enterprise Guide: Point, Click and Run is all that takes Aruna Buddana, TiVo Inc, Alviso, CA The Audience Research and Measurement team at TiVo constantly collects and processes anonymous data from our TiVo DVRs to understand the TV viewing habits of a typical DVR user. This second-by-second viewership data is combined with program and commercial occurrence data and our ratings product, Stop Watch offers program ratings, spot ratings and fast-forward rates down to the spot level. A variant of Stop Watch, called Power Watch, offers the same metrics based on a longitudinal sample of 45,000 opt-in households and provides with an ability to analyze viewership patterns based on household demographics or any self-reported data. Any demographic or attitudinal information available for this opt-in panel is derived from survey responses and no personally identifiable information is used. This panel offers the ability to segment viewing data at the household level, and offers program ratings, spot ratings, reach and frequency based on household demographic data. One of the advantages of our Power Watch product is the capability to divide the panel into custom segments such as Planning to buy a house in the next 6 months and to look into the viewing habits of such segments independently or together. Media advertisers use this information to target this segment by placing their advertisements in the programs or network that index highly for the target segment. We created the segment analysis tool in SAS Enterprise Guide, which automatically runs for the selected segments. Once created in SAS Enterprise Guide, the tool offers bar charts at different data cuts for program ratings, statistical significance tests, a brand s performance for the segments in comparison. The tool cut our time spent on data management and creation of the analysis of the iterative segment comparisons for our clients by almost 80%, thereby freeing analysts time and effort for the interpretation of the results in order to offer a summarized view of the target s behavior. This paper demonstrates the creation and execution of the analysis tool in SAS Enterprise Guide. SAS Base and SAS Enterprise Guide 4.2 are the SAS products used in the paper. INTRODUCTION SAS Enterprise Guide provides an easy graphical user interface that can be utilized to cut down the analysis time when running recurring analysis tasks. The effort of automating an entire analysis task is simple and seamless with easy point and click GUI components that makes integration of charting with different statistical techniques possible. This paper outlines and discusses the various features of SAS Enterprise Guide that help in building automation tools for easier and faster analysis results. A case study with blinded results is demonstrated in the paper with screen shots of Enterprise Guide s process flow diagrams along with the explanation of the code. The following flowchart depicts a summarized version of the process flow that gives an idea of the analysis requirements. Subsequent sections will explain how these tasks are executed and automated with SAS Enterprise Guide (EG). Gather the research questions from the stakeholders Create a project in SAS EG and understand what task components can help answer Manipulate the data such that it is compatible with the task. For example, existing data might need transpose to display a horizontal bar chart Check the output options and choose the options that best works. Export of the results can be created as step in the project. Insert code to the existing task components to create standardized reporting Create prompts or macro variables as needed such that dynamic user defined input is available. Figure 1. Process of creating an automated tool 1
ANALYSIS REQUIREMENTS One version of our ratings product Power Watch offers the ability to combine the viewership data with behavioral and/or demographic attributes of the households. This allows the researchers, analysts, media advertisers, brand agencies and networks to look into the viewership differences between these groups of households, called segments. With this data, the analytics team constantly seeks ways to produce intuitive analyses that help answer some of the following research questions: 1. Are there differences in the way households in the segments watch TV? 2. Does timeshifting behavior differ across household segments? 3. Does one segment watch TV at different times of day or in a different way compared to other segments? 4. Do some segments favor broadcast networks over cable networks? 5. Are there favorite networks that different segments choose to watch? 6. Do these segments watch commercials in different ways? And if they do, what are the brands that performed well? The analysis helps understand the TV viewership differences of these mutually exclusive segments such that the target audience can be recognized, which, further helps the brand agencies and advertisers to place their TV spots appropriately in order to reach their intended audience(s). As one can imagine, there are numerous segments that these comparison analysis can be executed upon hence making it a repetitive task. The automated tool is built using SAS EG and is utilized to run these recurring analysis tasks. Step-by-step procedure in the section below explains the sample of the automated tool. AN OVERVIEW OF THE TOOL SAS EG organizes the activities as projects, so the tool here is an SAS EG project which is a collection of the analysis datasets, task components of EG, results produced and the relations between them. The ability to run the entire project with one click starting from a dataset/task in EG depends on how the different components are linked together. Normally, the tasks are linked to their results, but there is an additional provision of link components available by right-clicking an object that makes the implementation completely automated. Once the requirements for the analysis are gathered, SAS EG is set up with a query that connects to the database to pull the data. PROC SQL is used through Microsoft s ODBC connection facility. The query returns the data file that has the relevant historical metrics information for the segments in question and for the time frame that is requested. The query is set to be run in the tool with the segment names and time frame variables set as macro arguments. SAS EG has the option of requesting input from the user and setting the macro variables for the user to input every time the code is run. SASEG calls these inputs as prompts and they can be created, edited and added to any program/task of the project using the Prompt Manager or by right clicking the respective program/task and then selecting the Prompts in the Properties menu. 2
Prompt Manager Figure 2. Screenshot of EG Project Once the prompts have been created, they can be used as the macro variables in the program query (segment_r) in the picture with the standard notation of ¯o_variable. Below figure shows the example of the prompts that the tool uses to pull the segment ratings data for given segments and time period and also makes sure if the results from the previous analysis run have been stored as a reminder. Creating/editing of the prompts is not shared in this paper. Figure 3. Screenshot of Prompt Manager The data pulled from the database is then treated to add labels, create new columns for the ease of the analysis, delete missing data as needed and in general clean and manipulate the data as needed. This process in SAS would have taken good amount of data steps or PROC SQL queried to be written by the analyst, but Enterprise Guide provides with an easy to use Query Builder for any dataset created such that the manipulation of the data becomes easy and effortless. Query Builder provides with an point and click graphical interface like any task component of EG which can be used to join multiple data tables, add new columns, add new prompts or macro variables which can 3
further help in conditional execution of the task, create new variables, summarize the variables in the dataset, filter/sort and create distinct data by removing duplicates and essentially act as one hybrid module for data manipulation. Once the final dataset is ready, analysis components from EG s task menu are selected to be executed. For segment comparisons for selected variables, One-way ANOVA and bar graphs are chosen to explain the viewership differences for different data cuts 1. All time-shifted viewing 2. All time-shifted commercial viewing 3. Viewing by all broadcast networks vs. all cable networks (media type) 4. Viewing by daypart(time of the day during which a program is broadcasted) 5. Viewing by network Bar graphs are used in the PowerPoint presentation to demonstrate the analysis results. Each individual task from the list above has to be run against the statistical tests and relevant bar graph components and only the statistically significant results are presented. SAS EG runs the individual tasks and the results can be linked to one another to automate and finish the complete run of the program at the click of a button. The results can be exported to the local machine and can be used to in the PowerPoint. Writing a VBA script to automatically insert the bar diagrams into the PPT template is also tried when the task of copy pasting the bar graphs became tedious and it is not discussed here as it is beyond the scope of this paper. The results from the statistical tests of the analysis can be stored as tables to use them for filtering the analysis results. For example, the analysis to describe the viewership differences among the segments by network, only networks that show statistically significant differences that meets a threshold of n households difference are selected. The calculation of threshold is coded in a new program and is linked to the statistical output from ANOVA and a query builder is used to join the two output datasets. The picture below depicts the entire tool Filtering the analysis results Figure 4. Screenshot of the tool 4
Another advantage with automating the tool using SAS EG is that the prompts or macro variables that are used in the tool can be added at the first step of the process flow and the macro variables are carried over to be used in appropriate programs/analysis tasks. For example, the macro variable threshold that is used for the filtering the network analysis can be changed for every new segment comparison and the prompt threshold is added to the segment_r, which is used to fire the tool run and is used later by the threshold calculating program once the network analysis is complete. Another aspect of creating customized graphs with SAS EG is to add custom code to the component. For example, if a bar graph needs to have the data displayed inside the bar, code can be added to the bar graph component to do the same by clicking on Preview code and thereby selecting Insert code from Modify Task option that can be accessed by right clicking the bar graph component. SAS EG has an advantage of displaying the results in HTML, PDF and RTF formats, which makes the results easily portable and available for external users. Different result options can be set using the Options choice on the Tools menu. Graph rendering options vary from ActiveX, Java, PNG, GIF and JPEG to name a few. Active X and Java are interactive and the graph options such as axis labels/values, colors of the graph, color/pattern of the background can be changed in the results window. The results can be exported out to the local machine and the export can be added as a step in the project so that the results are stored automatically, once the analysis is complete. There are options to not to override the existing output and once selected, the output files are stored with the time stamp and archiving the historical results is effortless. The magic of the automated run happens when one clicks Run Branch from.. and all the components are queued depending on how they are linked together. SAS EG color codes the components in run such that the queued components are highlighted in yellow and the task currently running is highlighted in green. Below is the screenshot of the tool while running the tool. Figure 5. Screenshot of project in execution mode 5
INDIVIDUAL COMPONENTS OF THE TOOL QUERY BUILDER: For easy data manipulation New columns can be added; datasets can be created, merged with other datasets and summarized Figure 6. Query Builder ONE-WAY ANOVA: Performs the statistical test of one-way Analysis of Variance to determine whether the viewership is significantly different among a group of segments. The variables are assigned to the analysis roles as needed. Since the analysis requires knowing which segments watch differently, Bonferroni s method of multiple comparisons is checked in the means comparison tab of ANOVA component. Bonferroni s correction method, as it is aptly called, controls for the type I error while performing multiple t-test comparisons of all the main effects. An option to look at plot of the means is also provided in the task. Below is the snapshot of the ANOVA component. 6
Figure 7. ANOVA BAR GRAPHS: Vertical bar graphs are used to show the differences and some of the graphs are customized by inserting custom code. Option to insert custom code Figure 8. Bar Graph 7
SAMPLE RESULTS TIME-SHIFTED VS. LIVE VIEWERSHIP Result1. Time-shifted vs. live viewing VIEWERSHIP BY MEDIA TYPE: Result2. Viewership by media type 8
VIEWERSHIP BY NETWORK: Segment 1 Segment2 Segment3 Segment4 Segment5 Segment 1 Segment2 Segment3 Segment4 Segment5 NIK TOON Result3. Viewership by Network CONCLUSION Creating an automated tool using SAS EG has helped improve the efficiency of the delivery of the analysis results to the stakeholders and cut the run time by almost 80%. The analyst team has gained significant amount of time to analyze and interpret the results and explain the differences in the way which the segments watch TV. Important aspects that we found are helpful to create automated and customized tools are using SAS Enterprise Guide - 1. Easy to create and execute GUI that holds collection of data manipulation and analysis tasks 2. Prompt Manager to create macro variables which can be added or modified to any task 3. One switch operation called Run Branch that enables to run all the connected tasks consecutively. 4. Provision to insert custom code to the existing components to create customized output. 5. Ability to provide output in different formats including PDF, HTML and RTF RECOMMENDED READING http://support.sas.com/software/products/guide/index.html#documentation http://www2.sas.com/proceedings/sugi31/109-31.pdf http://www.wuss.org/proceedings08/08wuss%20proceedings/papers/how/how01.pdf 9
CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Aruna Buddana Enterprise: TiVo Inc Address: 2190 Gold Street City, State ZIP: Alviso, CA 95002 Work Phone: 408-914-9827 E-mail: abuddana@tivo.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 10