Will My Followers Tweet? Predicting Twitter Engagement using Machine Learning. David Darmon, Jared Sylvester, Michelle Girvan and William Rand

Similar documents
Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction

Influencer Communities. Influencer Communities. Influencers are having many different conversations

HybridRank: Ranking in the Twitter Hybrid Networks

SOCIAL MEDIA MINING. Behavior Analytics

Reaction Paper Influence Maximization in Social Networks: A Competitive Perspective

Listening & Analytics. Get A Complete Picture of Your Social Performance

The Science of Social Media. Kristina Lerman USC Information Sciences Institute

Figure 1: Live Popularity

Stream Clustering of Tweets

Final Report Evaluating Social Networks as a Medium of Propagation for Real-Time/Location-Based News

Predicting the Odds of Getting Retweeted

Computational Intelligence Lecture 20:Intorcution to Genetic Algorithm

How hot will it get? Modeling scientific discourse about literature

Oracle Real-Time Decisions (RTD) Ecommerce Interaction Management Use Case

Can Cascades be Predicted?

DataCaptive Copyright 2018, All rights reserved. The COMPLETE GUIDE to Craft The Best SOCIAL MEDIA STRATEGY

TABLE OF CONTENTS INTRODUCTION INTRODUCTION

Cold-start Solution to Location-based Entity Shop. Recommender Systems Using Online Sales Records

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

PREDICTING EMPLOYEE ATTRITION THROUGH DATA MINING

Application of Decision Trees in Mining High-Value Credit Card Customers

Marketing Metrics. Maureen Bromwell. Chief Marketing Officer Northern Trust Asset Management.

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

A Survey on Influence Analysis in Social Networks

A BUYER S GUIDE TO CHOOSING A MOBILE MARKETING PLATFORM

TwiTTer Module 5 SeSSion 2: TwiTTer MoniToring And MeASuring ToolS

Estimation of social network user s influence in a given area of expertise

Community Level Topic Diffusion

PROVEN PRACTICES FOR PREDICTIVE MODELING

arxiv: v2 [cs.si] 19 Aug 2014

Propagating Uncertainty in Multi-Stage Bayesian Convolutional Neural Networks with Application to Pulmonary Nodule Detection

Cascading Behavior in Networks. Anand Swaminathan, Liangzhe Chen CS /23/2013

A STUDY ON SNA: MEASURE AVERAGE DEGREE AND AVERAGE WEIGHTED DEGREE OF KNOWLEDGE DIFFUSION IN GEPHI

A Meme is not a Virus:

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

International Journal of Research in Advent Technology Available Online at:

Introduction. Social Media in Ramadan; Exploring Arab User Habits on Facebook & Twitter. Presented by The Online Project

How activating tailored audiences with Lotame can take your Twitter advertising to new heights.

Measuring User Influence on Twitter Using Modified K-Shell Decomposition

Twitter Overview. Twitter Module 1

Social Analytics. Dr. Jai Ganesh Principal Research Scientist

2016 U.S. PRESIDENTIAL ELECTION FAKE NEWS

Socially Engaging Banking

Predicting Popularity of Messages in Twitter using a Feature-weighted Model

Data Mining. Analyzing Social Roles Based on a Hierarchical Model and Data Mining for Collective Decision-Making Support

Using Twitter as a source of information for stock market prediction

arxiv: v1 [cs.si] 18 Dec 2010

A Comparative Study on the existing methods of Software Size Estimation

Behavioral Data Mining. Lecture 22: Network Algorithms II Diffusion and Meme Tracking

An Introduction to Agent-Based Modeling Unit 9: Advanced Topics

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

Machine Learning as a Service

Machine Learning 2.0 for the Uninitiated A PRACTICAL GUIDE TO IMPLEMENTING ML 2.0 SYSTEMS FOR THE ENTERPRISE.

The effect of Product Ratings on Viral Marketing CS224W Project proposal

Generative Models for Networks and Applications to E-Commerce

Analysis of Social Influence and Information Dissemination in Social Media: The Case of Twitter

Facebook vs. Instagram Advertising

Machine learning for Dynamic Social Network Analysis

HOW YOUR CONTENT QUALITY IMPACTS YOUR SOCIAL MEDIA ROI

RECOGNIZING USER INTENTIONS IN REAL-TIME

Data-driven modelling of police route choice

Transforming Social Media Marketing by Analyzing Weather Patterns and Twitter Activity

Identifying Peer Influence in Massive Online Social Networks: A Platform for Randomized Experimentation on Facebook

Getting Started with. Analytics

Information Spread on Twitter: How Does Mention Help?

Using AI to Make Predictions on Stock Market

Geointeresting Podcast Transcript Episode 4: Justin Poole & Carter Christopher June 24, 2015

Towards Data-Driven Energy Consumption Forecasting of Multi-Family Residential Buildings: Feature Selection via The Lasso

Controlling Information Flow in Online Social Networks

Homophily and Influence in Social Networks

Adobe and Hadoop Integration

Adobe and Hadoop Integration

Analytics with Intelligent Edge Sergey Serebryakov Research Engineer IEEE COMMUNICATIONS SOCIETY

Strategic Analytics Framework

GENETIC ALGORITHMS. Narra Priyanka. K.Naga Sowjanya. Vasavi College of Engineering. Ibrahimbahg,Hyderabad.

Linked. THE B2B MARKETING PLATFORM «From social marketing to smart social selling» Semiocast. The Social Media Intelligence Company

Using SAS Enterprise Guide, SAS Enterprise Miner, and SAS Marketing Automation to Make a Collection Campaign Smarter

IBM Cognos Consumer Insight

Using Decision Tree to predict repeat customers

Digital Sales and Marketing Basics

Brief. The Methodology of the Hamilton 68 Dashboard. by J.M. Berger. Dashboard Overview

Data Mining of the Concept «End of the World» in Twitter Microblogs

The B2B Marketer s Guide to INTENT-DRIVEN MARKETING

Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework

AN APPROACH TO APPROXIMATE DIFFUSION PROCESSES IN SOCIAL NETWORKS

Who Matters? Finding Network Ties Using Earthquakes

TeraCrunch Solution Blade White Paper

Prediction of Success or Failure of Software Projects based on Reusability Metrics using Support Vector Machine

ABM PLAYBOOK TESTING WITH ABM ANALYTICS: 4 STEPS TO SEE FUNNEL PERFORMANCE FOR ANYTHING

SOCIAL MEDIA MARKETING 101

Me Too 2.0: An Analysis of Viral Retweets on the Twittersphere

Can Advanced Analytics Improve Manufacturing Quality?

Preference Elicitation for Group Decisions

Marketing Metrics Handbook (Simplified)

Mid-market technology trends: Leveraging disruption to drive value The Dbriefs Private Companies series Anthony Stephan, Principal, Deloitte

MARKETER S GUIDE Four Keys to Website Design in 2017

Interpretive Structural Modelling for Understanding the Inhibitors of a Telecom Service Supply Chain

User Behavior Recovery via Hidden Markov Models Analysis

Advanced analytics at your hands

Transcription:

Will My Followers Tweet? Predicting Twitter Engagement using Machine Learning David Darmon, Jared Sylvester, Michelle Girvan and William Rand Abstract: Managers who are able to understand how social media is evolving first have an advantage over those who are slower to understand what their followers are doing. Despite the advantage such knowledge would bring, user predictability in social media is not well understood. We use two different machine learning methods to model the behavior of 15,000 users on the basis of their past behavior during a seven-week period. We demonstrate that the behavior of users on Twitter can be well modeled as processes with self-feedback. We also explore how different structural segments of Twitter users behave differently. These insights would enable differential targeting schemes that might increase customer engagement with disparate groups of Twitter followers. Keywords: social media, twitter, prediction, machine learning, Twitter 1. Introduction Since the earliest days of marketing research, marketers have known that what one consumer says to another has a larger impact on their decision to purchase a product, respond to an advertising slogan, or participate in a call to action than anything that an advertiser can do (Ryan & Gross, 1943). However, for most of the history of marketing it has been difficult, expensive and time-consuming to tap into word-of-mouth communications between consumers. Recently, the large growth in social media usage presents a unique opportunity for brand managers to examine these conversations. If a manager is proactively monitoring these conversations then they can identify key times to launch marketing campaigns, reach out to key influentials, or even launch proactive countermarketing campaigns. In this paper, we begin research that moves in the direction of predictive social media analytics, i.e., tools that will not only describe current user behavior on social media, but also predict future user behavior. We begin this research project by using past behavior of users to predict when they will produce content (i.e., tweet) on a major social media platform, namely Twitter. We start by describing the data that we use to examine this question, then we discuss two different predictive analytic methods drawn from machine learning tools, computational mechanics and echo state network, and explore initial results of attempting to predict when users will tweet by describing behavioral models of the individuals drawn from these machine learning methods. We also examine the behavioral characteristics of structural segments of Twitter users, and end with a discussion of future directions. 2. Framework and Approaches In order to predict individual behavior on social media, we adopt a computational agent perspective (DeDeo, 2012). The user receives inputs from their surroundings, combines those inputs in ways dependent on their own internal states, and produces an observed behavior or output. In the context of a microblogging platform such as Twitter, the inputs may be streams from other Twitter users, real world events, etc., and the observed behavior may be a tweet, mention, or retweet. As a first approximation to the computation performed by a user, we might consider only the user s own past behavior as possible inputs to determine their future behavior. From this perspective, the behavior of the user can be modeled only from the time points when social interactions occurred (Perry & Wolfe, 2010). Such point process

models, while very simple, have found success recently in describing social systems (Steeg & Galstyan, 2012; Cho, Galstyan Brantigham and Tita, 2013). We propose extending this previous work by explicitly studying the predictive capability of the point process models. That is, given observed behavior for the user, we seek a model that not only captures the dynamics of the user, but also is useful for predicting the future behavior of the user, given their past behavior. The rationale behind this approach is that if we are able to construct models that both reproduce the observed behavior and successfully predict future behavior, the models capture something about the computational aspects, in the sense outlined above, of the user. We explore two machine-learning frameworks that enable this modeling. The first is the causal state modeling approach, motivated by results from computational mechanics, which assumes that every individual can initially be modeled as a biased coin, and then adds structure as necessary to capture patterns in the data. It does this by expanding the number of states necessary to represent the underlying behavior of the agent. Causal state models have been used successfully in various fields (Haslinger, Klinkner & Shalizi, 2010; Cointet, Faure & Roth, 2007; Padro & Padro, 2005). The second approach we explore is echo state networks, which assumr that agent behavior is the result of a complex set of internal states with intricate relationships to the output variables of interest, and then simplifies the weights on the relationships between the internal states and the output variables over time (Jaeger, 2001; Ozturk, Xu & Prícipe, 2007). Echo state networks have proven useful in a number of different domains (Jaeger & Haas, 2004; Salmen & Ploger, 2005; Tong, Bickett, Christiansen & Cottrell, 2007). 3. The Data The data consists of the statuses of 15,000 Twitter users over a 49-day period, of which 12,043 were active during the time period of our study. In this paper, we discard the actual content of the tweets and instead examine just whether or not an individual tweets in a particular time interval. For most of this paper we consider time intervals of 10 minutes, though we have data at the resolution of 1 second. In addition, the users were filtered to include only the top 3,000 most active users over the 49-day period. A base activity measure was determined by the proportion of seconds in the 7 AM to 10 PM window the user tweeted, which we call the tweet rate. Of the top 3,000 users, these tweet rates ranged from 0.38 to 8.5 10-5. 90% of the top 3,000 users had a tweet rate below 0.05. After this filtering, our dataset consists of 3,000 binary time series of length 57,600 (the number of ten-minute intervals in our dataset). 4. Prediction Results The 49 days of user activity were partitioned, chronologically, into a 45-day training set and a 4 day testing set. This partition was chosen to account for possible changes in user behavior over time, which would not be captured by using a shuffling of the days. Thus, for each user, the training set consists of 4,320 timepoints, and the testing set consists of 384 timepoints. In all cases, we predict tweet behavior on the out-of-sample test set counting a correct prediction when we match the tweet/no-tweet prediction right for a ten-minute interval. We compared the accuracy rates on the causal state model and echo state network to a baseline accuracy rate for each user. The baseline predictor is the majority vote of tweet vs. not-tweet behavior over the training days. In the context of our data, for users that usually tweeted in the training set, the baseline predictor will always predict that the user tweets, and for users that usually did not tweet in the training set, the baseline predictor will always predict the user does not tweet. For any process with memory, as we would expect from most Twitter users, a predictor should be able to outperform this base rate.

The comparison between the baseline and the causal state model and echo state network predictors are shown in Figure 1. In both plots, each red point corresponds to the baseline rate on the testing set for a given user, and the blue point corresponds to the accuracy rate on the testing set using one of the models. Here, the tweet rate is computed in terms of the coarsened time series. That is, the tweet rate is the proportion of ten minute windows over the 49 day period which contain tweets. Clearly, the model predictions show improvement over the baseline prediction, especially for those users with a tweet rate above 0.2. Overall, the causal state models and the echo state networks both showed improvement, and in some cases drastic improvement, over a baseline predictor. Moreover, for a large proportion of the users, the two methods gave similar predictive results. These predictive analytics can give managers the ability to predict when users are likely to tweet and as a result they can take that information into account when scheduling the deployment of social media marketing content. Figure 1: The improvement over the baseline accuracy rate for the causal state model and echo state network. In both plots, each red point corresponds to the baseline accuracy rate for a user, and the connected blue point is the accuracy rate using the causal state model or the echo state network. Moreover, since different users may be predicted to tweet at different times and since by inspecting a user s past timeline it is possible to infer what they might tweet about, brand managers can predict when a particular follower or influential who often tweets on a particular topic will tweet. This gives managers the ability to predict when certain topics will be tweeted about. In future work, we also hope to specifically predict when a user will tweet on a particular topic. This involves movingbeyond a binary alphabet to a larger alphabet where different topics are encoded as different symbols. If this proves successful, then managers would have a tool to predict not only when certain users will tweet, but what they will tweet about. 5. Behavioral Models and Network Segmentation The causal state model creates behavioral models for individuals that can be interpreted as transitions for each individual user between states of behavior. These models can be interpreted to map to real-world states of behavior of the users. For instance, states may map to tweeting from work, tweeting from a mobile device, not tweeting while driving, etc. We can draw these models as state diagrams with transitions between the states Tweet Rate Accuracy Rate Baseline CSM Tweet Rate Accuracy Rate Baseline ESN

being labeled by whether or not the user tweets during that transition. We illustrate four such state diagrams in Figure 2. Out of all the users, 58.8% had inferred causal state models similar to Figure 2(b), where a user has a tweeting state A and a non-tweeting state P. To investigate if different communities within the 15,000 users have stereotyped behavior, we applied a community detection algorithm to the network and then considered the statistics of various dynamical measures (statistical complexity, entropy rate, etc., described below) by community. To perform the community detection, we used the fast-greedy a) b) 1 1 1 p B 0 1 p 1 A P 0 0 1 c) d) 0 1 0 1 1 1 A P A 1 I 0 1 P 1 1 1 1 R 0 1 1 1 1 0 R 0 0 Figure 2: Typical 1, 2, 3, and 4-state causal state models. Of the 3,000 users, 383 (12.8%), 1,765 (58.8%), 132 (4.4%), and 100 (3.3%) had these number of states, respectively. algorithm of Clauset, Newman, and Moore (2004). For any network, after partitioning the nodes into communities, we define the modularity of the network as the fraction of edges that lie within communities. We consider the normalized modularity, where we normalize by the expected number of edges in a randomized version of the observed network. This procedure gives a hierarchy of possible community structures, and of those structures we choose the one that maximizes the normalized modularity. A maximum always exists since the normalized modularity is 0 for a network with a single community, and for typical networks, beyond a certain iteration grouping together communities decreases the modularity. Using this procedure on the 15K network, 69 communities were detected at a maximum normalized modularity of 0.2435. Of these, the largest four accounted for 98% of the users. The largest community contained 7520 users. The Twitter account of Om Malik, the account used as the seed in collecting the network, belonged to this community. The remaining three largest communities contained 6410, 400, and 409 users. The causal state model inferred for each user has two complementary metrics associated with it: its statistical complexity and entropy rate. The statistical complexity, loosely, specifies the number of bits into the past of the process that we need to look to optimally predict its future. The entropy rate specifies the inherent unpredictability of the process, due to randomness in its dynamics. These can be computed directly from the inferred causal state model. We can investigate the distribution of these values across the clusters, as shown in Figure 3. We see that most users have statistical complexities between 0 and 1, which corresponds to a causal state model where the user alternates between tweeting and nontweeting states. However, the statistical complexity distributions exhibit long tails across all of the communities, and these tails differ from cluster to cluster. Similarly, the entropy rates tend to cluster near 0.4, but the tails of the distributions differ by cluster. Thus, we see that the communities are heterogeneous in terms of the dynamics of the users contained within them.

Estimated Density 0.0 0.5 1.0 1.5 2.0 2.5 1 2 3 4 Rest Estimated Density 0 1 2 3 4 1 2 3 4 Rest 0 1 2 3 4 5 Inferred Statistical Complexity Inferred Entropy Rate Figure 3. The distribution of statistical complexities and entropy rates across the four largest clusters, grouping the remaining 65 clusters 'Rest'. For the statistical complexities, users with statistical complexity equal to 0, corresponding to a Bernoulli process (a 'coin flip' process) are considered as point masses and not used to infer the continuous density. Examining the behavioral complexity of users in different structural communities, gives managers the ability to examine how different network segments behave. This gives insight into how they might engage with those different users. At the simplest level, since we do not know when users are actively engaged in Twitter except through their tweeting capability, this gives us the ability to predict when they are most likely to be on Twitter. At a richer level it also gives us the ability to predict when a user is most likely to engage on Twitter, meaning that the user may be induced to tweet about the company if provided some useable content. Since these models are combined with network segmentation, it enables the manager to target different segments with different content at different times, tailoring to both the segment s particular preferences for being active on Twitter, but also giving them the ability to tailor the content to the particular segment s interests in a form of social dayparting.. 6. Conclusion and Future Work In this paper, we have shown that by building representations of the latent states of user behavior we can start to predict their actions on social media. We have done this using two different approaches, which have different ways of capturing the complexity of user behavior. Ultimately, the two methods performed very similarly on a large proportion of the users. It should be noted that this was not expected. The two methods differ drastically in their modeling paradigm, and the data was quite dynamic, providing plenty of opportunity for differentiation. Our best explanation is that in the end, most users exhibit only a few latent states of behavioral processing, and therefore any model which is able to capture these states will do well at capturing the behavior of users. We have also shown that different clusters of users based on network ties exhibit very different types of behavior, enabling segmentation of users and the possibility of targeting different social media content to different user groups. One of the biggest weaknesses of the present approach is its failure to incorporate exogenous inputs to a user. That is, we have treated each user as an autonomous unit, and only focused on using their own past behavior to predict their future behavior. In a social

context, such as Twitter, it makes more sense to incorporate network effects, and then examine how the behavior of friends and friends of friends directly impact a user's behavior. For example, the behavior of many of the users, especially those users with a low tweet rate, may become predictable after incorporating the behavior of users in their following network. We have seen that taking a predictive, model-based approach to exploring user behavior has allowed us to discover typical user profiles that have predictive power on a popular social media platform, and that it is possible to segment these users based on network properties, and that these different segments exhibit different behaviors. Such predictions, which take into account social context, could be useful in any number of domains. Brand managers could use these models to understand who will respond to a message that is sent out to a group of users, and potentially even assist in the determination of whether or not a particular piece of content will go viral. Predicting user behavior on social media has the potential to be transformative in terms of both our understanding of human interactions with social media, and the ability of organizations to engage with their audience. 7. References Y.-S. Cho, A. Galstyan, J. Brantingham, and G. Tita, Latent point process models for spatial-temporal networks, arxiv preprint arxiv:1302.2671, 2013. Clauset, A., Newman, M., & Moore, C. (2004). Finding community structure in very large networks Physical Review E, (Vol. 70). J.-P. Cointet, E. Faure, and C. Roth, Intertemporal topic correlations in online media, in Proceedings of 1st International Conference on Weblogs & Social Media (ICWSM), 2007. S. DeDeo, Evidence for non-finite-state computation in a human social system, arxiv preprint arxiv:1212.0018, 2012 R. Haslinger, K. Klinkner, and C. Shalizi, The computational structure of spike trains, Neural Comp., vol. 22, no. 1, pp. 121 157, 2010. H. Jaeger, The echo state approach to analysing and training recurrent neural networks, Fraunhofer Institute for Autonomous Intelligent Systems, Technical Report #148, 2001. H. Jaeger and H. Haas, Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science, vol. 304, no. 5667, pp. 78 80, 2004. M. C. Ozturk, D. Xu and J. C. Príncipe, Analysis and design of echo state networks, Neural Computation, vol. 19, no. 1, pp. 111 138, 2007. M. Padró and L. Padró, A named entity recognition system based on a finite automata acquisition algorithm, Procesamiento del Lenguaje Natural, vol. 35, pp. 319 326, 2005. P. O. Perry and P. J. Wolfe, Point process modeling for directed interaction networks, arxiv preprint arxiv:1011.1703, 2010. Ryan, Bryce and Neal C. Gross (1943), The diffusion of hybrid seed corn in two Iowa communities, Rural Sociology, 8(1), 15-24. M. Salmen and P. G. Ploger, Echo state networks used for motor control, in Proc. IEEE Conf. on Robotics and Automation (ICRA). IEEE, 2005, pp. 1953 1958. M. H. Tong, A. D. Bickett, E. M. Christiansen, and G. W. Cottrell, Learning grammatical structure with echo state networks, Neural Networks, vol. 20, no. 3, pp. 424 432, 2007. G. Ver Steeg and A. Galstyan, Information transfer in social media, in Proc. 21st Int l World Wide Web Conf. ACM, 2012, pp. 509 518.