Measuring Pair-wise Social Influence in Microblog

Size: px
Start display at page:

Download "Measuring Pair-wise Social Influence in Microblog"

Transcription

1 2012 ASE/IEEE International Conference on Social Computing and 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust Measuring Pair-wise Social Influence in Microblog Zibin Yin Ya Zhang Institute of Image Communication & Networking Shanghai Jiao Tong University Shanghai, China {zibinyin, ya Abstract The development of Microblog services has created an unprecedented opportunity for people to share information. To better understand the information propagation behaviors in such social networks, an important task is to measure the influence among users. A number of previous works measure users influence through analyzing the network characteristics or by retweet rate. However, high indegree not necessarily means influential and retweet rate fluctuates over time. In this paper, we propose a user interaction model in microblog by considering the following three key factors: user s active level, user s willingness to retweet, and the influence between a pair of users. One advantage of this model is that the model fitting only requires a subgraph and hence may be performed in a piece-wise fashion. Furthermore, we can find the users with potential influence in the network. We fit the model with a Sina Microblog dataset. We show that this model is able to predict influence at high accuracy. Moreover, this model can be used to predicting retweet rate and finding influential users. Keywords-Microblog; Influence; Social Network; Prediction I. INTRODUCTION Microblog has revolutionized the way information is propagated and consumed through providing a combined social network and messaging service. It enables people to share information about their activities and opinions in the form of short messages of 140 characters called tweets. It also allows people to selectively subscribe to other people s tweets. As the most prominent microblogging service, Twitter has more than 140 million registered accounts who post 110 million tweets per day. The counterpart of Twitter in China, Sina Microblog, has also been growing exponentially since its inception in 2009 and has now over 200 million registered users. With its significant user volume, microblog has now been recognized as one of the most important types of media. Corporations and organizations, as well as individuals, have attempted to market themselves through this channel. Measuring social influence has become one of the most important tasks for social network analysis. While there could be many interpretations for the meaning of social influence, we consider it as the capability to engage one s audience in conversations and have one s messages spread. Many applications (e.g. viral marketing [1] and personalized recommendations [2]) leverage individuals with high social influence to optimize information propagation. For example, a marketing campaign can reach a Corresponding author. wider range of audience at lower cost by advertising through the most influential users (e.g. celebrities). The measurement of influence has been widely studied in the field of social networks. The number of followers is considered to be one of the most important indicators of influence, i.e. a large number of followers indicate high influence. However, several studies have shown that the number of followers does not correlate well with influence [3]. Some other studies use retweet rate as the measurement of influence [3], [4], [5]. Nevertheless, retweet rate in microblog is often influenced by factors such as the type of content in the message, popularity of messages, and the users active level. Retweet rate for an individual often changes over time and does not reflect the inherent characteristic of the users. Another set of popularly employed measurement for influence is based on the network structure. PageRank[6] and HITS[7], while not originally proposed for computing influences in social networks, are naturally adopted to rate users in a microblog site. However, the scores given by the two algorithms do not necessarily precisely reflect the influence of users because the following relationship does not mean influence. In addition, there are some spoofing techniques with regards to PageRank and HITS algorithms. For example, one can raise his/her influence score by following some authority users. More recent years, several websites (e.g. Klout, PostRank and PeerIndex) start to offer services to compute individual s influence in a comprehensive way based on their profiles in the few most popular social network sites. The term social influence may be measured at two scales: at global level (i.e. user s global social influence) and between two users (i.e. pairwise social influence). Most of the existing studies measure the social influence at the global level. In this paper, we attempt to measure the reciprocal influence between individuals with a simple information propagation model. We model the user s action to messages with three important factors: user s active level (A), user s willingness to retweet (W), and the pairwise influence between two users (I). The first two factors represent a user s personal characteristics, and the third one is what we like to infer. We name the proposed information propagation model AWI model. The model utilizes the structural property of the network as well as user behaviors and interactions. This differentiates our measure of influence from earlier ones [4], [8], [9], which are mostly base on the retweet rate and/or the number of followers /12 $ IEEE DOI /SocialCom-PASSAT

2 To our best knowledge, the proposed AWI model is the first model to incorporate a user s active level and willingness to retweet in such information propagation model. Fitting the model with social interaction data provides us an accurate estimation of the social influence between a pair of users. Moreover, assuming the influence between individuals are independent of each other, fitting this model may be performed locally, i.e. inferring a user s influence to a user i only requires fetching the profiles of user i and all his followees. We build a dataset for validating this model by using the Search API of Sina Microblog. We show empirically that the proposed model is both effective and efficient in predict the social influence. The rest of the paper is organized as follows. In Section II, we provide a summary of the related work for measuring social influence. We introduce the proposed AWI model and present how to fit the model parameters with social interaction data in Section III. Details about the experimental data gathered from Sina Microblog are presented in Section IV. The experimental results with the comparison to previous methods are presented in Section V. Finally, we conclude the paper with Section VI. II. RELATED WORK Microblogging service has attracted a lot of attentions from researchers in various disciplines. Early research initiatives mainly studied the network structures and properties of the Twitter network, such as investigating Twitter s information propagation patterns [10] and analyzing structures of the Twitter network to identify influential users[3], [9]. Twitter messages are successfully applied to other fields[11], [12], [13]. Despite of the above efforts, the problem of how to characterize information propagation and social influence, two types of important but not well defined social behaviors, is still left open. Some studies discussed the presence of key influential people in a social network, which highlights the value of highly connected individuals as key elements in the propagation of information through the network. Ghosh et al.[14] summarized graph-based influence measures into two types: geodesic pathbased ranking measures and topological ranking measures. Kwak et al.[10] ranked Twitter users by the number of followers and by PageRank and found that these two rankings are similar. Recently, many studies have revealed that number of followers are not necessarily a good indicator for influence. Cha et al.[3] compared three measures of influence on Twitter: indegree, retweets, and mentions. Their study shows that the indegree of users did not correlate well with retweets and mentions, while the later two correlated well with each other, indicating that the number of followers is not a good measure of influence. Similarly, Ye et al.[15] proposed three metrics for social influence: follower influence, reply influence and retweet influence, and examined their stabilities, assessments, and correlations with each other. Huberman et al.[16] define friends in a network as users that exchanged directed messages. They observed that the number of messages correlates better with the number of friends than the number of followers/followees. Kwak et al.[10] examined the propagation of tweets and found that up to 1,000 followers, the average number of users being covered by a retweet is not affected by the number of followers the originator has. In addition, there are many studies about the user s activity in microblogging service. Agarwal et al.[17] discovered that the most influential bloggers were not necessarily the most active in the blogosphere. Remero et al.[4] propose a model which shows that the influence of a user depends on not only on the size of the influenced audience, but also on their passivity. These studies motivate us to incorporate a user s active level in the information propagation model. Galuba et al.[18] proposed an information propagation model which simultaneously takes into account of several key factors: content popularity, influence between individuals and the rate of propagation. Similarly, Goyal et al.[19] proposed both static and time-dependent models for capturing influence and presented algorithms for learning the parameters of the various models and for testing the models. Their maor goal is to predict the probability of influence between two neighboring users. The above two studies are probably the closest works to this paper. The maor difference between their works and our proposed model is that we introduce a measure of user s willingness to retweet and active level to the information propagation model. III. THE AWI MODEL FOR TWEET PROPAGATION Suppose the user i is a follower of the user and the user posts a tweet through a microblog service. We want to build a model for retweeting behavior of the user i, i.e. predicting whether the user i will retweet to a tweet or not. We assume that the retweeting behavior of a user is determined by the following three factors: active level, willingness to retweet, and pairwise influence. Active level: If the user i retweets a tweet, he must see the tweet first. It is reasonable to assume that the probability a user sees a tweet is in proportion to the probability that the user is active on the microblog site. We call the later probability the user s active level and use it to model the retweeting behaviors among users. Willingness to retweet: Some users are observed to be more likely to retweet than other users. We call this property willingness to retweet and consider it as a factor for modeling users retweeting behaviors. Pairwise influence: The influence of the user on the user i is considered to be one of the key element that decide the retweeting behavior of the user i. The above three factors may be seen as important properties of the corresponding users. The first two properties are inherent to individual users and the third one represents the interaction between two individuals. All of them are hidden variables in the model and only the user s retweeting behavior is observable. We now present the proposed model for individual s retweeting behaviors in microblog and illustrate it in Figure

3 Fig. 1. A i R i W i I i α i ω The Retweeting Behavior Model for Microblog Users. The model contains four component variables: active level, willingness to retweet, pairwise influence between two users and retweets between two users. We call the model AWI model thereafter. The model makes the following assumptions about a user s retweeting behaviors in microblog: A user retweets a followee s tweet if and only if he is active with the willingness to retweet and influenced by his followee. Active users are more likely to publish tweets, retweets and comments. A user with high willingness to retweet are more likely to retweet a followee s tweet. The influence of the user to the user i is independent of the user i s the active level and the retweet willingness. Based on the above assumptions, we may formulate the model with the following equation: R m i =1 A =1,A i =1,W i =1,I i =1,F i =1, where F i, R i, A i, W i and I i are all binary variables. F i = 1 means the user i is a follower of the user, Ri m =1means the user i retweets the message m from the user, A i =1 means the user i is active, W i = 1 means the user i has the willingness to retweet, and I i =1means the user i is influenced by the user. According to the model, suppose the user i is the user s follower, given the message m by the user, the probability of a retweet to m from user i may be expressed as follows: P (Ri m =1 A =1,F i =1) =P (A i =1,W i =1,I i =1 A =1,F i =1) = P (A i =1)P (W i =1 A i =1)P (I i =1 F i =1) P (A =1) = α iω α where α i = P (A i =1)reflects the active level of the user i, ω i = P (W i =1 A i =1)corresponds to the willingness to retweet of the user i, and f i = P (I i =1 F i =1)represents the strength of the influence of the user on the user i. The active level (α i ) and willingness to retweet (ω i ) of the user i often fluctuate with time. We assume the two variables are constant only in a short time frame. The influence of the user to the user i (f i ) is considered to be much more stable than (α i ) and (ω i ). Assuming that Ri m is independent of each other, we may write down the likelihood of all observations of the users retweeting behaviors as follows: L = ( α iω ) R m α i i ω (1 ) 1 R m i (1) α α F i=1,m M where M is the set of messages published by the user. As the size of a social network is usually overwhelmingly large, model fitting at the entire network scale is very expensive, if not impossible. In fact, one advantage of the proposed AWI model is that it requires only a local subgraph of a user in order to fit the parameters for that user. With this property, we can fit the parameters for the entire network in a piece-wise fashion. A. Active Level The active level of the user i often fluctuates with time. It is impossible to determine a user s active level for a given time point. In this study, we assume that a user s active level is stable within a certain time span T and it is in proportion to the number of tweets published by the user. In order to estimate a user i s active level α i within T,we first divide T into S small time units t 1,t 2...t S of duration τ 0 (S = T τ 0 ). A binary variable y k is used to indicate whether the user posts any message within t k. Assuming that the individual s activeness follows a binomial distribution, the maximum likelihood estimation of the active level for the user i within T may be represented as i = n{y k =1} (2) S where n{y k =1} means the number of small time units with y k =1. B. Willingness to Retweet The willingness to retweet reflects the tendency of a user to react to other people s messages and is an inherent characteristic of the users. We treat it as another important factor in the model. We use the following example to illustrate the importance of including the willingness to retweet in the model. Suppose there are two users (A and B) with the same active level and both of them follow another user (C). Suppose A and B react similarly to C s messages (with the same retweet rate). If A also retweets many other people s messages at similar rate but B barely reacts to other users message, then we say A has a higher willingness to retweet than B. By incorporating this variable in the model, we are able to distinguish the user A from the user B. Similar to the estimation of the active level of a user, we assume a binomial distribution for the willingness to retweet. With the maximum likelihood estimation, we can formulate the user i s willingness to retweet in time span T : ω (T ) i = RT i TW i (3) 504

4 where RT i means the number of retweets that the user i posted in T, and TW i mean the total number of messages(tweets/retweets) posted in T. Laplace smoothing is adopted to avoid the situation of dividing by 0. C. The Influence Score The active level and willingness to retweet of a user are observed to change over time. Considering this situation, we divide the time span into a set of segments of duration τ 1 (denoted as Ts). Within each segment, the active level and willingness to retweet are considered to be relatively stable. For each user, the values of the two variables within each time segment are first independently estimated. The influence score of each user is obtained by maximizing likelihood function, which is a reformulation of Eq. 1 taking account of the time segments. L = T TsF i =1 m M ( α(t ) ) Rm(T ) i (1 α(t ) ) 1 Rm(T ) i By definition, the variable f i and the formula α(t ) assume a value between 0 and 1. Hence, the influence score f i may be estimated by solving the following optimization problem: max fi logl (4) s.t. 0 α(t ) 1 0 f i 1 We use Newton s Method to solve the above optimization problem and get the influence score f i. IV. DATA COLLECTION Sina Weibo (literally Sina Microblog ) is a famous Chinese microblogging site launched by SINA Corporation in August Up to now, it has more than 200 million registered users and there are more than 3 million tweets published every day, on average 40 tweets per seconds. In this study, the social interaction data was collected from Sina Microblog using its API service. Due to the restrictions on the API access, it is impossible for us to get the entire dataset from Sina Microblog. We selected a set of users from Sina Microblog and retrieved their profiles and posting behaviors as well as their following relationship. The dataset we collected contains 52,722 users, 1,064,739 following and follower relationships, and 23,018,041 messages. Profile information retrieved for each user includes the ID, name, location, gender, description, whether the user is verified, and the number of followers and followees. The fields associated with each message include the poster s ID and name, as well as the ID, content and creation time stamp of this message. If a message is a retweet/reply to another message, the original message s message ID and poster ID are also included. The post date for the messages spans from Aug , when Sina Microblog was ust launched, to Mar V. EXPERIMENTS In this section, we present the experiments on the dataset fetched from Sina Microblog. To evaluate the accuracy of the AWI model, we first show that the model predicts the retweet rates of users with high accuracy. Then we use the influence scores as edge weights in the user-user relationship graph and compute the weighted PageRank for each user. We compare the rank of the most influential users output by several other algorithms including PageRank[6], HITS[7], and Remero s IP algorithm[4]. A. Prediction of Retweet Rates Based on our observation of the Sina Microblog dataset, a user s active level and willingness to retweet are time-sensitive properties of the users. On the other hand, the influence between a pair of users is relatively stable. In this experiment, we validate the accuracy of the AWI model in predicting individuals retweet rate. We first divide the fetched Sina Microblog dataset into two sets by the time stamps of the messages: Training set: The training set contains messages with earlier time stamps. The data set is used to estimate the pairwise influence scores for the users. Test set: The test set contains messages with more recent time stamps. The retweet rates of individual users are computed for the test set and used as ground truth. There are two parameters τ 0 and τ 1 in the experimental process, where τ 0 is the duration of time unit used for calculating the active level, and τ 1 is the duration of time segment within which active level and willingness to retweet are considered to be relatively stable. As Figure 2 shows, we divide the time span of our dataset into segments by τ 1, where training data contains K segments and test data contains L segments. We denote the kth segments as T k (k =1,,K + L). Fig. 2. The Time Span of Dataset is divided into Segments by τ 1 The experimental process for training and testing is designed as follows. The training process is to get an estimation of the pairwise influence scores from the training data. In the test step, we predict the retweet rate of user i to user in the test data by following the principle of our AWI model. Training: 1) Divide the training data into K segments by τ 1 ; 2) Estimate the active level(α (T k) i ) for each user within each time segment T k (k =1,,K) according to Eq.2; 3) Estimate the willingness to retweet (ω (T k) i ) for each user within each time segment T k (k =1,,K) according to Eq.3; 4) Estimate the influence score (f i ) for each user pair according to Eq.4; 505

5 Testing: 1) Divide the test data into L segments by τ 1 ; 2) Predict the active level (α (T k) i ) of user i within each time segment T k of test data. As the active level is considered to be relatively stable within τ 1, we use the data within T k 1 to predict α (T k) i according to Eq.2 (k = K + 1,,K + L); 3) Similarly to the active level, we predict the willingness to retweet (ω (T k) i ) of user i within each time segment T k of test data according to Eq.3 by using the data within T k 1 (k = K +1,,K + L); 4) Predict the retweet rate of user i to user within each time segment T k (k = K +1,,K + L) with Ri k = α (T k) i ω (T k), where f i is the influence score estimated in training process; 5) The predicted retweet rate of user i to user in test data is obtained as follows, where n T k is the number of messages user posted within each time segment T k (k = K +1,,K + L). P redictrate i = K+L k=k+1 nt k Rk i K+L k=k+1 nt k 6) Calculate the mean square error (MSE) between the predicted retweet rate and the actual retweet rate in the testing data. MSE = 1 (P redictratei TrueRate i ) 2 N where N is the number of user pairs, TrueRate i is the ratio of the number of retweets user i to user to the number of messages that user posted in the test data. We set training data to the first 870 days, and test data to the next 68 days. Furthermore, we experimented with a range of segment sizes τ 0 and empirically set the size of the time unit to be 1 day in the rest of experiments. We also tried different sizes of τ 1 including 30 days, 15 days, 10 days and 5 days. TABLE I THE MSE OF PREDICTING RETWEET RATE EXPERIMENT # Messages Baseline AWI model( 10 4 ) with different τ 1 ( 10 4 ) 30 Days 15 Days 10 Days 5 Days > Table I shows the results of this experiment, where we use the retweet rate from the training data as a baseline and compare the MSE in different user groups which are classified by the number of messages user posted for each pair of users. In Table I, it is clear that the influence score of our AWI model has a good performance in predicting retweet rate. Furthermore, the smaller τ 1 is, the smaller MSE of AWI model is, so we empirically set τ 1 to 5 days in the rest of experiments. The high accuracy of predicting retweet rate is attributed to the stability of the influence score. We calculate the Pearson correlation coefficients of retweet rates and influence scores between training data and test data. The correlation coefficient of influence score is , while the correlation coefficient of retweet rates is only B. Finding Influential Users Finding influential users is crucial in sociology and viral marketing. However, popular users who have lots of followers are not necessarily influential in terms of spawning retweets. Cha et al.[3] find that the influence of a user does not gained spontaneously or accidentally, but through concerted effort such as limiting tweets to a topic. So the topological mesaures such as indegree alone reveal very little about the influence of a user. Based on the obtained dataset we generate a weighted graph G = (N,E,W) and set the influence score (f i ) as the weight of edge. We use this graph to compute the weighted PageRank values for each node because PageRank has been widely used to rank web pages as well as people based on their influence[6], [9]. In this experiment, the weight calculation method differentiates our results of influential users from earlier works. In order to compare it with other algorithms, we also compute PageRank[6], HITS[7] and Remero s Influence- Passivity algorithm (IP algorithm)[4]. Table II shows the top 20 users who are ranked by the values of weighted PageRank (with the weighted edge of our method) as well as the ranks by other algorithms, and the number of followers. As Table II shows, some famous organizations and celebrities such as Sina Technology, Newsweek and Jiong He rank high regardless of algorithm. However, there are some users who rank very low in the other algorithms. For example, Embassy of Maldives is a official user, but it has only 30 followers which is much smaller than others. As we known, Maldives is a famous tourist destination, most of the messages Embassy of Maldives posted are the introduction of local attractions. These messages gained lots of retweets, and propagated very widely in the network. Although there are only a few users who follow Embassy of Maldives, users can also acquire the updates of Embassy of Maldives. Android APP is another example. Android APP only got 77 followers, but Android APP is a famous application market of Android which applications are very popular in the network. Lots of user share the APPs from Android APP even though they do not follow Android APP. There is an upper limit of the number of followees in Sina Microblog (The maximum number of followees is 2000), as well as Twitter.com. Some users maybe choose not to follow the users like Embassy of Maldives and Android APP because they can also get information from other followees. Despite the lack of followers, the users like Embassy of Maldives and Android APP are no doubt influential users. Previous 506

6 TABLE II THE RANK BY DIFFERENT ALGORITHMS VALUES INCLUDING PAGERANK, HITS, REMERO S IP ALGORITHM AND WEIGHTED PAGERANK Name Followers Page HITS- IP- wighted Rank auth influence PageRank Sina Technology Android APP Newsweek Jiong He Xiaoyu Wang elong Shiyi Pan Global News Fashion Trend Headline Blog Microworld Chenggong Bi Embassy of Maldives William Feng Vista Story ZINGT Blog Sina Fashion Phoenix News Food Channel Southern Weekly algorithms ignore these users, but our AWI model can be used to discover them. To further illustrate the difference between our method and others, we calculate the Pearson correlation coefficients among the number of followers, and the values of these algorithms. There are very large correlation coefficients between the number of followers and the values of PagRank, as well as the auth of HITS. The values reach and , because these algorithms only care about the structure of user graph and ignore user s behaviors. Although Remero et al.[4] use retweet rate as the weight of edge in IP algorithm, the result also has a high correlation with the number of followers. The value is that shows IP algorithm is an approach which emphasizes indegree too. Despite following a user is an important behavior in microblogging service which means interest and influence, it is not the only indicator of measuring influence. As we use the influence score as the weight, the correlation coefficient between the value of our method and the number of followers is which is much smaller than others. That is because our method eliminates the influence of robot accounts, suspended accounts and unaffected users. Taking account of user s personal characteristics as well as the structure of network makes us find the potential influential users who are ignored before. VI. CONCLUSION Microblogging service is described and classified as one of the most rapid growing social network, and vast efforts are devoted to it. In this paper, we presented the AWI model for tweet propagation in microblog. We use this model to infer the influence score between a pair of users. Our measure of influence represents the property of the network as well as the user s character. There are some advantages in the AWI model. It can eliminate the influence of robot accounts and suspended accounts and find user s potential influence. The influence score in our algorithm shows more durable than retweet rate that is more in line with the definition of influence. Through several experiments, we find that we can use the influence score in predicting retweet rate and finding influential users. They all have good performance. Specially, our method can find the some influential users who are ignored before. This model can be modified with the content of messages that will be our future work. ACKNOWLEDGMENT This work is partially supported by Shanghai Science and Technology Rising Star Program (11QA ), Shanghai Talent Development Fund ( ), 973 Program (2010CB731406), and STCSM. (12DZ ). REFERENCES [1] P. D. Matthew Richardson, Mining knowledge-sharing sites for viral marketing, the Eighth ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD.02), [2] K. H. B. T. Xiaodan Song, Yun Chi, Information flow modeling based on diffusion rate for prediction and ranking, The 16th international conference on World Wide Web( [3] F. B. K. P. G. Meeyoung Cha, Hamed Haddadi, Measuring user influence in twitter: The million follower fallacy, The 4th International AAAI Conference on Weblogs and Social Media (ICWSM 10), [4] S. A. B. A. H. Daniel M. Romero, Wociech Galuba, Influence and passivity in social media, The 20th ACM Conference Companion on World Wide Web, [5] A. K. Aron Yu, C. Vic Hu, Khyrank: Using retweets and mentions to predict influential users, [6] R. M. T. W. Lawrence Page, Sergey Brin, The pagerank citation ranking: Bring order to the web, Technical Report , Stanford InfoLab, [7] J. M. Kleinberg, Authoritative sources in a hyperlinked environment, JOURNAL OF THE ACM, [8] A. ronel Martìnez Teutle, Twitter: Network properties analysis, Electronics, Communications and Computer (CONIELECOMP), the 20th International Conference, [9] J. J. Q. H. Jianshu Weng, Ee-Peng Lim, Twitterrank: finding topicsensitive influential twitterers, WSDM, [10] H. P. S. M. Haewoon Kwak, Changhyun Lee, What is twitter, a social network or a news media? The 19th International World Wide Web (WWW) Conference, [11] X.-J. Z. Johan Bollen, Huina Mao, Twitter mood predicts the stock market, Computational Science, [12] Y. M. Takeshi Sakaki, Makoto Okazaki, Earthquake shakes twitter users: real-time event detection by social sensors, 19th Int. Conf. on World Wide Web (WWW), [13] E. K. J Ritterman, M Osborne, Using prediction markets and twitter to predict a swine flu pandemic, The 1st International Workshop on Mining Social Media, [14] K. L. Rumi Ghosh, Predicting influential users in online social networks, KDD workshop on Social Network Analysis (SNA-KDD), [15] F. W. Shaozhi Ye, Measuring message propagation and social influence on twitter.com, The 2nd International Conference on Social Informatics (SocInfo 10), [16] F. W. Bernardo A Huberman, Daniel M Romero, Social networks that matter: Twitter under the microscope, First Monday, [17] L. T. P. S. Y. Nitin Agarwal, Huan Liu, Identifying the influential bloggers in a community, WSDM, [18] D. C. Z. D. W. K. Wociech Galuba, Karl Aberer, Outtweeting the twitterers - predicting information cascades, Microblogs 3rd Workshop on Online Social Networks, WOSN, [19] L. V. L. Amit Goyal, Francesco Bonchi, Learning influence probabilities in social networks, WSDM,