Machine learning for Dynamic Social Network Analysis

Size: px
Start display at page:

Download "Machine learning for Dynamic Social Network Analysis"

Transcription

1 Machine learning for Dynamic Social Network Analysis Applications: Control Manuel Gomez Rodriguez Max Planck Institute for Software Systems IIT HYDERABAD, DECEMBER 2017

2 Outline of the Seminar REPRESENTATION: TEMPORAL POINT PROCESSES 1. Intensity function 2. Basic building blocks 3. Superposition 4. Marks and SDEs with jumps APPLICATIONS: MODELS 1. Information propagation 2. Opinion dynamics 3. Information reliability 4. Knowledge acquisition APPLICATIONS: CONTROL 1. Influence maximization 2. Activity shaping 3. When to post 4. When to fact check This lecture 2

3 Applications: Control 1. Influence maximization 2. Activity shaping 3. When to post 4. When to fact check 3

4 Influence maximization Can we find the most influential group of users? Why this goal? 4

5 Idea adoption representation (Lecture 2) We represent an idea adoptions using terminating temporal point processes: N 1 (t) Idea adoption: N 2 (t) N 3 (t) User Idea Time N 4 (t) N 5 (t) 5

6 Idea adoption intensity (Lecture 2) Sources (given) Follow-ups (modeled) N 1 (t) N 2 (t) N 3 (t) N 4 (t) N 5 (t) Memory Adopt idea only Previous once Influence from messages by user v user v on user u [Gomez-Rodriguez et al., ICML 6

7 Influence of a set of sources Influence of a set of source nodes A: Average number of nodes who follow-up by time T Probability that a node n follows up t n It depends on the nodes intensity t A = 0 t A = 0 Source Node n (previous slide) 7

8 Influence estimation: exact vs. approx. Approximate Influence Estimation t A = 0 Exact Follow-up Probability Can be exponential in network size, not scalable! t n t A = 0 Key idea Neighborhood Source Node n 8

9 How good are approximate methods? Not only theoretical guarantees, but they also work well in practice. 9 [Du et al., NIPS 2013]

10 Maximizing the influence Once we have defined influence, what about finding the set of source nodes that maximizes influence? Theorem. For a wide variety of influence models, the influence maximization problem is NP-hard. 10 [Du et al., NIPS 2013]

11 NP-hardness of influence maximization The influence maximization can be reduced to a Set Cover problem Set Cover is a well-known NP-hard problem 11 [Du et al., NIPS 2013]

12 Submodularity of influence maximization The influence function satisfies a natural diminishing property: submodularity! F(A) F(B) In practice, s the greedy s solution is often (very close to) the optimal solution Consequenc e: Greedy algorithm with 63% provable guarantee 12 [Du et al., NIPS 2013]

13 Applications: Control 1. Influence maximization 2. Activity shaping 3. When to post 4. When to fact check 13

14 Activity shaping Can we steer users activity in a social network in general? Why this goal? 14

15 Activity shaping vs influence maximization Related to Influence Maximization Problem Activity shaping is a generalization of influence maximization Influence Maximization Fixed incentive One time the same piece of information It is only about maximizing adoption Activity Shaping Variable incentive Multiple times multiple pieces, Many different activity shaping tasks 15

16 Event representation (Lecture 2) We represent messages using nonterminating temporal point processes: N 1 (t) Recurrent event N 2 (t) N 3 (t) N 4 (t) User Time N 5 (t) 16 [Zarezade et al., Allerton 2017

17 Multidimensional Hawkes process For each user u, actions as a counting process N u (t) N u (t) Intensities or rates (Actions per time unit) User influence matrix Non-negative kernel (memory) Exogenous actions Endogenous actions 17 [Zarezade et al., Allerton 2017

18 Steering endogenous actions + Organic actions Directly incentivized actions Intensities of directly incentivized actions Directly incentivized actions 18 [Zarezade et al., Allerton 2017

19 Cost to go & Bellman s principle of optimality Optimization problem Loss Dynamics defined by Jump SDEs To solve the problem, we first define the corresponding optimal cost-to-go: The cost-to-go, evaluated at t 0, recovers the optimization problem! 19 [Zarezade et al., Allerton 2017

20 Hamilton-Jacobi-Bellman (HJB) equation Lemma. The optimal cost-to-go satisfies Bellman s Principle of Optimality Hamilton-Jacobi-Bellman (HJB) equation Partial differential equation in J 20 (with respect to λ and t) [Zarezade et al., Allerton 2017

21 Solving the HJB equation Consider a quadratic loss We propose optimal intensity is: Rewards organic actions Penalizes directly incentivizes actions and then show that the Computed offline once! Closed form solution to a first order ODE Solution to a matrix 21 Riccati differential equation [Zarezade et al., Allerton 2017

22 The Cheshire algorithm Intuition Steering actions means sampling action user & times from u*(t) More in detail Since the intensity function u*(t) is stochastic, we sample from it using: Superposition principle Standard thinning Easy to It only requires sampling from inhomog. Poisson! implemen t [Zarezade et al., Allerton 2017

23 Experiments on real data Experiments on five Twitter datasets (users) where actions are tweets and retweets 1. Fit model parameters exogeneous rate influence matrix 2. Simulate steering endogenous actions directly incentivized tweets chosen by each method 23 [Zarezade et al., Allerton 2017

24 Performance vs. # of incentivized tweets Sports, M(t f ) 5k Series, M(t f ) 5k Cheshire (in red) reaches 30K tweets 20-50% faster than the second best performer 24 [Zarezade et al., Allerton 2017

25 Applications: Control 1. Influence maximization 2. Activity shaping 3. When to post 4. When to fact check 25

26 Social media as a broadcasting platform Everybody can build, reach and broadcast information to their own audience Broadcasted content Audience reaction 26

27 Attention is scarce Social media users follow many broadcasters Instagram feed Older posts Twitter feed Older posts 27

28 What are the best times to post? Can we design an algorithm that tell us when to post to achieve high visibility? 28

29 Representation of broadcasters and feeds Broadcasters posts as a counting process N(t) Users feeds as sum of counting processes M(t) N 1 (t) M 1 (t) N 2 (t) t M(t) = A T N(t) t t M n (t) N n (t) 29 t [Zarezade et al., WSDM 2017]

30 Broadcasting and feeds intensities N(t) M(t) t t Broadcaster intensity function (tweets / hour) Feed intensity function (tweets / hour) Given a broadcaster i and her followers Feed due to other broadcasters 30 [Zarezade et al., WSDM 2017]

31 Definition of visibility function M(t) Visibility of broadcaster i at follower j Position of the highest ranked tweet by broadcaster i in follower j s wall t r ij (t) = 0 r ij (t ) = 4 r ij (t ) = 0 In general, the visibility Feed ranking Ranked stories depends on the feed ranking Older tweets mechanism!.. Post by broadcaster u 31 Post by other broadcasters [Zarezade et al., WSDM 2017]

32 Optimal control of temporal point processes Formulate the when-to-post problem as a novel stochastic optimal control problem (of independent interest) Visibility and feed dynamics Optimizing visibility Experiments System of stochastic Optimal control of equations with jumps jumps Twitter 32 [Zarezade et al., WSDM 2017]

33 Visibility dynamics in a FIFO feed (I) M(t) Reverse chronological order New tweets Older tweets Rank at t+dt Other broadcasters Broadcaster i post a story and posts a story and broadcaster i does other broadcasters not post do not post Nobody posts a story Follower s wall r ij (t)=2 r ij (t+dt) = 3 r ij (t)=2 r ij (t+dt) =0 r ij (t)=2 r ij (t+dt)=2 33 [Zarezade et al., WSDM 2017]

34 Visibility dynamics in a FIFO feed (II) Zero-one law Broadcaster i Other broadcasters posts a story posts a story Stochastic differential equation (SDE) with jumps OUR GOAL: Optimize r ij (t) over time, so that it is small, by controlling dn i (t) through the intensity µ i (t) 34 [Zarezade et al., WSDM 2017]

35 Feed dynamics We consider a general intensity: (e.g. Hawkes, inhomogeneous Poisson) Deterministic arbitrary intensity Stochastic selfexcitation Jump stochastic differential equation (SDE) [Zarezade et al., WSDM 2017] 35

36 The when-to-post problem Terminal penalty Nondecreasing loss Optimization problem Dynamics defined by Jump SDEs 36 [Zarezade et al., WSDM 2017]

37 Bellman s Principle of Optimality Lemma. The optimal cost-to-go satisfies Bellman s Principle of Optimality Hamilton-Jacobi-Bellman (HJB) equation Partial differential equation in J 37 (with respect to r, λ and t) [Zarezade et al., WSDM 2017]

38 Solving the HJB equation Consider a quadratic loss Favors some periods of times (e.g., times in which the follower is online) We propose the optimal intensity is: Trade-offs visibility and number of broadcasted posts and then show that It only depends on the current visibility! 38 [Zarezade et al., WSDM 2017]

39 The RedQueen algorithm Consider s(t) = u*(t) = (s/q) 1/2 s r(t) How do we sample the next time? r(t) Superposition principle t 1 t 2 t 3 t 4 t Δ i exp( (s/q) 1/2 ) t 1 + Δ 1 t 2 + Δ 2 t 3 + Δ 3 t 4 + Δ 4 min i t i + Δ i It only requires sampling M(t f ) times! 39 [Zarezade et al., WSDM 2017]

40 When-to-post for multiple followers Consider n followers and a quadratic loss: Favors some periods of times (e.g., times in which the follower is online) Trade-offs visibility and number of broadcasted posts We can easily adapt the efficient sampling algorithm to Then, we can show that the optimal intensity is: multiple followers! It only depends on the current visibilities! 40 [Zarezade et al., WSDM 2017]

41 Novelty in the problem formulation The problem formulation is unique in two key technical aspects: I. The control signal is a conditional intensity Previous work: time-varying real vector II. The jumps are doubly stochastic Previous work: memory-less jumps 41 [Zarezade et al., WSDM 2017]

42 Case study: one broadcaster Significance: followers retweets per weekday Broadcaster s posts Average position over time True posts 40% lower! 01/06 15/06 31/06 REDQUEEN 42 [Zarezade et al., WSDM 2017]

43 Evaluation metrics Position over time Time at the top Post by broadcaster Post by other broadcasters Follower s wall r(t 1 ) = 0 r(t 2 ) = 1 r(t 3 ) = 0 r(t 4 ) = 1 r(t 5 ) = 2 r(t 6 ) = 0 Position over time = Time at the top = 0x(t 2 t 1 )+ 1x(t 3 t 2 )+ 0x(t 4 t 3 )+ 1x(t 5 t 4 )+ 2x(t 6 t 5 ) (t 2 t 1 ) (t 4 t 3 ) [Zarezade et al., WSDM 2017]

44 Position over time broadcasters true posts average across users Better RE DQU E EN Karimi It achieves (i) 0.28x lower average position, in average, than the broadcasters true posts and (ii) lower average position for 100% of the users. 44 [Zarezade et al., WSDM 2017]

45 Time at the top average across users Better broadcasters true posts REDQU EEN Karimi It achieves (i) 3.5x higher time at the top, in average, than the broadcasters true posts and (ii) higher time at the top for 99.1% of the users. 45 [Zarezade et al., WSDM 2017]

46 Applications: Control 1. Influence maximization 2. Activity shaping 3. When to post 4. When to fact check 46

47 Opinionated, inaccurate, fake news 47

48 Solution: Resort to fact-checks by 3 rd parties Send information to trusted third parties (e.g., Snopes) for fact-checking Challenges 1. Fact-checking is costly 2. Which content to fact-check? 3. What to do after fact-checking? 48 [Kim et al., WSDM 2018]

49 Detect & prevent = flags by crowd + fact check Major social networking sites are testing the following mechanism S D means D follows S Christine Shares & re-shares 3.27pm Bob 3.00pm Exposures Beth Joe 3.25pm Flags 4.15pm Chris 5.00pm David 5.15pm Fact-check pm [Kim et al., WSDM 2018]

50 Challenges Previous procedure faces several challenges: Uncertainty on the number of exposures Flags can be manipulated Fact-checking is costly Tradeoff between flags & This talk Probabilistic exposure models Robust flag aggregation This talk Optimal fact-checking 50 [Kim et al., WSDM 2018]

51 Procedure representation & modeling Shares (one counting process per story) N p (t) Intensities or rates (Shares/exposures per time unit) Exposures (one counting process per story) N e (t) Re-shares Flags It is unknown It depends on whether a story is misinformation Survival rates (Fact-checks per time unit)! Fact-check M(t) [Kim et al., WSDM 2018]

52 Rate of misinformation Exposures (one counting process per story) N e (t) Flags N f (t) Average number of users exposed to misinformation by time t: Estimated from historical data Rate of misinformation: Conditional! Allows for posteriors! [Kim et al., WSDM 2018]

53 When to fact-check? (Survival) rates (Fact-checks per time unit) Fact-check Find survival rates that minimize the rate of misinformation over time: Survival rates This is a stochastic optimal control given the dynamics of exposures, shares, reshares & flags Non-decreasing loss problem for jump SDEs Trade-off fact-checks vs misinformation [Kim et al., WSDM 2018]

54 Solving the optimal control problem We define the optimal cost-to-go J( ): Bellman s Principle of Optimality Dynamics dm(t), dn(t), dn f (t), dn p (t), dλ e (t) Hamilton-Jacobi-Bellman (HJB) Equation Partial differential equation in J (wrt M, N, N f, N p, λ e, and t [Kim et al., WSDM 2018]

55 Optimal solution for fact-checking Fact-check For a general family of shares and exposure intensities Given an additive quadratic loss, the optimal fact intensity for each story is given by: Parameter that trades-off fact-checks vs misinformation It only depends on the current rate of misinformation! 55 [Kim et al., WSDM 2018]

56 The CURB algorithm Intuition Adaptive planning of the time to fact check Exposures Flags The function f(..) uses: Superposition principle Standard thinning Re-shares t 2 f(t 1, ) It only requires sampling O(N e (t f )) times! t 0 t 1 f(t 0, ) t 4 f(t 3, ) t Easy to implemen t 56 t 3 f(t 2, ) [Kim et al., WSDM 2018]

57 Performance vs # of fact checks (Twitter) CURB and the oracle achieve optimal 57 tradeoff between precision & misinformation reduction [Kim et al., WSDM 2018]

58 Misinformation reduction over time (Twitter) Both CURB and the oracle prevent the spread 58 [Kim et al., WSDM 2018]

59 REPRESENTATION: TEMPORAL POINT PROCESSES 1. Intensity function 2. Basic building blocks 3. Superposition 4. Marks and SDEs with jumps APPLICATIONS: MODELS 1. Information propagation 2. Opinion dynamics 3. Information reliability 4. Knowledge acquisition APPLICATIONS: CONTROL 1. Influence maximization 2. Activity shaping 3. When to post 4. When to fact check 59

60 Thanks! Interested? Internships/PhDs/postdoc positions at learning.mpi-sws.org