Towards Modelling Language Innovation Acceptance in Online Social Networks

Size: px
Start display at page:

Download "Towards Modelling Language Innovation Acceptance in Online Social Networks"

Transcription

1 Towards Modelling Language Innovation Acceptance in Online Social Networks Date : 2016/05/02 Author : Daniel Kershaw, Matthew Rowe and Patrick Stacey Source : ACM WSDM 16 Advisor : Jia-ling Koh Speaker : Yi-hui Lee 1

2 Outline Introduction Approach Experiment Conclusion 2

3 Introduction Goal : In this work we demonstrate how such innovations in language can be identified across two different OSN s Online Social Networks through the operationalisation of known language acceptance models that incorporate relatively simple statistical tests. your babe, Before Anyone Else ur bae // 2014 Pharrell Come Get It Bae 3

4 Introduction(cont.) Reddit : Twitter : 4

5 Introduction(cont.) Framework : Input Output Pre-Processing Data Grouping Operationalisation 1. Frequency 2. Form 3. Meaning 4. Classification 5

6 Outline Introduction Approach Experiment Conclusion 6

7 Approach Pre-Processing : TwitterNLP s POS tagger : - remove : hashtags(#), mentions(@), HTTP links through using regex long pattern repetitions of the same letter were truncated down to just three characters, e.g. soooooooo would be normalised to soo. 7

8 Data Grouping : Approach(cont.) Time : To group the data by time a function weekofyear(e) returns the week the Tweet or Reddit post was created on. Word I am a girl I watch the movie Time (weeks Word I, am, a, girl I, watch, the, movie Time (weeks 1 2 8

9 Approach(cont.) Data Grouping : Community : 1. Reddit : Louvain community detection algorithm -Dataset being broken down into on three community levels : local (the sub- reddit), regional (collection of subreddits) and global (all subreddits). 2. Twitter : geographically bound from within the UK this meant that Tweets could be clustered through the use of the longitude and latitude associated with each tweet. Twitter API (coordinates) dev.twitter.com/overview/terms/geo-developer-guidelines 9

10 Data Grouping : Approach(cont.) Community : -low-level community defined by a postcode LA1 could be compared to a subreddit (the lowest community in Reddit), potentially containing a greater convergence on topic and language used -higher level community could be classed as showing the general patterns that are global understood across all sub communities. Word I am a boy I watch the show Word I, am, a, boy I, watch, the, show Commu nity (Twitter/ Reddit) Twitter Twitter Twitter Twitter Reddit Reddit Reddit Reddit Community (Twitter/Reddit) Twitter Reddit 10

11 Operationalisation : Approach(cont.) Frequency : Word I, am, a, girl, I, am, a, boy I, watch, the, movie, I, watch, the, show When, bae, eat Time (weeks) 1 2 n Word I am a girl boy I watch the movie show When bae eat Time (weeks) n n n T(w, t) 2/8 2/8 2/8 1/8 1/8 2/8 2/8 2/8 1/8 1/8 11

12 Operationalisation : Approach(cont.) Form : Word I, am, watching, I, am, listening, homosexual I, am, homosexual, they, are, homogeneous, joking When, bae, eating, homogeneous Time (weeks) 1 2 n Word homo homo homo Word ing ing ing Time (weeks) 1 2 n n Time (weeks) 2 1 n n MP(w, t, P) 1/7 2/7 MS(w, t, S) 2/7 1/7 12

13 Operationalisation : Approach(cont.) Meaning : -Word2vec -W2V t c : word2vec to each community (c) 13

14 Operationalisation : Approach(cont.) Meaning : 14

15 Approach(cont.) Operationalisation : Meaning : similarity between communities while still showing variation. If the value is near 0 then it could mean that the word is too diverse for general usage (i.e. too colloquial), while a word with a value near 1 would potentially indicate that it is too specific. 15

16 Operationalisation : Approach(cont.) Classification : Increase/Decrease - : Spearman s Rank bae t 1 2 n Tw *n TGIF t 1 2 n Tw Increase Decrease

17 Approach(cont.) Operationalisation : Limitations : The three method proposed though do not cover all the categories proposed through the VFRGT and FUDGE frameworks 17

18 Approach(cont.) Framework : Input Output Pre-Processing Data Grouping Operationalisation 1. Frequency 2. Form 3. Meaning 4. Classification 18

19 Outline Introduction Approach Experiment Conclusion 19

20 Experiment Frequency : 20

21 Experiment(cont.) Form : 21

22 Experiment(cont.) Meaning : classified as an innovation did not appear across all the communities, but when they did they they appeared at a low rank and thus the learned embedding, from the word2vec function, generated sparse words within the context of the innovation. 22

23 Outline Introduction Approach Experiment Conclusion 23

24 Conclusion demonstrated that through the use of relatively simple statistical tests one is able to use known linguistic models to assess language and its change in on-line social networks when the methods are applied to two on-line social networks, they can show variation in innovations usage and persistence these methods can be applied to the individual communities that make up the networks, where we have shown how varying community structure has poten- tially different language dynamics. 24

25 Conclusion(cont.) Future work : look into identifying the dynamics of language innovations within the context of users, along with the influence communities have over language and innovation diffusion. 25