Data Mining in Social Network. Presenter: Keren Ye

Similar documents
Data Preprocessing, Sentiment Analysis & NER On Twitter Data.

Context-Sensitive Classification of Short Colloquial Text

Who says what to whom on Twitter

Predicting the Odds of Getting Retweeted

Automatic Detection of Rumor on Social Network

Stream-based active learning for sentiment analysis in the financial domain

Tweeting Questions in Academic Conferences: Seeking or Promoting Information?

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

TEXT MINING APPROACH TO EXTRACT KNOWLEDGE FROM SOCIAL MEDIA DATA TO ENHANCE BUSINESS INTELLIGENCE

Smartphone Product Review Sentiment Analysis using Logistic Regression

PAST research has shown that real-time Twitter data can

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks

TwitterRank: Finding Topicsensitive Influential Twitterers

Lumière. A Smart Review Analysis Engine. Ruchi Asthana Nathaniel Brennan Zhe Wang

A Framework for Analyzing Twitter to Detect Community Crime Activity

SOCIAL MEDIA 101: SOCIALIZING, PROMOTING AND STORYTELLING FLORIDA FFA ASSOCIATION

DETECTING COMMUNITIES BY SENTIMENT ANALYSIS

Measure Social Media Like a Pro: Social Media Analytics Uncovered

Should We Use the Sample? Analyzing Datasets Sampled from Twitter s Stream API

Social Media Analytics

Measuring message propagation and social influence on Twitter.com

Cloud Assisted Trend Analysis of Twitter Data using Hadoop

How to Set-Up a Basic Twitter Page

Unveiling the Patterns of Video Tweeting: A Sina Weibo-based Measurement Study

Putting abortion opinions into place: a spatial-temporal analysis of Twitter data

Squibs. Evaluation Methods for Statistically Dependent Text. Sarvnaz Karimi CSIRO. Jie Yin CSIRO. Jiri Baum Sabik Software Solutions

Public Opinion Mining on Social Media: A Case Study of Twitter Opinion on Nuclear Power 1

Data Mining of the Concept «End of the World» in Twitter Microblogs

Welcome to. The Social Media System Twitter Success System

TWITTER GUIDE TABLE OF CONTENTS

Marketing and CS. Ranking Ad s on Search Engines. Enticing you to buy a product. Target customers. Traditional vs Modern Media.

Social Media Fundamentals: A Beginner s Guide

Topic 6 - Promotion. N5 Business Management

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

Analyzing the Impact of Social TV Strategies on Viewer Engagement

Who s Here Today? B2B Social Media: Why?

Information vs Interaction: An Alternative User Ranking Model for Social Networks

Social Media 101: A Twitter Primer

TWITTER FEEDS SENTIMENT ANALYSIS AND VISUALIZATION. College of Technology, Lyceum of the Philippines University, Philippines

Hey World, Let s Get Social! Social Media Tips, Do s and Don ts

SOCIAL MEDIA GLOSSARY General terms

Twitter. Runa Sarkar Indian Institute of Management Calcutta

Classification Model for Intent Mining in Personal Website Based on Support Vector Machine

Predicting Twitter Hashtags Popularity Level

Social Media Analytics for E-commerce Organisations

Leveraging the Social Breadcrumbs

BUS 168 Chapter 6 - E-commerce Marketing Concepts: Social, Mobile, Local

Increase Online Sales Through. Viral Social. How to Build Your Website Traffic and Online Sales Using Facebook, Twitter, Viand; : t3 nked I n

Twitter Sentiment Analysis on Demonetization an Initiative Government of India

HOW TO SETUP YOUR PROFILE

Visualizing Crowdfunding

Automatic Sentiment Analysis of Twitter Messages Using Lexicon Based Approach and Naive Bayes Classifier with Interpretation of Sentiment Variation

Twitter 101: How Physical Therapists Organizations can Engage, Persuade and Energize their Membership. How Most of Us Started with Twitter

Using Social Sensors for Detecting Power Outages in the Electrical Utility Industry

Chief Author: Zafer Younis Co-Author: Roula Al Khatib

Predicting Stock Prices through Textual Analysis of Web News

A Personalized Company Recommender System for Job Seekers Yixin Cai, Ruixi Lin, Yue Kang

Characterization of Cross-posting Activity for Professional Users Across Major OSNs

2017 Qtr 3. A New Report From MPA Measuring Audience Engagement. Social Media Report FEATURING THE ASSOCIATION OF MAGAZINE MEDIA

Integrated online marketing platform to grow your business

SENTIMENT ANALYSIS BASED ON APPRAISAL THEORY FOR MARKETING INTELLIGENCE IN INDONESIA S MOBILE PHONE MARKET

Using Social Media for Advocacy. Stop and Reflect Exercises & Social Media Planning Tools

Designing Ranking System for Chinese Product Search Engine Based on Customer Reviews

SOCIAL MEDIA TOOLKIT FOR NONPROFITS

Social Networking: Facebook, Twitter, and Google+ Mary Rotman Publicist, O Reilly Media

Information Propagation in Social Network Sites*

Twitter Trending Topic Classification

10 Online Communication Channels

When Politicians Tweet: A Study on the Members of the German Federal Diet

Entry Name: "USTUTT-Thom-MC1" VAST 2013 Challenge Mini-Challenge 1: Box Office VAST

Designing Of Effective Shopper Purchase Analysis Model Based On Chrip Likes

Experiences in the Use of Big Data for Official Statistics

Text Analysis of American Airlines Customer Reviews

eservices Social Media Solution Guide Genesys Social Engagement Overview

Sharknado Social Media Analysis with SAP HANA and Predictive Analysis

OPEN ENROLLMENT SOCIAL MEDIA TOOLKIT

Time-based Microblog Distillation

Cross-Pollination of Information in Online Social Media: A Case Study on Popular Social Networks

Predicting Political Preference of Twitter Users

Using Decision Tree to predict repeat customers

Social Media Recommendations For Clubs: Getting Started & Best Practices. General

SSRG International Journal of Computer Science and Engineering (SSRG-IJCSE) volume 4 Issue 6 June 2017

Tweet Segmentation Using Correlation & Association

Twitter Sentiment Analysis on Demonetization tweets in India Using R language

twawler: A lightweight twitter crawler

Real-Time Marketing exploiting stateof-the-art

LSUHSC-SCHOOL OF MEDICINE PERFORMANCE PLANNING AND REVIEW FORM

TweetSense: Recommending Hashtags for Orphaned Tweets by Exploiting Social. Signals in Twitter. Manikandan Vijayakumar

Social Media. BootCamp

Leveraging Social Media to Build Your Online Brand

Mind-Map the Gap Sentiment Analysis of Public Transport Tamás Bosznay, Amadeus Software, UK

Pet Spa Grooming. Lauralee Gilkey April 16, 2013

Pre-Workshop Training: Niche Research

NLP WHAT S HAPPENING TO NLP?

PROJECT MANAGEMENT AND ORGANIZATIONAL TRANSFORMATION SEMINAR. Cara Stewart, Principal, Remarx Media Inc.

Using #hashtags

Recommending #-Tags in Twitter

The Social Engagement Index. A Benchmark Report by Spredfast on the State of Enterprise Social Programs

Personalized Recommendation for Online Social Networks Information: Personal Preferences and Location Based Community Trends

Transcription:

Data Mining in Social Network Presenter: Keren Ye

References Kwak, Haewoon, et al. "What is Twitter, a social network or a news media?." Proceedings of the 19th international conference on World wide web. ACM, 2010. Pak, Alexander, and Patrick Paroubek. "Twitter as a Corpus for Sentiment Analysis and Opinion Mining." LREc. Vol. 10. 2010.

Data Mining in Social Network What is Twitter, a social network or a news media?

Twitter Basic Features Tweet about any topic within 140-character limit Follow others to receive their tweets

Twitter Space Crawl Twitter Space Crawl Application Programming Interface (API) Data collection Profiles of all users: June 6th - June 31st, 2009 Profiles of users who mentioned trending topics: June 6th - September 24th, 2009

Twitter Space Crawl User Profile 41.7 million (4,170,000) user profiles. 1.47 billion (1,470,000,000) directed relations of following and being followed Trending Topics + Associated Tweets 4,262 unique trending topics and their tweets Query API every five minutes for trending topic title (Top-10) Grab all the tweets that mention the trending topic

Twitter Space Crawl Removing Spam Tweets Why Undermine the accuracy of PageRank Spam keywords hinder relevant web page extraction Add noise and bias in analysis How Filters tweets from users who have been on Twitter for less than a day Removes tweets that contain three or more trending topics

Basic Analysis Followings and Followers (CCDF) Complementary cumulative distribution function

Basic Analysis Followers vs. Tweets y: number of followers a user has y: number of tweets the user tweets

Basic Analysis Followings vs. Tweets y: number of followings a user has y: number of tweets the user tweets

Basic Analysis Reciprocity Top users by the number of followers in Twitter are mostly celebrities and mass media 77:9% of user pairs with any link between them are connected one-way only 22:1% have reciprocal relationship between them - r-friends 67:6% of users are not followed by any of their followings in Twitter A source of information? A social networking site?

Basic Analysis Degree of seperation Small world phenomenon - Stanley Milgram s Any two people could be connected on average within six hops from each other Main difference The directed nature of Twitter relationship - only 22:1% of user pairs are reciprocal Can we expect that two users in Twitter to be longer than other known networks MSN - 180 million users, 6.0, 7.8 for medium and 90% degree of separation respectively

Basic Analysis Degree of separation Choose a seed randomly Compute the shortest paths between the seed and the rest of the network - 4.12 Social network? Source of information?

Basic Analysis Homophily A contact between similar people occurs at a higher rate than among dissimilar people Investigate homophily in two context Geographic location Popularity

Basic Analysis Homophily Geographic Location Popularity Social network? Source of information?

Trending the trends Motivation Interpret the act of following as subscribing to tweets die How trending topics rise in popularity, spread through the followers network, and eventually Review 4,266 unique trending topics from June 3rd to September 25th, 2009 Apple s Worldwide Developers Conference, the E3 Expo, NBA Finals, and the Miss Universe Pageant

Trending the trends Compare to Google Trend Similarity Only 126 (3.6%) out of 3,479 unique trending topics from Twitter exist in 4,597 unique hot keywords from Google Freshness Social Network? On average 95% of topics each day are new in Google while only 72% of topics are new in Twitter Interactions might be a factor to keep trending topics persist

Trending the trends Compare to CNN Headline News Preliminary Results More than half the time CNN was ahead in reporting However, some news broke out on Twitter before CNN Source of information?

Trending the trends Singleton, Reply, Mention, and Retweet Singleton: tweet with no reply or a retweet Reply Mention: tweet addressing a specific user, both replies and mentions include @ followed by the addressed user s Twitter id Retweet: marked with either RT followed by @user id or via @user id Among all tweets mentioning 4,266 unique trending topics, singletons are most common, followed by replies and retweets.

Trending the trends Out of 41 million Twitter users, a large number of users (8; 262; 545) participated in trending topics and about 15% of those users participated in more than 10 topics during four months.

Trending the trends Impact of retweet

Data Mining in Social Network Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Motivation Recognize positive / negative / objective sentiment

Corpus collection Use the Twitter API The whole data set is huge, a subset is enough for training purpose Using sentiment related emoji to get the positive / negative training corpus Happy emoticons: :-), :), =), :D etc. Sad emoticons: :-(, :(, =(, ;( etc. For objective training corpus Retrieve text messages from Twitter accounts of popular newspapers and magazines

Training the classifier Feature Feature Extraction Model Model Evaluation

Training the classifier Feature Presence of a n-gram as a binary feature E.g., I love the sound my ipodmakeswhen I shake to shuffle it. Boo bee boo Unigram (1-gram): presence of I, love, the, Bigram (2-gram): presence of I love, love the, the sound,...

Training the classifier Feature extraction Filtering Remove URL links, Twitter user names and emoticons Tokenization Segment text by splitting it by spaces and punctuation marks Remove stopwords Construct n-gram Negation is attached to a word which precedes it or follows it. E.g., I do+not, do+not like.

Training the classifier Naive Bayes Model s - sentiment M - Twitter Message

Training the classifier Naive Bayes Model - An example I love the sound my ipodmakeswhen I shake to shuffle it. Boo bee boo P(s=+ M) ~ P(+) P(I +) P(love +) P(the +) P(sound +) P(s=- M) ~ P(-) P(I -) P(love -) P(the -) P(sound -) By counting the number in training set, we can get: P(+), P(-) P(I +), P(I -), P(love +), P(love -),...

Training the classifier Other details of the model POS-tags as extra information Discriminate common n-grams since they do not strongly indicate sentiment

Training the classifier Model Evaluation Precision: measures the proportion of correctly tagged tokens within the set of all the tokens that were non ambiguously tagged by the evaluated system. It is therefore a measure of the accuracy of the tagging effectively performed by the system. Decision: measures the proportion of tokens non ambiguously tagged within the set of all token processed by the evaluated system. It therefore quantifies to which extent the evaluated system effectively tags the input data.

Training the classifier

Conclusion Essence of data mining Find interesting patterns General idea of the two papers Subjective way - propose problem, explain the reason. Objective way - propose problem, solve it. Domain knowledge of the two Statistics and data visualization Machine learning technology

Thanks