TURNING TWEETS INTO KNOWLEDGE. An Introduction to Text Analytics

Similar documents
How to Set-Up a Basic Twitter Page

TEXT SURVEY BEST PRACTICES

SOCIAL MEDIA. Facebook. Instagram. Twitter. Hashtags # Using Facebook, Twitter and Instagram to build your team!

Twitter page management for beginners

Data Preprocessing, Sentiment Analysis & NER On Twitter Data.

Data Mining in Social Network. Presenter: Keren Ye

Lumière. A Smart Review Analysis Engine. Ruchi Asthana Nathaniel Brennan Zhe Wang

MOBILE CLOUD ENTERPRISE. The Next Step in Our Evolution and Yours

Alert to their needs. How two companies use SMS to deliver instant information

Twitter. Runa Sarkar Indian Institute of Management Calcutta

FACEBOOK GUIDE HOW TO USE FACEBOOK FOR RECRUITMENT MARKETING

COMMUNICATIONS H TOOLKIT H NATIONAL VOTER REGISTRATION DAY. A Partner Communications Toolkit for Traditional and Social Media

Discover Prepaid Jeff Lewis Interview

An Executive s Guide to B2B Video Marketing. 8 ways to make video work for your business

White Paper: VANTIQ Competitive Landscape

Managing the Cognitive Transformation. Paul Harmon

Gartner's Top 10 Strategic Technology Trends for 2014

Beating the Competition with Cognitive Commerce

WHITE PAPER: CUSTOMER DATA PLATFORMS FOR BUSINESS-TO-BUSINESS SOFTWARE AS A SERVICE (SAAS) MARKETING

THE ULTIMATE GUIDE TO MARKETING AUTOMATION FOR SALESFORCE

Social Media 101: A Twitter Primer

Text Analysis and Search Analytics

Promote Your Business With LinkedIn

Dean College Social Media Handbook

Technical Assistance Center Webinar. Building a Social Media Presence

The Importance of Supplementing NPS Scores with Insights Drawn from Real Comments and Reviews. Whitepaper

Machine Learning Technologies for The Hospitality Industry

Keyword Analysis. Section 1: Google Forecast. Example, inc

Week 1 Unit 1: Intelligent Applications Powered by Machine Learning

Social Networking: Today s Communication Tools June 19, 2012 Grace Thompson, SPHR, CTC

Grow Your Small Business With Salesforce SELL. SERVICE. MARKET. SUCCEED.

NLP WHAT S HAPPENING TO NLP?

MSP Marketing Engine. Ready-to-Go Programs

Introduction to E-Business I (E-Bay)

Home Automation. Industry Analysis

YOUR TRAFFIC WITH. Ramon Berrios. Cofounder of SociallyRich.co. on Instagram for More Tips

Cloud Assisted Trend Analysis of Twitter Data using Hadoop

Mobile Marketing. This means you need to change your strategy for marketing to those people, or risk losing them to your competition.


BIG DATA IS OLX. We impact lives - 100s of terabytes of data at a time

Leveraging the Social Breadcrumbs

29 Must-Know Terms For Every Social Media Analyst. (and the people who work with them)

THE ULTIMATE CAREER FAIR CHECKLIST FOR EMPLOYERS

Social Media Marketing Navigating the Maze

Predicting the Odds of Getting Retweeted

7Time Saved in trying to schedule in early stage telephone interviews

Social Media Is More Than a Popularity Contest

Optimising engagement with consumers Tini Sevak

Pay Per Click. September 2016

CivicScience Insight Report

Digital Marketing. Ways of reaching customers through digital marketing. Content Marketing. Mobile Marketing. Conversion Rate Optimization

Agenda. The Power of Social Data: Transforming Big Data into Decisions. 1. Data and Decisions. 2. Amazon as Data Refinery

AN E-BOOK BY KENTICO CMS GETTING STARTED WITH CONTENT MARKETING

Unit 6 Good Choice. What is the most important thing to consider when you buy a product? Rank them 1 4. (1 = most important) Answer the question.

Why Your SIEM Isn t Adding Value And Why It May Not Be The Tool s Fault Co-management applied across the entire security environment

SOCIAL MEDIA TOOLKIT FOR NONPROFITS

Predicting Stock Prices through Textual Analysis of Web News

SOCIAL MEDIA GLOSSARY General terms

Social Media and Self-Advocacy KIT MEAD AUTISTIC SELF ADVOCACY NETWORK (ASAN)

BYOC : Twitter for Business An Introductory Guide. Session Ellis Holman System z Client Architect

Jobseeker s Guide to Writing an Effective LinkedIn Profile

Text Mining. Theory and Applications Anurag Nagar

Knowledge Discovery and Data Mining

Why careers in IT rule

BUY. Data-Driven Attribution. Playbook

When culture goes live

Student Questionnaire

WHAT S UNDER THE HOOD OF PROGRAMMATIC? INSIDE THE FIVE-STEP PROCESS THAT ROCKETS AN ONLINE AD FROM FOR SALE TO SOLD

Mobile Matters for Facebook Marketers


Holiday Purchasing Habits: A Digital Advantage for Local Businesses

Social Media Content Analysis for Marketing Strategy

koss resource white paper Growing Your CRE Business In the Digital Paradigm Part 1 of 12 The Landscape Is Changing

Introduction to digital marketing

Basics of Social Media. The why, the what, and the how for your small business or nonprofit

The Ultimate Guide to Keyword Research

Branding. KDLH Promotion 1. Branding. Wow, you are pretty big. Building Your Image into the Community. Building Your Image into the Community

Mindset Of The Millionaire Traders

Accelerate Lesson 9 FACEBOOK ADVERTISING

24 Ways to Build a Great Company Culture

Leveraging Social Media for More Effective Cross-Media Campaigns. David Baldaro, XMPie

Experiences in the Use of Big Data for Official Statistics

Guide to CLOUD ACCOUNTING How you can transform your business today

Working in a Customer Service Culture

LOAN OFFICER GUIDE TO MARKETING : LEAD NURTURING REALTOR GUIDE TO MARKETING TOTALMORTGAGE.COM PART 1

Take Part. Get Set For Life.

DISCOVERING THE BUY BOX SWEET SPOT

Proactive Listening: Taking Action to Meet Customer Expectations

Social Media. BootCamp

Marketing. Social Media A NEW, IN-DEPTH 2-DAY COURSE. Enroll today online at NationalSeminarsTraining.com/SMKT2 or call

WHY SOCIAL MEDIA MATTERS TO FOOD TRUCK OWNERS

AI TO BETTER ENGAGE YOUR DIGITALLY CONNECTED CUSTOMER OVERVIEW TABLE OF CONTENTS

2015 Global State of Multichannel Customer Service Report

Edge Analytics for IoT Device Intelligence

Overview: ADVERTISING ON FACEBOOK

Advertising, Design, Social Media & Printing Easton - Philadelphia

A Guide to Social Media Team

Transcription:

TURNING TWEETS INTO KNOWLEDGE An Introduction to Text Analytics

Twitter Twitter is a social networking and communication website founded in 2006 Users share and send messages that can be no longer than 140 characters long One of the Top 10 most-visited sites on the internet Initial Public Offering in 2013 Valuation ~$31 billion 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 1

Impact of Twitter Use by protestors across the world Natural disaster notification, tracking of diseases Celebrities, politicians, and companies connect with fans and customers Everyone is watching! 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 2

2013 AP Twitter Hack The Associated Press is a major news agency that distributes news stories to other news agencies In April 2013 someone tweeted the above message from the main AP verified Twitter account S&P500 stock index fell 1% in seconds, but the White House rapidly clarified 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 3

Understanding People Many companies maintain online presences Managing public perception in age of instant communication essential Reacting to changing sentiment, identifying offensive posts, determining topics of interest How can we use analytics to address this? 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 4

Using Text as Data Until now, our data has typically been Structured Numerical Categorical Tweets are Loosely structured Textual Poor spelling, non-traditional grammar Multilingual 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 1

Text Analytics We have discussed why people care about textual data, but how do we handle it? Humans can t keep up with Internet-scale volumes of data ~500 million tweets per day! Even at a small scale, the cost and time required may be prohibitive 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 2

How Can Computers Help? Computers need to understand text This field is called Natural Language Processing The goal is to understand and derive meaning from human language In 1950, Alan Turing proposes a test of machine intelligence: passes if it can take part in a real-time conversation and cannot be distinguished from a human 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 3

History of Natural Language Processing Some progress: chatterbots like ELIZA Initial focus on understanding grammar Focus shifting now towards statistical, machine learning techniques that learn from large bodies of text Modern artificial intelligences : Apple s Siri and Google Now 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 4

Why is it Hard? Computers need to understand text Ambiguity: I put my bag in the car. It is large and blue It = bag? It = car? Context: Homonyms, metaphors Sarcasm In this lecture, we ll see how we can build analytics models using text as our data 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 5

Sentiment Mining - Apple Apple is a computer company known for its laptops, phones, tablets, and personal media players Large numbers of fans, large number of haters Apple wants to monitor how people feel about them over time, and how people receive new announcements. Challenge: Can we correctly classify tweets as being negative, positive, or neither about Apple? 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 1

Creating the Dataset Twitter data is publically available Scrape website, or Use special interface for programmers (API) Sender of tweet may be useful, but we will ignore Need to construct the outcome variable for tweets Thousands of tweets Two people may disagree over the correct classification One option is to use Amazon Mechanical Turk 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 2

Amazon Mechanical Turk Break tasks down into small components and distribute online People can sign up to perform the tasks for a fee Pay workers, e.g. $0.02 per classified tweet Amazon MTurk serves as a broker, takes small cut Many tasks require human intelligence, but may be time consuming or require building otherwise unneeded capacity 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 3

Our Human Intelligence Task Actual question we used: Judge the sentiment expressed by the following item toward the software company "Apple Workers could pick from Strongly Negative (-2) Negative (-1) Neutral (0) Positive (+1) Strongly Positive (+2) Five workers labeled each tweet 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 4

Our Human Intelligence Task For each tweet, we take the average of the five scores. LOVE U @APPLE (1.8) @apple @twitter Happy Programmers' Day folks! (0.4) So disappointed in @Apple Sold me a Macbook Air that WONT run my apps. So I have to drive hours to return it. They wont let me ship it. (-1.4) We have labels, but how do we build independent variables from the text of a tweet? 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 5

A Bag of Words Fully understanding text is difficult Simpler approach: Count the number of times each words appears This course is great. I would recommend this course to my friends. THIS COURSE GREAT WOULD FRIENDS 2 2 1 1 1 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 1

A Simple but Effective Approach One feature for each word - a simple approach, but effective Used as a baseline in text analytics projects and natural language processing Not the whole story though - preprocessing can dramatically improve performance! 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 2

Cleaning Up Irregularities Text data often has many inconsistencies that will cause algorithms trouble Computers are very literal by default Apple, APPLE, and ApPLe will all be counted separately. Change all words to either lower-case or upper-case Apple APPLE ApPLe apple apple apple apple 3 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 3

Cleaning Up Irregularities Punctuation also causes problems basic approach is to remove everything that isn t a,b,,z Sometimes punctuation is meaningful Twitter: @apple is a message to Apple, #apple is about Apple Web addresses: www.website.com/somepage.html Should tailor approach to the specific problem @Apple APPLE! --apple-- apple apple apple apple 3 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 4

Removing Unhelpful Terms Many words are frequently used but are only meaningful in a sentence - stop words Examples: the, is, at, which Unlikely to improve machine learning prediction quality Remove to reduce size of data Two words at a time? The Who! Take That! Take 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 5

Stemming Do we need to draw a distinction between the following words? argue argued argues arguing Could all be represented by a common stem, argu Algorithmic process of performing this reduction is called stemming Many ways to approach the problem 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 6

Stemming Could build a database of words and their stems Pro: handles exceptions Con: won t handle new words, bad for the Internet! Can write a rule-based algorithm e.g. if word ends in ed, ing, or ly, remove it Pro: handles new/unknown words well Con: many exceptions, misses words like child and children (but would get other plurals: dog and dogs) 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 7

Stemming The second option is widely popular Porter Stemmer by Martin Porter in 1980, still used! Stemmers have been written for many languages Other options include machine learning (train algorithms to recognize the roots of words) and combinations of the above Real example from data: by far the best customer care service I have ever received by far the best custom care servic I have ever receiv 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 8

Sentiment Analysis Today Over 7,000 research articles have been written on this topic Hundreds of start-ups are developing sentiment analysis solutions Many websites perform real-time analysis of tweets tweetfeel shows trends given any term The Stock Sonar shows sentiment and stock prices 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 1

Text Analytics in General Selecting the specific features that are relevant in the application Applying problem specific knowledge can get better results Meaning of symbols Features like number of words 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 2

The Analytics Edge Analytical sentiment analysis can replace more laborintensive methods like polling Text analytics can deal with the massive amounts of unstructured data being generated on the internet Computers are becoming more and more capable of interacting with humans and performing human tasks In the next lecture, we ll discuss IBM Watson, an impressive feat in the area of Text Analytics 15.071x Turning Tweets Into Knowledge: An Introduction to Text Analytics 3