Text Classification. CMSC 498J: Social Media Computing. Department of Computer Science University of Maryland Spring Hadi Amiri

Size: px
Start display at page:

Download "Text Classification. CMSC 498J: Social Media Computing. Department of Computer Science University of Maryland Spring Hadi Amiri"

Transcription

1 Text Classification CMSC 498J: Social Media Computing Department of Computer Science University of Maryland Spring 2016 Hadi Amiri

2 Lecture Topics Introduction to text classification VW toolkit Target-dependent Churn Classification in Microblogs. Amiri, Daume III. AAAI

3 Text Classification Classify as spam, not-spam. News articles as business, entertainment, health, international, politics, sports and technology movie reviews as positive, negative, neutral. jokes as funny, not-funny. essays as A, B, C, D, or F etc. 3

4 Text Classification- Cnt. Let s say we have: Input A set of documents X={x 1,, x n } A set of labels or predicted classes Y={Class-1,, Class-k} We know the label for each document Output (x 1,y 1 ),...,(x n,y n ) We aim to learn a function f (classifier) that can map inputs to their corresponding outputs f:x Y 4

5 Text Classification- Cnt. X={ i d love verizon's coverage, actually t-mobile has great deals, i hate t-mobile! One more bill!!, i cant take it anymore! i hate verizon} Y={+1, -1} positive negative 5

6 Text Classification- Cnt. X={ i d love verizon's coverage, actually t-mobile has great deals, i hate t-mobile! One more bill!!, i cant take it anymore! i hate verizon} Y={+1, -1} {(x 1, y 1 ), (x 2, y 2 ),..., (x 4, y 4 )}= { (i d love verizon's coverage, +1), (actually t-mobile has great deals, +1), (i hate t-mobile! One more bill!!, -1), (i cant take it anymore! i hate verizon, -1) } 6

7 Text Classification- Cnt. X={ i d love verizon's coverage, actually t-mobile has great deals, i hate t-mobile! One more bill!!, i cant take it anymore! i hate verizon} Y={+1, -1} f (i d love verizon's coverage ) = +1 f (actually t-mobile has great deals ) = +1 f (i hate t-mobile! One more bill!! ) = -1 f (i cant take it anymore! i hate verizon) = -1 Classification: the output variable takes class labels, i.e. Y={-1,+1} Regression: the output variable takes continuous values, i.e. Y=[-1,+1]. 7

8 Text Classification- Cnt. f ( ) =Y Cant wait to leave verizon for t-mobile! One more bill!! I cant take it anymore, the unlimited data isnt even worth it. My days with verizon are numbered. Verizon: I will change carriers as soon as contract is up. Verizon your customer service is horrible. this loyal customer will be gone. How can we determine f?? Why do we need to learn f? What is a good representation for documents to be correctly classified? 8

9 How can we determine f Given {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )} we aim to find f(.)! An ideal f(.) is a function such that f(x i ) = y i for all i Hard to find, why? We just expect f(x i ) to be close to y i. y f(x) error = (y f(x)) zero error when prediction and actual label are the same, Non-zero, otherwise! Loss function l y i, f x i = (y i f(x i )) 2 l y i, f x i = (y i f(x i )) 2 i i 9

10 How can we determine f- Cnt. Given {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )} we aim to find a function f(.) that minimizes the error Three popular loss functions Squared loss (linear classifier) Hinge loss (the SVMs), Logistic loss (logistic classifier) 10

11 How can we determine f- Cnt. Three popular loss functions Squared loss (linear classifier) l y, f x = (y f(x)) 2 Hinge loss (the SVMs) Logistic loss (logistic classifier) 11

12 Document Representation What is a good representation for documents to be correctly classified? Helping classifier to distinguish documents of different classes! 12

13 Document Representation- Cnt. Features How we classify objects such as People and Cars? We use features / characteristics of those objects! 13

14 Document Representation- Cnt. Knowledge about features that make good predictors of class membership! having wheels or not distinguishes people from cars, but doesn't distinguish cars from trains. 14

15 Document Representation- Cnt. X={ i would love verizon coverage, i hate verizon one more bill, i hate verizon} Y={+1, -1} Bag of Word representation Features={i, would, love, verizon, coverage, hate, one, more, bill} 15

16 Document Representation- Cnt. X={ i would love verizon coverage, i hate verizon one more bill, i hate verizon} Y={+1, -1} Bag of Word representation Features={i, would, love, verizon, coverage, hate, one, more, bill} i would love verizon coverage hate one more bill x x x

17 Document Representation- Cnt. Bag of Word representation X={ i would love verizon coverage, i hate verizon one more bill, i hate verizon} Y={+1, -1} Features={would, love, hate} Sentiment Words would love hate x x x

18 Document Representation- i would love verizon coverage hate one more bill cant NUMERIC Input of class {positive, 1,1,1,1,1,0,0,0,0,0,positive 1,0,0,1,0,1,1,1,1,0,negative 1,0,0,1,0,1,0,0,0,0,negative 18

19 Vowpal Wabbit (VW) Vowpal Wabbit: Fast learning Simplicity Namespace definition Easy Ablation Analysis Label [Importance] [Base] ['Tag] Namespace Feature... Namespace Feature... Namespace = A letter like a, b, c, Feature = String[:Float] 19

20 Label Namespace Features Namespace Features Vowpal Wabbit (VW) Vowpal Wabbit: Fast learning Simplicity Namespace definition Easy Ablation Analysis +1 a i would love verizon coverage -1 a i hate verizon one more bill -1 a i hate verizon b would love b hate b hate How to add sentiment feature in Weka? 20

21 Test and Training Data Never test on training data!!! evaluate Data Training data classifier Test data 1. Training phase 2. Test phase 21

22 Test and Training Data- Cnt. How to create test and training data? Use k-fold cross validation, k=5,

23 Classification Performance Precision, Recall, F1-Score F1 = 2 precision recall precision + recall In which applications precision (recall) is more important than recall (precision)? Source: 23

24 Classification Performance Precision, Recall, F1-Score F1 = 2 precision recall precision + recall How to obtain high recall (regardless of precision!)? Source: 24

25 Classification Performance Precision, Recall, F1-Score F1 = 2 precision recall precision + recall How to obtain high precision (regardless of recall!)? Source: 25

26 Churn Prediction Churn happens when a customer leaves a brand or stop using its service Karnstedt, et al., 2010, Dave et al., Churn rate indicates: customer response the average time an individual remains a customer Hadi Amiri, Hal Daume III. Target-dependent Churn Classification in Microblogs. AAAI

27 Churn Prediction- Problem Def. Target-dependent Churn Classification in Microblogs Given a tweet t i posted by a user u j about a brand b k, determine if t i is churny or non-churny with respect to b k. We assume that for each brand there exists a list of competing brands. 27

28 Churn Prediction- Sample 28

29 Churn Prediction- Features It might be enough to produce a list of churny keywords and consider them as features. Features= {leave, leaving, switch, switching, numbered, cancel, canceling, discontinue, give up, call off, through with, get rid, end contract, change to, changing,...} 29

30 Reasons of Poor Performance Comparative micro-posts that compare two brands: I am leaving verizon for t-mobile Simple language constituents: compare phrases like switch to att vs. switch from att Negation: att is awesome, I ll never leave them, Churny keywords do not necessarily convey churny contents: t-mobile to give att customers up to $450 to switch I need a little att s #help b4 leaving the states Subtle ways in using language as in debating if I should stay with att or in 2 months, bye att that contain no obvious churny keywords but are clearly churny wrt the brand. 30

31 Churn Prediction- Features Demographic Features 31

32 Churn Prediction- Features Content features I want to switch from crappy t-mobile to verizon or att 32

33 Churn Prediction- Features Content features 33

34 Churn Prediction- Features Context features 34

35 Churn Prediction- Dataset Churn dataset includes tweets about Verizon, T-Mobile, and AT&T Labeled as churny and non-churny Three annotators: Fleiss kappa measure: 0:62 --> substantial agreement Cohen s kappa: 0:93 --> substantial agreement The above two values are not directly comparable. 35

36 Churn Prediction- Results 36

37 Churn Prediction- Results Neighboring words Syntactic and comparative User activities in thread Friends activities in thread 37

38 Churn Prediction- Results Top demographic features average friend active days about competitors, average friend posts about competitors, and average friend active days about brand. Top content features OBject=verizon verizon is the object of the micro-post L1=leaving L1 is the first left neighboring word of brand as in I am leaving Verizon poss_contract_my the possession modifier syntactic relation between the words my and contract as in my contract is over. Top context features are all from user posts in threads USer_leaving, and USer_contract USer_dobj_leave_verizon direct object relation between the transitive verb leave and the brand as in I want to leave Verizon. 38

39 Churn Prediction- Conclusion Motivated the problem of Target Dependent Churn Prediction in Social Media and identified Important Social Churn Indicators. Many potential areas to explore from mining influential users of brands to ad routing to stock prediction. 39

40 Questions? 40