T-PICE: Twitter Personality based Influential Communities Extraction System

Size: px
Start display at page:

Download "T-PICE: Twitter Personality based Influential Communities Extraction System"

Transcription

1 2014 IEEE International Congress on Big Data T-PICE: Twitter Personality based Influential Communities Extraction System Eleanna Kafeza Business School Athens University of Economics and Business, Greece Andreas Kanavos Computer Engineering and Informatics Department University of Patras, Greece Christos Makris Pantelis Vikatos Computer Engineering and Computer Engineering and Informatics Department Informatics Department University of Patras, Greece University of Patras, Greece Abstract The identification of influential users in social media communities has been recently of major concern, since these users can contribute to viral marketing campaigns. In our approach we extend the notion of influence from users to networks and consider personality as a key characteristic for identifying influential networks. We describe the Twitter Personality based Influential Communities Extraction (T-PICE) system that creates the best influential communities in a Twitter network graph considering users personality. We then expand existing approaches in users personality extraction by aggregating data that represent several aspects of user behavior using machine learning techniques. We use an existing modularity based community detection algorithm and we extend it by inserting a pre-processing step that eliminates graph edges based on users personality. The effectiveness of our approach is demonstrated by sampling the twitter graph and comparing the influence of the created communities with and without considering the personality factor. We define several metrics to count the influence of communities. Our results show that the T-PICE system creates the most influential communities. Keywords-classification; influential community detection; personality mining; social media analytics; I. INTRODUCTION In social networking sites, only a fraction of users can influence other users. Businesses try to identify influential users for propagating communication messages by looking in most cases, on users static profile. In our approach, instead of identifying users, we identify communities that demonstrate high activity and we generate the user profile based on their behavior. We look into complex relationships considering users personality in social media networks so as to identify the best information conducting communities. Our objective is to extract, from social media data, the appropriate features that represent these complex relationships which stem from different data origins and subsequently use them to identify influential communities. The importance of considering psychological mechanisms for understanding internet use has already been identified in the literature [16], justifying that user personality plays a dominant role in social media communication. In this work, we investigate the role that personality plays in information diffusion. Users personality can be described by a combination of personality traits that express tendencies to behave. There are five basic dimensions of personality that remain stable in individuals, forming the Big Five Model [18]. Our proposed Twitter Personality based Influential Communities Extraction methodology (T-PICE) results in the identification of networks that have the highest possible communication capability. It extracts a diversity of user information through Twitter and creates user profiles as tuples of the extracted aggregated information. Classification algorithms from the Weka toolkit are used to map these user profiles to personality traits. We train classifiers using vectors of features augmented with predefined category of each personality trait; the produced models are tested for their performance determining the best classification algorithm for each trait. Hence, each node of the Twitter graph is associated with a 5-tuple that represents the user personality. We propose the use of personality traits as an additional parameter for influential community detection. T-PICE framework utilizes the method described in [2] to identify communities within the Twitter network which is based on modularity optimization. We extend the approach of [2] by considering the personality relationship between nodes at a pre-processing step. Our contributions are in several aspects: firstly, we extend the existing approaches for personality based on users behavior extraction from social media data; then we identify the mining algorithms that best fit each personality trait and ultimately, we extend community detection algorithms by adding a pre-processing step that accounts for users personality. Furthermore, a unified framework that combines personality mining and community detection to address the problem of identifying influential communities, is proposed. Our results show that the T-PICE system creates the most influential communities. The remainder of the paper is organized as follows. Section II overviews related work. The proposed system architecture is described in Section III. Moreover, in Section IV and V, modules and sub-modules of our model as well as details of the implementation of the system are respectively presented. In addition, Section VI presents a reference to our experimental results while in Section VII we discuss our results. Finally, in Section VIII, we present our concluding /14 $ IEEE DOI /BigData.Congress

2 remarks, open problems and future work. II. RELATED WORK The automatic extraction of each user s personality has gained the interest of scientists in the recent years. Computational linguistics and data mining have been used for the automatic recognition of personality based on text. The most widely known model of personality trait qualification is the Big Five [18]. According to Big Five, the human personality is described as a vector of five values of traits as shown in Table I. The combination of Big Five personality dimensions explain the dynamics of a personality. For example, a person may be very talkative (high Extraversion), not very tolerant and sensitive (low Agreeableness), systematic and punctual (high Conscientiousness), easily anxious (high Neuroticism) and extremely curious (high Openness). Trait Agreeableness (A) Conscientiousness (C) Extraversion (E) Neuroticism (N) Openness (O) Table I PERSONALITY TRAITS Description This personality dimension includes attributes such as affability, tolerance, sensitivity, trust and kindness Common features of this dimension include organization, punctuality, achievementorientation and dependency This trait includes individuals such as outgoing talkative, sociable and enjoying social situations Individuals high in this trait tend to be anxious, irritable, temperamental and moody This trait features characteristics such as curiosity, originality, intellectuality, creativity and openness to new ideas In existing literature, the problem of automatic recognition of personality traits has been addressed using computational linguistics and characteristics of social network structure in a limited manner. In recent years, supervised learning approaches have been used for extracting the types of personalities. In [17], the authors presented firstly a detailed correlation analysis between Big Five personality traits and the features contained in LIWC [20] and MRC [5]; then they classified Big Five personality traits using regression and classification models. The authors in [12] tested linguistic features derived from LIWC for predicting personality in a large corpus of blogs using Support Vector Machines (SVM) as classification algorithm. In [21], the authors used a combination of decision trees with linear models at the leaves using the M50 algorithm, categorizing High and Low scores in Big Five traits via Twitter profiles. Prediction of personality trait scores of Facebook users is addressed in [9], using M5 trees based on linguistic characteristics and social network features. A study to automatically recognize Big Five personality traits on Facebook status messages is presented in [1], observing that MNB (Multinomial Naive Bayes) sparse model performs better than SMO (Support Vector Machines using Sequential Minimal Optimization) and BLR (Bayesian Logistic Regression). Other efforts using unsupervised learning and statistical methods have been introduced in [3] and [4] using annotated Twitter dataset as well as Facebook relationships respectively. Furthermore, there are some studies which include personality recognition traits with datasets, that are not derived from social networks. These studies have introduced methods of recognition of the blogger s personality [19] or speech based dialogue system understanding a user s personality [1]; datasets from different languages [3] are also present. The above literature review indicates that there are many studies for automatic personality identification. However, the results in these studies are not directly comparable because of the different methods and the different datasets used. Our approach differs from the existing studies. To be more specific, our proposed methodology in personality mining differs from [4], since in the latter mentioned, they used data from Facebook and not Twitter as well as they did not apply any data mining techniques; instead they used features from correlation analysis of the study [17]. Moreover, in [9], the authors apply data mining techniques to small texts such as about me or Blurb texts in Facebook accounts. In [3], the emotional stability is described without Big Five personality traits using an unsupervised learning method. A similar work in [21] uses only structural features without linguistic characteristics of users text. In [1], the authors use Facebook data and introduce a classification model using only the classification algorithm SMO. Our approach integrates the methods of existing techniques applying a variety of data mining techniques [11] that have not been used all together in the existing literature; hence doing an elaborative comparison identifying the best approach for personality data mining. Furthermore, our model of the user profile creation integrates existing approaches, use the network structure and linguistics aspects; it expands the existing literature by creating a user profile that takes into account several features of network structure and social media metrics that have not been considered before. We sample the Twitter and extract the corresponding Twitter network, which is separated in communities using a well-used community detection approach [2], [8]. We extend the modularity based community detection by inserting a pre-processing step that eliminates graph edges based on their personality. It is the first time that such an extensible study has considered Twitter data on personality mining. III. SYSTEM ARCHITECTURE In T-PICE, users personality is extracted based on a variety of elements: the linguistic presence, the user behavior within the network and the way communities are formed 213

3 based on the network structure. Figure 1 represents a generic model depicting the system architecture for a personality mining system that identifies influential communities. The system is composed of the following modules: The social media crawler. The crawler is responsible for sampling and traversing the social media; also it collects information regarding the users activity as well as the connections based on a given topic. The user profile creation. The profile creation takes the social media graph as input and creates a vector that represents the user profile. The linguistic analysis is based on both the users tweets and the network characteristics; this is where we extract attributes that represent the user s structural position within the social media graph as well as its metrics. With these metrics, we can capture the user behavior, which include the number of tweets, retweets etc. The personality classification module takes the above user profile as input and determines the user personality based on the theory of Big Five. A personality test in the form of questionnaire is used to train the classifier. The communities decomposition module takes into consideration the users personality and extracts communities using different criteria. The influential communities identification module takes the communities as input and determines the influential ones. IV. PERSONALITY MINING FOR THE IDENTIFICATION OF INFLUENTIAL COMMUNITIES An influential community is a community that demonstrates a high level of activity having several tweets or followers. We argue that the personality aspect plays an important role when determining influential communities, hence we augment existing approaches in community detection with personality detection as well. In the following section, we present the modules and sub-modules of our model. A. Social Media Crawler The social media crawler traverses the Twitter and creates a social media graph where nodes are users and edges represent the follow connection between two users. For our experiments, we use a topic-based sampling approach where tweets are collected via a keyword search query. The process creates a sample of the Twitter graph as follows: initially it retrieves the users and their followers, which have posted a tweet within the given time period. Subsequently, it connects users that follow each other or have a common follower through that follower. More specifically, the process for generating the Social Media Graph is presented (see Algorithm 1). Algorithm 1 Generation of Social Media Graph 1: input Query/Keyword #q 2: output The sample Graph Users, The list of followers of a user Followers[], The list of followers to be inserted to Users Newnodes 3: identify set of tweets for given #q, T = {t 1,t 2,...,t i } 4: tweet t i T 5: u i = user of tweet t i 6: Followers[u i ] = Followers of u i 7: for each t i T do 8: Users = Users u i 9: end for 10: identify set of followers of a user u k, Followers[u k ]= {f 1,f 2,...,f j } 11: for each u k Users do 12: for each f j Followers[u k ] do 13: if f j Users then 14: link f j with u k 15: else 16: for each u l Users and u l u k do 17: if f j Followers[u l ] then 18: Newnodes = Newnodes f j 19: link f j with u k and link f j with u l 20: end if 21: end for 22: end if 23: end for 24: end for 25: Users = Users Newnodes B. User Profile Creation The user profile is determined by the user behavior in social media. There are several aspects that describe the user behavior such as: use of words, emotions, frequency of communication, number of friends etc. Moreover, user s social relationships play an important role in user profiling and such relationships can be extracted from the social graph based on users communication patterns. In our work, we extend existing approaches in predicting personality traits by sketching the user profile while processing heterogeneous information collected from different sources of social media data. We aggregate information collected based on: The linguistic and emotional content of the tweets. The user communication behavior. The network structure aspects of the user presence in Twitter. 1) Linguistic and Emotional Analysis: The Linguistic Inquiry and Word Count (LIWC) software measures the cognitive and emotional properties of a person. It is a widely used linguistic analysis tool that parses users text (tweets in our case) and assigns the words in psychologically mean- 214

4 Figure 1. System Architecture ingful categories. There are 80 such features that include linguistic and psychological use of language as well as personal concerns. Hence, each Twitter user is represented as a vector with 80 values that characterize their linguistic and emotional behavior. Definition: User Linguistic profile is a tuple of 80 characteristics that represent user linguistic presence in Twitter l(c 1,...,c 80 ). 2) Social Media Analytics: Social media analytics can be used to monitor and capture user s behavior. The followers of a user, the number of contributions to the social network and the frequency of contribution are some aspects that differentiate user behavior. Definition: User Social Media Analytics profile is a tuple a(y 1,...,y 6 ), where each value is extracted as a metric from the social media user behavior. More precisely, in the case of Twitter, the Twitter analytics profile is a tuple a(y 1,...,y 6 ), where y 1 is the number of Followers, y 2 is the number of Direct Tweets, y 3 is the number of Retweets, y 4 is the number of Conversations, y 5 is the Frequency of user s Tweets and y 6 is the number of Hashtag Keywords as in [15]. These metrics describe the user communication behavior in Twitter. 3) Network Information: Each user is represented as a node in the social graph. As such, the user has some structural network characteristics. These characteristics are associated with their behavior. Definition: User Network Structure profile is a tuple n(z 1,z 2,z 3 ), where z 1 is the Egocentric Network Density, z 2 is the Betweenness Centrality and z 3 is the Closeness Centrality. 4) User Profile: A user profile is the union of different user profiles i.e. the linguistic, analytics and network profile. By incorporating different aspects of user behavior, we achieve to construct a complete user profile that better captures user behavior. Definition: User Profile UP(x 1,...,x n ) = l(c 1,...,c 80 ) a(y 1,...,y 6 ) n(z 1,z 2,z 3 ). C. Personality Classification We predict user personality based on their UP vector, using machine learning techniques. A pre-defined label of High or Low for each personality trait is added to the UP vector based on the score derived from the questionnaire creating a particular dataset of each trait. Subsequently, the five datasets are used for training the classifiers. We employ a variety of classification algorithms to gain a better understanding of which method better suits to each personality trait and identify the best classifier for each trait. The performance is evaluated by the F-Measure metric. The models with the highest F-Measure value for each personality trait, are used for the prediction of the new test instances. D. Communities Decomposition In our approach, we aim to identify the most influential communities in the twitter graph. There are several algorithms for community detection in which modularity based community detection is considered one of the most popular methods. Existing approaches do not consider node features of the graph as a parameter for community detection. We base our community detection module on the modularity detection and we extend the approach presented in [2] proposing a pre-processing step where graph edges are removed or kept according to the following alternatives: 1) Links between nodes with equal personality traits are removed (EL). 215

5 2) Links between nodes of different personality traits are removed (DL). 3) Based on [21], nodes that have the same values in agreeableness, extraversion and openness are kept, while the rest are removed (AEO). After the pre-processing step, the modularity community detection algorithm [2] is used to cut the network into communities. E. Influential Communities Identification So as to identify influential communities, we use the following activity metrics: the number of Tweets, the number of Followers and the Borda Count of tweets and followers. These metrics capture the activity level within each community. Moreover, we define a new combinatorial metric by dividing the selected activity metric with the size of the community. This metric gives us insight on the influence of each community by presenting the number of tweets per node or the number of followers per node. We rank communities for each approach (i.e. Blondel, EL, DL and AEO) and compare the results. V. IMPLEMENTATION We based our experiments on Twitter and used Twitter API to collect tweets. We implemented the Twitter graph using Twitter4J 1, and have colored our graph according to our methodology. We sampled the Twitter graph implementing the process of Algorithm 1. We collected tweets published for a time interval of 21 days (06/01/ /01/2014) using the keyword #SocialNetworks. Our Twitter graph consists of 693 nodes. In order to construct the training set, we conducted a survey on 80 individuals. Each user replied to a questionnaire 2 that determines user personality as described in [13], [14]. Then, we crawled the Twitter to retrieve the relevant information for each of these users and constructed the UP vector. In our implementation, the UP vector consists of 80 linguistic metrics, 6 Twitter analytics metrics and 3 network information metrics, as presented in Table II. For each user of the dataset and based on the answers of the personality questionnaire, we compute a score for each personality trait. This score is derived from the mean value of the corresponding questions, as described in [13]. In order to train the classifier, we differentiate for each trait a High and Low category based on a threshold. We determine the threshold for each trait based on previous research [1]. Table III presents the distribution of instances of High and Low categories for each personality trait. Thus, we five datasets are created; each for every personality trait. We separated each dataset to training and test set, using two approaches: a) K-Fold Cross-Validation (K=10 Fold) and b) Leave-One-Out Cross-Validation. The concept of Table II THE USER PROFILE FEATURE VECTOR Features # Description LIWC 80 4 general descriptor categories (total word count, words per sentence, percentage of words captured by the dictionary, and percent of words longer than six letters), 22 standard linguistic dimensions (e.g., percentage of words in the text that are pronouns, articles, auxiliary verbs, etc.), 32 word categories tapping psychological constructs (e.g., affect, cognition, biological processes), 7 personal concern categories (e.g., work, home, leisure activities), 3 paralinguistic dimensions (assents, fillers, nonfluencies), and 12 punctuation categories (periods, commas, etc) Twitter Metrics 6 Followers, Tweets, Retweets, Conversations, Frequency, Hashtag Keywords Network 3 Egocentric Network Density, Betweenness Centrality, Closeness Centrality Table III DISTRIBUTION OF LABELS Trait High (%) Low (%) Agreeableness (A) Conscientiousness (C) Extraversion (E) Neuroticism (N) Openness (O) using both techniques is that splitting with 10-Fold Cross- Validation, important information can be removed from the training set. However, the Leave-One-Out Cross-Validation technique evaluates the classification performance based on one sample. The classifiers were chosen from bayes, functions, lazy, trees and rules categories of the Weka library 3. Table IV shows the results for the 10-Fold Cross- Validation measure, for each classifier and for each trait regarding the F-Measure. Based on these results, we select the best classifier for each trait, depicted in bold in the table. Similarly, Table V shows the results for Leave-One- Out Cross-Validation. For personality traits A, C and E on both approaches, the AdaBoost, BayesNet and JRip are selected as the best classifiers. In the case of N, 10-Fold Cross-Validation selects Ridor and in Leave-One-Out Cross- Validation, IBK achieves the best performance. Because the F-Measure is substantially larger in 10-Fold Cross- Validation, we select the Ridor as the best classifier. In the case of O, the 10-Fold Cross-Validation selects the JRip while in the Leave-One-Out Cross-Validation, J48 and PART are selected. Again because the F-Measure of 10-Fold Cross- Validation is substantially larger, we select the JRip as the 1. Twitter4J API: Weka toolkit: 216

6 best classifier. Table IV 10-FOLD CROSS-VALIDATION Classifiers A C E N O AdaBoost BayesNet IBK J JRip Multilayer Perceptron Naive Bayes Classifier PART Ridor RotationForest SMO Figure 2. Comparison of Community Detection Algorithms based on the percentage of Followers of the top communities Table V LEAVE-ONE-OUT CROSS-VALIDATION Classifiers A C E N O AdaBoost BayesNet IBK J JRip Multilayer Perceptron Naive Bayes Classifier PART Ridor RotationForest SMO Figure 3. Comparison of Community Detection Algorithms based on the percentage of Tweets of the top communities The classification of High and Low category for each personality trait, creates a tuple of 5 labels for each Twitter user. In other words, there are 2 5 =32different combinations that characterize people s personality and thus can be depicted as different colors in the graph s nodes. VI. RESULTS In the following figures 2, 3 and 4, we present the performance of each of our algorithms in determining the influential communities. We rank the influence of a community using different metrics for different application scenarios. For example, we use the number of tweets within each community as the ranking metric for applications that require finding influential communities regarding a topic or a specific time period or an event. For the top communities, we compute the percentage of tweets from nodes participating in them, versus the total number of tweets in the original graph crawled. For applications that are more generic and require an overall estimation of the influence of a community, we determine influence based on the number of followers. In cases where both tweets and followers are of interest, we use the Borda Count of tweets and followers to measure influence. The Borda Count is a single-winner election Figure 4. Comparison of Community Detection Algorithms based on the percentage of Borda Count of the top communities method, in which voters rank options in order of preference. Namely, each option gets 1 point for each last place vote received, 2 points for each next-to-last point vote; all the way up to N points for each first place vote (where N is the number of options). Since our motivation stems from the fact that we are interested in identifying the more influential communities and not just the first one, we use the summation of the metrics for the first three communities. Figure 2 presents the metric percentage of followers 217

7 for the first three communities as well as the corresponding community sizes. Our observation is that our proposed methods (EL and DL) increase significantly the number of followers, versus the community size in the first three communities, as compared to Blondel and AEO approaches. DL detects communities with the best percentage of followers. In Figure 3, we use the metric percentage of tweets to measure all methods performance. We observe that the performance of DL is the best regarding the percentage of tweets. Blondel and AEO have the same results while EL gives the less percentage of Tweets. When looking the tweets, versus the community size, EL is better, followed by DL and Blondel. In Figure 4, we evaluate all methods using the metric of the Borda Count of followers and tweets. In this case, EL achieves remarkably the better performance, while the other three methods have marginally the same. The introduced metrics for counting the influence of a community do not take into consideration the size of the community. Hence, we introduce a normalized metric based on size (see Table VI). This is a metric that can be used for a variety of applications, especially when cost is associated with the size of the communities. Such applications are advertising ones, where we look for the smaller communities with the largest impact. In all cases, AEO algorithm that deletes edges, which differ at least in one of Agreeableness, Extraversion or Openness trait, gives worse results compared to EL and DL. Moreover, we conducted a set of experiments for AEO variations where the removed edges are between personalities with a difference in three traits, and the results we obtained are similar to AEO. Keeping links that do not differ so much, creates balanced personality graphs were communities are not influential. This result is consistent with the metric/size metric (Table VI). EL achieves the best results across all metrics. The top communities which are extracted using the different approaches of communities decomposition, depict the diversity in the distribution of dissimilar personalities as it is shown in Table VII. We can see for each algorithm the average of the percentage of dissimilarity of personalities for the top communities. This metric is computed by counting the number of nodes with different personalities divided by the total number of nodes in the top communities. According to Table VII, EL exhibits the greatest diversity in personalities in the resulting influential communities. DL results in less variation in personalities. VII. DISCUSSION In our work we use [2] for community detection; an approach based on the modularity criterion. This is a popular technique for community detection. The modularity measures the density of links inside communities as compared to links between communities. When the similar personality Table VI NORMALIZED METRIC FOR RATING INFLUENTIAL COMMUNITIES Communities Decomposition Tweets / Size Followers / Size Borda Count / Size Simple Blondel 1,704 2,201 1,150 EL 2,065 2,794 1,467 DL 1,651 2,584 1,054 AEO 1,322 1,892 1,111 Table VII AVERAGE OF DISSIMILARITY RATES OF USERS PERSONALITY OF THE TOP COMMUNITIES Communities Decomposition Ranking Tweets Ranking Followers Ranking Borda Count Simple Blondel 58,7% 66,3% 66,3% EL 69,3% 70,2% 67,8% DL 54,1% 61,1% 54,2% AEO 63,4% 60,2% 60,2% links are discharged (EL) in the pre-processing step then the modularity is determined based on the density of users that have different personalities only. Hence, more heterogeneous communities are created that tend to be more influential. Similarly, when the different personality links are deleted, influential homogeneous communities are created. Looking at the extreme cases we observe the following. In the case that the Twitter graph has nodes that correspond to individuals with the same personality mixture of Big Five traits, the EL approach will lead to a graph after preprocessing step which includes only isolated nodes. So the influence of the communities will be much lower because the top communities are constituted by one node each one. In this case, the influential network is transformed to influential users. On the other hand, DL approach will keep the graph as it is and thus the performance of the influential communities will be the same as in Blondel. In the case of an extracted graph where all nodes have different personalities i.e. only for graphs with equal or less than 32 nodes, the EL approach will keep the graph as it is and thus the influence of the communities will be the same as in Blondel. The DL will create a graph with isolated nodes and thus the influence of top communities will be reduced. Our results show that in all different cases and metrics, EL or DL outperforms Blondel creating the most influential communities that exhibit either a heterogeneous or a homogenous personality distribution. VIII. CONCLUSION -FUTURE WORK In this work, we looked into the problem of determining influential communities in Twitter. We propose the Influential Communities Extraction methodology (T-PICE), a unified framework that extracts users personality based on several aspects of user behavior and colors the network graph using machine learning algorithms according to the 218

8 32 possible personality descriptions as defined by the Big Five personality model. Furthermore, we determine the best classification algorithm for each personality trait in order to improve the performance of our system. The influential communities are created based on several variations of modularity based community detection, where personality is also considered in a pre-processing step. Finally, the comparison of the proposed variations and the initial community detection algorithm is evaluated based on metrics that count the activity level of the top three communities. The detected top communities by EL (whre links between nodes with equal personality traits are removed) and DL (where links between nodes of different personality traits are removed) indicate that personality heterogeneous as well as homogenous communities are the more influential ones in creating networks of higher information diffusion. The T-PICE system can be a tool for marketing managers or advertisers to help them identify the influential community, thus better promoting their products. As future work, we are interested in examining the scalability problems that emerge when considering bigger graphs. In addition, we aim to make more experiments using several subjects and identify the parameters that influence the results of our algorithms in a finer granularity level. In conclusion, we will investigate the evolution of influential communities in time as well as the impact of other features in the influential community ranking. REFERENCES [1] F. Alam, E. A. Stepanov and G. Riccardi, Personality Traits Recognition on Social Network - Facebook, Computational Personality Recognition, [2] V. D. Blondel, J. - L. Guillaume, R. Lambiotte and E. Lefebvre, Fast Unfolding of Community Hierarchies in Large Networks, Journal of Statistical Mechanics: Theory and Experiment, P10008, [3] F. Celli and L. Rossi, The Role of Emotional Stability in Twitter Conversations, Semantic Analysis in Social Media, pp , [4] F. Celli and L. Polonio, Relationships between Personality and Interactions in Facebook, Social Networking: Recent Trends, Emerging Issues and Future Outlook, pp , [5] M. Coltheart, The MRC Psycholinguistic Database, Quarterly Journal of Experimental Psychology, Volume 33A, pp , [6] C. Dwork, R. Kumar, M. Naor and D. Sivakumar, Rank Aggregation Methods for the Web, World Wide Web Conference (WWW), pp , [7] M. Farah and D. Vanderpooten, An Outranking Approach for Rank Aggregation in Information Retrieval, Conference on Research and Development in Information Retrieval (SIGIR), pp , [8] S. Fortunato, Community Detection in Graphs, Physics Reports 486, pp , [9] J. Golbeck, C. Robles and K. Turner, Predicting Personality with Social Media, Human Factors in Computing Systems (CHI), pp , [10] L. R. Goldberg, The Development of Markers for the Big Five factor Structure, in Psychological Assessment, Volume 4, Issue 1, pp , [11] J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. The Morgan Kaufmann Series in Data Management Systems, [12] F. Iacobelli, A. J. Gill, S. Nowsonl and J. Oberlander, Large Scale Personality Classification of Bloggers, Affective Computing and Intelligent Interaction (ACII), pp , [13] O. P. John, E. M. Donahue and R. L. Kentle, The Big Five Inventory - Versions 4a and 54, Berkeley: University of California, Institute of Personality and Social Research, [14] O. P. John and S. Srivastava, The Big Five Trait Taxonomy: History, Measurement, and Theoretical Perspectives, in Handbook of Personality: Theory and Research, 2nd ed. pp , New York: The Guilford Press, [15] E. Kafeza, A. Kanavos, C. Makris and D. Chiu, Identifying Personality-based Communities in Social Networks, Legal and Social Aspects in Web Modeling (Keynote Speech in LSAWM), in conjunction with the International Conference on Conceptual Modeling (ER), [16] R. N. Landers and J. W. Lounsbury, An Investigation of Big Five and Narrow Personality Traits in Relation to Internet Usage, Journal of Computers in Human Behavior, Volume 22, Issue 2, pp , [17] F. Mairesse, M. A. Walker, M. R. Mehl and R. K. Moore, Using Linguistic Cues for the Automatic Recognition of Personality in Conversation and Text, Journal of Artificial Intelligence Research (JAIR), Volume 30, pp , [18] R. R. McCrae and O. P. John, An Introduction to the Five- Factor Model and Its Applications, Journal of Personality, Volume 60, Issue 2, pp , [19] H. Mohtasseb and A. Ahmed, Mining Online Diaries for Blogger Identification, Data Mining and Knowledge Engineering (ICDMKE), pp , [20] J. W. Pennebaker, M. E. Francis and R. J. Booth, Linguistic Inquiry and Word Count (LIWC): LIWC2001, New Jersey: Lawrence Erlbaum Associates, [21] D. Quercia, M. Kosinski, D. Stillwell and J. Crowcroft, Our Twitter Profiles, Our Selves: Predicting Personality with Twitter, Social Computing (SocialCom)/Privacy, Security, Risk and Trust (PASSAT), pp ,

Conceptual Replication ISSN Predicting Personality from Social Media Text. Jennifer Golbeck

Conceptual Replication ISSN Predicting Personality from Social Media Text. Jennifer Golbeck Transactions on R eplication R esearch Conceptual Replication ISSN 2473-3458 Predicting Personality from Social Media Text Jennifer Golbeck Human Computer Interaction Lab, University of Maryland, College

More information

Influencer Communities. Influencer Communities. Influencers are having many different conversations

Influencer Communities. Influencer Communities. Influencers are having many different conversations Influencer Communities Influencers are having many different conversations 1 1.0 Background A unique feature of social networks is that people with common interests are following (or friend-ing) similar

More information

Exploiting time series analysis in Twitter to measure a campaign process performance

Exploiting time series analysis in Twitter to measure a campaign process performance Exploiting time series analysis in Twitter to measure a campaign process performance Eleanna Kafeza College of Technological Innovation Zayed University Abu Dhabi, UAE Email: eleana.kafeza@zu.ac.ae Christos

More information

Visiting Patterns and Personality of Foursquare Users

Visiting Patterns and Personality of Foursquare Users Visiting Patterns and Personality of Foursquare Users Martin J. Chorley, Gualtiero. B. Colombo, Stuart. M. Allen, Roger. M. Whitaker School of Computer Science & Informatics Cardiff University Cardiff,

More information

Estimating the Impact of User Personality Traits on electronic Word-of-Mouth Text-mining Social Media Platforms

Estimating the Impact of User Personality Traits on electronic Word-of-Mouth Text-mining Social Media Platforms Estimating the Impact of User Personality Traits on electronic Word-of-Mouth Text-mining Social Media Platforms Panos Adamopoulos Goizueta Business School Emory University padamop@emory.edu Anindya Ghose

More information

Predicting Popularity of Messages in Twitter using a Feature-weighted Model

Predicting Popularity of Messages in Twitter using a Feature-weighted Model International Journal of Advanced Intelligence Volume 0, Number 0, pp.xxx-yyy, November, 20XX. c AIA International Advanced Information Institute Predicting Popularity of Messages in Twitter using a Feature-weighted

More information

Cyber-Social-Physical Features for Mood Prediction over Online Social Networks

Cyber-Social-Physical Features for Mood Prediction over Online Social Networks DEIM Forum 2017 B1-3 Cyber-Social-Physical Features for Mood Prediction over Online Social Networks Chaima DHAHRI Kazunori MATSUMOTO and Keiichiro HOASHI KDDI Research, Inc 2-1-15 Ohara, Fujimino-shi,

More information

Predicting the Odds of Getting Retweeted

Predicting the Odds of Getting Retweeted Predicting the Odds of Getting Retweeted Arun Mahendra Stanford University arunmahe@stanford.edu 1. Introduction Millions of people tweet every day about almost any topic imaginable, but only a small percent

More information

How hot will it get? Modeling scientific discourse about literature

How hot will it get? Modeling scientific discourse about literature How hot will it get? Modeling scientific discourse about literature Project Aims Natalie Telis, CS229 ntelis@stanford.edu Many metrics exist to provide heuristics for quality of scientific literature,

More information

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

Tweeting Questions in Academic Conferences: Seeking or Promoting Information?

Tweeting Questions in Academic Conferences: Seeking or Promoting Information? Tweeting Questions in Academic Conferences: Seeking or Promoting Information? Xidao Wen, University of Pittsburgh Yu-Ru Lin, University of Pittsburgh Abstract The fast growth of social media has reshaped

More information

Using Twitter as a source of information for stock market prediction

Using Twitter as a source of information for stock market prediction Using Twitter as a source of information for stock market prediction Argimiro Arratia argimiro@lsi.upc.edu LSI, Univ. Politécnica de Catalunya, Barcelona (Joint work with Marta Arias and Ramón Xuriguera)

More information

E-Commerce Sales Prediction Using Listing Keywords

E-Commerce Sales Prediction Using Listing Keywords E-Commerce Sales Prediction Using Listing Keywords Stephanie Chen (asksteph@stanford.edu) 1 Introduction Small online retailers usually set themselves apart from brick and mortar stores, traditional brand

More information

Here comes the Brave New World of Social Media. Miltiadis Kandias Athens University of Economics & Business

Here comes the Brave New World of Social Media. Miltiadis Kandias Athens University of Economics & Business Here comes the Brave New World of Social Media Miltiadis Kandias Athens University of Economics & Business Outline Social Media crawlable data (OSINT) OSINT exploitation A story of joy (?) and a horror

More information

Fraud Detection for MCC Manipulation

Fraud Detection for MCC Manipulation 2016 International Conference on Informatics, Management Engineering and Industrial Application (IMEIA 2016) ISBN: 978-1-60595-345-8 Fraud Detection for MCC Manipulation Hong-feng CHAI 1, Xin LIU 2, Yan-jun

More information

Understanding Low Review Ratings in Online Communities: A Personality Based Approach

Understanding Low Review Ratings in Online Communities: A Personality Based Approach Understanding Low Review Ratings in Online Communities: A Personality Based Ifeoma Adaji, Kiemute Oyibo, Julita Vassileva MADMUC Lab, University of Saskatchewan, Saskatoon, Saskatchewan, Canada ifeoma.adaji@usask.ca,kiemute.oyibo@usask.ca,

More information

2016 U.S. PRESIDENTIAL ELECTION FAKE NEWS

2016 U.S. PRESIDENTIAL ELECTION FAKE NEWS 2016 U.S. PRESIDENTIAL ELECTION FAKE NEWS OVERVIEW Introduction Main Paper Related Work and Limitation Proposed Solution Preliminary Result Conclusion and Future Work TWITTER: A SOCIAL NETWORK AND A NEWS

More information

Indian Election Trend Prediction Using Improved Competitive Vector Regression Model

Indian Election Trend Prediction Using Improved Competitive Vector Regression Model Indian Election Trend Prediction Using Improved Competitive Vector Regression Model Navya S 1 1 Department of Computer Science and Engineering, University, India Abstract Election result forecasting has

More information

How to Create a Dataset from Social Media: Theory and Demonstration

How to Create a Dataset from Social Media: Theory and Demonstration How to Create a Dataset from Social Media: Theory and Demonstration Richard N. Landers Old Dominion University @rnlanders rnlanders@odu.edu CARMA October 2017 Agenda/Learning Objectives 1. Foundational

More information

AN INTELLIGENT APPROACH FOR PREDICTING SOCIAL MEDIA IMPACT ON BRAND BUILDING

AN INTELLIGENT APPROACH FOR PREDICTING SOCIAL MEDIA IMPACT ON BRAND BUILDING AN INTELLIGENT APPROACH FOR PREDICTING SOCIAL MEDIA IMPACT ON BRAND BUILDING 1 ALTYEB ALTAHER, 2 AHMED HAMZA OSMAN 1,2 Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University,

More information

Improving the Response Time of an Isolated Service by using GSSN

Improving the Response Time of an Isolated Service by using GSSN Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

HIERARCHICAL LOCATION CLASSIFICATION OF TWITTER USERS WITH A CONTENT BASED PROBABILITY MODEL. Mounika Nukala

HIERARCHICAL LOCATION CLASSIFICATION OF TWITTER USERS WITH A CONTENT BASED PROBABILITY MODEL. Mounika Nukala HIERARCHICAL LOCATION CLASSIFICATION OF TWITTER USERS WITH A CONTENT BASED PROBABILITY MODEL by Mounika Nukala Submitted in partial fulfilment of the requirements for the degree of Master of Computer Science

More information

Architecture of Text Mining Application in Analyzing Public Sentiments of West Java Governor Election using Naive Bayes Classification

Architecture of Text Mining Application in Analyzing Public Sentiments of West Java Governor Election using Naive Bayes Classification Architecture of Text Mining Application in Analyzing Public Sentiments of West Java Governor Election using Naive Bayes Classification Suryanto Nugroho Master of Informatics Engineering, Amikom Yogyakarta

More information

Large Scale Product Recommendation of Supermarket Ware Based on Customer Behaviour Analysis

Large Scale Product Recommendation of Supermarket Ware Based on Customer Behaviour Analysis big data and cognitive computing Article Large Scale Product Recommendation of Supermarket Ware Based on Customer Behaviour Analysis Andreas Kanavos 1,2,3, *, Stavros Anastasios Iakovou 1,4, Spyros Sioutas

More information

SOCIAL MEDIA MINING. Behavior Analytics

SOCIAL MEDIA MINING. Behavior Analytics SOCIAL MEDIA MINING Behavior Analytics Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

An Introduction to Social Analytics: Concepts and Methods

An Introduction to Social Analytics: Concepts and Methods An Introduction to Social Analytics: Concepts and Methods Shaila M. Miranda, PhD Expected Publication Date: July, 2018 Proposal Focus of the Book Social Analytics deals with the collection, management,

More information

Application of Location-Based Sentiment Analysis Using Twitter for Identifying Trends Towards Indian General Elections 2014

Application of Location-Based Sentiment Analysis Using Twitter for Identifying Trends Towards Indian General Elections 2014 Application of Location-Based Sentiment Analysis Using Twitter for Identifying Trends Towards Indian General Elections 2014 Omaima Almatrafi Suhem Parack Bravim Chavan George Mason University George Mason

More information

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA 2013

Procedia - Social and Behavioral Sciences 141 ( 2014 ) WCLTA 2013 Available online at www.sciencedirect.com ScienceDirect Procedia - Social and Behavioral Sciences 141 ( 2014 ) 1315 1319 WCLTA 2013 A Study Of Relationship Between Personality Traits And Job Engagement

More information

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest

Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest Predicting Reddit Post Popularity Via Initial Commentary by Andrei Terentiev and Alanna Tempest 1. Introduction Reddit is a social media website where users submit content to a public forum, and other

More information

Sentiment Analysis and Political Party Classification in 2016 U.S. President Debates in Twitter

Sentiment Analysis and Political Party Classification in 2016 U.S. President Debates in Twitter Sentiment Analysis and Political Party Classification in 2016 U.S. President Debates in Twitter Tianyu Ding 1 and Junyi Deng 1 and Jingting Li 1 and Yu-Ru Lin 1 1 University of Pittsburgh, Pittsburgh PA

More information

Machine learning-based approaches for BioCreative III tasks

Machine learning-based approaches for BioCreative III tasks Machine learning-based approaches for BioCreative III tasks Shashank Agarwal 1, Feifan Liu 2, Zuofeng Li 2 and Hong Yu 1,2,3 1 Medical Informatics, College of Engineering and Applied Sciences, University

More information

MODEL OF SENTIMENT ANALYSIS FOR SOCIAL MEDIA DATA

MODEL OF SENTIMENT ANALYSIS FOR SOCIAL MEDIA DATA MODEL OF SENTIMENT ANALYSIS FOR SOCIAL MEDIA DATA Nurul Atasha Khairuddin, Kamilia Kamardin Advanced Informatics School, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100 Kuala Lumpur, Malaysia.

More information

Who Will Retweet This? Detecting Strangers from Twitter to Retweet Information

Who Will Retweet This? Detecting Strangers from Twitter to Retweet Information Who Will Retweet This? Detecting Strangers from Twitter to Retweet Information 00 KYUMIN LEE, Utah State University JALAL MAHMUD, IBM Research Almaden JILIN CHEN, Google MICHELLE ZHOU, IBM Research Almaden

More information

Using Text Mining and Machine Learning to Predict the Impact of Quarterly Financial Results on Next Day Stock Performance.

Using Text Mining and Machine Learning to Predict the Impact of Quarterly Financial Results on Next Day Stock Performance. Using Text Mining and Machine Learning to Predict the Impact of Quarterly Financial Results on Next Day Stock Performance Itamar Snir The Leonard N. Stern School of Business Glucksman Institute for Research

More information

Forecasting mobile games retention using Weka

Forecasting mobile games retention using Weka 22 Forecasting mobile games retention using Weka Forecasting mobile games retention using Weka Roxana Ioana STIRCU Bucharest University of Economic Studies roxana.stircu@gmail.com Abstract: In the actual

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

1. Objectives: 1.1 Specific objectives:

1. Objectives: 1.1 Specific objectives: Introduction: The present work has been developed with the purpose of participating in the challenge promoted by Rosette, exemplifying the combined use of RapidMiner software and Rosette extensions for

More information

Various Techniques for Efficient Retrieval of Contents across Social Networks Based On Events

Various Techniques for Efficient Retrieval of Contents across Social Networks Based On Events Various Techniques for Efficient Retrieval of Contents across Social Networks Based On Events SAarif Ahamed 1 First Year ME (CSE) Department of CSE MIET EC ahamedaarif@yahoocom BAVishnupriya 1 First Year

More information

Unlocking Unstructured Social Media Data in Marketing. William Rand Assistant Professor of Bussiness Management

Unlocking Unstructured Social Media Data in Marketing. William Rand Assistant Professor of Bussiness Management Unlocking Unstructured Social Media Data in Marketing William Rand Assistant Professor of Bussiness Management In Collaboration with Kelly Hewett, Roland Rust, and Harald J. van Heerde Managers perspectives

More information

Sentiment analysis using Singular Value Decomposition

Sentiment analysis using Singular Value Decomposition International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2016 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Veena

More information

Text, Web, and Social Media Analytics

Text, Web, and Social Media Analytics Kecerdasan Bisnis dalam Praktek Predictive Analytics II Text, Web, and Social Media Analytics Husni Lab. Riset JTIF UTM Sumber awal: http://mail.tku.edu.tw/myday/teaching/1071/bi/1071bi07_business_intelligence.pptx

More information

Restaurant Recommendation for Facebook Users

Restaurant Recommendation for Facebook Users Restaurant Recommendation for Facebook Users Qiaosha Han Vivian Lin Wenqing Dai Computer Science Computer Science Computer Science Stanford University Stanford University Stanford University qiaoshah@stanford.edu

More information

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Mahalia Miller Daniel Wiesenthal October 6, 2010 1 Introduction One topic of current interest is how language

More information

Social Media Analytics for E-commerce Organisations

Social Media Analytics for E-commerce Organisations Social Media Analytics for E-commerce Organisations Monish Narwani 1,Sanjay Lulla 2,Vivek Bhatia 3,Rishi Hemwani 4,Prof. Gresha Bhatia 5 Dept. of Computer Science, Vivekanand Institute of Technology, Chembur

More information

Context-Sensitive Classification of Short Colloquial Text

Context-Sensitive Classification of Short Colloquial Text Context-Sensitive Classification of Short Colloquial Text TU Delft - Network Architectures and Services (NAS) 1/12 Outline Emotions propagate through a social network like viruses. Some people influence

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March ISSN Web and Text Mining Sentiment Analysis

International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March ISSN Web and Text Mining Sentiment Analysis International Journal of Scientific & Engineering Research, Volume 6, Issue 3, March-2015 672 Web and Text Mining Sentiment Analysis Ms. Anjana Agrawal Abstract This paper describes the key steps followed

More information

Data Preprocessing, Sentiment Analysis & NER On Twitter Data.

Data Preprocessing, Sentiment Analysis & NER On Twitter Data. IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 73-79 www.iosrjournals.org Data Preprocessing, Sentiment Analysis & NER On Twitter Data. Mr.SanketPatil, Prof.VarshaWangikar,

More information

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Mahsa Naseri and Simone A. Ludwig Abstract In service-oriented environments, services with different functionalities are combined

More information

5.1 Leadership Versus Management 5.2 Transactional Leadership 5.3 Transformational Leadership 5.4 Situational Leadership

5.1 Leadership Versus Management 5.2 Transactional Leadership 5.3 Transformational Leadership 5.4 Situational Leadership 5. Leading 5.1 Leadership Versus Management 5.2 Transactional Leadership 5.3 Transformational Leadership 5.4 Situational Leadership 5.5 Personality Types 5.6 Power in Organizations 5.7 Leadership in Teams

More information

Social Media Analytics

Social Media Analytics Social Media Analytics Outline Case Study : Twitter Analytics and Text Analytics Role of Social Media Analytics in Business Intelligence About AlgoAnalytics Page 2 Case Study : Twitter and Text Analytics

More information

A Comparative Study of Recommendation Methods for Mobile OSN Users

A Comparative Study of Recommendation Methods for Mobile OSN Users A Comparative Study of Recommendation Methods for Mobile OSN Users Shyam Krishna K 1, Dr. Vince Paul 2 M.Tech Student, Department of Computer Science &Engineering, Sahrdaya College of Engineering & Technology,

More information

An Algorithm for Mobile Computing Opinion Mining In Multilingual Forms By Voice and Text Processing

An Algorithm for Mobile Computing Opinion Mining In Multilingual Forms By Voice and Text Processing 2018 IJSRST Volume 4 Issue 5 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology An Algorithm for Mobile Computing Opinion Mining In Multilingual Forms By Voice and Text

More information

Course Description Applicable to students admitted in

Course Description Applicable to students admitted in Course Description Applicable to students admitted in 2018-2019 Required and Elective Courses (from ) COMM 4848 New Media Advertising This course examines new media as an evolving advertising media. The

More information

REVIEW ON PREDICTION OF CHRONIC KIDNEY DISEASE USING DATA MINING TECHNIQUES

REVIEW ON PREDICTION OF CHRONIC KIDNEY DISEASE USING DATA MINING TECHNIQUES Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Predicting Corporate 8-K Content Using Machine Learning Techniques

Predicting Corporate 8-K Content Using Machine Learning Techniques Predicting Corporate 8-K Content Using Machine Learning Techniques Min Ji Lee Graduate School of Business Stanford University Stanford, California 94305 E-mail: minjilee@stanford.edu Hyungjun Lee Department

More information

Stream Clustering of Tweets

Stream Clustering of Tweets Stream Clustering of Tweets Sophie Baillargeon Département de mathématiques et de statistique Université Laval Québec (Québec), Canada G1V 0A6 Email: sophie.baillargeon@mat.ulaval.ca Simon Hallé Thales

More information

How to Create a Dataset from Twitter or Facebook: Theory and Demonstration

How to Create a Dataset from Twitter or Facebook: Theory and Demonstration How to Create a Dataset from Twitter or Facebook: Theory and Demonstration Richard N. Landers Old Dominion University @rnlanders rnlanders@odu.edu ODU STOB Dean s Research Seminar September 2017 Agenda/Learning

More information

A logistic regression model for Semantic Web service matchmaking

A logistic regression model for Semantic Web service matchmaking . BRIEF REPORT. SCIENCE CHINA Information Sciences July 2012 Vol. 55 No. 7: 1715 1720 doi: 10.1007/s11432-012-4591-x A logistic regression model for Semantic Web service matchmaking WEI DengPing 1*, WANG

More information

Predicting ratings of peer-generated content with personalized metrics

Predicting ratings of peer-generated content with personalized metrics Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu

More information

Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective

Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective Data Science Challenges for Online Advertising A Survey on Methods and Applications from a Machine Learning Perspective IWD2016 Dublin, March 2016 Online Advertising Landscape [Introduction to Computational

More information

Estimation of social network user s influence in a given area of expertise

Estimation of social network user s influence in a given area of expertise Journal of Physics: Conference Series PAPER OPEN ACCESS Estimation of social network user s influence in a given area of expertise To cite this article: E E Luneva et al 2017 J. Phys.: Conf. Ser. 803 012089

More information

Big Data. Methodological issues in using Big Data for Official Statistics

Big Data. Methodological issues in using Big Data for Official Statistics Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics

More information

Incorporating AI/ML into Your Application Architecture. Norman Sasono CTO & Co-Founder, bizzy.co.id

Incorporating AI/ML into Your Application Architecture. Norman Sasono CTO & Co-Founder, bizzy.co.id Incorporating AI/ML into Your Application Architecture Norman Sasono CTO & Co-Founder, bizzy.co.id @nsasono /in/normansasono AI/ML can do wonders. But it has been too hyped up. As Architects/Developers,

More information

Automated Tracking of Components of Job Satisfaction via Text Mining of Twitter Data. Purdue University 2 Georgia Institute of Technology

Automated Tracking of Components of Job Satisfaction via Text Mining of Twitter Data. Purdue University 2 Georgia Institute of Technology 1 Automated Tracking of Components of Job Satisfaction via Text Mining of Twitter Data Louis Hickman 1, Koustuv Saha 2, Munmun De Choudhury 2, & Louis Tay 1 1 Purdue University 2 Georgia Institute of Technology

More information

Building Cognitive applications with Watson services on IBM Bluemix

Building Cognitive applications with Watson services on IBM Bluemix BusinessConnect A New Era of Thinking Building Cognitive applications with services on Bluemix Bert Waltniel Cloud 1 2016 Corporation A New Era of Thinking What is Bluemix? Your Own Hosted Apps / Services

More information

Effective Products Categorization with Importance Scores and Morphological Analysis of the Titles

Effective Products Categorization with Importance Scores and Morphological Analysis of the Titles Effective Products Categorization with Importance Scores and Morphological Analysis of the Titles Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis Data Structuring & Engineering Lab Department

More information

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns

GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 1 : Athens University of Economics and Business,

More information

SOCIAL NETWORK AND ATTITUDE ANALYSIS

SOCIAL NETWORK AND ATTITUDE ANALYSIS International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 SOCIAL NETWORK AND ATTITUDE ANALYSIS Suma Reddy 1, Shilpa G.V 2 1 PG Scholar, Department of Computer

More information

Predicting Corporate Influence Cascades In Health Care Communities

Predicting Corporate Influence Cascades In Health Care Communities Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice

More information

Feature Extraction from Micro-blogs for Comparison of Products and Services

Feature Extraction from Micro-blogs for Comparison of Products and Services Feature Extraction from Micro-blogs for Comparison of Products and Services Peng Zhao, Xue Li, and Ke Wang Nanjing University, 210093 China, The University of Queensland, QLD 4072 Australia, Simon Fraser

More information

Cryptocurrency Price Prediction Using News and Social Media Sentiment

Cryptocurrency Price Prediction Using News and Social Media Sentiment Cryptocurrency Price Prediction Using News and Social Media Sentiment Connor Lamon, Eric Nielsen, Eric Redondo Abstract This project analyzes the ability of news and social media data to predict price

More information

COMPARATIVE STUDY OF SUPERVISED LEARNING IN CUSTOMER RELATIONSHIP MANAGEMENT

COMPARATIVE STUDY OF SUPERVISED LEARNING IN CUSTOMER RELATIONSHIP MANAGEMENT International Journal of Computer Engineering & Technology (IJCET) Volume 8, Issue 6, Nov-Dec 2017, pp. 77 82, Article ID: IJCET_08_06_009 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=8&itype=6

More information

Available online at ScienceDirect. Procedia Computer Science 59 (2015 ) James Luke 1, Suharjito 2 *

Available online at  ScienceDirect. Procedia Computer Science 59 (2015 ) James Luke 1, Suharjito 2 * Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 59 (2015 ) 254 261 International Conference on Computer Science and Computational Intelligence (ICCSCI 2015) Data mining

More information

Community Level Topic Diffusion

Community Level Topic Diffusion Community Level Topic Diffusion Zhiting Hu 1,3, Junjie Yao 2, Bin Cui 1, Eric Xing 1,3 1 Peking Univ., China 2 East China Normal Univ., China 3 Carnegie Mellon Univ. OUTLINE Background Model: COLD Diffusion

More information

Speech Analytics Transcription Accuracy

Speech Analytics Transcription Accuracy Speech Analytics Transcription Accuracy Understanding Verint s speech analytics transcription and categorization accuracy Verint.com Twitter.com/verint Facebook.com/verint Blog.verint.com Table of Contents

More information

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks Data Analytics with Adam Filion Application Engineer MathWorks 2015 The MathWorks, Inc. 1 Case Study: Day-Ahead Load Forecasting Goal: Implement a tool for easy and accurate computation of dayahead system

More information

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for

Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for human health in the past two centuries. Adding chlorine

More information

API Economy - making APIs part of new business models

API Economy - making APIs part of new business models 1 API Economy - making APIs part of new business models Andrzej Osmak - @aosmak Cloud Advisor, IBM Cloud February 2, 2017 2 Industry disruption has always been driven by technology and standardization

More information

Available online at ScienceDirect. Procedia Technology 18 (2014 ) 72 79

Available online at   ScienceDirect. Procedia Technology 18 (2014 ) 72 79 Available online at www.sciencedirect.com ScienceDirect Procedia Technology 18 (2014 ) 72 79 International workshop on Innovations in Information and Communication Science and Technology, IICST 2014, 3-5

More information

WaterlooClarke: TREC 2015 Total Recall Track

WaterlooClarke: TREC 2015 Total Recall Track WaterlooClarke: TREC 2015 Total Recall Track Haotian Zhang, Wu Lin, Yipeng Wang, Charles L. A. Clarke and Mark D. Smucker Data System Group University of Waterloo TREC, 2015 Haotian Zhang, Wu Lin, Yipeng

More information

Stock Price Prediction with Daily News

Stock Price Prediction with Daily News Stock Price Prediction with Daily News GU Jinshan MA Mingyu Derek MA Zhenyuan ZHOU Huakang 14110914D 14110562D 14111439D 15050698D 1 Contents 1. Work flow of the prediction tool 2. Model performance evaluation

More information

Group #2 Project Final Report: Information Flows on Twitter

Group #2 Project Final Report: Information Flows on Twitter Group #2 Project Final Report: Information Flows on Twitter Huang-Wei Chang, Te-Yuan Huang 1 Introduction Twitter has become a very popular microblog website and had attracted millions of users up to 2009.

More information

Glossary Adjacency matrix Adjective Orientation Similarity Aspect coverage Bipartite networks CAO Collaborative filtering Complete graph

Glossary Adjacency matrix Adjective Orientation Similarity Aspect coverage Bipartite networks CAO Collaborative filtering Complete graph Glossary Adjacency matrix The adjacency matrix is a matrix whose rows and columns represent the graph vertices. A matrix entry at position (i, j) contains a 1 or a 0 value according to whether an edge

More information

CONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK

CONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK CONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK Ms.S.P.VidhyaPriya 1,B.Gokhila 2, T.Santhiya 3, K.Saranya 4 1 M.E.,Assistant Professor-CSE, Kathir College Of Engineering,

More information

Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction Final Report: Local Structure and Evolution for Cascade Prediction Jake Lussier (lussier1@stanford.edu), Jacob Bank (jbank@stanford.edu) ABSTRACT Information cascades in large social networks are complex

More information

Opinion Mining Task and Techniques: A Survey

Opinion Mining Task and Techniques: A Survey Volume 4, No. 8, May-June 2013 International Journal of Advanced Research in Computer Science Harpreet Kaur Assistant Professor Department of Information Technology DAV Institute of Engineering and Technology

More information

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM

SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM SOFTWARE DEVELOPMENT PRODUCTIVITY FACTORS IN PC PLATFORM Abbas Heiat, College of Business, Montana State University-Billings, Billings, MT 59101, 406-657-1627, aheiat@msubillings.edu ABSTRACT CRT and ANN

More information

«ARE WE DISRUPTING OURSELVES?» Jörg Besier Managing Director, Accenture

«ARE WE DISRUPTING OURSELVES?» Jörg Besier Managing Director, Accenture «ARE WE DISRUPTING OURSELVES?» Jörg Besier Managing Director, Accenture ARE WE DISRUPTING OURSELVES? HOW AI SHAPES THE FUTURE OF THE IT INDUSTRY SWISS COGNITIVE TANK, ACCENTURE OFFICE IN ZURICH APRIL 12,

More information

TDWI Analytics Fundamentals. Course Outline. Module One: Concepts of Analytics

TDWI Analytics Fundamentals. Course Outline. Module One: Concepts of Analytics TDWI Analytics Fundamentals Module One: Concepts of Analytics Analytics Defined Data Analytics and Business Analytics o Variations of Purpose o Variations of Skills Why Analytics o Cause and Effect o Strategy

More information

Improving Consumer Consumption Preference Prediction Accuracy with Personality Insights

Improving Consumer Consumption Preference Prediction Accuracy with Personality Insights Improving Consumer Consumption Preference Prediction Accuracy with Personality Insights IBM-Acxiom Personality Insights (PI) Proof-of-Concept Project (POC) March 2016 Executive Summary IBM Personality

More information

Experiences in the Use of Big Data for Official Statistics

Experiences in the Use of Big Data for Official Statistics Think Big - Data innovation in Latin America Santiago, Chile 6 th March 2017 Experiences in the Use of Big Data for Official Statistics Antonino Virgillito Istat Introduction The use of Big Data sources

More information

Enabling News Trading by Automatic Categorization of News Articles

Enabling News Trading by Automatic Categorization of News Articles SCSUG 2016 Paper AA22 Enabling News Trading by Automatic Categorization of News Articles ABSTRACT Praveen Kumar Kotekal, Oklahoma State University Vishwanath Kolar Bhaskara, Oklahoma State University Traders

More information

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos

Effective CRM Using. Predictive Analytics. Antonios Chorianopoulos Effective CRM Using Predictive Analytics Antonios Chorianopoulos WlLEY Contents Preface Acknowledgments xiii xv 1 An overview of data mining: The applications, the methodology, the algorithms, and the

More information

Predictive Analytics Using Support Vector Machine

Predictive Analytics Using Support Vector Machine International Journal for Modern Trends in Science and Technology Volume: 03, Special Issue No: 02, March 2017 ISSN: 2455-3778 http://www.ijmtst.com Predictive Analytics Using Support Vector Machine Ch.Sai

More information

Social Media Insights Social Media Trends and Analytics Implications

Social Media Insights Social Media Trends and Analytics Implications Social Media Insights 2018 Social Media Trends and Analytics Implications 4 trends for 2018 01 02 03 VIDEO CONTENT CONSUMPTION SKYROCKETS TRUST DECLINES, PEER INFLUENCE RISES HUMANS, MEET AI 04 THE PROMISE

More information

Identifying Splice Sites Of Messenger RNA Using Support Vector Machines

Identifying Splice Sites Of Messenger RNA Using Support Vector Machines Identifying Splice Sites Of Messenger RNA Using Support Vector Machines Paige Diamond, Zachary Elkins, Kayla Huff, Lauren Naylor, Sarah Schoeberle, Shannon White, Timothy Urness, Matthew Zwier Drake University

More information

Article Review: Personality assessment in organisational settings

Article Review: Personality assessment in organisational settings Article Review: Personality assessment in organisational settings Author Published 2009 Journal Title Griffith University Undergraduate Psychology Journal Downloaded from http://hdl.handle.net/10072/340326

More information

A Comparative Study of Filter-based Feature Ranking Techniques

A Comparative Study of Filter-based Feature Ranking Techniques Western Kentucky University From the SelectedWorks of Dr. Huanjing Wang August, 2010 A Comparative Study of Filter-based Feature Ranking Techniques Huanjing Wang, Western Kentucky University Taghi M. Khoshgoftaar,

More information

A Comparison of Indonesia s E-Commerce Sentiment Analysis for Marketing Intelligence Effort (case study of Bukalapak, Tokopedia and Elevenia)

A Comparison of Indonesia s E-Commerce Sentiment Analysis for Marketing Intelligence Effort (case study of Bukalapak, Tokopedia and Elevenia) A Comparison of Indonesia s E-Commerce Sentiment Analysis for Marketing Intelligence Effort (case study of Bukalapak, Tokopedia and Elevenia) Andry Alamsyah 1, Fatma Saviera 2 School of Economics and Business,

More information

Research Article Rice Products Feature Analyzing on the Base of Online Review Mining

Research Article Rice Products Feature Analyzing on the Base of Online Review Mining Advance Journal of Food Science and Technology 7(1): 49-53, 2015 DOI:10.19026/ajfst.7.1265 ISSN: 2042-4868; e-issn: 2042-4876 2015 Maxwell Scientific Publication Corp. Submitted: August 31, 2014 Accepted:

More information