Applying Delta TFIDF to Sentiment Analysis and Text Categorization
|
|
- Brett Reeves
- 6 years ago
- Views:
Transcription
1 Applying Delta TFIDF to Sentiment Analysis and Text Categorization Honors Thesis Shamit Patel University of Maryland, Baltimore County Department of Computer Science and Electrical Engineering Advisor: Dr. Tim Finin May 26, 2009 Abstract Text mining is becoming increasingly important for the automatic classification of electronic documents (Weiss et al., 1999). Two instances of text mining, sentiment analysis and text categorization, are explored in this paper. The problem is to determine which approach, a baseline method that simply does a word count, or Delta Term Frequency-Inverse Document Frequency (Delta TFIDF), is more accurate in classifying electronic documents using various support vector machine (SVM) kernels. 1. Introduction The human being is curious by nature, and an integral part of human behavior is to ask questions. Humans constantly seek new knowledge through various channels such as news broadcasts, newspapers, blogs, forums, and myriad other sources of information. Whether this information will influence their decision-making process or their way of living is a matter of how they interpret the sentiment that is inherent in those opinions. Furthermore, how humans analyze the text they read can also influence their actions. For example, if someone receives an that they believe is spam but it s actually not, and they delete it, then that may have been an important piece of information that was lost. Automatic text categorization can help to solve this problem by placing the burden of textual analysis on a machine learning algorithm such as support vector machines. Everyday, we are faced with many decisions, whether trivial or significant. Sentiment analysis aims to automate the process of coming to a decision so that one can quickly judge whether or not to do something. For example, if one is deciding to buy a particular product, instead of reading a lengthy review of the product, a sentiment analyzer can quickly determine whether the review has a positive or negative tendency. Although this may be a trivial application, sentiment analysis can be used at much more significant levels to detect terrorist activity. Here are some interesting applications of sentiment analysis: Influence Networks: The study of these is one approach to understanding stakeholder communities. According to Aafia Chaudhry, a physician, the study 1
2 of influence networks is the science of opinion leadership, of innovation adoption (Grimes, 2008a). Measuring Market Effectiveness: Companies spend a lot of money in promoting brand image and awareness. However, study of past consumer activity is of little use in understanding potential buyers who are not responding to market communication. Surveys and social-media mining can fill the gap (Grimes, 2008a). Customer Experience Management/Enterprise Feedback Management: These initiatives aim to take organizations beyond measurement to dynamic stakeholder involvement. Furthermore, these initiatives aspire to determine the voice of the customer in the plethora of enterprise-customer contact points that may include surveys and other forms of communication (Grimes, 2008a). My thesis evaluates the accuracies of various information retrieval techniques in classifying electronic documents by sentiment or text category, using different support vector machine kernels. Two information retrieval techniques were used to quantitatively analyze the documents: a baseline method that simply counts the number of words that relate to the particular category, and a Delta TFIDF approach that actually weights words according to their importance in the document. Next, linear and polynomial SVM kernels were used to train and test the SVM algorithm. Finally, a statistical comparison of the accuracies of the baseline method and the Delta TFIDF method was performed in order to determine which method was more accurate in classifying the documents. Overall, sentiment analysis and text categorization will redefine the way we think about information and how we make our everyday decisions. 2. Background 2.1 Sentiment Analysis and Text Categorization Sentiment analysis seeks to use automated means to determine, extract, and process subjective information found in text. It is then applied to a variety of domains such as blogs, , surveys, and news articles. A prime goal of sentiment analysis is to generate market intelligence and to identify opportunities and issues (Grimes, 2008b). Furthermore, text categorization is the automatic classification of documents into categories from a predefined set. It can be applied to the filing of patents into patent directories, spam filtering, authorship attribution, and many other domains. In fact, the accuracy of contemporary text categorization rivals that of human experts, due to a combination of information retrieval and machine learning technology (Sebastiani, n.d.). 2.2 Document Indexing Two information retrieval techniques were used for document indexing: a baseline method and Delta TFIDF. Document indexing involves mapping a document d j into a condensed representation of its content that can be directly understood by a classifier-building algorithm and by a classifier. A text d j is usually represented as a vector of term weights d w,, w. 1 j T T is the dictionary, which is the set of terms, or j j features, that occur at least once in a particular number of training documents (Sebastiani, n.d.). Some examples of features that occur in a positive review of the movie City of 2
3 Angels are: mediocrity, exhilarating, really enjoyed, and wonderfully (Martineau & Finin, n.d.). An indexing technique is defined by a description of what a term is, and a method to calculate term weights. For defining what a term is, a popular option is to identify terms with the words in the document. Moreover, term weights may be binary-valued or real-valued. Binary weights signify whether or not a particular term is in a document. Non-binary weights are computed by probabilistic or statistical methods (Sebastiani, n.d.). The baseline method simply uses binary-valued term weights and does a word count. On the contrary, Delta TFIDF is an improved version of the standard Term Frequency-Inverse Document Frequency (TFIDF) information retrieval technique. TFIDF is a popular class of statistical term weighting functions. Basically, the more often a particular term appears in a document, the more important it is for the document. This is the term-frequency intuition. Also, the more documents a particular term appears in, the lesser its contribution is in describing the meaning of a document in which it occurs. This is the inverse-document intuition (Sebastiani, n.d.). A dimensionality reduction stage is usually applied in order to decrease the size of the document representations from T, the dictionary, to a much smaller, predefined number. This reduces overfitting, the tendency of the classifier to better classify the data it has been trained on rather than the test data. Moreover, dimensionality reduction is often implemented via feature selection: scoring each term by means of a scoring function that captures its level of positive or negative correlation with the category, and only the highest scoring terms are used for document representation (Sebastiani, n.d.). 2.3 Classifier Learning In this project, support vector machines are used to classify documents into their appropriate categories. This is an instance of supervised learning, in which the SVM algorithm is trained on data that actually contains the class labels of training instances. Furthermore, the set of documents is split into three disjoint sets: the training set, the validation set, and the test set. The training set is the set of data from which the SVM algorithm builds the classifier, the validation set is the set of documents on which the classifier is fine-tuned, and the test set is the set on which the accuracy of the classifier is evaluated (Sebastiani, n.d.). Accuracy is defined as the percentage of test instances that are classified correctly by the SVM algorithm. Lastly, both kinds of errors, false positives and false negatives, are grouped together. 2.4 Support Vector Machines The support vector machine algorithm, from a geometrical perspective, tries to find, among all the decision surfaces σ 1, σ 2, σ 3, in T -dimensional space that separate the positive from the negative training instances, the σ i that separates the positive examples from the negative examples by the widest possible margin. This means that the minimum distance between the hyperplane and a training example is maximized. Furthermore, a benefit that SVMs have for text categorization is that dimensionality reduction is typically unnecessary. This is because SVMs are reasonably impervious to overfitting and can scale up to substantial dimensionalities. Overfitting is the tendency of the classifier to more accurately classify the training data rather than the test data 3
4 (Sebastiani, n.d.). Finally, a linear SVM kernel attempts to linearly separate data while a polynomial SVM kernel tries to separate data via a polynomial function. 3. Related Work Mullen and Collier (2004) utilize SVMs to unite various sources of relevant information, including components of sentences as well as information about the topic of the literature. Moreover, Leopold and Kindermann (2002) claim that in text classification, term-frequency transformations have a larger effect on the performance of SVM than the kernel function itself. This was significant when the statistical comparison of the baseline method and Delta TFIDF was done at the end of the project because this determined which method is most accurate in detecting sentiment and in determining text category. Furthermore, sentiment analysis and text categorization were performed over different scales of data. In particular, they were applied to both sentence-level and document-level data. Furthermore, Joachims (1999) describes a way to address the problem of large tasks. This is important if memory and time constraints inhibit the ability to analyze data. In addition, it is important to note whether SVM solutions are unique or global. In other words, it is important to determine whether a SVM solution applies only to a particular dataset or to a more generic class of data. For example, an SVM solution can be for a particular news article or it can be for an entire newspaper. Moreover, Burges (1998) explains how SVMs can be practically implemented and applies them to pattern recognition. This is significant because not only can sentiment be analyzed, but other patterns can also be inferred, such as political affiliation, nationality, field of study, and many other categories. In general, SVMs are undoubtedly one of the most accurate machine learning algorithms for sentiment analysis and text categorization. 4. Thesis Statement The goal of this project is to evaluate the Delta TFIDF technique on several kinds of text mining problems, in order to determine whether it achieves better accuracy than the baseline method in classifying electronic documents. 5. Methodology The approach that was used in this research can be divided into four principal stages: 1. Featurization: This process involves generating Sparse Feature Vector (SFV) files to be used by the SVM algorithm and document indexing, as described in the background. Finally, the data is separated into training and testing folds using 10-fold cross-validation. 2. Training: In this stage, the SVM algorithm learns from the training folds. 3. Testing: In this stage, the SVM algorithm classifies the test instances into the appropriate categories. 4. Statistical Comparison: The accuracies of the baseline and Delta TFIDF methods in classifying the test instances are compared using two-tailed t tests. 4
5 5.1 Featurization First, the raw data were separated into the appropriate categories. For example, positive product reviews were placed into a positive directory while negative product reviews were placed into a negative directory. Next, the SFV representation of the data points was made and the data was folded into 10 equal-sized non-overlapping sets for cross-validation. Furthermore, a master keys index was produced so that every bigram (Martineau, 2009), or frequently adjacent pair of terms (Sebastiani, n.d.), was given a consistent and unique identifier. Now, the training and testing folds for the baseline method were produced. Next, the folded document counts for the idf scores were produced, to be used in the Delta TFIDF method. Finally, the training and testing folds for the Delta TFIDF experiment were created (Martineau, 2009). In this last stage of featurization, a bag-of-words approach was used, in which each word is associated with a value which is usually that word s frequency in a particular document. The Delta TFIDF approach weights these values by how biased they are to one corpus. This approach assigns feature values for a document by computing the difference of that word s TFIDF scores in the positive and negative training corpora (Martineau & Finin, n.d.). Given the following: 1. C t,d is the frequency of term t in document d 2. P t is the number of documents in the positively labeled training set with term t 3. P is the number of documents in the positively labeled training set 4. N t is the number of documents in the negatively labeled training set with term t 5. N is the number of documents in the negatively labeled training set 6. V t,d is the feature value for term t in document d The feature value for term t in document d is (Martineau & Finin, 2008): P N V t, d Ct, d log2 Ct, d log2 Pt N t P N t, d log2 log Pt Nt C 2 C, d log t 2 P Pt N t N This term frequency transformation increases the significance of words that are unequally distributed between the positive and negative classes, and discounts uniformly distributed words. For sentiment classification, this better characterizes their actual importance within the document. Moreover, the value of an evenly distributed feature is zero, and the more uneven the distribution, the more important a feature should be. Features that are more important in the negative training set than the positive training set 5
6 have a positive score, while features that are more significant in the positive training set than the negative training set have a negative score. This creates a clear linear separation between positive and negative features. Finally, in the domain of sentiment analysis, Delta TFIDF places a much higher weight on sentimental words than either TFIDF or a raw term count (Martineau and Finin, n.d.). Overall, the Delta TFIDF technique very accurately determined the sentiment and text categories of various electronic documents. 5.2 Training During training, the SVM algorithm learned through the feature vectors using Joachim s SVM light software package. This was supervised learning because the training instances actually contained the class labels. 5.3 Testing During testing, the accuracy of the SVM algorithm was tested using Joachim s SVM light software package. This is usually the final stage in the standard machine learning methodology. 5.4 Statistical Comparison Finally, a statistical comparison of the baseline method and the Delta TFIDF approach was done in order to determine which method was most accurate in sentiment and text category detection. This comparison was done using two-tailed t tests. In a twotailed t test, the null hypothesis is a particular value, and there are two alternative hypotheses, one positive and one negative (Stockburger, 1996). The purpose of this test is to determine whether the Delta TFIDF method is more accurate than the baseline method in classifying documents at a statistically significant level. If there is a 95% or greater chance that there is a statistically significant difference between the accuracies of the two methods, then the method that is consistently more accurate than the other one is also statistically more accurate than the other one. 6. Results Experiments were performed for two different domains: product reviews and spam s. The product review domain was chosen because the consumer industry is a major business and consumers could greatly benefit from sentiment analysis. Instead of reading a lengthy product review, they can quickly determine whether or not to buy a particular product by using sentiment analysis. Next, the spam domain was chosen because spam detection is critical for anyone who uses . This is because spam can clog up disk space as well as cause many other problems. For the product reviews, the task was to determine whether a particular review exhibited positive or negative sentiment. Moreover, the task for the spam s was to determine whether or not a particular was spam. 6
7 6.1 Product Review Data The product review data used in this research are associated with (Ding, Liu, & Yu, 2008), (Hu & Liu, 2004a), and (Hu & Liu, 2004b). This data is on the scale of sentences. There were 399 positive reviews and 198 negative reviews. Here are some examples of positive and negative product reviews: Positive Review of Apex AD2600 Progressive-scan DVD player (Ding, Liu, & Yu, 2008), (Hu & Liu, 2004a), (Hu & Liu, 2004b) excellent second dvd, or first dvd for hdtv ready tv. wow! simple to use and hook up. comes with standard rca jacks for output, along with s- video output ( s-video cable not included, must be purchased seperately ) and also component video outputs. the progressive scan option can be turned off easily by a button on the remote control which is one of the simplest and easiest remote controls i have ever seen or used. i also own an " apex ad 1201 " dvd player and have had no problems with it since i purchased it almost 1 1/2 years ago. one big difference between the 1201 and the 2600 models is that the 2600 model is virtually silent. and does n't need to be placed in a cabinet like the 1201 does. friends of mine who own apex tv sets are also all very pleased. i would not hesitate to purchase this if you are uncertain of the brand name. consider it for a future gift too! Negative Review of Apex AD2600 Progressive-scan DVD player (Ding, Liu, & Yu, 2008), (Hu & Liu, 2004a), (Hu & Liu, 2004b) frustrating just hope you never lose / break the remote for this player! we 've purchased 3 universal remotes so far-all claiming to work " apex " dvd players and none worked. called customer service and basically was told to either keep buying univ. remotes to try or buy the replacement remote for $ 23 ( which is almost half of what i paid for the whole player ). if anybody knows of a remote to work this-i 'd love to hear from you! ( on here of course ) also, a couple dvd 's would n't play and they were new ones! 7
8 Linear SVM Kernel The accuracies obtained for the product review data for a linear SVM kernel are given in Table 1. The c-values are the tradeoff between the training error and the margin (Joachims, 2009). Table 1. Accuracies of Baseline and Delta TFIDF Methods using a Linear SVM Kernel for Product Review Data Accuracy Baseline Method Delta TFIDF (c = 10,000) (c = 100,000) Fold % 73.33% Fold % 85.00% Fold % 85.00% Fold % 80.00% Fold % 81.67% Fold % 88.33% Fold % 81.67% Fold % 76.67% Fold % 77.97% Fold % 84.48% Average 79.24% 81.41% P-value Figure 1. Accuracy of Baseline Method vs. Accuracy of Delta TFIDF Method using a Linear SVM Kernel for Product Review Data 8
9 The P-value of indicates that there is a 94.9% chance that there is a statistically significant difference between the accuracies of the baseline method and the Delta TFIDF method in classifying product reviews. This result provides strong evidence that Delta TFIDF is indeed a more accurate way to analyze product reviews than the baseline method is. Finally, it is evident from Figure 1 that the Delta TFIDF method is only slightly more accurate than the baseline method in classifying electronic documents. Polynomial SVM Kernel The accuracies obtained for the product review data for a polynomial SVM kernel are given in Table 2. Table 2. Accuracies of Baseline and Delta TFIDF Methods using a Polynomial SVM Kernel for Product Review Data Accuracy Baseline Method Delta TFIDF (c = 10,000) (c = 10,000) Fold % 73.33% Fold % 85.00% Fold % 85.00% Fold % 80.00% Fold % 83.33% Fold % 88.33% Fold % 81.67% Fold % 76.67% Fold % 77.97% Fold % 84.48% Average 67.66% 81.58% P-value
10 Figure 2. Accuracy of Baseline Method vs. Accuracy of Delta TFIDF Method using a Polynomial SVM Kernel for Product Review Data The P-value of indicates that there is a very high chance that there is a statistically significant difference between the accuracies of the baseline method and the Delta TFIDF method in classifying product reviews. This result also provides strong evidence that Delta TFIDF is indeed a more accurate way to analyze product reviews than the baseline method is. This result also indicates that the Delta TFIDF method has a far better polynomial fit to the data than the baseline method does. This is perhaps due to the weighting of word counts that is done in the former method. Finally, Figure 2 provides strong evidence that the Delta TFIDF method is more accurate than the baseline method because the former method is consistently more accurate than the latter one in classifying electronic documents. 6.2 Spam Data The spam data used in this research are associated with (Metsis, Androutsopoulos, & Paliouras, 2006). This data is on the scale of documents. Ham s are s that are not spam. There were 1,500 spam s and 3,672 ham s. Here are some examples of spam and ham s: Spam (Metsis, Androutsopoulos, & Paliouras, 2006) Subject: get that new car 8434 people nowthe weather or climate in any particular environment can change and affect what people eat and how much of it they are able to eat. 10
11 Ham (Metsis, Androutsopoulos, & Paliouras, 2006) Subject: meter jan 1999 george, i need the following done : jan 13 zero out receipt package id 2666 allocate flow of 149 to deliv package id 392 jan 26 zero out receipt package id 3011 zero out deliv package id 392 these were buybacks that were incorrectly nominated to transport contracts ( ect 201 receipt ) let me know when this is done hc Linear SVM Kernel The accuracies obtained for the spam data for a linear SVM kernel are given in Table 3. Table 3. Accuracies of Baseline and Delta TFIDF Methods using a Linear SVM Kernel for Spam Data Accuracy Baseline Method Delta TFIDF Baseline Method Delta TFIDF (c = 10,000) (c = 10,000) (c = 10,000) (c = 10,000) Fold % 98.84% Fold % 98.84% Fold % 98.46% Fold % 99.23% Fold % 99.61% Fold % 98.84% Fold % 99.42% Average 96.62% 98.92% Fold % 98.84% P-value Fold % 97.87% Fold % 99.23% 11
12 Figure 3. Accuracy of Baseline Method vs. Accuracy of Delta TFIDF Method using a Linear SVM Kernel for Spam Data The P-value of shows that there is a very high probability that the difference between the accuracies of the baseline method and the Delta TFIDF method in classifying spam s is statistically significant. Figure 3 provides additional evidence that Delta TFIDF is more accurate than the baseline in classifying s as spam or ham since the former method s accuracy is consistently, to a visible extent, greater than the latter method s accuracy. Therefore, Delta TFIDF is certainly a great method for the problem of spam detection. 12
13 Polynomial SVM Kernel The accuracies obtained for the spam data for a polynomial SVM kernel are given in Table 4. Table 4. Accuracies of Baseline and Delta TFIDF Methods using a Polynomial SVM Kernel for Spam Data Accuracy Baseline Method Delta TFIDF Baseline Method Delta TFIDF (c = 10,000) (c = 10,000) (c = 10,000) (c = 10,000) Fold % 98.84% Average 90.53% 98.92% Fold % 98.46% P-value Fold % 99.61% Fold % 99.42% Fold % 98.84% Fold % 97.87% Fold % 99.23% Fold % 98.84% Fold % 99.23% Fold % 98.84% Figure 4. Accuracy of Baseline Method vs. Accuracy of Delta TFIDF Method using a Polynomial SVM Kernel for Spam Data 13
14 The P-value of shows that there is a very high probability that the difference between the accuracies of the baseline method and the Delta TFIDF method in classifying spam s is statistically significant. Also, the fact that this P-value is much lower than the one for the linear SVM kernel indicates that Delta TFIDF has a much better polynomial fit than the baseline method does. Finally, Figure 4 indicates that the accuracy of Delta TFIDF is far greater than the accuracy of the baseline, and this provides additional evidence that Delta TFIDF does indeed improve the accuracy of text categorization over the baseline method. 7. Discussion The Delta TFIDF method is far more accurate in classifying electronic documents than the baseline method, which simply does word counts (Martineau & Finin, n.d.). Furthermore, this research could potentially be used in any business intelligence application, including national security, stock market analysis, and many other economical and government initiatives. Not only can this research be used in analyzing text for sentiment, but it could theoretically be extended to analyzing physical human sentiment. This can be very useful in lie detection tests. Overall, this research has great implications for the future in ways that yet cannot be foreseen. For future work, the Delta TFIDF method should be applied to the task of multiclass sentiment analysis and text categorization. In this situation, there are more than two classes for the data. An example of multiclass text categorization would be newsgroup article classification, in which each article is classified into one of many different newsgroups. I started this task and I hope that this work will be continued in the future. In general, given the success of the Delta TFIDF method with the task of binary classification, I am confident that it will also perform well in the task of multiclass classification. 8. Conclusions The Delta TFIDF approach statistically surpasses the baseline method on different scales of data for sentiment analysis and text categorization (Martineau & Finin, n.d.). Also, the SVM approach to text mining promises to be the leading way to classify electronic documents. Given the heterogeneity of various datasets, the information retrieval techniques of word count and Delta TFIDF may sometimes produce very similar results and sometimes very different results. For example, for one particular domain, Delta TFIDF may only be very slightly more accurate than word count and in another domain, it may be much more accurate than word count. Overall, the combination of information retrieval and machine learning technology promises to only improve the accuracy of sentiment analysis and text categorization in the future (Sebastiani, n.d.). Acknowledgements I would like to thank Justin Martineau for guiding me throughout this project and for allowing me to use his featurization code. I would also like to thank Dr. Tim Finin for guiding me as to what would constitute a good honors thesis. Moreover, I would like to thank Xiaowen Ding, Minqing Hu, Bing Liu, and Philip S. Yu for the product review data associated with (Ding, Liu, & Yu, 2008), (Hu & Liu, 2004a), and (Hu & Liu, 2004b). I would also like to thank Vangelis Metsis, Ion Androutsopoulos, and Georgios 14
15 Paliouras for the spam data set associated with (Metsis, Androutsopoulos, & Paliouras, 2006). Finally, I would like to thank Thorsten Joachims for the SVM light program that was used for training and testing the SVM algorithm. References Burges, C. J. (1998). A tutorial on Support Vector Machines for pattern recognition. In U. Fayyad (Ed.), Data mining and knowledge discovery (Rep. No. 2, pp ). Retrieved February 13, 2009, from Kluwer Academic Publishers Web site: Ding, X., Liu, B., & Yu, P. S. (2008, February). A Holistic Lexicon-Based Appraoch to Opinion Mining. Proceedings of First ACM International Conference on Web Search and Data Mining (WSDM-2008). Grimes, S. (2008a, February 19). Sentiment Analysis: A Focus on Applications. BeyeNETWORK: Global coverage of the business intelligence ecosystem. Retrieved May 12, 2009, from Grimes, S. (2008b, January 22). Sentiment Analysis: Opportunities and Challenges. BeyeNETWORK: Global coverage of the business intelligence ecosystem. Retrieved May 12, 2009, from Hu, M., & Liu, B. (2004a, August). Mining and summarizing customer reviews. In ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Symposium conducted at the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Seattle, WA. Retrieved May 4, 2009, from Hu, M., & Liu, B. (2004b, July). Mining Opinion Features in Customer Reviews. In Nineteenth National Conference on Artificial Intelligence. Symposium conducted at the Nineteenth National Conference on Artificial Intelligence, San Jose, CA. Retrieved May 4, 2009, from Joachims, T. (2009, March 21). How to use. In SVM-LIGHT Support Vector Machine. Retrieved May 4, 2009, from Joachims, T. (1999). Making large-scale SVM learning practical. Retrieved February 13, 2009, from Leopold, E., & Kindermann, J. (2002). Text categorization with Support Vector Machines. How to represent texts in input space? In N. Cristianini (Ed.), Machine Learning (Rep. No. 46, pp ). Retrieved February 13, 2009, from Kluwer Academic Publishers Web site: Martineau, J. (2009). Procedure. Unpublished typescript, University of Maryland, Baltimore County, Baltimore, MD. 15
16 Martineau, J., & Finin, T. (2009, May). Delta TFIDF: An Improved Feature Space for Sentiment Analysis. In Third AAAI International Conference on Weblogs and Social Media. Symposium conducted at the third AAAI Conference on Weblogs and Social Media, San Jose, CA. Retrieved May 4, 2009, from Martineau, J., & Finin, T. (2008). Improving the Bag of Words Feature Space for SVM Based Sentiment Analysis. Association for the Advancement of Artificial Intelligence. Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006, July). Spam Filtering with Naive Bayes - Which Naive Bayes? In CEAS Third Conference on and Anti- Spam (pp. 1-9). Mountain View, CA. Retrieved May 4, 2009, from Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. Retrieved February 9, 2009, from nlp_corrected.pdf Rennie, J. D., & Rifkin, R. (2002, April). Improving Multiclass Text Classification with the Support Vector Machine. Retrieved February 13, 2009, from Sebastiani, F. (n.d.). Text Categorization. Retrieved May 15, 2009, from Stockburger, D. W. (1996). Introductory Statistics: Concepts, Models, and Applications. Joplin, MO: Southwest Missouri State University. Retrieved May 17, 2009, from Weiss, S. M., Apte, C., Damerau, F. J., Johnson, D. E., Oles, F. J., Goetz, T., et al. (1999, July/August). Maximizing Text-Mining Performance. IEEE Intelligent Systems, 1094(7167), 2-8. Retrieved May 4, 2009, from 16
A SURVEY ON PRODUCT REVIEW SENTIMENT ANALYSIS
A SURVEY ON PRODUCT REVIEW SENTIMENT ANALYSIS Godge Isha Sudhir ishagodge37@gmail.com Arvikar Shruti Sanjay shrutiarvikar89@gmail.com Dang Poornima Mahesh dang.poornima@gmail.com Maske Rushikesh Kantrao
More informationA Personalized Company Recommender System for Job Seekers Yixin Cai, Ruixi Lin, Yue Kang
A Personalized Company Recommender System for Job Seekers Yixin Cai, Ruixi Lin, Yue Kang Abstract Our team intends to develop a recommendation system for job seekers based on the information of current
More informationPredicting Corporate Influence Cascades In Health Care Communities
Predicting Corporate Influence Cascades In Health Care Communities Shouzhong Shi, Chaudary Zeeshan Arif, Sarah Tran December 11, 2015 Part A Introduction The standard model of drug prescription choice
More informationUsing Decision Tree to predict repeat customers
Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the
More informationPredicting Stock Prices through Textual Analysis of Web News
Predicting Stock Prices through Textual Analysis of Web News Daniel Gallegos, Alice Hau December 11, 2015 1 Introduction Investors have access to a wealth of information through a variety of news channels
More informationText Mining. Theory and Applications Anurag Nagar
Text Mining Theory and Applications Anurag Nagar Topics Introduction What is Text Mining Features of Text Document Representation Vector Space Model Document Similarities Document Classification and Clustering
More informationData Preprocessing, Sentiment Analysis & NER On Twitter Data.
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 73-79 www.iosrjournals.org Data Preprocessing, Sentiment Analysis & NER On Twitter Data. Mr.SanketPatil, Prof.VarshaWangikar,
More informationPredicting user rating on Amazon Video Game Dataset
Predicting user rating on Amazon Video Game Dataset CSE190A Assignment2 Hongyu Li UC San Diego A900960 holi@ucsd.edu Wei He UC San Diego A12095047 whe@ucsd.edu ABSTRACT Nowadays, accurate recommendation
More informationData Mining in Social Network. Presenter: Keren Ye
Data Mining in Social Network Presenter: Keren Ye References Kwak, Haewoon, et al. "What is Twitter, a social network or a news media?." Proceedings of the 19th international conference on World wide web.
More informationPredicting Airbnb Bookings by Country
Michael Dimitras A12465780 CSE 190 Assignment 2 Predicting Airbnb Bookings by Country 1: Dataset Description For this assignment, I selected the Airbnb New User Bookings set from Kaggle. The dataset is
More informationOpinion Mining And Market Analysis
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 10, Number 10 (2015) pp. 25629-25636 Research India Publications http://www.ripublication.com Opinion Mining And Market Analysis
More informationE-Commerce Sales Prediction Using Listing Keywords
E-Commerce Sales Prediction Using Listing Keywords Stephanie Chen (asksteph@stanford.edu) 1 Introduction Small online retailers usually set themselves apart from brick and mortar stores, traditional brand
More informationPredicting Yelp Ratings From Business and User Characteristics
Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online
More informationPredictive Analytics Using Support Vector Machine
International Journal for Modern Trends in Science and Technology Volume: 03, Special Issue No: 02, March 2017 ISSN: 2455-3778 http://www.ijmtst.com Predictive Analytics Using Support Vector Machine Ch.Sai
More informationClassification Model for Intent Mining in Personal Website Based on Support Vector Machine
, pp.145-152 http://dx.doi.org/10.14257/ijdta.2016.9.2.16 Classification Model for Intent Mining in Personal Website Based on Support Vector Machine Shuang Zhang, Nianbin Wang School of Computer Science
More informationA logistic regression model for Semantic Web service matchmaking
. BRIEF REPORT. SCIENCE CHINA Information Sciences July 2012 Vol. 55 No. 7: 1715 1720 doi: 10.1007/s11432-012-4591-x A logistic regression model for Semantic Web service matchmaking WEI DengPing 1*, WANG
More informationDetermining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for
Determining NDMA Formation During Disinfection Using Treatment Parameters Introduction Water disinfection was one of the biggest turning points for human health in the past two centuries. Adding chlorine
More informationThe Customer Is Always Right: Analyzing Existing Market Feedback to Improve TVs
The Customer Is Always Right: Analyzing Existing Market Feedback to Improve TVs Jose Valderrama 1, Laurel Rawley 2, Simon Smith 3, Mark Whiting 4 1 University of Central Florida 2 University of Houston
More informationPrediction of Google Local Users Restaurant ratings
CSE 190 Assignment 2 Report Professor Julian McAuley Page 1 Nov 30, 2015 Prediction of Google Local Users Restaurant ratings Shunxin Lu Muyu Ma Ziran Zhang Xin Chen Abstract Since mobile devices and the
More informationNew restaurants fail at a surprisingly
Predicting New Restaurant Success and Rating with Yelp Aileen Wang, William Zeng, Jessica Zhang Stanford University aileen15@stanford.edu, wizeng@stanford.edu, jzhang4@stanford.edu December 16, 2016 Abstract
More informationThe Art of Ignoring. Hi, I m Alwin Hoogerdijk and my presentation today is about the Art of Ignoring. But first let me introduce myself.
The Art of Ignoring Hi, I m Alwin Hoogerdijk and my presentation today is about the Art of Ignoring. But first let me introduce myself. I am the President and founder of Collectorz.com. We make collection
More informationFORECASTING & REPLENISHMENT
MANHATTAN ACTIVE INVENTORY FORECASTING & REPLENISHMENT MAXIMIZE YOUR RETURN ON INVENTORY ASSETS Manhattan Active Inventory allows you to finally achieve a single, holistic view of all aspects of your inventory
More informationAutomatic Detection of Rumor on Social Network
Automatic Detection of Rumor on Social Network Qiao Zhang 1,2, Shuiyuan Zhang 1,2, Jian Dong 3, Jinhua Xiong 2(B), and Xueqi Cheng 2 1 University of Chinese Academy of Sciences, Beijing, China 2 Institute
More informationReaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks
Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Mahalia Miller Daniel Wiesenthal October 6, 2010 1 Introduction One topic of current interest is how language
More informationSawtooth Software. Sample Size Issues for Conjoint Analysis Studies RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.
Sawtooth Software RESEARCH PAPER SERIES Sample Size Issues for Conjoint Analysis Studies Bryan Orme, Sawtooth Software, Inc. 1998 Copyright 1998-2001, Sawtooth Software, Inc. 530 W. Fir St. Sequim, WA
More informationComputational Gambling
Introduction Computational Gambling Konstantinos Katsiapis Gambling establishments work with the central dogma of Percentage Payout (PP). They give back only a percentage of what they get. For example
More informationBrian Macdonald Big Data & Analytics Specialist - Oracle
Brian Macdonald Big Data & Analytics Specialist - Oracle Improving Predictive Model Development Time with R and Oracle Big Data Discovery brian.macdonald@oracle.com Copyright 2015, Oracle and/or its affiliates.
More informationDesign Like a Pro. Boost Your Skills in HMI / SCADA Project Development. Part 3: Designing HMI / SCADA Projects That Deliver Results
INDUCTIVE AUTOMATION DESIGN SERIES Design Like a Pro Boost Your Skills in HMI / SCADA Project Development Part 3: Designing HMI / SCADA Projects That Deliver Results The end of a project can be the most
More informationA STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET
A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,
More informationExperiences in the Use of Big Data for Official Statistics
Think Big - Data innovation in Latin America Santiago, Chile 6 th March 2017 Experiences in the Use of Big Data for Official Statistics Antonino Virgillito Istat Introduction The use of Big Data sources
More informationOnline Algorithms and Competitive Analysis. Spring 2018
Online Algorithms and Competitive Analysis CS16: Introduction to Data Structures & Algorithms CS16: Introduction to Data Structures & Algorithms Spring 2018 Outline 1. Motivation 2. The Ski-Rental Problem
More informationInsights from the Wikipedia Contest
Insights from the Wikipedia Contest Kalpit V Desai, Roopesh Ranjan Abstract The Wikimedia Foundation has recently observed that newly joining editors on Wikipedia are increasingly failing to integrate
More informationVIDEO 1: WHAT IS CONTENT MARKETING?
VIDEO 1: WHAT IS CONTENT MARKETING? Hi, I m Justin with HubSpot Academy. Welcome to the class on Understanding Content Marketing. This class will introduce you to the world of content marketing and provide
More informationPredicting user rating for Yelp businesses leveraging user similarity
Predicting user rating for Yelp businesses leveraging user similarity Kritika Singh kritika@eng.ucsd.edu Abstract Users visit a Yelp business, such as a restaurant, based on its overall rating and often
More informationApplication of Machine Learning to Financial Trading
Application of Machine Learning to Financial Trading January 2, 2015 Some slides borrowed from: Andrew Moore s lectures, Yaser Abu Mustafa s lectures About Us Our Goal : To use advanced mathematical and
More information2. Materials and Methods
Identification of cancer-relevant Variations in a Novel Human Genome Sequence Robert Bruggner, Amir Ghazvinian 1, & Lekan Wang 1 CS229 Final Report, Fall 2009 1. Introduction Cancer affects people of all
More informationGetting started with digital evidence management. Your complete guide to saving time and money with a digital evidence management system
Getting started with digital evidence management Your complete guide to saving time and money with a digital evidence management system Introduction What is a digital evidence management system? A digital
More informationA proposed Novel Approach for Sentiment Analysis and Opinion Mining
A proposed Novel Approach for Sentiment Analysis and Opinion Mining Ravendra Ratan Singh Jandail Computing Science and Engineering, Galgotias University, India Abstract as the people are being dependent
More informationTEXT MINING APPROACH TO EXTRACT KNOWLEDGE FROM SOCIAL MEDIA DATA TO ENHANCE BUSINESS INTELLIGENCE
International Journal of Advance Research In Science And Engineering http://www.ijarse.com TEXT MINING APPROACH TO EXTRACT KNOWLEDGE FROM SOCIAL MEDIA DATA TO ENHANCE BUSINESS INTELLIGENCE R. Jayanthi
More informationCustomer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara
Customer Relationship Management in marketing programs: A machine learning approach for decision Fernanda Alcantara F.Alcantara@cs.ucl.ac.uk CRM Goal Support the decision taking Personalize the best individual
More informationVideo Traffic Classification
Video Traffic Classification A Machine Learning approach with Packet Based Features using Support Vector Machine Videotrafikklassificering En Maskininlärningslösning med Paketbasereade Features och Supportvektormaskin
More informationADWORDS IS AN AUTOMATED ONLINE AUCTION. WITHIN A CAMPAIGN, YOU IDENTIFY KEYWORDS THAT TRIGGER YOUR ADS TO APPEAR IN SPECIFIC SEARCH RESULTS.!
1. What is AdWords? ADWORDS IS AN AUTOMATED ONLINE AUCTION. WITHIN A CAMPAIGN, YOU IDENTIFY KEYWORDS THAT TRIGGER YOUR ADS TO APPEAR IN SPECIFIC SEARCH RESULTS. This type of campaign is called a Search
More informationThe New Marketing Metrics for B2B. Measurements that really matter to the success of your business
The New Marketing Metrics for B2B Measurements that really matter to the success of your business Table of Contents Introduction Step 1: Analyze Your Customer s Buying Process Step 2: Identify Your Marketing
More informationCommunication Intelligence in the Mailstream:
Customer Communication Management Communication Intelligence in the Mailstream: A Customer Communication Management Getting and keeping customers and doing it profitably is a challenge as old as commerce
More informationMillennials are crowdsourcingyouhow companies and brands have the chance to do
millennial pulse 2017 SPECIAL REPORT Millennials are crowdsourcingyouhow companies and brands have the chance to do what Millennials think they can t do themselves Be the crowd. Millennials are counting
More informationRank hotels on Expedia.com to maximize purchases
Rank hotels on Expedia.com to maximize purchases Nishith Khantal, Valentina Kroshilina, Deepak Maini December 14, 2013 1 Introduction For an online travel agency (OTA), matching users to hotel inventory
More informationPredictive Modelling for Customer Targeting A Banking Example
Predictive Modelling for Customer Targeting A Banking Example Pedro Ecija Serrano 11 September 2017 Customer Targeting What is it? Why should I care? How do I do it? 11 September 2017 2 What Is Customer
More informationCHANNELADVISOR WHITE PAPER. Everything You Ever Wanted to Know About Feedback on EBay
CHANNELADVISOR WHITE PAPER Everything You Ever Wanted to Know About Feedback on EBay Everything You Ever Wanted to Know About Feedback on EBay 2 An important part of successful selling on ebay is the feedback
More informationAnalytics for Banks. September 19, 2017
Analytics for Banks September 19, 2017 Outline About AlgoAnalytics Problems we can solve for banks Our experience Technology Page 2 About AlgoAnalytics Analytics Consultancy Work at the intersection of
More informationSURVEY PAPER ON TECHNIQUES USED IN OPINION MINING
SURVEY PAPER ON TECHNIQUES USED IN OPINION MINING Vikrant R. Harmalkar 1, Omkar H. Jagdale 2, Swati N. Chavan 3, Prof. Nidhi Sharma 4 1,2,3,4 Department of CSE, BVCOENM, Abstract With the growing availability
More informationPrinciples of Verification, Validation, Quality Assurance, and Certification of M&S Applications
Introduction to Modeling and Simulation Principles of Verification, Validation, Quality Assurance, and Certification of M&S Applications OSMAN BALCI Professor Copyright Osman Balci Department of Computer
More informationPredicting the Odds of Getting Retweeted
Predicting the Odds of Getting Retweeted Arun Mahendra Stanford University arunmahe@stanford.edu 1. Introduction Millions of people tweet every day about almost any topic imaginable, but only a small percent
More informationHow to Use a Weird "Trade- In" Loophole to Bank $300 to $500 PER DAY
How to Use a Weird "Trade- In" Loophole to Bank $300 to $500 PER DAY Presented by: Luke Sample Hosted by: John S. Rhodes Copyright 2016 WebWord, LLC. All Rights Reserved. This guide may not be reproduced
More informationGIVING ANALYTICS MEANING AGAIN
GIVING ANALYTICS MEANING AGAIN GIVING ANALYTICS MEANING AGAIN When you hear the word analytics what do you think? If it conjures up a litany of buzzwords and software vendors, this is for good reason.
More informationFrom Relevance Laggard to Leader
From Relevance Laggard to Leader Becoming more relevant to your customers, communities and staff WWW.COVEO.COM 1 JANUARY 23, 2017 The Coveo Relevance Maturity Model Cheap Search is Expensive. Your customers
More informationBig Data. Methodological issues in using Big Data for Official Statistics
Giulio Barcaroli Istat (barcarol@istat.it) Big Data Effective Processing and Analysis of Very Large and Unstructured data for Official Statistics. Methodological issues in using Big Data for Official Statistics
More informationHello Attribution. Goodbye Confusion. A comprehensive guide to the next generation of marketing attribution and optimization
Hello Attribution Goodbye Confusion A comprehensive guide to the next generation of marketing attribution and optimization Table of Contents 1. Introduction: Marketing challenges...3 2. Challenge One:
More informationDelivering success online
The Edelytics Guide to School Marketing Horses for Courses Edelytics is the perfect mix of admission expertise and digitally proficiency. Our team comprises of exadmission heads, SEO experts, P.R. professionals,
More informationLumière. A Smart Review Analysis Engine. Ruchi Asthana Nathaniel Brennan Zhe Wang
Lumière A Smart Review Analysis Engine Ruchi Asthana Nathaniel Brennan Zhe Wang Purpose A rapid increase in Internet users along with the growing power of online reviews has given birth to fields like
More informationNow, I wish you lots of pleasure while reading this report. In case of questions or remarks please contact me at:
Preface Somewhere towards the end of the second millennium the director of Vision Consort bv, Hans Brands, came up with the idea to do research in the field of embedded software architectures. He was particularly
More informationTURNING TWEETS INTO KNOWLEDGE. An Introduction to Text Analytics
TURNING TWEETS INTO KNOWLEDGE An Introduction to Text Analytics Twitter Twitter is a social networking and communication website founded in 2006 Users share and send messages that can be no longer than
More informationMining the reviews of movie trailers on YouTube and comments on Yahoo Movies
Mining the reviews of movie trailers on YouTube and comments on Yahoo Movies Li-Chen Cheng* Chi Lun Huang Department of Computer Science and Information Management, Soochow University, Taipei, Taiwan,
More informationRECOGNIZING USER INTENTIONS IN REAL-TIME
WHITE PAPER SERIES IPERCEPTIONS ACTIVE RECOGNITION TECHNOLOGY: RECOGNIZING USER INTENTIONS IN REAL-TIME Written by: Lane Cochrane, Vice President of Research at iperceptions Dr Matthew Butler PhD, Senior
More informationSUPPORTING INVESTMENT MANAGEMENT PROCESSES WITH MACHINE LEARNING TECHNIQUES
Association for Information Systems AIS Electronic Library (AISeL) Wirtschaftsinformatik Proceedings 2009 Wirtschaftsinformatik 2009 SUPPORTING INVESTMENT MANAGEMENT PROCESSES WITH MACHINE LEARNING TECHNIQUES
More informationPredicting International Restaurant Success with Yelp
Predicting International Restaurant Success with Yelp Angela Kong 1, Vivian Nguyen 2, and Catherina Xu 3 Abstract In this project, we aim to identify the key features people in different countries look
More informationMembers Guide for MasterResellRights.com.
Members Guide for MasterResellRights.com. A Word from Connor First of all I want to thank you for being a member of MRR and it's network of sites, it is truly appreciated. This guide was written as a way
More informationWhite Paper. Demand Signal Analytics: The Next Big Innovation in Demand Forecasting
White Paper Demand Signal Analytics: The Next Big Innovation in Demand Forecasting Contents Introduction... 1 What Are Demand Signal Repositories?... 1 Benefits of DSRs Complemented by DSA...2 What Are
More informationPrediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework
Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework Lu Sun 1, Kiejin Park 2 and Limei Peng 1 1 Department of Industrial Engineering, Ajou
More informationVisualizing Crowdfunding
Visualizing Crowdfunding Alexander Chao UC Berkeley B.A. Statistics 2015 2601 Channing Way Berkeley, Ca 94704 alexchao56@gmail.com ABSTRACT With websites such as Kickstarter and Indiegogo rising in popular
More informationSplitting Approaches for Context-Aware Recommendation: An Empirical Study
Splitting Approaches for Context-Aware Recommendation: An Empirical Study Yong Zheng, Robin Burke, Bamshad Mobasher ACM SIGAPP the 29th Symposium On Applied Computing Gyeongju, South Korea, March 26, 2014
More informationTweeting Questions in Academic Conferences: Seeking or Promoting Information?
Tweeting Questions in Academic Conferences: Seeking or Promoting Information? Xidao Wen, University of Pittsburgh Yu-Ru Lin, University of Pittsburgh Abstract The fast growth of social media has reshaped
More informationCONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK
CONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK Ms.S.P.VidhyaPriya 1,B.Gokhila 2, T.Santhiya 3, K.Saranya 4 1 M.E.,Assistant Professor-CSE, Kathir College Of Engineering,
More informationSticky Sites LESSON PLAN. Essential Question How do websites attract visitors and keep them there?
LESSON PLAN Sticky Sites Essential Question How do websites attract visitors and keep them there? Lesson Overview Students learn about some of the features that attract and retain visitors to websites.
More informationReal-Time ERP / MES Empowering Manufacturers to Deliver Quality Products On-Time
Real-Time ERP / MES Empowering Manufacturers to Deliver Quality Products On-Time KEN HAYES, CPIM, OCP VICE PRESIDENT, NEW PRODUCT DEVELOPMENT PROFITKEY INTERNATIO NAL Sponsored by Real-time is a commonly
More informationInsurance Marketing Benchmarks Report
Insurance Marketing Benchmarks Report 2017 Introduction How can I attract and maintain policyholders? That s a question successful insurance agents ask themselves on a regular basis. Better coverage, competitive
More informationA PRIMER TO MACHINE LEARNING FOR FRAUD MANAGEMENT
A PRIMER TO MACHINE LEARNING FOR FRAUD MANAGEMENT TABLE OF CONTENTS Growing Need for Real-Time Fraud Identification... 3 Machine Learning Today... 4 Big Data Makes Algorithms More Accurate... 5 Machine
More informationUnravelling Airbnb Predicting Price for New Listing
Unravelling Airbnb Predicting Price for New Listing Paridhi Choudhary H John Heinz III College Carnegie Mellon University Pittsburgh, PA 15213 paridhic@andrew.cmu.edu Aniket Jain H John Heinz III College
More informationDETECTING COMMUNITIES BY SENTIMENT ANALYSIS
DETECTING COMMUNITIES BY SENTIMENT ANALYSIS OF CONTROVERSIAL TOPICS SBP-BRiMS 2016 Kangwon Seo 1, Rong Pan 1, & Aleksey Panasyuk 2 1 Arizona State University 2 Air Force Research Lab July 1, 2016 OUTLINE
More informationSunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
More informationLimits of Software Reuse
Technical Note Issued: 07/2006 Limits of Software Reuse L. Holenderski Philips Research Eindhoven c Koninklijke Philips Electronics N.V. 2006 Authors address: L. Holenderski WDC3-044; leszek.holenderski@philips.com
More informationThe Economics of E-commerce and Technology. The Nature of Technology Industries
The Economics of E-commerce and Technology The Nature of Technology Industries 1 Technology Firms are Different Main ideas so far can be applied to any firm Porter s five forces Competitive advantage Technology
More informationMeasuring Cross-Device, The Methodology
Measuring Cross-Device, The Methodology As the first company to crack-the-code on cross-screen, Tapad Data Scientists are asked to explain the power of our cross-screen technology on a near-daily basis.
More information"Nothing," replied the artist, "will ever be attempted, if all possible objections must first be overcome."
PERSONALIZED QUESTIONNAIRES FOR CANADA'S ANNUAL SURVEY OF MANUFACTURES John S. Crysdale, Statistics Canada 13-C8 Jean Talon Building, Ottawa, Ontario, Canada K1A 0T6 "Nothing," replied the artist, "will
More informationIntroduction to Analytics Tools Data Models Problem solving with analytics
Introduction to Analytics Tools Data Models Problem solving with analytics Analytics is the use of: data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based
More informationAnalyzing Customer Behavior at Amazon.com
Analyzing Customer Behavior at Amazon.com Andreas S. Weigend Chief Scientist, Amazon.com KDD: August 2003 SAS: October 2003 Analyzing Customer Behavior at Amazon.com Andreas S. Weigend Chief Scientist,
More informationTracking #metoo on Twitter to Predict Engagement in the Movement
Tracking #metoo on Twitter to Predict Engagement in the Movement Ana Tarano (atarano) and Dana Murphy (d km0713) Abstract: In the past few months, the social movement #metoo has garnered incredible social
More information2 Maria Carolina Monard and Gustavo E. A. P. A. Batista
Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of
More informationHow ToolsGroup s SO99+ Complements SAP APO
White Paper Powerfully Simple How ToolsGroup s SO99+ Complements SAP APO March, 2014 White Paper Powerfully Simple The SAP Planning Platform 3 What s Changed? 3 More Products, Shorter Life Spans 4 Planner
More informationWHITE PAPER HOW TO SET MANAGED SERVICES PRICING BY KARL W. PALACHUK
WHITE PAPER HOW TO SET MANAGED SERVICES PRICING BY KARL W. PALACHUK Price is what you pay. Value is what you get. Warren Buffett INTRODUCTION Whether you re moving from on-demand support to managed services
More informationThe Importance of Supplementing NPS Scores with Insights Drawn from Real Comments and Reviews. Whitepaper
The Importance of Supplementing NPS Scores with Insights Drawn from Real Comments and Reviews Whitepaper INTRODUCTION/EXECUTIVE SUMMARY The Net Promoter Score (NPS) system has transformed the way businesses
More informationECONOMIC MACHINE LEARNING FOR FRAUD DETECTION
ECONOMIC MACHINE LEARNING FOR FRAUD DETECTION Maytal Saar-Tsechansky 2015 UT CID Report #1511 This UT CID research was supported in part by the following organizations: identity.utexas.edu ECONOMIC MACHINE
More informationOlin Business School Master of Science in Customer Analytics (MSCA) Curriculum Academic Year. List of Courses by Semester
Olin Business School Master of Science in Customer Analytics (MSCA) Curriculum 2017-2018 Academic Year List of Courses by Semester Foundations Courses These courses are over and above the 39 required credits.
More informationIntroduction. Context for Digital Transformation. Customer Experience
Introduction The last decade has seen a massive shift in our economy and we are starting to see entire industries disrupted and transformed. Business models that were stable for decades or centuries have
More informationOntosCAI Competitive Affairs/Intelligence Analyze, Monitor, Understand your Competitive Environment
NOW YOU KNOW [ SERIES] OntosCAI Competitive Affairs/Intelligence Analyze, Monitor, Understand your Competitive Environment [DANIEL HLADKY, ONTOS INTERNATIONAL AG] Competition has always been central to
More informationKnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration
KnowledgeSTUDIO Advanced Modeling for Better Decisions Companies that compete with analytics are looking for advanced analytical technologies that accelerate decision making and identify opportunities
More informationMarketing & Big Data
Marketing & Big Data Surat Teerakapibal, Ph.D. Lecturer in Marketing Director, Doctor of Philosophy Program in Business Administration Thammasat Business School What is Marketing? Anti-Marketing Marketing
More informationAn Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection
An Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection T. Maruthi Padmaja 1, Narendra Dhulipalla 1, P. Radha Krishna 1, Raju S. Bapi 2, and A. Laha 1 1 Institute for
More informationCOMMERCIAL INTENT HOW TO FIND YOUR MOST VALUABLE KEYWORDS
COMMERCIAL INTENT HOW TO FIND YOUR MOST VALUABLE KEYWORDS COMMERCIAL INTENT HOW TO FIND YOUR MOST VALUABLE KEYWORDS High commercial intent keywords are like invitations from prospective customers. They
More informationPredictive analytics [Page 105]
Week 8, Lecture 17 and Lecture 18 Predictive analytics [Page 105] Predictive analytics is a highly computational data-mining technology that uses information and business intelligence to build a predictive
More informationThe Big PowerPoint Study. Where is time wasted and how we can prevent it
The Big PowerPoint Study Where is time wasted and how we can prevent it 2 How We Waste Valuable Time Working With PowerPoint A B2B Study by GfK on Behalf of Made in Office The average office employee spends
More information