Modeling and Predicting User Interests based on Taxonomy. Makoto Nakatsuji

Size: px
Start display at page:

Download "Modeling and Predicting User Interests based on Taxonomy. Makoto Nakatsuji"

Transcription

1 Modeling and Predicting User Interests based on Taxonomy Makoto Nakatsuji

2

3 Abstract In the thesis, we analyze user interests based on a domain specific taxonomy. We propose modeling user interests and measuring similarity of users according to the taxonomy in the domain. Then we apply our method to recommender systems. We propose identifying topics, those that include new concepts that are likely be interesting to the user even though those concepts are not present in the user profile. We try to expand user interests significantly by letting the user browse those topics. Recommender systems are widely used by content providers to drive their commercial success. Many content providers adopt methods based on collaborative filtering (CF), which is a broad term for the process of recommending items to an active user, who receives the recommendation, based on the intuition that users who access the same items with the user tend to have similar interests with the user. Basic CF methods measure the similarity of users only from the co-rating behaviors against items, and compute recommendation for the active user by analyzing the items possessed by the most similar users with the user. As a result, they are apt to recommend the types of items that have already been accessed by the user. For example, if the user highly rates a horror movie (as an item), the typical CF methods recommend items that were made by the same director, performed by the same actors, or included in the same genre, horror. Those items are not truly novel since they are often already known to the user, or easily discovered by the user. We also apply our method to knowledge management in a system develi

4 opment domain. We extract user knowledge of system development from accumulated mails for a system development project. Developers responsible for handling modules or development procedures often collaborate with each other in the course of their work. Given the long development schedules common for complex projects, some turn over of personnel must be accepted. It is essential that the new people be able to utilize the know-who and know-how information created by the original experts as contained in the logs of message systems. With respect to the above issues, the thesis studies the following topics. 1. Identifying novel topics based on user interests. We introduce the method that extracts user interests from users blog entries and measures similarity of users according to the taxonomy of items. We also introduce a new measure, score of novelty, to understand how novel the recommended items are for the user based on the taxonomy. This metric is useful in two ways. First, it presents, in an easy to understand manner, the relationship between the user s present interests and the target item. That is, the user can understand why the presented items are different from those that user has accessed before. Second, the user can boost the novelty threshold if she wants items that are completely unknown to her. We start with a proposal to build user interests according to a taxonomy of items. We consider that users who like items, may like the classes that include those items. Our method thus reflects the rating of the user on an item to that of a class that includes that item. We then measure similarity of users by using not only co-rating behaviors against items but also those against classes in the taxonomy. As a result, we can identify many items accurately for the user by analyzing the items of users who share the same items and/or same classes with the user. Then, it creates a user graph whose nodes are users; weighted edges are set between users according to their similarity. It performs Random Walk with Restarts over the user graph and extracts user nodes that are frequently passed by the walk, even though ii

5 weights on the edges from the starting node to those nodes are not high. The users so extracted are likely to have items with high novelty for the starting node user. An offline evaluation conducted on several datasets finds that our method identifies more novel items with higher accuracy than previous methods. We also perform an online experiment for analyzing user reactions to topics recommended based on our assessments. By analyzing the frequency of user access to novel items output by our recommendation scheme over time, we confirmed the effectiveness of our novel topic recommendation. We found that the novel topics recommended by our technique were used for creating new communication links between users; this was confirmed by evaluating the frequency of comments between users who came to know each other through our online recommendations. 2. Cross-domain recommendations over domain specific user graphs Content providers want to make recommendations across multiple interrelated domains such as music and movies. However, existing collaborative filtering methods fail to accurately identify items that may be interesting to the user but that lie in domains that the user has not accessed before. Our method is based on the observation that users who share similar items or who share social connections, can provide recommendation chains (sequences of transitively associated edges) to items in other domains. It first builds domain-specific-user graphs (DSUGs) whose nodes, users, are linked by weighted edges that reflect the similarity of users. It then connects the DSUGs via the users who rated items in several domains, to create a cross-domain-user graph (CDUG). It performs Random Walk with Restarts on the CDUG to extract user nodes that are related to the starting user node on the CDUG even though they are not present in the DSUG of the starting user node. Then, it incorporates items possessed by those users to recommendations of the iii

6 starting node user. Furthermore, to extract many more user nodes, we employ our proposed taxonomy-based similarity measure that states that users are similar if they share the same items and/or same classes. Thus we can set many suitable routes from the starting user node to other user nodes in the CDUG. An evaluation of user implicit ratings against items in two interrelated domains as extracted from a blog portal, indicates that our method identifies potentially interesting items in other domains with higher accuracy than is possible with existing CF methods. 3. Analyzing the developer s knowledge based on development-related taxonomies Product developers frequently discuss topics related their development project with others, but often use technical terms whose meanings are not clear to non-specialists. To provide nonexperts with precise and comprehensive understanding of the information discussed, the method proposed herein categorize the messages using a taxonomy of products developed and a taxonomy of tasks relevant to those products. The instances in the taxonomy are products and/or tasks manually selected as relevant to the system development. We apply our previously proposed method that extracts user interests from blogs, to the developer s knowledge extraction from the mail messages accumulated in mailing lists. The problem is that there is no taxonomy related to system development, thus we semi-automatically enrich the taxonomy from the mails accumulated in the mailing list for the system development. Using such expanded taxonomy, we can analyze user knowledge accurately. This provides the concrete application example that drives forward the taxonomy based knowledge management. Thus, in this research we have proposed a taxonomy-based method of user knowledge analysis from the viewpoint of user s knowledge management and providing novelty to the user. The method proposed by this thesis iv

7 makes the research direction in knowledge management of users based on the taxonomy and novel item identification for the future recommender systems. v

8

9 Acknowledgments I wish to express my sincere gratitude to my supervisor, Professor Toru Ishida at Kyoto University, for his continuous guidance, valuable discussion and advice. I gratefully appreciate my thesis committee at Kyoto University, Professor Katsumi Tanaka and Professor Toyoaki Nishida for their valuable comments. Associate Professor Shigeo Matsubara, and Assistant Professor Hiromitsu Hattori always gave me kind advice and helpful comments. Coordinators at Ishida & Matsubara laboratory, Ms. Yoko Kubota and Ms. Terumi Kosugi usually help me with my office tasks. I wish to thank all members of Ishida & Matsubara laboratory for their support in numerous ways. I appreciate the constructive discussions and the kindly advices from my colleagues, Dr. Yoshihiro Otsuka, Dr. Ko Fujimura, Mr. Tadasu Uchiyama, Mr. Makoto Yoshida, Mr. Akimichi Tanaka, Dr. Toshio Uchiyama, Mr. Tatsuyuki Kimura, Mr. Yu Miyoshi, and Mr. Yasuhiro Fujiwara. I wish to thank all members of NTT Cyber Solutions Laboratories and NTT Network Service Systems Laboratories for their support in numerous ways. Finally, I want to express my gratitude to my family; my parents Hiroshi Nakatsuji, Mieko Nakatsuji, my elder brother Satoru Nakatsuji, my elder brother Susumu Nakatsuji, and my wife Kumiko Nakatsuji. They gave me sincere support and encouragement. vii

10

11 Contents 1 Introduction Objectives Research Issues Thesis Outline Background Collaborative filtering Accurate Item Recommendation Novel Item Recommendation Cross-domain Recommendation Random Walk with Restarts Identifying novel topics based on user interests Introduction Background and our purpose Approaches Impact of applications based on our method Related works Collaborative Filtering Interest ontology extraction Designing service-domain ontology Interest ontology generation algorithm Introducing interest weight to ontology Detecting novelty by similarity measurements ix

12 3.5.1 Interest-weight-based similarity measurement Innovative blog-entry detection Offline Experimental results Datasets and methodology Measuring interest distributions of blog users Measuring performance of extracted interest ontology Comparing filtering algorithms Analyzing size of user-oriented community Measuring performance of detecting novel topics Online Experimental results Explaining our online experiment Evaluating recommendation results based on extracted interest ontology Evaluating detection of novel topics Evaluating activation of blog community Summary Discussion Summary Analyzing accuracy and novelty of taxonomy-based recommendations Introduction Related Works Background Collaborative Filtering Random Walk with Restarts Proposed Method Problem Definition Modeling user interests Measuring user similarity Identifying highly novel items from the user graph Evaluation Datasets x

13 4.5.2 Methodology Compared methods Results on accuracy Results on item novelty Results when extending movie taxonomy Results when using restaurant dataset Summary Cross-domain recommendations over domain speciific user grap Introduction Related Works Background Collaborative Filtering Random Walk with Restarts Method Creating a cross-domain-user graph (CDUG) Identifying items in other domains Modeling user interests Measuring similarity of users Evaluation Datasets Methodology Compared methods Results Summary Analyzing the developer s knowledge based on developmentrelated taxonomies Introduction Related Works Method The design of taxonomies Method xi

14 6.3.3 Analyzing know-who using expanded taxonomy Analyzing semantic relationships between similar users Analyzing know-how using expanded taxonomies Evaluation Dataset and methodology Evaluating the characteristics of relevant concepts Evaluating the effectiveness of know-who analysis Evaluating the effect of know-how analysis Summary Conclusion 115 xii

15 List of Figures 3.1 Procedure for designing service-domain ontology Procedure for generating interest ontologies Hops in filtering algorithms Applying interest weight to ontology Measuring similarity based on degree of interest agreement Definitions of terms and examples Community creation service of recommending innovative blog entries Experimental results of user distributions and ontology extraction Experimental results of our ontology extraction and detection of novel topics (a) number of users obtained by changing X. (b) number of users that have high interest weight after changing X Snapshot of online experimental service DoblogMusic User accesses to DoblogMusic Increasing user-user communication through DoblogMusic Taxonomy in movie domain An explanation of score of novelty Measuring similarity of user a and u Example of extended taxonomy of MovieLens items Example of creating the column-normalized adjacency matrix of the CDUG xiii

16 5.2 Measuring similarity of users a and u MAE when we set T D = Image of taxonomy with relevant and unified concepts Procedure of assigning s Explanatory image of classifying phrases to concepts Assigning taxonomy-based semantic tags to the relationships between users Accuracy of extracting relevant concepts Results of know-who analysis. (X axis indicates the number of users and Y axis indicates accuracy of the results.) Results of know-how analysis. (MS means mail set.) xiv

17 List of Tables 4.1 Definition of main symbols MAE against movie dataset MAE against non-japanese music dataset MAE against Japanese music dataset Prediction coverage versus item novelty against non- Japanese music dataset Prediction coverage versus item novelty against Japanese music dataset MAE against movie dataset with extended taxonomy MAE against restaurant dataset Prediction coverage versus item novelty against restaurant dataset xv

18

19 Chapter 1 Introduction 1.1 Objectives Recommender systems are widely used by content providers to drive their commercial success. Many content providers adopt methods based on collaborative filtering (CF), which is a broad term for the process of recommending items to an active user, who receives the recommendation, based on the intuition that users who access the same items with the user tend to have similar interests with the user. Basic CF methods measure the similarity of users only from the co-rating behaviors against items, and compute recommendation for the active user by analyzing the items possessed by the most similar users with the user. As a result, they are apt to recommend the types of items that have already been accessed by the user. For example, if the user highly rates a horror movie (as an item), the typical CF methods recommend items that were made by the same director, performed by the same actors, or included in the same genre, horror. Those items are not truly novel since they are often already known to the user, or easily discovered by the user. One solution of this problem is increasing the diversity of the items in the recommendation list for the active user. The list includes items in several classes defined by a taxonomy. However, those schemes fail to consider the semantic relationships between a user and items that are recommended 1

20 to the user. The semantic relationships described above are necessary if the user is to accept recommended items, especially when the user has not thought of those items before. Our purpose is to expand the user s interests significantly by identifying topics that are not lie in the class that the user accessed before, and recommending those to the user. However, it is generally difficult to understand that what types of class the user is interested in and what types of items are not lie in the class that the user accessed before. In this thesis, we propose modeling user interests in detail following a taxonomy of items. Then measure similarity of users according to the taxonomy of items. By analyzing the relationships of the present user knowledge and recommended items, we can understand what types of items are recommended to the user and how far the recommended items are from the present interests of the user. We also apply our user knowledge modeling according to the taxonomy to a system development domain and analyze what types of development activities the user well understood. 1.2 Research Issues In this thesis, we present the following research issues. 1. How to analyze knowledge of users 2. How to identify items that are not known to the user but may be interesiting to the user To solve the former problem, we introduce the taxonomy-based approach. Taxonomy of items are designed by the service provider to enable their customers to access their preference items easily. Thus, we consider that users who like items, may like the classes that include those items. Our method thus reflects the rating of the user on an item to that of a class that includes that item. Next, it measures similarity of users by using not only co-rating behaviors against items but also those against classes in the taxonomy. As a result, we can accurately identify many items for the active 2

21 user by analyzing the interests of users who share same items and/or same classes with the active user. In the system development domain, we apply our taxonomy-based approach to the developper s knowledge extraction from the mail messages accumulated in mailing lists. The problem is that there is no taxonomy related to system development, thus we semi-automatically enrich the taxonomy from the mails accumulated in the mailing list for the system development. Using such expanded taxonomy, we can analyze user knowledge accurately. To solve the latter problem, we introduce the definition, the score of novelty, as the smallest number of hops from the class user accessed before to the class that includes a possible item over the taxonomy. It indicates the relationships between the present interests of a user and the items that are recommended to the user, by using a taxonomy of items defined by service designers. By using this measure, the active user can understand what types of items are recommended to him and how far the recommended items are from the present interests of the user. Here, we consider concept as a class defined in the taxonomy created in each service domain. By presenting items with supporting information such as novelty of those items, the user can more readily become interested in items not stored in his/her profile and so acquire new interests. Furthermore, to identify items with higher novelty, we introduce a graphbased approach. We consider users who are similar to the active user tend to share the same items in the same classes with the user, and so are not likely to provide items with high novelty for the user. We create a user graph whose nodes are users and that sets weighted edges between users according to their similarity. We perform Random Walk with Restarts (RWR)[60] on the user graph, and try to extract user nodes that the walk arrives frequently even though the weights on the edge from the starting node to those extracted user nodes are small; this means that such users share less items with the active user and/or less classes with the active user. Thus, we incorporate items held by the users so discovered to compute the prediction values for the starting node user to identify items with higher novelty. 3

22 We also try to identify items of interest in domains that the active user has not accessed before. It will a key tool for the content providers that want to offer items across multiple interrelated domains, especially when they have a large rating datasets in some domains while for some other domains they can collect only limited rating datasets. Our approach is based on the observation that users who share similar items or who share social connections, can provide recommendation chains (sequences of transitively associated edges) to items in other domains. It first builds domain-specific-user graphs (DSUGs) whose nodes, users, are linked by weighted edges that reflect the similarity of users. It then connects the DSUGs via the users who rated items in several domains or via the users who share social connections, to create a cross-domain-user graph (CDUG). It performs Random Walk with Restarts on the CDUG to extract user nodes that are related to the starting user node on the CDUG even though they are not present in the DSUG of the starting user node. Then, it incorporates items possessed by those users to recommendations of the starting node user. 1.3 Thesis Outline This thesis consists of seven chapters, including this chapter as the introduction. Chapter 2 is dedicated to introduce the background of this thesis and describe about present studies of user modeling and novel topic identification using collaborative filtering technique. We also introduce Random Walk with Restarts (RWR) that measures the relatedness of two nodes in the graph because we use RWR to measure the relatedness between users. First, we will see overall of collaborative filtering technique. Second, we review related works of collaborative filtering, which can be categorized into three parts: accurate item recommendation, novel item recommendation, and cross-domain recommendation. Third, we see usages of RWR in the field of information retirieval and recommendation studies. 4

23 Chapter 3 introduces the notion of novel topics, those that includes new concepts that are likely be interesting to the user even though those concepts are not present in the user profile. We try to expand user interests significantly by letting the user browse those topics. We introduce a new measure, score of novelty, to understand how novel the recommended items are for the user and try to identify items with high novelty for the user, while also guaranteeing highly accurate recommendation results. We first build interests of a user as a hierarchy of classes where a rating value of the user is assigned to each class and item. Next, we measure the similarity of users using user ratings against items as well as those against classes and generate a user group that has high similarity to the user. The novel topics for the user are then identified with the score of novelty, by determining a suitable size of the user group and analyzing the items possessed by users in the user group. We perform an online experiment for analyzing user reactions to topics recommended based on our assessments. By analyzing the frequency of user access to novel items output by our recommendation scheme over time, we confirmed the effectiveness of our novel topic recommendation. We found that the novel topics recommended by our technique were used for creating new communication links between users; this was confirmed by evaluating the frequency of comments between users who came to know each other through our online recommendations. Chapter 4 analyzes our taxonomy-based recommendation method from the viewpoint of accuracy and novelty of the prediction results. Our method takes two approaches. First, it measures similarity of users using items rated by users and a taxonomy of items. It can identify for the user many items accurately. Second, it creates a user graph whose nodes are users; weighted edges are set between users according to their similarity. It performs Random Walk with Restarts over the user graph and extracts user nodes that are frequently passed by the walk, even though weights on the edges from the starting node to those nodes are not high. The users so extracted are likely to have items with high novelty for the starting node user. An evaluation conducted on several data sets finds that our method identifies more novel items with higher accuracy than previous methods. 5

24 Chapter 5 extends our taxonomy-based recommendtion to identify items that may be interesting to the user but that lie in domains that the user has not accessed before. Content providers want to make recommendations across multiple interrelated domains such as music and movies. However, existing collaborative filtering methods fail to accurately identify such items. Our method is based on the observation that users who share similar items or who share social connections, can provide recommendation chains (sequences of transitively associated edges) to items in other domains. It first builds domain-specific-user graphs (DSUGs) whose nodes, users, are linked by weighted edges that reflect the similarity of users. It then connects the DSUGs via the users who rated items in several domains or via the users who share social connections, to create a cross-domain-user graph (CDUG). It performs Random Walk with Restarts on the CDUG to extract user nodes that are related to the starting user node on the CDUG even though they are not present in the DSUG of the starting user node. Then, it incorporates items possessed by those users to recommendations of the starting node user. Furthermore, to extract many more user nodes, we employ a taxonomy-based similarity measure that states that users are similar if they share the same items and/or same classes. Thus we can set many suitable routes from the starting user node to other user nodes in the CDUG. An evaluation of user implicit ratings against items in two interrelated domains and social connection histories of users as extracted from a blog portal, indicates that our method identifies potentially interesting items in other domains with higher accuracy than is possible with existing CF methods. Chapter 6 applies our previously proposed method that extracts user interests from blogs, to the developper s knowledge extraction from the mail messages accumulated in mailing lists. The problem is that there is no taxonomy related to system development, thus we semi-automatically enrich the taxonomy from the mails accumulated in the mailing list for the system development. Using such expanded taxonomy, we can analyze user knowledge accurately. This provides the concrete application example that drives forward the taxonomy based knowledge management. Chapter 7 summarizes the main contribution in the thesis and concludes 6

25 the thesis summarizing the result obtained through this research. We also address the prospect of the future research. 7

26

27 Chapter 2 Background Our method extends CF and uses RWR to identify items with high novelty. Thus, we explain those in this chapter. 2.1 Collaborative filtering CF methods can be classified into two approaches: memory-based CF and model-based CF. Memory-based CF is based on the assumption that each user belongs to a larger group of similarly behaving users. Indeed this method is referred to as user-oriented memory-based CF; an analogous method which builds item similarity groups using co-purchase history is known as item-oriented[89]. On the other hand, model-based CF generates the predictions by using a model that is optimized by training data. Clustering[75, 94], Bayesian network models are examples of the modelbased approach[76, 95]. In computing similarity of users, basic CF methods often use the Pearson correlation approach[88] or the cosine-based approach[68]. If we define M as number of items rated by user a,, who is to receive the recommendation, and u, r a,ii is the rating value of user a for item I i, and r a is the average value of item ratings given by a, the Pearson correlation coefficient measures the similarity S(a, u) between a and u according to equation (5.1). 9

28 S(a,u)= M i (r a,ii r a )(r u,ii r u ) (2.1) M i (r a,ii r a ) 2 M i (r u,ii r u ) 2 When we use the cosine-based approach, we compute the similarity S(a,u) between a and u according to equation (2.2). S(a,u)= M i r a,ii r u,ii M i r 2 a,ii M i r 2 u,ii (2.2) The advantage of the Pearson correlation approach is that it takes into account that different users might have different rating schemes. If we assume N is the set of users that are most similar to the active user a, the predicted rating of a on item I i, p a,ii is obtained by the following equation (5.2). p a,ii = r a + N u (r u,ii r u )S(a,u) N (2.3) u S(a,u) The below, we review related works of collaborative filtering, which can be categorized into three parts: accurate item recommendation, novel item recommendation, and cross-domain recommendation Accurate Item Recommendation In most CF studies, the researchers focus on improving the accuracy of the prediction results. Here, we explain some of those works such as those using the matrix factorization technique, taxonomy-based technique, and graph mining technique. Yehuda et al. proposed method that uses matrix factorization that characterizes both items and users by vectors of factors inferred from item rating patterns[77, 78]. High correspondence between item and user factors leads to a recommendation. These methods have become popular in recent years by combining good scalability with predictive accuracy. However, their method does not aim to suggest novel items for the user. Furthermore, 10

29 the matrix factorization technique usually analyzes latent factors inferred from item rating patterns, thus it is not easy for a user to understand why identified items are recommended to the user. We consider that presenting semantic reasons for a user is important especially when recommending novel items. Thus, we use the taxonomy of items to explain why the presented items are novel for the active user. Some researchers use a taxonomy of items to raise the accuracy of prediction results[96]. Their method was shown to be useful when the transaction data of users was sparse. However, in measuring user similarity, their method focuses only on classes that include items rated by both users and their super classes. As a result, this method naively assumes that users who share many items are highly similar with the user; those users may have many good as well as many not so good items for the user. Our method is different from previous taxonomy-based method because it focuses on the width of user interests according to the taxonomy. Width of user interests is computed by checking how many sub-classes the user is interested in each class in the taxonomy. This is based on our observation that interests of users is always not categorized as the same type even though users share many sub-classes. For example, readers can naturally guess that users who love only rock genre is somewhat different types of users from users who love both rock and classic genres. By carefully analyzing such nuance in measuring similarity of users, we can accurately identify many items for the active user by analyzing the interests of users who share same items and/or same classes with the active user. The authors in [90, 91] assigns a-priori score to the classes in the taxonomy of items, and compute the relationships between scores assigned to different classes. Then, they propagate those scores for a specific user to predict each user preference. Their method was also shown to be useful when the transaction data of users was sparse. Recently, they learn item taxonomies autonomously by using clustering algorithms, and improve the prediction accuracy[92]. Their method is not in the scope of CF methods (they call their approach ontology filtering) because they did not compute similarities of users. We consider that the taxonomy can be used to explain 11

30 what types of items are recommended to the active user and why the users are computed as similar even if users do not share any items. Thus, we measure similarities of users using taxonomy of items and recommend an item to the active user with score of novelty. Some researchers have started to use random walks or RWR on a graph to compute recommendations[95, 70, 76, 69, 87]. Yildirim and Krishnamoorthy perform random walks on the item graph whose nodes on the graph are items and whose weighted edges are set between item nodes according to item similarity; they confirmed that their approach overcame the sparsity problem, which decreases accuracy of prediction results when the transaction dataset is sparse. Some researchers also use the graph analysis technique to study recommendations[72, 83, 67, 73]. For example, to solve the sparsity problem using the graph-based method, Huang proposed the method that trails the transitively associated edges on the graph whose nodes are items and users[72]. However, to the best of our knowledge, no study has identified items with higher novelty using random walks on the user graph or using other graph-based methods Novel Item Recommendation The notion novel is defined in different ways in several related papers. To the best of our knowledge, novel items are often defined in several studies as the items that are not known to the user but interesting for the user[71]. However, the definition above is very abstract, and thus difficult to evaluate item novelty in detail. Onuma et al. proposed the method that identifies novel items as items (they call surprising ) that are accessed by users similar with the active user but also accessed by users not so similar with the active user[87]. Their idea is to envision the problem of identifying novel items as node selection on a graph, giving high scores to nodes that are well connected to the older choices, and at the same time well connected to unrelated choices. Their evaluation example shows that their method generates more diverse recommendation results such that recommending surprising items in comedy, 12

31 horror, and SF movie items to the user who only likes comedy movies. We consider that the score of novelty in our paper is more natural definition because it lets the active user understand why the recommended items are novel by using the taxonomy of items. The authors in [90, 91] also define novel items as those which are identified by a certain method but are not identified by other methods. Their evaluation shows that their proposed method can identify items that can not be identified by the method that only ranks items by their popularity. We consider such a metric is not for evaluating novelty, but for just the metric of popularity. As we explain above, we consider definitions of previous works are not very useful for the active user to understand why the recommended items are determined as novel and to understand how novel they are. Different from above explained previous works, Nakatsuji et al. proposed a taxonomy-based algorithm to find novel items that are defined as items that are included in classes that the active user has not accessed before, (in their paper, they call those as innovative items.). Their online evaluation shows that clicks of users tend to concentrate on novel items[85]. Unfortunately, they focused on the application of novel item recommendation and did not investigate how accurate and how novel the items predicted by their method are. They did not compare the accuracy and novelty of identified items with those predicted by other CF methods. We improve and confirm the prediction accuracy by measuring similarity of users considering the width of user interests according to the taxonomy. Furthermore, our method identifies items with higher novelty for the active user by applying the graph-based approach. Herlocker and his co-workers also described that novel items and serendipitous items are different though both are not known to the user but interesting for the user[71]. The difference is that the former is more easily found by the user than the latter. Our method does not classify novel items and serendipitous items. It makes users aware of how far the recommended items are from their present interests through our proposed measure, the score of novelty. However, as the reader can naturally imagine, items with 13

32 high novelty for the user can not be easily discovered by the user. For example, the user who, up to now, has demonstrated an interest only in music items in Classic, is unable to easily discover interesting items in Jazz by himself. Our evaluation, described later, also shows that existing CF methods have difficulty in accurately identifying items with high novelty for the user. Indeed, our evaluation did not explicitly treat serendipitous items because the evaluation data set were taken from user access histories. However, the previous online evaluation by Nakatsuji et al.[85] presented novel (or serendipitous) items to users that were not included in the users access histories, and confirmed that the actual users were excited in those items Cross-domain Recommendation Related to the studies of novel item identification, recently, there are few works against cross-domain recommendations[79, 80, 84], which predict items that are located in the domains that the active user explicitly did not showed interests before. Bin and his co-workers analyze users who take similar rating behaviors against items across several item domains[79, 80]. Their method shares the knowledge that is learned by using the rating datasets from multiple item domains even when the users and items of these datasets do not overlap. Nakatsuji et al. also proposed cross-domain recommendation based on the observation that users who share similar items or who share social connections, can provide recommendation chains (sequences of transitively associated edges) to items in other domains[84]. We consider, however, novel item identification within a domain is still important because users who access items in a domain has already expressed the interests in that domain. Thus, it is natural to present novel items within the domain to expand his interests, by analyzing user interests in detail based on the domain-specific taxonomy. 14

33 2.2 Random Walk with Restarts A graph is a natural representation of data that have some inherent relational structure. In a graph, objects and their relationships can be represented as nodes and weighted edges respectively, where weights denote the strength of a relationship. Measuring the relatedness of two nodes in the graph can be achieved by using RWR theory[60]. Starting from node a, arwris performed by following a randomly selected link to another node at each step. Additionally, at every step there is a probability, α, that the walk denotes the probability that the random walk at step t proceeds from node u. q is a column vector whose elements are set to zero; only the element corresponding to a is set to one, i.e. q(a)=1. Also let A be the column-normalized adjacency matrix of the graph. In other words, A is the transition probability table where its element A(u,v) gives the probability of v being the next node given that the current node is u. The stationary probabilities for each node can be obtained by recursively applying equation (5.3) until convergence, and they give us the long-term visit rate of each node with a bias towards a particular starting node. restarts at a. Let p (t) be a column vector where p (t) u p(t + 1)=(1 α)ap (t) + α q (2.4) Therefore, p (l) a, where l is the status after convergence, can be considered as a measure of relatedness between nodes a and u. 15

34

35 Chapter 3 Identifying novel topics based on user interests 3.1 Introduction In this section, we first describe the background and purpose of our study. We explain our approach to identify novel topics, and then describe the impact of applications based on our method Background and our purpose Blogs are becoming more popular for publishing and discussing shared interests among users. Information sharing systems for blogs could enable users to expand their interests by browsing the collections of blog entries published by other users. However, to retrieve information from blog entries, current blog services simply employ keyword searches of blogs using Google or simple metadata attached to blog-entries, i.e. RSS metadata such as titles, creators, dates and so on. Unfortunately, neither approach offers detailed semantics about the description content in blog entries. Moreover, there is no function to generate personalized searches easily, users are restricted by their own knowledge or imagination when entering search keywords. Such keyword searches are time consuming and troublesome. 17

36 For example, users cannot perform a keyword search if they do not understand what they want to search for to some degree beforehand. Thus, when keywords cannot be specified, information retrieval from blog entries often cannot be performed even if the database contains topics that the user might become interested in. To counteract the above problems, the study on Adaptive Information Filtering (AIF)[35] cooperates with the user in constructing a user profile; recommendations are offered based on the profile. Making a user profile interactively beforehand is good for offering recommendations to users, as indicated by the high-accuracy of AIF. Unfortunately, a common complaint about AIF is the user s need to make his/her own profiles, and often known information is encountered many times. This is because recommendation systems with conventional AIF only check the possibility of the user being interested the document and fail to identify if the information has already been presented to the user or not. For filtering these redundant documents, novelty-detection researchers[49] define a novel document as a document that includes new information that is relevant according to the user profile. They extract relevant documents from a document stream and then classify the documents as novel or not; novel documents are provided to the user. Novelty detection can, however, provide documents that offer new information about concepts that have already present in the user profile. In our study[42, 85], we define an novel topic as a topic that includes new concepts that are likely be interesting to the user even though those concepts are not present in the user profile. The goal is to expand the user s interests significantly by identifying novel topics and recommending those to the user. In particular, we first focus on the novel topics identification in blogs because blogs have become a popular method of publishing and searching for information that can appeal to the users Approaches For achieving the above-mentioned goal, we use the following approaches. 18

37 We start with a proposal to build user interests according to a taxonomy of items. We consider that users who like items, may like the classes that include those items. Our method thus reflects the rating of the user on an item to that of a class that includes that item. In the taxonomy, items are object that the user is interested in, such as music artists, music songs, movie titles and so on. On the other hand, the classes are defined using a taxonomy of items in a service domain. For example, we can set classes as genres, which are defined by item sets. By classifying blog entries into each class and item in the taxonomy, we could automatically generate user interests according to the taxonomy. In classifying user entries according to the taxonomy of items, we remove classification mistakes automatically by using the taxonomy of items and continuity of descriptions about user interests as explained in our previous paper[85, 42]. Of course, we can also build user interests according to the taxonomy by using buying histories and listening histories of users. Next, we measure the similarity of users by considering the degree of interest agreement between each class and item. Most previous techniques of measuring similarity of users use Pearson correlation coefficient or cosine-based similarity against items rated by both users as we will explain in Section 5.3. In this paper, we build user interests according to the taxonomy of items and measures similarity of users by using not only co-rating behaviors against items but also those against classes in the taxonomy. By considering the degree of interest agreement between each class and item, we can measure the similarity of users considering the width and depth of a user s interests through the taxonomy of items. As a result, we can identify many items accurately for the user by analyzing the items of users who share the same items and/or same classes with the user. We also establish a new evaluation method that determines a suitable size of user group G U, whose users are similar with the active user a, who receives recommendations, by 19

38 observing the difference between the interests of user a and interests of users among G U while changing the size of G U. Finally, novel topics for the active user a are identified by analyzing the classes, C, that are interested by users in user group G U even though a did not explicitly show interests to C. We introduce a measure, the score of novelty, to understand how novel the recommended items are for the user, and try to identify items of high novelty for the user, while also guaranteeing highly accurate recommendation results[42, 85]. Accuracy is also important because users trust accurate recommendation results and tend to use such services[82]. We define the score of novelty as the smallest number of hops from the class user accessed before to the class that includes a possible item over the taxonomy. By accurately identifying items that are highly novel to the user, and recommending those to him, he may accept those items and widen his interests. We show two evaluation steps based on users implicate ratings against music items extracted from a large number of blog entries as collected by the blog portal Doblog. The taxonomy of music artists is provided by ListenJapan. The first step is an offline experiment that evaluates the accuracy in predicting users hidden interests using our implicate rating dataset and investigates the distribution of user interests extracted from blogs according to the score of novelty. The results show that our method can identify items with higher accuracy than the previous methods including a previous taxonomy-based method[50]. They also show that our method can identify items with higher novelty than the recommendations manually created by the designers in the service provider. was one of the biggest blog portals in Japan. Unfortunately, Doblog terminated services on May

39 The second step is an online experiment for analyzing user reactions to topics recommended based on our assessments of an online experimental service. Most prior works used only offline synthetic data to evaluate their recommendation techniques. However, analyzing the reactions of actual users to recommendations is very important for confirming whether the recommended novel topics are actually effective. By analyzing the frequency of user access to novel items output by our recommendation scheme over time, we confirmed the effectiveness of our novel topic recommendation. We found that the novel topics recommended by our technique were used for creating new communication links between users; this was confirmed by evaluating the frequency of comments between users who came to know each other through our online recommendations Impact of applications based on our method Most recommendation schemes fail to consider the semantic relationships between a user and items that are recommended to the user. Thus, the user can t easily understand why particular items were recommended. The semantic relationships described above are necessary if the user is to accept recommended items, especially when the user has not thought of those items before. Our method can attach the score of novelty, which indicates how novel the recommended items are to the user. It indicates the relationships between the present interests of a user and the items that are recommended to the user, by using a taxonomy of items defined by service designers. That is, our method can recommend to user a content items that belong to the concept that user a does not know of, together with their score of novelty. Here, we consider concept as a class defined in the taxonomy created in each service domain. By presenting items with supporting information such as novelty of those items, the user can more readily become interested in We provided an experimental service DoblogMusic at for Doblog users from August to December

40 items not stored in his/her profile and so acquire new interests. Some examples might help understanding. Consider user A who has items I 1 and I 2 under the class Rock in her interests, and we extract users X whose interests are similar to those of A according to the results of similarity measurements between user A and other users. If there are many users in X who are interested in item I 3 under the class Classic, we can recommend item I 3 of class Classic to user A together with information indicating its score of novelty, because Classic and Rock are not similar semantically given the definition in taxonomy of items in music domain. Thus, we can recommend items to user A with the phrase you may not have heard about item I 3 in Classic genre, but users whose interests are similar to yours, are interested in item I 3. By presenting some unknown items to the user together with the score of novelty, or using phrases like the one described above, user A may develop an interest in I 3 even though its class may not be known to user A, i.e. not stored in a profile of A. However, user A has a chance to expand his/her interests significantly, if he/she accesses novel item I 3. The paper is organized as follows. Section 3.2 introduces related works and Section 5.3 explains the technical background of the paper. Section 3.4 describes our model of user interests according to the taxonomy of items. Section describes our similarity measurement of users using the taxonomy of items and Section explains identifying novel topics based on similarity measurement results. Sections 6.4 and 3.7 describe our offline and online experimental studies, respectively. Section 3.8 concludes this paper. 3.2 Related works In [21], the authors classify web pages and place them in a topic directory by using pages in the directory and hyperlink relationships among pages. On the other hand, we extract interest ontologies and use them for innovative blog entry detection. Therefore, we do not need a huge volume of web 22

41 pages and hyperlink relationships. We classify blog entries by only using a service-domain ontology, and remove classification mistakes by using class characteristics and continuity of descriptions about user interests. In [22, 20], the authors try a major technique that extracts blog community web pages by adapting a current extraction technique that is similar to the technique in[19] and the PageRank algorithm[24]. The problems in applying the technique in [22, 20] to creating and activating a blog community are that the technique cannot provide innovative information to users because pages are only extracted if they already have link relationships. Many online content providers such as Amazon, offer recommendations based on collaborative filtering[45, 33, 88], which is a broad term for the process of recommending items to users based on the intuition that users within a particular group tend to behave similarly under similar circumstances. One advantage of collaborative filtering techniques is that they can recommend relevant items that are different from those in a user s profile. However, the existing collaborative filtering techniques don t consider the semantic relationships between user A and content items that are recommended to A by using the taxonomies attached to content items. As a result, the user cannot understand semantic reasons why those items are recommended and how innovative the recommended items are, and so is less likely to access the recommendation. For applying a semantic approach to retrieving information from a blog, semblog[23] tries to construct a user profile using a personal ontology, which is a manual construction of a users classification of blog entries in a category directory of the ontology according to their interests. A category directory is built by users beforehand to construct an ontology-mappingbased search framework. However, manual ontology creation is a timeconsuming and troublesome task for users, and applying a semantic ontology to a blog community is difficult. We automatically extract a userinterest ontology; thus, creating and updating ontologies is easy for users. In research studies of ontology mapping[44, 34, 41], similarity mea- 23

42 surements considering approximation of classes and class topologies are proposed in [41]. In addition to class topology, we consider each user s weighted interest in each class and instance. Furthermore, in analyzing conjunctions in class topologies of ontologies with high similarity scores, we detect innovative instances, those that other users have in their ontologies but the user does not. 3.3 Collaborative Filtering Our method extends CF to identify novel topics. Thus, we explain CF in this section. CF methods can be classified into two approaches: memory-based CF and model-based CF. Memory-based CF is based on the assumption that each user belongs to a larger group of similarly behaving users. Indeed this method is referred to as user-oriented memory-based CF[36] ; an analogous method which builds item similarity groups using co-purchase history is known as item-oriented[45]. On the other hand, model-based CF generates the predictions by using a model that is optimized by training data. Clustering[75, 94], Bayesian network models[76, 95] are examples of the model-based approach. In computing similarity of users, basic CF methods often use the Pearson correlation approach[46, 88] or the cosine-based approach[33]. If we define M as number of items rated by user a and u, r a,ii is the rating value of user a for item I i, and r a is the average value of item ratings given by a, the Pearson correlation coefficient measures the similarity S(a, u) between a and u according to equation (5.1). S(a,u)= M i (r a,ii r a )(r u,ii r u ) (3.1) M i (r a,ii r a ) 2 M i (r u,ii r u ) 2 When we use the cosine-based approach, we set r a and r u as zero in equation (5.1). The advantage of the Pearson correlation approach is that it takes into account that different users might have different rating schemes. 24

43 Metadata Title Artist Select Label Genre Album Rock/Pop Property: rock/pop Domain: music (1) Designer chooses music domain for creating blog community. (2) Selecting metadata for extracting user interests. (3) For example, classifying artists (instances) by genre (class). Artist Adult Contemporary Property: rock/pop, adult contemporary Domain: music Light Rock Artist Artist Adult Alternative Class Property: rock/pop, adult contemporary, light rock Domain: music Property: rock/pop, adult contemporary, adult alternative Domain: music Instance Figure 3.1: Procedure for designing service-domain ontology. If we assume N is the set of users that are most similar to the active user a, the predicted rating of a on item I i, p a,ii is obtained by the following equation (5.2). p a,ii = r a + N u (r u,ii r u )S(a,u) N u S(a,u) (3.2) 3.4 Interest ontology extraction We first explain how to design the service-domain ontology of a service domain, examples are provided for the content delivery services of music and movies, and then describe an method that can automatically extract interest ontologies Designing service-domain ontology We describe the procedure so as to support the generation of interest ontologies. We use OWL (Web Ontology Language)[25] for describing a service domain ontology in detail. The problem is that most users find it very dif- 25

44 ficult to design detailed ontologies. Our solution is to permit the use of simple ontologies. These ontologies require only a hierarchical relationship among the classes (subclassof description) and a property description that specifies the enumeration of the instances (oneof description); they restrict the succession condition in the class hierarchy. Our method, described in Section 3.4.2, can automatically extract an interest ontology by classifying user blog entries into service domain ontologies without user intervention. As shown in Fig 3.1, first, the ontology designer chooses the target service domain for extracting user interests. The designer then chooses metadata that reflects user interests by analyzing the activity of an existing community such as a Bulletin Board System (BBS). In the music domain, the designer chooses metadata of genres or artists, considering that the community is founded on this metadata. Finally, the designer chooses the metadata that represents the restriction properties of a class hierarchy and classifies other metadata into classes. For example, the designer chooses genres as a property and classifies artists as instances of classes. Service designers need only construct a service-domain ontology with the intended domains and gradually increase the number of ontologies as the service is expanded. Designers also should adjust the granularity of end classes for reflecting user interests in detail. Fortunately, the designers of many content directories, such as All Media Guide (AMG) and listen Japan, have developed content taxonomies with fine granularity to support users when they browse and buy content according to interests. Therefore, we construct service-domain ontologies according to these directories Interest ontology generation algorithm We explain the interest ontology generation algorithm by analyzing the interest distribution of users, as shown in Fig

45 Entries of user A All blog entries Entries of user B ªªªªª Entries of user X (1) Creating index for all entries. (2) Classifying entries into service-domain ontology. (3) Analyzing user's interest distribution based on user ID of classified blog entry. Stone Temple Pilots Alternative Nirvana Class Instance Farm Happy Mondays Madchester New Order Stone Roses Verve Coldplay Shoegaze My Bloody Valentine 69 Stone Temple Pilots 420 Alternative Nirvana 92 Number of users Farm Happy Mondays New Order 89 Madchester Stone Roses Verve Coldplay Shoegaze 42 My Bloody Valentine Interest ontology of user A (4) Extracting interest ontology by arranging entries based on user ID. Interest ontology of user X Stone Temple Pilots Alternative ª Nirvana Alternative New Order (5) User modifies interest ontology. delete Farm Madchester New Order Madchester Shoegaze My Bloody Valentine Stone Roses Figure 3.2: Procedure for generating interest ontologies. Basic ontology generation algorithm First, we describe the merit of generating user interests according to a service domain ontology. We use the service domain ontology as defined by the experts in each service domain. By using the accurate and detailed knowledge included in the service domain ontology, we can extract the user-interest ontologies accurately. We note that many service providers assign various name attributes to their content items with the idea of assisting users in locating content items via keyword search. The current version of our method uses exact keyword matching to extract user interests as described in his/her blog-entries. The polysemy problem can be eased by applying maintenance knowledge of service domain ontologies. The basic ontology generation algorithm (BOGA) is described below. (1) BOGA makes index files for all blog entries (can be collected through the ping server). For example, our experiments in Section 6.4 and 27

46 Section 3.7 used all Doblog blog-entries stored over a roughly four year period. Here, we assume that each collected blog entry has a unique user ID. (2) BOGA classifies all collected blog entries into a service-domain ontology. BOGA classifies blog entry E i into instance I i ( classc i ) if there is a name attribute of I i in E i. BOGA permits each blog entry to be classified into two or more classes. For example, consider the service-domain ontology in Fig.3.2. BOGA classifies the blog entry into instance Happy Mondays of class Madchester when there is a Happy Mondays character string in the description in the blog entry. (3) BOGA measures the number of users interested in each instance of C e, which is one of the end classes in the service-domain ontology. In calculating the number of interested users, BOGA counts the number of users as one, even if the same user describes the same instance or class in two or more blog entries. BOGA calculates the number of users interested in class C e by obtaining the number of users interested in all instances in C e. Thus, the interested user distribution in the domain can be measured by recurrently counting the number of users from C e to the root class C r. (4) BOGA extracts only the classification results about one user ID from all classification results in order to develop an interest ontology for this user ID. In Fig. 3.2, BOGA can extract an interest ontology of user A when the blog entries of this user describe instances of Stone Temple Pilots, New Order, and Farm. (5) Finally, our method allows the user to inspect and delete instances that he/she considers are not his/her actual interests, from his/her interest ontology. Ontology-filtering algorithms For example, BOGA classifies blog entries that describe Farm, actual reference is to an agricultural farm, into the instance Farm of class Madchester. To filtering the mistakes caused by words with several meanings, we make use of the following characteristics such as taxonomy of instances 28

47 in ontologies and the durability of user interests as expressed in the user s blog. Instances that belong to the same class have the same characteristics. Adjacent classes have similar characteristics. classes also have similar characteristics. Instances of those User interests that continue for a certain period and describe an interest over two or more days. We propose two filtering algorithms: FA1 and FA2. First, we explain FA1. Filtering algorithm 1 We subdivide procedure (2) of BOGA to permit FA1 to be applied. (2-1) When the name attribute n(i i ) of instance I i ( C i ) is described in blog entry E i, FA1 checks whether a name attribute of an instance of the same class I k {(I k C i ) (I k I i )} is described in all blog entries that the user has accumulated. We call instances I k classification decision elements (CDEs). (2-2) Blog entry E i is classified as mentioning instance I i when there is a description of CDEs, and not classified as mentioning instance I i when there is no description. In Fig. 3.2, when the description of Farm exists in E i, and New Order is described among all accumulation blog entries of a user, E i is assumed to be a blog entry about instance Farm of Madchester and is classified accordingly. We can filter classification mistakes accurately by using the many CDEs created from the accurate and comprehensive knowledge contained in the service domain ontology maintained by expert domain designers. Filtering algorithm 2 In addition, we propose filtering algorithm 2 (FA2) that provides more restrictive classification than FA1. In procedure (2-1) of FA1, FA2 checks whether CDEs are described in blog entry E i. Blog entry E i is classified in I i if CDEs are described, and not otherwise. 29

48 Two hops Rock One hops Nirvana Alternative US Indie Athens R.E.M. Zero hops Charlatans Farm Madchester Stone Roses Coldplay Shoegaze Verve My Bloody Valentine Elf Power Elephant 6 New Order Ride Olivia Tremor Control Figure 3.3: Hops in filtering algorithms. Adjusting range of CDEs In addition, we introduce a mechanism that adjusts the range of CDEs by using the class hierarchy. We consider that descriptions of classes and instances of interest often appear together with instances of the same class and those of neighboring classes. We add a new adjustment parameter, hop limit, which defines the range of CDEs. In Fig. 3.3, we assume there are CDEs that include instances of brother classes and those of the grandfather class when two hops from end classes are permitted Introducing interest weight to ontology In addition, we introduce the interest weight as a parameter that indicates the degree of a user s interest in each class and instance of an interest ontology. By using this parameter, we can create a virtual-community of those users who have almost the same degree of interest in the same classes or instances. Here, we explain the idea of calculating interest weight using Fig In this paper, we extract the interest weight of a user for item I i by ana- 30

49 Total entries of user A User A, Entry 1 I like stone roses and my bloody valentine recently. I like Nirvana and new order much more. User A, Entry 2 As for shoegaze, I think Ride and My bloody valentine are best. 2 Alternative nirvana 1/4 Interest weight under class 1/2 5/4 1/4 New Order Madchester My Bloody Valentine Shoegaze Ride 1/4 Stone Roses 1/2 Interest weight for instance 3/4 Figure 3.4: Applying interest weight to ontology. lyzing the number of times I i is present in all of his/her blog entries. Note that the user s interest means more than just the simple number instances since I i might be simply part of a list. Our proposal is to apply the following ideas to extract interest weight from blog entries. (1) The interest weight of every blog entry is one. (2) If N(E i ) kinds of name attributes of interest instances appear in blog entry E i, the interest weight of each instance in E i becomes 1/N(E i ). (3) When we define the set of all accumulated blog entries of a user as E, the interest weight S(I i ) of each instance I i is S(I i )= E (I i E i ) (1/N(E i)), and the interest weight S(C i ) of each class C i is S(C i )= Ii C i S(I i ). We also consider that a user who is interested in I i is also interested in class C if I i lies under class C. In the same way, we consider that a user who is interested in C, is also interested in the super class of C. Thus, we give the following definition. A user who has an interest in instances in a deeper class hierarchy, tends to have upper class hierarchies that have larger interest weight values. (4) The interest weight of the instances is reflected in that of the class 31

50 that includes the instance. The interest weight of the classes is reflected in that of the super class. For example, in Fig. 3.4, we give the interest weight of instance Stone Roses as 1/4, that of instance My Bloody Valentine as 1/4 + 1/2 = 3/4, that of class Shoegaze as 3/4 + 1/2 = 5/4, and that of class Alternative as 1/2 + 5/4 + 1/4 = Detecting novelty by similarity measurements In this section, we propose to measure the similarity between ontologies through a consideration of interest weights. We detect innovative topics for user u by measuring the similarity between the user-interest ontology of u and those other users. Next, we determine a group of users, of appropriate size, whose interest ontologies are similar to that of u Interest-weight-based similarity measurement We now explain our similarity measurement in detail by using Fig We use Table 3.1 that gives definitions of terms used in this Section, and examples based on Fig We first define the terms interest ontology O A of user A and O B of user B, topology T 1, which is composed of a class and subclass relationship, and topology T 2, which is composed of a class and instance relationship. Furthermore, we define common classes C i as classes that both ontologies have, and common instances I i as instances that both ontologies have. For example, there are five common classes, a1, b1, b2, c3, and c4, in Fig In particular, we define a common class set that formalizes topology T 1 as C(T 1 ), and a common class set that formalizes topology T 2 as C(T 2 ). For example, in Fig. 3.5, C(T 1 ) has common classes a1 and b2, and C(T 2 ) has common classes b2, c3, and c4. We also give the degree of interest agreement of common instance I i as I(I i ), that of common class C i as I(C i ), 32

51 Interest ontology of user A: OA Interest ontology of user B: OB 3 5 b1 m k a1 0 1 c3 a b c b n 2 3 c4 2 g h c1 l b1 a b2 n c3 c4 a e p 3 c j 0 b3 d Class Instance Interest weight of instance Interest weight under the class. Topology T1 Topology T2 Figure 3.5: Measuring similarity based on degree of interest agreement. and that of common topology created by common class C i as I t (C i ). In [41], the authors calculate the similarity between ontologies considering the degree of similarity between class topologies T 1. In addition, we apply the following ideas to create user-interest-based virtual communities. Evaluating the degree of interest agreement between C i s and I i s from the interest weight with smaller value. This filters users who simply enumerate a lot of instances in an blog entry and creates a virtual community among users who have similar or larger interest weight values with respect to that of each user. Treating topologies T 1 and T 2 separately because we consider that T 1 reflects the width and depth of a user s interests while T 2 reflects the objects in which users are interested. Decreasing the computational complexity by generating the class schema of user-interest ontologies according to that of service-domain ontologies. Accessing a large number of blog entries, as is done in our experiments in Section 6.4, is important for useful ontology mapping. 33

52 Table 3.1: Definitions of terms and examples. Type of a graph Hierarchy of COIs Number of edges Average number of users among COIs Variance number of users among COIs N1= N1= N1= NF N1=60, N2= N1=90, N2= (1) We analyze classes common to O A and O B and extract common classes that belong to C(T 1 ) and C(T 2 ). (2) When common class C i has common instance I i between ontologies, we assign the smallest value of the interest weight of common instances I i to I(I i ). For example, I(a) is 2. (3) Similarly, we assign the smallest value of the interest weight of common class C i to I(C i ). For example, I(b1) is 3. (4) We define the product sets of subclasses of C i, which are common to a class set, as N(C i ), and the set union of subclasses of C i among C i C(T 1 ) as U(C i ). For example, if we insert common class a1 C(T 1 ) to N(C i ) and U(C i ), N(a1)={b1, b2} and U(a1)={b1, b2, b3}. We then we give I t (C i ) as C j N(C i ) I(C j ). For example, I U(C i ) t (a1) is given by ( )/3 = 7. Thus, we obtain the degree of interest agreement S(T 1 ) of C(T 1 ) as Ci C(T 1 ) I t (C i ). In Fig. 3.5, S(T 1 )=( )/3 +(9 + 3)/2. (5) We also define an instance set of C i in ontology O A as I A (C i ), and an instance set of C i in ontology O B as I B (C i ) among C i C(T 2 ). We then give 34

53 Ii C I t (C i ) as i I(I i ) I A (C i ) I B (C i ). For example, I t(c3) is given by (( )/4)= 5/4. Thus, we assign the degree of interest agreement S(T 2 ) of C(T 2 ) as Ci C(T 2 ) I t (C i ). In Fig. 3.5, S(T 2 )=2/1 + 5/ (6) By using evaluation function f (X), which corresponds to the relative degree of importance of a topology, we finally determine the similarity score between ontologies S O (AB) as S(T 1 )+ f (S(T 2 )). For example, if f (X) equals X, in Fig. 3.5, S O (AB)= /4. As explained in procedures (4) and (5), our algorithm determines that interest ontologies that are more similar follow topology T 1, which expresses the depth and width of user interest. Our similarity measurement between ontologies returns higher similarity values if there are more common classes C(T 1 ) that form topology T 1 in both ontologies (in other words, if there are more classes that appear in both ontologies.). Our method yields almost the same effect as calculating the similarities between T 1 when calculating similarities between topology T 2 in different ontologies. Thus, our method identifies two different ontologies as being similar if they have common instances in common classes (deeper level) or wider level of their hierarchies Innovative blog-entry detection We use our similarity measurement for innovative blog-entry detection and user-oriented community creation. (1) We calculate the similarity between the ontology of user A and the ontologies of other users in set U. By using the heuristic threshold X, we derive X users who have high similarity to user A as the interest-sharing virtual community G U. (2) We then analyze difference instances between the ontology of user A and the ontologies of G U. We also define a parameter called the score of novelty, which indicates how many hops we need to get from difference instances of an ontology of G U to the class of the ontology of user A. In Fig. 3.6, we need three hops to go from difference instance Elf Power of the ontology of user B to class Rock of the ontology of user A. By 35

54 Blog entries of user a (4) Creating community by browsing recommended entry Blog entries of user b (3) Recommending items via other users entries. User a can become interested in artist Elf Power through such entries. (1) Extracting user interests. (1) Extracting user interests. Item 1 Rock New Order Item 2 Madchester Alternatiive Shoegaze My Bloody Valentine Happy Mondays Coldplay Galaxie 500 (2) Measuring similarity. Item 1 Rock Item 2 Alternative US Indie Madchester Athens R.E.M. Coldplay Shoegaze Happy Mondays Stone Roses My Bloody Valentine Elf Power Elephant 6 Olivia Tremor Control Fig 3.6: Community creation service of recommending innovative blog entries. recommending blog entries with a high score of novelty, the interests of users may be significantly expanded. Lowering the level of novelty may produce more comfortable new concepts but these will prove to be less satisfying. (3) Finally, we extract innovative instances G I, which are unknown to user A, but that are well-known to users in G U ; the innovative blog entries about G I are recommended to user A together with the score of novelty. As we defined in Section 5.3, innovative topic are concepts that are new and interesting to the user, thus the value of score of novelty of an innovative instance is more than one. Here, determining the most suitable size of G U is very important for detecting attractive and innovative instances. If the size of G U is reduced, the difference between user-interest ontologies is smaller, and instances in G I may be close to the user-interest ontology of each user. However, there may be few novel instances in G I. On the other hand, if the size of G U is increased, the difference between user-interest ontologies is larger, and instances in G I may be too novel for the user. Thus, we observe the difference between the user-interest ontology of user u and those of G U while changing the size of G U. The most suitable size of G U is the point at which there 36

55 is a rapid increase in the number of G I. Details of this process are given in Section An example of community creation is depicted in Fig.3.6. User B is included in user group G U whose interest ontologies are measured as similar to the interest ontology of user A. If users in G U often have an interest in Elf Power, user A has the potential to be interested in Elf Power even though the class Elephant 6 that includes Elf Power is many hops from the class Rock that user A has a known interest in. Furthermore, by browsing blog entries concerning these novel instances, users may expand their interests and share interests with each other. 3.6 Offline Experimental results We now present the results of offline experiments and simulation studies that demonstrate the performance of interest ontology extraction and novel blog-entry detection Datasets and methodology The proposed methods were tested using the large-scale blog portal Doblog, which holds 1,600,000 blog entries from 55,000 users. We also used the service-domain ontology of the music domain, as shown in Fig. 3.2, which was created by referring to public information on listen Japan, a web portal storing music artist genre information. Our experimental servicedomain ontology contains 114 classes as genres, covering a wide range of genres in the music domain, Rock, Classic, Jazz, and Soul and the instances are 4,300 artists; it has, on average, four level class hierarchies; the deepest class hierarchy has five levels. Furthermore, each class and instance of the service-domain ontology has two or more name attributes. For example, the instance R.E.M. has the name attributes R.E.M. and REM. Overall, the 4,300 instances were given 7,600 name attributes. A genre hierarchy almost similar to our service-domain ontology is referred to in URL htm of listen Japan. 37

56 For evaluating accuracy, we defined correct answers as blog entries that have descriptions of classified classes or instances and evaluated the generated interest ontology by using the precision and recall of the classification results. In this paper, precision means the proportion of correct answers in the classification results and recall means that of correct answers in all blog entries. When recall is high, the extracted interest ontologies better cover user interests. However, when precision is low, created interest ontologies include classification mistakes, and the novel topics detected for the user are unreliable. Thus, achieving high precision is indispensable. In the evaluation, we used filtering algorithms to eliminate instances that consisted of just one word such as police, because we consider that such instances have a high probability of having several meanings. We used Namazu to generate index files of blog entries Measuring interest distributions of blog users Graphs of user distributions in the music domain examined are depicted in Fig. 3.7-(a). This figure shows the number of users in each class in the different level of class hierarchy in the ontology. Each class has about 200 users, even the end classes. By checking the blog entries classified in end classes, we confirmed that these blog entries frequently have unique words that describe the features of these classes. For example, blog entries classified into the end class Death Metal have the phrase death voice with high probability. This is because the end classes in our service-domain ontology have a granularity that is appropriate for extracting the uniqueness of the blog entries classified into these classes. End class granularity is important because it controls whether we can determine if a user is interested in end class instances or not. 38

57 «Number of users« Number of users in each class hierarchy 2nd hierarchy 3rd hierarchy 4th hierarchy Precision 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Rock / Pop Alternative/Punk Soul R&B Jazz Blues Classical (a) Interest distribution of blog users in experimental service domain ontology. FA2 FA1 BOGA (Genre) 0% Funk Glam Metal Folk Hard Adult Art & Progressive Genre/Artists Rock Rock Rock Rock Rock Contemporary (b) Comparing the precision of instances with one word among BOGA, FA1 and FA2. Fig 3.7: Experimental results of user distributions and ontology extraction Measuring performance of extracted interest ontology We evaluated the accuracy of FA2 by checking 1/4 (randomly selected) of the classified blog entries. As shown in Table 3.2-(a), the achieved precision is higher than 90% with high recall of 80%. Thus, our filtering algorithm is effective for generating suitable user-interest ontologies. The reason why we achieved high performance is that we used comprehensive knowledge contained in the service domain ontology as explained in Section We can filter classification mistakes accurately by using a lot of CDEs defined in Section in the service domain ontology. We also found that negative comments constituted only about 5 percent or so of all comments. It seems that if the user has a negative opinion related to a hobby, most often he/she does not express it. A positive opinion leads to a lot of comments. 39

58 Table 3.2: Experimental results of our ontology extraction and detection of novel topics. Score of novelty Percentages of instances % 15.2% 23.2% 4.0% Score of novelty Percentages of instances 23.4% 23.1% 44.3% 9.2% From those results, we assume that the user s comment is positive if the blog entry repeatedly refers to something about their hobbies. Interest weight reflects information about how many times the user described a certain instance in a certain class as explained in Section We also note that there are a few events/people, such as Michel Jackson, that are described in blog entries as a form of gossip. Extracting and manually checking these instances may improve system accuracy if they are frequent in blog entries over a certain period of time Comparing filtering algorithms We compared BOGA and filtering algorithms by randomly checking 1/4 of the blog entries that were classified into instances with one word. Graphs of the precisions achieved by BOGA, FA1, and FA2 for the 83 instances that were randomly selected from among the 827 instances with one word, are shown in Fig. 3.7-(b). The accuracies of BOGA and filtering algorithms are shown in Table 3.2-(b). These results indicate that precision improves in the order of BOGA, FA1, and FA2, while recall decreases significantly with FA2, even though FA1 decreases only slightly compared to BOGA. To improve recall while holding the high precision in FA2, we will add a method that checks for CDEs in the blog entries with these elements having a high probability of appearing such as trackbacks of entries or entries near each other in a time series. Analyzing Fig. 3.7-(b) in more detail, there are eight instances in which the precision cannot be improved even with FA2, and those instances lower 40

59 the overall precision. We thus extracted the instances in which the classification number increased by ten times or more when FA2 was replaced by FA1. This yielded 28 instances, and 5 of those instances had precision of 0. The reason for this is that they do not co-occur in the same blog entry with CDEs, even though the user was interested in them and described the name attribute of these instances often. Thus, precision can be effectively improved by deleting these instances from the service-domain ontology. We also evaluated the accuracy of FA2 while changing the hop limit number. Two hops were better than zero hops with respect to the number of correct answers and precision, as shown in Table 3.2-(c). However, four hops yielded worse precision than two hops, although the number of correct answers was slightly better. This is because our service domain ontology has a large number of instances in end classes, and the relationship between end classes and super classes is closer than the relationship between super classes and grandfather classes. For example, end class Acid Metal has the super class Metal and grandfather class Rock. In this case, the relationship between Acid Metal and Metal is closer than the relationship between Metal and Rock. Thus, two hops offer better precision than zero hops because two hops include many CDEs. Four hops have lower precision than two hops because the resulting instances are far from the end classes. Furthermore, we analyzed the cases in which the number of classified blog entries changed by at least a factor of four hops were used instead of two hops. Such cases represent classification mistakes. For example, CDEs of the instance Europe in class Northern Metal with four hops, included instances in class Adult Contemporary under the class Rock. In this case, blog entries with the description Europe tour were also classified into Europe in Northern Metal. Therefore, the number of correct answers with high precision can be effectively increased by deleting these mistakenly classified blog entries from the classified instances by changing the hop limit number. 41

60 Number of users who are interested in artists of each group Number of users who are interested in artists of each group. 100 famous group 200 famous group 50 moderately famous group 100 moderately famous group group with small number of fans group with small number of fans X: Number of users in the group Gu. X: Number of users in the group Gu. (a) Number of users by changing X. (b) Number of users who rate many items by changing X Fig 3.8: (a) number of users obtained by changing X. (b) number of users that have high interest weight after changing X Analyzing size of user-oriented community We determined the suitable size of G U, as described in Section 3.5.2, by observing the difference between the user-interest ontology of each user u and those of G U while changing the size of G U. First, we selected user A from among all users extracted by our servicedomain ontology and analyzed a suitable size of G U by changing parameter X, which represents the number of users who have high similarity to user A in interest-sharing community G U, see Section In this evaluation, we divided novel instances G I into 3 instance groups in order of the appearance rate of instances when we set X to 70: a very popular instance group, a moderately popular instance group, and instance group with a small number of fans. We then calculated the number of users who were interested in the artists of each instance group while changing X from 10 to 70 in steps of 1. Graphs of the number of users who were interested in each instance group obtained while changing X are shown in Fig. 3.8-(a). Next, we focused on users who had high interest weights in their interest ontologies. Graphs of the number of such users obtained while changing X are shown in Fig. 3.8-(b). The very popular instance group was recommended to users regardless of the value of X, see Fig. 3.8-(a). The instance group with a small number of fans, on the other hand, was recommended most often when X was ten (Fig. 3.8-(b), ); the moderately popular instance group 42

61 was recommended more often as X was increased. This is because users with high interest weights tend to discuss instances in the instance group with a small number of fans, rather than discussing instances in the famous instance group. Furthermore, the number of users in each instance group increased suddenly when X is greater than 60. This is because the difference between a user s ontology and those of G U is larger when X is greater than 60, and instances with low probability of being interesting come to be recommended more often. From this result, novel topics are effectively detected with respect to detailed user interests when X is smaller than 60 given the datasets used in our experiment. This result also suggests that the suitable size of G U is given by X = 60 because the number of instances of each group radically increased when X exceeded that point Measuring performance of detecting novel topics We next evaluated novel blog-entry detection. In the evaluation, we compared the proportion of novel instances in the manually defined recommendation lists created by you might like these artists in a music portal listen Japan to the proportion of novel instances in the recommendation lists created by our methods. Designers of music portal listen Japan have manually defined artists (A n ) that are considered to relevant to artist (A i ). We checked the 75 users, out of the total of 1503 users, who were judged to be interested in the music domain of our service-domain ontology. First, we identified X users who had high similarity to user A as described in Section We took from the recommendation lists created by our method the top 150 instances that appeared frequently in the interests of those X users. The manually defined recommendation lists were generated by passing the user interests, extracted by our algorithm, to the portal s recommendation system. The manual recommendation lists included, on average, 23 instances. Table 3.2-(d) and (e) show the percentages of recommended instances and their score of novelty for the manually generated recommendation lists According to Section 3.6.5, we set X to

62 User s Blog Site R ecommendation page of DoblogMusic DoblogMusic (2) Recommendations (1) G enre: Alternative rock, E mo, Lo-fi Artists: Jimmy eat World, Get up kids Small (0) Score of novelty Large (3) Automatically tagging Artists or Genres to each entries by clas s ifying blog entries to items/classes in the taxonomy. Length of bar charts means strength of prediction values. Color types of bar chart mean values of score of novelty. For example, novel artists with red color types of bar charts have large value of score of novelty. User 1 User 2 User 3 User 4 Genre: E mo, Lo-fi, British P op Artists: Jimmy eat World, Charlatans Fig 3.9: Snapshot of online experimental service DoblogMusic. (a) Number of users accessing DoblogMusic. (b) Number of accesses of DoblogMusic. Fig 3.10: User accesses to DoblogMusic. and our lists, respectively. These results indicate that our technique recommends more instances with a higher score of novelty than the manually created recommendation lists. Another conclusion that can be drawn is that users actually have a much wider range of interests than predicted by the music portal experts. 3.7 Online Experimental results To evaluate the effectiveness of novel topic detection, we offered an experimental service DoblogMusic to Doblog users. We used a larger service- 44

Predicting user rating for Yelp businesses leveraging user similarity

Predicting user rating for Yelp businesses leveraging user similarity Predicting user rating for Yelp businesses leveraging user similarity Kritika Singh kritika@eng.ucsd.edu Abstract Users visit a Yelp business, such as a restaurant, based on its overall rating and often

More information

Trust-Networks in Recommender Systems

Trust-Networks in Recommender Systems San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research 2008 Trust-Networks in Recommender Systems Kristen Mori San Jose State University Follow this and additional

More information

Prediction of Google Local Users Restaurant ratings

Prediction of Google Local Users Restaurant ratings CSE 190 Assignment 2 Report Professor Julian McAuley Page 1 Nov 30, 2015 Prediction of Google Local Users Restaurant ratings Shunxin Lu Muyu Ma Ziran Zhang Xin Chen Abstract Since mobile devices and the

More information

Generative Models for Networks and Applications to E-Commerce

Generative Models for Networks and Applications to E-Commerce Generative Models for Networks and Applications to E-Commerce Patrick J. Wolfe (with David C. Parkes and R. Kang-Xing Jin) Division of Engineering and Applied Sciences Department of Statistics Harvard

More information

Analysis of User s Relation and Reading Activity in Weblogs

Analysis of User s Relation and Reading Activity in Weblogs Analysis of User s Relation and Reading Activity in Weblogs Tadanobu Furukawa 1, Tomofumi Matsuzawa 2, Yutaka Matsuo 3, Koki Uchiyama 4 and Masayuki Takeda 2 1 Graduate School of Science and Technology,

More information

Social Recommendation: A Review

Social Recommendation: A Review Noname manuscript No. (will be inserted by the editor) Social Recommendation: A Review Jiliang Tang Xia Hu Huan Liu Received: date / Accepted: date Abstract Recommender systems play an important role in

More information

Influence of First Steps in a Community on Ego-Network: Growth, Diversity, and Engagement

Influence of First Steps in a Community on Ego-Network: Growth, Diversity, and Engagement Influence of First Steps in a Community on Ego-Network: Growth, Diversity, and Engagement Atef Chaudhury University of Waterloo Waterloo, ON N2L 3G1, Canada a29chaud@uwaterloo.ca Myunghwan Kim LinkedIn

More information

Predicting Yelp Ratings From Business and User Characteristics

Predicting Yelp Ratings From Business and User Characteristics Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online

More information

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Yixuan Li 1, Kun He 2, David Bindel 1 and John E. Hopcroft 1 1 Cornell University, USA 2 Huazhong University of Science

More information

Knowledge-Guided Analysis with KnowEnG Lab

Knowledge-Guided Analysis with KnowEnG Lab Han Sinha Song Weinshilboum Knowledge-Guided Analysis with KnowEnG Lab KnowEnG Center Powerpoint by Charles Blatti Knowledge-Guided Analysis KnowEnG Center 2017 1 Exercise In this exercise we will be doing

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION v171025 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

Time Series Motif Discovery

Time Series Motif Discovery Time Series Motif Discovery Bachelor s Thesis Exposé eingereicht von: Jonas Spenger Gutachter: Dr. rer. nat. Patrick Schäfer Gutachter: Prof. Dr. Ulf Leser eingereicht am: 10.09.2017 Contents 1 Introduction

More information

Measurement of Bloggers Buzzword Prediction Ability Based on Analyzing Frequency of Early Mentions of Past Buzzwords

Measurement of Bloggers Buzzword Prediction Ability Based on Analyzing Frequency of Early Mentions of Past Buzzwords , March 12-14, 2014, Hong Kong Measurement of Bloggers Buzzword Prediction Ability Based on Analyzing Frequency of Early Mentions of Past Buzzwords Seiya Tomonaga, Shinsuke Nakajima, Yoichi Inagaki, Reyn

More information

Netflix Optimization: A Confluence of Metrics, Algorithms, and Experimentation. CIKM 2013, UEO Workshop Caitlin Smallwood

Netflix Optimization: A Confluence of Metrics, Algorithms, and Experimentation. CIKM 2013, UEO Workshop Caitlin Smallwood Netflix Optimization: A Confluence of Metrics, Algorithms, and Experimentation CIKM 2013, UEO Workshop Caitlin Smallwood 1 Allegheny Monongahela Ohio River 2 TV & Movie Enjoyment Made Easy Stream any video

More information

Seminar in E-Business & Recommender Systems University of Fribourg, Department of Informatics

Seminar in E-Business & Recommender Systems University of Fribourg, Department of Informatics Seminar in E-Business & Recommender Systems University of Fribourg, Department of Informatics Research Paper The impact of Recommender Systems on Business and Customers in Electronic Markets. STUDENT NAMES:

More information

The Efficient Allocation of Individuals to Positions

The Efficient Allocation of Individuals to Positions The Efficient Allocation of Individuals to Positions by Aanund Hylland and Richard Zeckhauser Presented by Debreu Team: Justina Adamanti, Liz Malm, Yuqing Hu, Krish Ray Hylland and Zeckhauser consider

More information

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors

You are Who You Know and How You Behave: Attribute Inference Attacks via Users Social Friends and Behaviors You are Who You Know and How You Behave: Attribute Inference via Users Social Friends and Behaviors Neil Zhenqiang Gong ECE Department, Iowa State University neilgong@iastate.edu Bin Liu MSIS Department,

More information

Is Machine Learning the future of the Business Intelligence?

Is Machine Learning the future of the Business Intelligence? Is Machine Learning the future of the Business Intelligence Fernando IAFRATE : Sr Manager of the BI domain Fernando.iafrate@disney.com Tel : 33 (0)1 64 74 59 81 Mobile : 33 (0)6 81 97 14 26 What is Business

More information

the festival. Each performance takes 30 minutes allowing 12 performances on a stage. The stages and choirs have certain properties that have their inf

the festival. Each performance takes 30 minutes allowing 12 performances on a stage. The stages and choirs have certain properties that have their inf Planning the Amusing Hengelo Festival Gerhard Post Martin Schoenmaker 1 Introduction The Amusing Hengelo festival [1] is an annual musical event in which around 4000 singers participate. These singers

More information

Using Decision Tree to predict repeat customers

Using Decision Tree to predict repeat customers Using Decision Tree to predict repeat customers Jia En Nicholette Li Jing Rong Lim Abstract We focus on using feature engineering and decision trees to perform classification and feature selection on the

More information

Text Mining. Theory and Applications Anurag Nagar

Text Mining. Theory and Applications Anurag Nagar Text Mining Theory and Applications Anurag Nagar Topics Introduction What is Text Mining Features of Text Document Representation Vector Space Model Document Similarities Document Classification and Clustering

More information

Insights from the Wikipedia Contest

Insights from the Wikipedia Contest Insights from the Wikipedia Contest Kalpit V Desai, Roopesh Ranjan Abstract The Wikimedia Foundation has recently observed that newly joining editors on Wikipedia are increasingly failing to integrate

More information

Yelp Recommendation System Using Advanced Collaborative Filtering

Yelp Recommendation System Using Advanced Collaborative Filtering Yelp Recommendation System Using Advanced Collaborative Filtering 1. INTRODUCTION Chee Hoon Ha Stanford Univerisy cheehoon@stanford.edu Thanks to the advancement in technology, we live in a world where

More information

Measuring Surprise in Recommender Systems

Measuring Surprise in Recommender Systems Measuring Surprise in Recommender Systems Marius Kaminskas Insight Centre for Data Analytics University College Cork, Ireland marius.kaminskas@insight-centre.org Derek Bridge Insight Centre for Data Analytics

More information

Functional Guide for the Promotion Calculation Engine

Functional Guide for the Promotion Calculation Engine Promotion Calculation Engine Document Version: 1.0 2017-12-01 SAP Customer Activity Repository 3.0 FP2 Typographic Conventions Type Style Example Description Words or characters quoted from the screen.

More information

Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework

Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework Prediction of Personalized Rating by Combining Bandwagon Effect and Social Group Opinion: using Hadoop-Spark Framework Lu Sun 1, Kiejin Park 2 and Limei Peng 1 1 Department of Industrial Engineering, Ajou

More information

Personalized Recommendation for Online Social Networks Information: Personal Preferences and Location Based Community Trends

Personalized Recommendation for Online Social Networks Information: Personal Preferences and Location Based Community Trends Personalized Recommendation for Online Social Networks Information: Personal Preferences and Location Based Community Trends Shaymaa Khater Dissertation submitted to the Faculty of the Virginia Polytechnic

More information

A Decision Support System for Market Segmentation - A Neural Networks Approach

A Decision Support System for Market Segmentation - A Neural Networks Approach Association for Information Systems AIS Electronic Library (AISeL) AMCIS 1995 Proceedings Americas Conference on Information Systems (AMCIS) 8-25-1995 A Decision Support System for Market Segmentation

More information

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Mahalia Miller Daniel Wiesenthal October 6, 2010 1 Introduction One topic of current interest is how language

More information

The Art of Lemon s solution

The Art of Lemon s solution The Art of Lemon s solution KDD Cup 2011 Track 2 Siwei Lai/ Rui Diao Liang Xiang Outline Problem Introduction Data Analytics Algorithms Content 11.2175% Item CF 3.8222% BSVD+ 3.5362% NBSVD+ 3.8146% Main

More information

Chapter 3. Integrating AHP, clustering and association rule mining

Chapter 3. Integrating AHP, clustering and association rule mining Chapter 3. Integrating AHP, clustering and association rule mining This chapter presents a novel product recommendation methodology that combines group decision-making and data mining. This method has

More information

Bayesian Agile Planner (BAP) Release Planning

Bayesian Agile Planner (BAP) Release Planning Bayesian Agile Planner (BAP) Release Planning December, 2015 By: Murray Cantor, Ph.D. Introduction Often businesses have what are, to some, contradictory development team goals: 1) Creating delivery plans

More information

Data Mining in CRM THE CRM STRATEGY

Data Mining in CRM THE CRM STRATEGY CHAPTER ONE Data Mining in CRM THE CRM STRATEGY Customers are the most important asset of an organization. There cannot be any business prospects without satisfied customers who remain loyal and develop

More information

Butler County Community College Business Technology and Workforce Development Spring COURSE OUTLINE Personal Selling

Butler County Community College Business Technology and Workforce Development Spring COURSE OUTLINE Personal Selling Butler County Community College Jared McGinley Business Technology and Workforce Development Spring 2003 COURSE OUTLINE Personal Selling Course Description: BA215. Personal Selling. 3 hours credit. This

More information

Some insights about Insights

Some insights about Insights Can I get some insights, please? Over the years, I have come to somewhat dislike the term insights almost to the same level as, say, a Data Lake. And that s saying something. Not because these concepts

More information

Mining the reviews of movie trailers on YouTube and comments on Yahoo Movies

Mining the reviews of movie trailers on YouTube and comments on Yahoo Movies Mining the reviews of movie trailers on YouTube and comments on Yahoo Movies Li-Chen Cheng* Chi Lun Huang Department of Computer Science and Information Management, Soochow University, Taipei, Taiwan,

More information

Rank hotels on Expedia.com to maximize purchases

Rank hotels on Expedia.com to maximize purchases Rank hotels on Expedia.com to maximize purchases Nishith Khantal, Valentina Kroshilina, Deepak Maini December 14, 2013 1 Introduction For an online travel agency (OTA), matching users to hotel inventory

More information

SELECTING ADS RELEVANT TO LIVE EVENTS TO AN ONLINE AUDIENCE

SELECTING ADS RELEVANT TO LIVE EVENTS TO AN ONLINE AUDIENCE Technical Disclosure Commons Defensive Publications Series April 02, 2015 SELECTING ADS RELEVANT TO LIVE EVENTS TO AN ONLINE AUDIENCE Fong Shen Tao Huang jian Chen Claire Cui Xiaodan Song Follow this and

More information

Modeling of competition in revenue management Petr Fiala 1

Modeling of competition in revenue management Petr Fiala 1 Modeling of competition in revenue management Petr Fiala 1 Abstract. Revenue management (RM) is the art and science of predicting consumer behavior and optimizing price and product availability to maximize

More information

Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling

Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang and Wojtek Kowalczyk Free University of Amsterdam, Department of Computer Science, Amsterdam, The Netherlands

More information

Improving information delivery to oil and gas exploration and field development teams. Experiences from Shell

Improving information delivery to oil and gas exploration and field development teams. Experiences from Shell Improving information delivery to oil and gas exploration and field development teams Experiences from Shell Roger Abel, Shell and Paul Cleverley Flare Solutions Limited, First published in Hart s E&P,

More information

Concept-Based Readability Measurement and Adjustment for Web Services Descriptions

Concept-Based Readability Measurement and Adjustment for Web Services Descriptions Concept-Based Readability Measurement and Adjustment for Web Services Descriptions Pananya Sripairojthikoon, Twittie Senivongse Department of Computer Engineering, Faculty of Engineering, Chulalongkorn

More information

CHAPTER 2 LITERATURE SURVEY

CHAPTER 2 LITERATURE SURVEY 10 CHAPTER 2 LITERATURE SURVEY This chapter provides the related work that has been done about the software performance requirements which includes the sub sections like requirements engineering, functional

More information

How to Get More Value from Your Survey Data

How to Get More Value from Your Survey Data Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................3

More information

REASONING ABOUT CUSTOMER NEEDS IN MULTI-SUPPLIER ICT SERVICE BUNDLES USING DECISION MODELS

REASONING ABOUT CUSTOMER NEEDS IN MULTI-SUPPLIER ICT SERVICE BUNDLES USING DECISION MODELS REASONING ABOUT CUSTOMER NEEDS IN MULTI-SUPPLIER ICT SERVICE BUNDLES USING DECISION MODELS Sybren de Kinderen, Jaap Gordijn and Hans Akkermans The Network Institute, VU University Amsterdam, The Netherlands

More information

Task Control for bpm'online. User Manual

Task Control for bpm'online. User Manual Task Control for bpm'online User Manual Index of Contents Task Control for bpm'online OVERVIEW 3 Installation 4 Activities form 6 Quick filters at the Activities section 8 Activities control 9 child tasks

More information

Event-based Social Networks: Linking the Online and Offline Social Worlds

Event-based Social Networks: Linking the Online and Offline Social Worlds Event-based Social Networks: Linking the Online and Offline Social Worlds Xingjie Liu *, Qi He #, Yuanyuan Tian #, Wang-Chien Lee *, John McPherson #, Jiawei Han + The Pensalvania State University *, IBM

More information

Adaptive Multi-attribute Diversity for Recommender Systems

Adaptive Multi-attribute Diversity for Recommender Systems Adaptive Multi-attribute Diversity for Recommender Systems Tommaso Di Noia 1, Jessica Rosati 2,1, Paolo Tomeo 1, Eugenio Di Sciascio 1 1 Polytechnic University of Bari Via Orabona, 4 70125 Bari, Italy

More information

A Propagation-based Algorithm for Inferring Gene-Disease Associations

A Propagation-based Algorithm for Inferring Gene-Disease Associations A Propagation-based Algorithm for Inferring Gene-Disease Associations Oron Vanunu Roded Sharan Abstract: A fundamental challenge in human health is the identification of diseasecausing genes. Recently,

More information

RECOMMENDER SYSTEM IN RETAIL

RECOMMENDER SYSTEM IN RETAIL RECOMMENDER SYSTEM IN RETAIL 2015 Shoppers Drug Mart. All rights reserved. Unauthorized duplication or distribution in whole or in part via any channel without written permission strictly prohibited. TRADITIONAL

More information

A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL

A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL A SIMULATION STUDY OF THE ROBUSTNESS OF THE LEAST MEDIAN OF SQUARES ESTIMATOR OF SLOPE IN A REGRESSION THROUGH THE ORIGIN MODEL by THILANKA DILRUWANI PARANAGAMA B.Sc., University of Colombo, Sri Lanka,

More information

Application and Evaluation of the Personal Software Process

Application and Evaluation of the Personal Software Process 55 Application and Evaluation of the Personal Software Process Hamdy K.Elminir #1, Eman A.Khereba *1, Mohamed Abu Elsoud #1, Ibrahim El-Hennawy #2 1 Computer Science department, Mansoura University, 60

More information

Research Article Research on E-Commerce Platform-Based Personalized Recommendation Algorithm

Research Article Research on E-Commerce Platform-Based Personalized Recommendation Algorithm Applied Computational Intelligence and So Computing Volume 2016, Article ID 5160460, 7 pages http://dx.doi.org/.1155/2016/5160460 Research Article Research on E-Commerce Platform-Based Personalized Recommendation

More information

OroTimesheet User Guide

OroTimesheet User Guide OroTimesheet User Guide www.orotimesheet.com Copyright 1996-2018 OroLogic Inc. Revision 8.57.0 Table des matières I Contents Contents...I OroTimesheet User Guide...1 Presentation...1 Starting off on the

More information

Experiences in the Use of Big Data for Official Statistics

Experiences in the Use of Big Data for Official Statistics Think Big - Data innovation in Latin America Santiago, Chile 6 th March 2017 Experiences in the Use of Big Data for Official Statistics Antonino Virgillito Istat Introduction The use of Big Data sources

More information

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction

Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction Paper SAS1774-2015 Predictive Modeling Using SAS Visual Statistics: Beyond the Prediction ABSTRACT Xiangxiang Meng, Wayne Thompson, and Jennifer Ames, SAS Institute Inc. Predictions, including regressions

More information

A Social Network-Based Recommender System (SNRS)

A Social Network-Based Recommender System (SNRS) A Social Network-Based Recommender System (SNRS) Jianming He and Wesley W. Chu Computer Science Department University of California, Los Angeles, CA 90095 Email: {jmhek, wwc}@cs.ucla.edu Abstract Social

More information

The Lazy Man s Cash Formula

The Lazy Man s Cash Formula The Lazy Man s Cash Formula Copy Exactly How I Generate $1,927 Per Day Online Almost Effortlessly By Mr. X Now, let s start making some automated revenue as I promised you! Legal Disclaimers All material

More information

Clustering Method using Item Preference based on RFM for Recommendation System in u-commerce

Clustering Method using Item Preference based on RFM for Recommendation System in u-commerce Clustering Method using Item Preference based on RFM for Recommendation System in u-commerce Young Sung Cho 1, Song Chul Moon 2, Seon-phil Jeong 3, In-Bae Oh 4, Keun Ho Ryu 1 1 Department of Computer Science,

More information

IDEAL CUSTOMER AVATAR TOOLKIT

IDEAL CUSTOMER AVATAR TOOLKIT IDEAL CUSTOMER AVATAR TOOLKIT Ideal Customer AVATAR TOOLKIT 2 According to the SBA, 95% of small businesses fail within the first 5 years. One of the main reasons these businesses fail is the lack of a

More information

Achieve Better Insight and Prediction with Data Mining

Achieve Better Insight and Prediction with Data Mining Clementine 12.0 Specifications Achieve Better Insight and Prediction with Data Mining Data mining provides organizations with a clearer view of current conditions and deeper insight into future events.

More information

1. Search, for finding individual or sets of documents and files

1. Search, for finding individual or sets of documents and files WHITE PAPER TURNING UNSTRUCTURED TEXT INTO INSIGHT Extending Business Intelligence With Text Analysis and Search EXECUTIVE SUMMARY While traditional business intelligence (BI) has transformed business

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

Personality-Based Recommendation in E-Commerce

Personality-Based Recommendation in E-Commerce Personality-Based Recommendation in E-Commerce Ciro Bologna 1, Anna Chiara De Rosa 2, Alfonso De Vivo 3, Matteo Gaeta 4, Giuseppe Sansonetti 4, and Valeria Viserta 2 1 Qui! Group S.p.A. 2 Poste Italiane

More information

Using the structure of overlap between search results to rank retrieval systems without relevance judgments

Using the structure of overlap between search results to rank retrieval systems without relevance judgments Information Processing and Management 43 (7) 1059 1070 www.elsevier.com/locate/infoproman Using the structure of overlap between search results to rank retrieval systems without relevance judgments Anselm

More information

Exercises in Environmental Physics

Exercises in Environmental Physics Exercises in Environmental Physics Valerio Faraoni Exercises in Environmental Physics Valerio Faraoni Physics Department Bishop s University Lennoxville, Quebec J1M 1Z7 Canada vfaraoni@cs-linux.ubishops.ca

More information

TEXT MINING APPROACH TO EXTRACT KNOWLEDGE FROM SOCIAL MEDIA DATA TO ENHANCE BUSINESS INTELLIGENCE

TEXT MINING APPROACH TO EXTRACT KNOWLEDGE FROM SOCIAL MEDIA DATA TO ENHANCE BUSINESS INTELLIGENCE International Journal of Advance Research In Science And Engineering http://www.ijarse.com TEXT MINING APPROACH TO EXTRACT KNOWLEDGE FROM SOCIAL MEDIA DATA TO ENHANCE BUSINESS INTELLIGENCE R. Jayanthi

More information

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista

2 Maria Carolina Monard and Gustavo E. A. P. A. Batista Graphical Methods for Classifier Performance Evaluation Maria Carolina Monard and Gustavo E. A. P. A. Batista University of São Paulo USP Institute of Mathematics and Computer Science ICMC Department of

More information

A Unified Theory of Software Testing Bret Pettichord 16 Feb 2003

A Unified Theory of Software Testing Bret Pettichord 16 Feb 2003 A Unified Theory of Software Testing Bret Pettichord 16 Feb 2003 This paper presents a theory, or model, for analyzing and understanding software test techniques. It starts by developing a theory for describing

More information

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara

Customer Relationship Management in marketing programs: A machine learning approach for decision. Fernanda Alcantara Customer Relationship Management in marketing programs: A machine learning approach for decision Fernanda Alcantara F.Alcantara@cs.ucl.ac.uk CRM Goal Support the decision taking Personalize the best individual

More information

Business Customer Value Segmentation for strategic targeting in the utilities industry using SAS

Business Customer Value Segmentation for strategic targeting in the utilities industry using SAS Paper 1772-2018 Business Customer Value Segmentation for strategic targeting in the utilities industry using SAS Spyridon Potamitis, Centrica; Paul Malley, Centrica ABSTRACT Numerous papers have discussed

More information

Toshio Takahara ( ) 1. Introduction. 2. Types of Object and Object Changes

Toshio Takahara ( ) 1. Introduction. 2. Types of Object and Object Changes The General Picture of TRIZ From the Viewpoint of Changing s A Method of Resolving Differences Based on the Concepts of Functions and Process s Part 3 Toshio Takahara ( ) Abstract An important thing is

More information

Application of neural network to classify profitable customers for recommending services in u-commerce

Application of neural network to classify profitable customers for recommending services in u-commerce Application of neural network to classify profitable customers for recommending services in u-commerce Young Sung Cho 1, Song Chul Moon 2, and Keun Ho Ryu 1 1. Database and Bioinformatics Laboratory, Computer

More information

Optimization of Reinforced Concrete Frames by Harmony Search Method

Optimization of Reinforced Concrete Frames by Harmony Search Method 11 th World Congress on Structural and Multidisciplinary Optimisation 07 th -12 th, June 2015, Sydney Australia Optimization of Reinforced Concrete Frames by Harmony Search Method Moacir Kripka, Deise

More information

PM Created on 1/14/ :49:00 PM

PM Created on 1/14/ :49:00 PM Created on 1/14/2015 12:49:00 PM Table of Contents... 1 Lead@UVa Online Training... 1 Introduction and Navigation... 1 Logging Into and Navigating the Site... 2 Managing Notes and Attachments... 9 Customizing

More information

Splitting Approaches for Context-Aware Recommendation: An Empirical Study

Splitting Approaches for Context-Aware Recommendation: An Empirical Study Splitting Approaches for Context-Aware Recommendation: An Empirical Study Yong Zheng, Robin Burke, Bamshad Mobasher ACM SIGAPP the 29th Symposium On Applied Computing Gyeongju, South Korea, March 26, 2014

More information

Tweet Segmentation Using Correlation & Association

Tweet Segmentation Using Correlation & Association 2017 IJSRST Volume 3 Issue 3 Print ISSN: 2395-6011 Online ISSN: 2395-602X Themed Section: Science and Technology Tweet Segmentation Using Correlation & Association Mr. Umesh A. Patil 1, Miss. Madhuri M.

More information

KNIFE SHARPENING BY MORS KOCHANSKI DOWNLOAD EBOOK : KNIFE SHARPENING BY MORS KOCHANSKI PDF

KNIFE SHARPENING BY MORS KOCHANSKI DOWNLOAD EBOOK : KNIFE SHARPENING BY MORS KOCHANSKI PDF Read Online and Download Ebook KNIFE SHARPENING BY MORS KOCHANSKI DOWNLOAD EBOOK : KNIFE SHARPENING BY MORS KOCHANSKI PDF Click link bellow and free register to download ebook: KNIFE SHARPENING BY MORS

More information

9. Verification, Validation, Testing

9. Verification, Validation, Testing 9. Verification, Validation, Testing (a) Basic Notions (b) Dynamic testing. (c) Static analysis. (d) Modelling. (e) Environmental Simulation. (f) Test Strategies. (g) Tool support. (h) Independent Verification

More information

TABLE OF CONTENTS ix

TABLE OF CONTENTS ix TABLE OF CONTENTS ix TABLE OF CONTENTS Page Certification Declaration Acknowledgement Research Publications Table of Contents Abbreviations List of Figures List of Tables List of Keywords Abstract i ii

More information

Corporate Profile

Corporate Profile www.datamine.gr Corporate Profile 1 www.datamine.gr 2 Contents About Datamine 4 Innovative Products for Demanding Business Scenarios 5 CAS for Telecoms 6 CAS for Retailers 7 Segment Designer 8 Corporate

More information

Review Manager Guide

Review Manager Guide Guide February 5, 2018 - Version 9.5 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

More information

ANVIL MARKETING SERVICES

ANVIL MARKETING SERVICES ANVIL MARKETING SERVICES Media Marketing for Local Business 2012 SEO and Social Media Consultant +44 208 144 0973 Angus Watson International Sales Manager +44 203 608 4007 http://anvilmarketing.co.uk 1

More information

Now, I wish you lots of pleasure while reading this report. In case of questions or remarks please contact me at:

Now, I wish you lots of pleasure while reading this report. In case of questions or remarks please contact me at: Preface Somewhere towards the end of the second millennium the director of Vision Consort bv, Hans Brands, came up with the idea to do research in the field of embedded software architectures. He was particularly

More information

Application of Intelligent Methods for Improving the Performance of COCOMO in Software Projects

Application of Intelligent Methods for Improving the Performance of COCOMO in Software Projects Application of Intelligent Methods for Improving the Performance of COCOMO in Software Projects Mahboobeh Dorosti,. Vahid Khatibi Bardsiri Department of Computer Engineering, Kerman Branch, Islamic Azad

More information

Wind turbine vibration study: a data driven methodology

Wind turbine vibration study: a data driven methodology University of Iowa Iowa Research Online Theses and Dissertations Fall 2009 Wind turbine vibration study: a data driven methodology Zijun Zhang University of Iowa Copyright 2009 Zijun Zhang This thesis

More information

Multidimensional Aptitude Battery-II (MAB-II) Extended Report

Multidimensional Aptitude Battery-II (MAB-II) Extended Report Multidimensional Aptitude Battery-II (MAB-II) Extended Report Name: Sam Sample A g e : 30 (Age Group 25-34) Gender: Male Report Date: January 17, 2017 The profile and report below are based upon your responses

More information

RELATION-BASED ITEM SLOTTING

RELATION-BASED ITEM SLOTTING RELATION-BASED ITEM SLOTTING A Thesis presented to the Faculty of the Graduate School University of Missouri In Partial Fulfillment Of the Requirements for the Degree Master of Science by Phichet Wutthisirisart

More information

Introduction. If you are curious of what to expect, then read on

Introduction. If you are curious of what to expect, then read on Introduction If you are reading this, then you re probably preparing to take the Advertising Fundamentals exam and are not quite sure of what s in store for you? or you feel a little under confident about

More information

Reduce paper-based errors

Reduce paper-based errors December 2008 Page 1 Reduce paper-based errors The Syncade Application eliminates the need for paper-based documentation. It enables quick, easy access to electronic information - when it is needed, where

More information

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo

Ensemble Modeling. Toronto Data Mining Forum November 2017 Helen Ngo Ensemble Modeling Toronto Data Mining Forum November 2017 Helen Ngo Agenda Introductions Why Ensemble Models? Simple & Complex ensembles Thoughts: Post-real-life Experimentation Downsides of Ensembles

More information

Finding Right Revenue Models for Digital Goods in. Various Market Segments. - An Analysis of the Japanese Digital Music Market -

Finding Right Revenue Models for Digital Goods in. Various Market Segments. - An Analysis of the Japanese Digital Music Market - Finding Right Revenue Models for Digital Goods in Various Market Segments - An Analysis of the Japanese Digital Music Market - Jiro Kokuryo and Motohiro Hattori Graduate School of Business Administration,

More information

Misinformation Systems

Misinformation Systems Case 1-1 Ackoff s Management Misinformation Systems This case is from a classic article entitled Management Misinformation Systems. It was written by Russell L. Ackoff and appeared in Management Science.

More information

Bandwagon and Underdog Effects and the Possibility of Election Predictions

Bandwagon and Underdog Effects and the Possibility of Election Predictions Reprinted from Public Opinion Quarterly, Vol. 18, No. 3 Bandwagon and Underdog Effects and the Possibility of Election Predictions By HERBERT A. SIMON Social research has often been attacked on the grounds

More information

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET

A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET A STUDY ON STATISTICAL BASED FEATURE SELECTION METHODS FOR CLASSIFICATION OF GENE MICROARRAY DATASET 1 J.JEYACHIDRA, M.PUNITHAVALLI, 1 Research Scholar, Department of Computer Science and Applications,

More information

Can Advanced Analytics Improve Manufacturing Quality?

Can Advanced Analytics Improve Manufacturing Quality? Can Advanced Analytics Improve Manufacturing Quality? Erica Pettigrew BA Practice Director (513) 662-6888 Ext. 210 Erica.Pettigrew@vertexcs.com Jeffrey Anderson Sr. Solution Strategist (513) 662-6888 Ext.

More information

Exploring Similarities of Conserved Domains/Motifs

Exploring Similarities of Conserved Domains/Motifs Exploring Similarities of Conserved Domains/Motifs Sotiria Palioura Abstract Traditionally, proteins are represented as amino acid sequences. There are, though, other (potentially more exciting) representations;

More information

Quality Control Assessment in Genotyping Console

Quality Control Assessment in Genotyping Console Quality Control Assessment in Genotyping Console Introduction Prior to the release of Genotyping Console (GTC) 2.1, quality control (QC) assessment of the SNP Array 6.0 assay was performed using the Dynamic

More information

International Journal of Recent Trends in Electrical & Electronics Engg., Dec IJRTE ISSN:

International Journal of Recent Trends in Electrical & Electronics Engg., Dec IJRTE ISSN: Market Basket Analysis of Sports Store using Association Rules Harpreet Kaur 1 and Kawaljeet Singh 2 1 Department of Computer Science, Punjabi University, Patiala, India. 2 University Computer Centre,

More information

Promote Your Business With LinkedIn

Promote Your Business With LinkedIn Promote Your Business With LinkedIn Greater Aiken SCORE Workshop North Augusta, SC - July 19, 2017 Presented by: Kelley O. Kohr, JD WSI Digital Marketing 1 2 AGENDA LinkedIn Means Business! Get Started

More information