Tag cloud generation for results of multiple keywords queries

Size: px
Start display at page:

Download "Tag cloud generation for results of multiple keywords queries"

Transcription

1 Tag cloud generation for results of multiple keywords queries Martin Leginus, Peter Dolog and Ricardo Gomes Lage IWIS, Department of Computer Science, Aalborg University

2 What tag clouds are? Tag cloud is a visual retrieval interface depicting the most important terms of a dataset. Tag clouds build on top of the entire dataset or query based tag clouds.

3 Tag clouds build on top of the entire dataset. What tag clouds are?

4 Query based tag clouds. What tag clouds are?

5 Motivation It is motivated by personalization tasks, surveillance systems and information retrieval tasks defined with multiple keywords.

6 Techniques Most Frequent Tags from Corpus (MFTC) Most Frequent Tags from Query Result Set (POP) The most frequent topics within the system are propagated to the tag cloud. The tag cloud does not cover other not so frequently represented topics which could be relevant for the user. Term frequency inverse document frequency selection (TFIDF) For each tag from the documents ( ) that is associated with the query keywords, tf idf is computed. These values are aggregated and sorted in the descending order. No consideration of semantic similarities between tags.

7 Techniques Max coverage selection (COV) Maximization of coverage and minimization of overlap between tag clouds tags. The optimization of coverage might result into the generation of tag clouds that contain terms with high coverage but are irrelevant for the specific user's information retrieval goal.

8 Graph based techniques 1. Tag space transformed into a graph. 1. Calculate a tag pair co occurence using Jaccard similarity for all tags. 2. When similarity for a tag pair is greater than a predefined threshold α, we consider such tags as similar. 3. Each similar tag pair is transformed into two directed edges t1 t2 and t2 t1 2. Graph based methods for relevance estimation The algorithms rank an importance of a tag t with respect to the query keywords T I (t Tq) Top k most relevant tags are selected for the final tag cloud

9 Graph based techniques 1. Tag space transformed into a graph. Calculate a tag pair co occurence using Jaccard similarity for all tags. Samuel L. Jackson assigned to Goodfellas (1990),Pulp Fiction (1994),Die Hard: With a Vengeance (1995),Kill Bill: Vol. 2 (2004) Tarantino assigned to Reservoir Dogs (1992),Pulp Fiction (1994),Four Rooms(1995), Jackie Brown (1997),Kill Bill: Vol. 1 (2003),Kill Bill: Vol. 2 (2004) Cooccurring at Pulp Fiction and Kill Bill: Vol. 2 JAC(Samuel L. Jackson;Tarantino) =

10 Graph based techniques 1. Tag space transformed into a graph. 1. Calculate a tag pair co occurence using Jaccard similarity for all tags. 2. When similarity for a tag pair is greater than a predefined threshold α, we consider such tags as similar. 3. Each similar tag pair is transformed into two directed edges t1 t2 and t2 t1 2. Graph based methods for relevance estimation The algorithms rank an importance of a tag t with respect to the query keywords T I (t ) Top k most relevant tags are selected for the final tag cloud

11 Graph based techniques Graph based methods for relevance estimation Distance based approaches computationally expensive Stochastic approaches simulation of a random traversal of the graph In this work, we focus only on stochastic approaches

12 Stochastic Graph based techniques Measuring importance of nodes in the graph through the simulation of a stochastic process i.e., random traversing of the graph. The transition probability from a node for all nodes that have an ingoing edge from. is defined as

13 Stochastic Graph based techniques Bruce Willis Reservoir dogs Unbreakable Samuel L. Jackson Quentin Tarantino Pulp Fiction Kill Bill vol. 2

14 Stochastic Graph based techniques Bruce Willis Starts a random walk from Pulp Fiction node 5 options of transitions Reservoir dogs Unbreakable Samuel L. Jackson Quentin Tarantino Pulp Fiction Kill Bill vol. 2

15 Stochastic Graph based techniques Bruce Willis Jumped to Bruce Willis tag only three options of transitions. Reservoir dogs Unbreakable Samuel L. Jackson Quentin Tarantino Pulp Fiction Kill Bill vol. 2

16 Stochastic Graph based techniques Bruce Willis Jumped to Unbreakable tag only three options of transitions. Reservoir dogs Unbreakable Samuel L. Jackson Quentin Tarantino Pulp Fiction Kill Bill vol. 2

17 Stochastic Graph based techniques Bruce Willis The random walk after some time converges if you will run it longer the time a token stays at a certain node will be the same. Reservoir dogs Unbreakable XY Samuel L. Jackson Quentin Tarantino Pulp Fiction Kill Bill vol. 2

18 Stochastic Graph based techniques At each step of the random walk, it is possible to perfom a random restart which starts the random walk again from one of the root noods query tags. Bruce Willis Reservoir dogs Unbreakable XY Samuel L. Jackson Quentin Tarantino Pulp Fiction Kill Bill vol. 2

19 Pagerank with priors Relative importance to a query tag is introduced through the vector of prior probabilities = { } A random surfer is assured with a back probability ) The resulting ranks biased towards are considered as definition of importance after convergence i.e.; I(t The method requires to set up several parameters such as the back probability and prior probabilities with respect to a specific dataset.

20 HITS with priors The same prior probabilities probability = { } and a back ) Where: )

21 K step Markov Chain This method differs in the implementation of a random surfer model. Implement with a path length limitation determines how often we jump back to root nodes..... Relative importance to root nodes is introduced through a vector of prior probabilities

22 Prior probabilities Uniform prior distribution results into inclusion of irrelevant tags into the final tag cloud. Relative popularity of query tags

23

24 Datasets Bibsonomy contains 206k items, 51k tags and 466k tagging posts. Movielens contains 16k tags, 7k movies and 95k tagging posts. Delicious contains 187k tags, 355k bookmarks and 2046k tagging posts.

25 Synthetic metrics Synthetic metrics express a quality of tags selection process (Venetis 2011). Relevance of : Expresses how relevant the tags in are to the query tags. We compute an average relevance of all tags from in the following way: The metric captures to which extent resources associated with tag cloud tags overlap with the resources retrieved by the query tags. We do not consider Coverage as this metrics might be misleading tags with high coverage can be irrelevant and not enough discriminative for the retrieval tasks

26 A set of tags issued as a query Documents associated with the tag Documents retrieved by the keywords query Documents associated with the tag Tag T is more relevant than T The tag T can be perceived as more specific subtopic of the documents returned by the query T / more discriminative for filtering purposes.

27 Results Bibsonomy

28 Results Movielens

29 Results Delicious

30 Limitations The methods do not perform that well on top of datasets with the long tail distribution of tags. Caused by the way the tag space is transformed into a graph structure.

31 Conclusions The graph based methods perform the best at the Movielens and the Bibsonomy datasets. The proposed extension of the setting of prior probabilities for the random walk based algorithms. The methods do not perform well at the Delicious dataset.

32 Future work Propose an enhanced graph creation Enhance tags selection to generate more diverse and novel tag clouds. Extend synthetic metrics that will better capture diversity and novelty of tag clouds

33 Questions

34

35 Possible questions: Why there is a need to adjust a prior probabilities? When a rarely used tag is chosen as a query tag, such tag does not co occure with many tags. Therefore, there are not many edges connecting this graph node with other nodes. A random traversal of the graph initiated from the rarely used tag/node might reach not important/relevant nodes (tags). Consequently, it results into an inclusion of irrelevant tags into the tag cloud. We verified this assumption by series of preliminary evaluations.

36 Possible questions: Why Delicious is different? There are many very frequent tags in the dataset, i.e., almost 20 tags that were assigned at least times, almost 500 tags that were placed by users at least 1000 times. On the other hand, there are tags utilized less than 10 times. The underlying co occurrence graph links very frequent tags with very rarely used tags. It results into the inclusion of more frequent tags into tag clouds. Such inclusion causes lower relevance.

Methodologies for Improved Tag Cloud Generation with Clusterin

Methodologies for Improved Tag Cloud Generation with Clusterin Methodologies for Improved Tag Cloud Generation with Clustering. Martin Leginus, Peter Dolog, Ricardo Lage, and Frederico Durao Department of Computer Science, Aalborg University July, 2012 Agenda Introduction

More information

SOCIAL MEDIA MINING. Behavior Analytics

SOCIAL MEDIA MINING. Behavior Analytics SOCIAL MEDIA MINING Behavior Analytics Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Entity Grouping for Accessing Social Streams via Word Clouds

Entity Grouping for Accessing Social Streams via Word Clouds Entity Grouping for Accessing Social Streams via Word Clouds Martin Leginus 1, Leon Derczynski 2, and Peter Dolog 1 1 Department of Computer Science, Aalborg University, Selma Lagerlofs Vej 300, 9200 Aalborg,

More information

HybridRank: Ranking in the Twitter Hybrid Networks

HybridRank: Ranking in the Twitter Hybrid Networks HybridRank: Ranking in the Twitter Hybrid Networks Jianyu Li Department of Computer Science University of Maryland, College Park jli@cs.umd.edu ABSTRACT User influence in social media may depend on multiple

More information

On utility of temporal embeddings for skill matching. Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch

On utility of temporal embeddings for skill matching. Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch On utility of temporal embeddings for skill matching Manisha Verma, PhD student, UCL Nathan Francis, NJFSearch Skill Trend Importance 1. Constant evolution of labor market yields differences in importance

More information

The Science of Social Media. Kristina Lerman USC Information Sciences Institute

The Science of Social Media. Kristina Lerman USC Information Sciences Institute The Science of Social Media Kristina Lerman USC Information Sciences Institute ML meetup, July 2011 What is a science? Explain observed phenomena Make verifiable predictions Help engineer systems with

More information

2016 U.S. PRESIDENTIAL ELECTION FAKE NEWS

2016 U.S. PRESIDENTIAL ELECTION FAKE NEWS 2016 U.S. PRESIDENTIAL ELECTION FAKE NEWS OVERVIEW Introduction Main Paper Related Work and Limitation Proposed Solution Preliminary Result Conclusion and Future Work TWITTER: A SOCIAL NETWORK AND A NEWS

More information

THE internet has changed different aspects of our lives, job

THE internet has changed different aspects of our lives, job 1 Recommender Systems for IT Recruitment João Almeida and Luís Custódio Abstract Recruitment processes have increasingly become dependent on the internet. Companies post job opportunities on their websites

More information

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Mahsa Naseri and Simone A. Ludwig Abstract In service-oriented environments, services with different functionalities are combined

More information

M-Eco enhanced Adaptation Service (D5.2) Dolog, Peter; Durao, Frederico Araujo; Lage, Ricardo Gomes; Leginus, Martin; Pan, Rong

M-Eco enhanced Adaptation Service (D5.2) Dolog, Peter; Durao, Frederico Araujo; Lage, Ricardo Gomes; Leginus, Martin; Pan, Rong Aalborg Universitet M-Eco enhanced Adaptation Service (D5.2) Dolog, Peter; Durao, Frederico Araujo; Lage, Ricardo Gomes; Leginus, Martin; Pan, Rong Publication date: 2012 Document Version Accepted author

More information

Glossary Adjacency matrix Adjective Orientation Similarity Aspect coverage Bipartite networks CAO Collaborative filtering Complete graph

Glossary Adjacency matrix Adjective Orientation Similarity Aspect coverage Bipartite networks CAO Collaborative filtering Complete graph Glossary Adjacency matrix The adjacency matrix is a matrix whose rows and columns represent the graph vertices. A matrix entry at position (i, j) contains a 1 or a 0 value according to whether an edge

More information

Diversifying Web Service Recommendation Results via Exploring Service Usage History

Diversifying Web Service Recommendation Results via Exploring Service Usage History \ IEEE TRANSACTIONS ON SERVICES COMPUTING on Volume: PP; Year 2015 Diversifying Web Service Recommendation Results via Exploring Service Usage History Guosheng Kang, Student Member, IEEE, Mingdong Tang,

More information

Influencer Communities. Influencer Communities. Influencers are having many different conversations

Influencer Communities. Influencer Communities. Influencers are having many different conversations Influencer Communities Influencers are having many different conversations 1 1.0 Background A unique feature of social networks is that people with common interests are following (or friend-ing) similar

More information

Ontology-Based Model of Law Retrieval System for R&D Projects

Ontology-Based Model of Law Retrieval System for R&D Projects Ontology-Based Model of Law Retrieval System for R&D Projects Wooju Kim Yonsei University 50 Yonsei-ro, Seodaemun-gu, Seoul, Republic of Korea +82-2-2123-5716 wkim@yonsei.ac.kr Minjae Won INNOPOLIS Foundation

More information

The effect of Product Ratings on Viral Marketing CS224W Project proposal

The effect of Product Ratings on Viral Marketing CS224W Project proposal The effect of Product Ratings on Viral Marketing CS224W Project proposal Stefan P. Hau-Riege, stefanhr@stanford.edu In network-based marketing, social influence is considered in order to optimize marketing

More information

Conclusions and Future Work

Conclusions and Future Work Chapter 9 Conclusions and Future Work Having done the exhaustive study of recommender systems belonging to various domains, stock market prediction systems, social resource recommender, tag recommender

More information

15. Text Data Visualization. Prof. Tulasi Prasad Sariki SCSE, VIT, Chennai

15. Text Data Visualization. Prof. Tulasi Prasad Sariki SCSE, VIT, Chennai 15. Text Data Visualization Prof. Tulasi Prasad Sariki SCSE, VIT, Chennai www.learnersdesk.weebly.com Why Visualize Text? Understanding get the gist of a document Grouping cluster for overview or classifcation

More information

Social Recommendation: A Review

Social Recommendation: A Review Noname manuscript No. (will be inserted by the editor) Social Recommendation: A Review Jiliang Tang Xia Hu Huan Liu Received: date / Accepted: date Abstract Recommender systems play an important role in

More information

SI Recommender Systems, Winter 2009

SI Recommender Systems, Winter 2009 University of Michigan Deep Blue deepblue.lib.umich.edu 2010-02 SI 583 - Recommender Systems, Winter 2009 Sami, Rahul Sami, R. (2010, February 16). Recommender Systems. Retrieved from Open.Michigan - Educational

More information

Mining the Social Web. Eric Wete June 13, 2017

Mining the Social Web. Eric Wete June 13, 2017 Mining the Social Web Eric Wete ericwete@gmail.com June 13, 2017 Outline The big picture Features and methods (Political Polarization on Twitter) Summary (Political Polarization on Twitter) Features on

More information

Generative Models for Networks and Applications to E-Commerce

Generative Models for Networks and Applications to E-Commerce Generative Models for Networks and Applications to E-Commerce Patrick J. Wolfe (with David C. Parkes and R. Kang-Xing Jin) Division of Engineering and Applied Sciences Department of Statistics Harvard

More information

Methods and tools for exploring functional genomics data

Methods and tools for exploring functional genomics data Methods and tools for exploring functional genomics data William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington Outline Searching for

More information

Hashtag-centric Immersive Search on Social Media

Hashtag-centric Immersive Search on Social Media Hashtag-centric Immersive Search on Social Media Yuqi Gao, Jitao Sang, Tongwei Ren, Changsheng Xu State Key Laboratory for Novel Software Technology, Nanjing University National Lab of Pattern Recognition,

More information

Monte-Carlo Tree Search

Monte-Carlo Tree Search Introduction Selection and expansion References An introduction Jérémie DECOCK May 2012 Introduction Selection and expansion References Introduction 2 Introduction Selection and expansion References Introduction

More information

Context-aware recommendation

Context-aware recommendation Context-aware recommendation Eirini Kolomvrezou, Hendrik Heuer Special Course in Computer and Information Science User Modelling & Recommender Systems Aalto University Context-aware recommendation 2 Recommendation

More information

Worker Skill Estimation from Crowdsourced Mutual Assessments

Worker Skill Estimation from Crowdsourced Mutual Assessments Worker Skill Estimation from Crowdsourced Mutual Assessments Shuwei Qiang The George Washington University Amrinder Arora BizMerlin Current approaches for estimating skill levels of workforce either do

More information

Influence Maximization-based Event Organization on Social Networks

Influence Maximization-based Event Organization on Social Networks Influence Maximization-based Event Organization on Social Networks Cheng-Te Li National Cheng Kung University, Taiwan chengte@mail.ncku.edu.tw 2017/9/18 2 Social Event Organization You may want to plan

More information

An Analysis Framework for Content-based Job Recommendation. Author(s) Guo, Xingsheng; Jerbi, Houssem; O'Mahony, Michael P.

An Analysis Framework for Content-based Job Recommendation. Author(s) Guo, Xingsheng; Jerbi, Houssem; O'Mahony, Michael P. Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title An Analysis Framework for Content-based Job

More information

Active Learning for Conjoint Analysis

Active Learning for Conjoint Analysis Peter I. Frazier Shane G. Henderson snp32@cornell.edu pf98@cornell.edu sgh9@cornell.edu School of Operations Research and Information Engineering Cornell University November 1, 2015 Learning User s Preferences

More information

Modeling and Predicting User Interests based on Taxonomy. Makoto Nakatsuji

Modeling and Predicting User Interests based on Taxonomy. Makoto Nakatsuji Modeling and Predicting User Interests based on Taxonomy Makoto Nakatsuji Abstract In the thesis, we analyze user interests based on a domain specific taxonomy. We propose modeling user interests and

More information

TOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS

TOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS TOWARD MORE DIVERSE RECOMMENDATIONS: ITEM RE-RANKING METHODS FOR RECOMMENDER SYSTEMS Gediminas Adomavicius YoungOk Kwon Department of Information and Decision Sciences Carlson School of Management, University

More information

What is Word Cloud? Word clouds provide a concise yet fun way to summarize the content of websites or text documents

What is Word Cloud? Word clouds provide a concise yet fun way to summarize the content of websites or text documents What is Word Cloud? Word clouds provide a concise yet fun way to summarize the content of websites or text documents In a typical word cloud, tags from a website (or words from a document) are packed into

More information

MicroTrails Comparing Hypotheses about Task Selection on a Crowdsourcing Platform

MicroTrails Comparing Hypotheses about Task Selection on a Crowdsourcing Platform MicroTrails Comparing Hypotheses about Task Selection on a Crowdsourcing Platform Martin Becker 1 Kathrin Borchert 2 Mathias Hirth 2 Hauke Mewes 1 Andreas Hotho 1,3 Phuoc Tran-Gia 2 DMIR, Computer Sicence,

More information

Leveraging the Social Breadcrumbs

Leveraging the Social Breadcrumbs Leveraging the Social Breadcrumbs 2 Social Network Service Important part of Web 2.0 People share a lot of data through those sites They are of different kind of media Uploaded to be seen by other people

More information

Ant Colony Optimization

Ant Colony Optimization Ant Colony Optimization Part 2: Simple Ant Colony Optimization Fall 2009 Instructor: Dr. Masoud Yaghini Outline Ant Colony Optimization: Part 2 Simple Ant Colony Optimization (S-ACO) Experiments with S-ACO

More information

NETWORK BASED PRIORITIZATION OF DISEASE GENES

NETWORK BASED PRIORITIZATION OF DISEASE GENES NETWORK BASED PRIORITIZATION OF DISEASE GENES by MEHMET SİNAN ERTEN Submitted in partial fulfillment of the requirements for the degree of Master of Science Thesis Advisor: Mehmet Koyutürk Department of

More information

Data Mining in Social Network. Presenter: Keren Ye

Data Mining in Social Network. Presenter: Keren Ye Data Mining in Social Network Presenter: Keren Ye References Kwak, Haewoon, et al. "What is Twitter, a social network or a news media?." Proceedings of the 19th international conference on World wide web.

More information

Indexing and Query Processing. What will we cover?

Indexing and Query Processing. What will we cover? Indexing and Query Processing CS 510 Spring 2010 1 What will we cover? Key concepts and terminology Inverted index structures Organization, creation, maintenance Compression Distribution Answering queries

More information

Lioma, Christina Amalia: Part of Speech n-grams for Information Retrieval

Lioma, Christina Amalia: Part of Speech n-grams for Information Retrieval Lioma, Christina Amalia: Part of Speech n-grams for Information Retrieval David Nemeskey Data Mining and Search Research Group MTA SZTAKI Data Mining Seminar 2011.05.12. Information Retrieval Goal: return

More information

Individual and Social Behavior in Tagging Systems

Individual and Social Behavior in Tagging Systems Individual and Social Behavior in Tagging Systems Elizeu Santos-Neto,David Condon,Nazareno Andrade +,Adriana Iamnitchi,Matei Ripeanu Electrical & Computer Engineer University of British Columbia 2332 Mail

More information

A Semi-automated Peer-review System Bradly Alicea Orthogonal Research

A Semi-automated Peer-review System Bradly Alicea Orthogonal Research A Semi-automated Peer-review System Bradly Alicea bradly.alicea@ieee.org Orthogonal Research Abstract A semi-supervised model of peer review is introduced that is intended to overcome the bias and incompleteness

More information

Trust-Aware Recommender Systems

Trust-Aware Recommender Systems Mohammad Ali Abbasi, Jiliang Tang, and Huan Liu Computer Science and Engineering, Arizona State University {Ali.Abbasi, Jiliang.Tang, Huan.Liu}@asu.edu Trust-Aware Recommender Systems Chapter 1 Trust-Aware

More information

A Weighted Tag Similarity Measure Based on a Collaborative Weight Model

A Weighted Tag Similarity Measure Based on a Collaborative Weight Model A Weighted Tag Similarity Measure Based on a Collaborative Weight Model G.R.J.Srinivas Niket Tandon Search and Information Max Planck Institute, Extraction Lab, IIIT Hyderabad, Germany India ntandon@mpi-inf.mpg.de

More information

Cascading Behavior in Networks. Anand Swaminathan, Liangzhe Chen CS /23/2013

Cascading Behavior in Networks. Anand Swaminathan, Liangzhe Chen CS /23/2013 Cascading Behavior in Networks Anand Swaminathan, Liangzhe Chen CS 6604 10/23/2013 Outline l Diffusion in networks l Modeling diffusion through a network l Diffusion, Thresholds and role of Weak Ties l

More information

Towards Effective and Efficient Behavior-based Trust Models. Klemens Böhm Universität Karlsruhe (TH)

Towards Effective and Efficient Behavior-based Trust Models. Klemens Böhm Universität Karlsruhe (TH) Towards Effective and Efficient Behavior-based Trust Models Universität Karlsruhe (TH) Motivation: Grid Computing in Particle Physics Physicists have designed and implemented services specific to particle

More information

Knowledge-Guided Analysis with KnowEnG Lab

Knowledge-Guided Analysis with KnowEnG Lab Han Sinha Song Weinshilboum Knowledge-Guided Analysis with KnowEnG Lab KnowEnG Center Powerpoint by Charles Blatti Knowledge-Guided Analysis KnowEnG Center 2017 1 Exercise In this exercise we will be doing

More information

Predicting user rating for Yelp businesses leveraging user similarity

Predicting user rating for Yelp businesses leveraging user similarity Predicting user rating for Yelp businesses leveraging user similarity Kritika Singh kritika@eng.ucsd.edu Abstract Users visit a Yelp business, such as a restaurant, based on its overall rating and often

More information

Final Project - Social and Information Network Analysis

Final Project - Social and Information Network Analysis Final Project - Social and Information Network Analysis Factors and Variables Affecting Social Media Reviews I. Introduction Humberto Moreira Rajesh Balwani Subramanyan V Dronamraju Dec 11, 2011 Problem

More information

Discovering Emerging Businesses

Discovering Emerging Businesses Arjun Mathur, Chris van Harmelen, Shubham Gupta Abstract In the last few years, extensive research has been done on user-item graphs in order to enable modern users to easily find interesting new items

More information

A Dynamics for Advertising on Networks. Atefeh Mohammadi Samane Malmir. Spring 1397

A Dynamics for Advertising on Networks. Atefeh Mohammadi Samane Malmir. Spring 1397 A Dynamics for Advertising on Networks Atefeh Mohammadi Samane Malmir Spring 1397 Outline Introduction Related work Contribution Model Theoretical Result Empirical Result Conclusion What is the problem?

More information

The Emergence of Hypertextual Ecology from Individual Decisions

The Emergence of Hypertextual Ecology from Individual Decisions The Emergence of Hypertextual Ecology from Individual Decisions Miles Efron Steven M. Goodreau Vishal Sanwalani July 23, 2002 Abstract Current World Wide Web (WWW) search engines employ graph-theoretic

More information

Improving Web Service Clustering through Ontology Learning and Context Awareness

Improving Web Service Clustering through Ontology Learning and Context Awareness Improving Web Service Clustering through Ontology Learning and Context Awareness Banage Thenne Gedara Samantha Kumara A DISSERTATION SUBMITTTED IN FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR

More information

Influence Maximization on Social Graphs. Yu-Ting Wen

Influence Maximization on Social Graphs. Yu-Ting Wen Influence Maximization on Social Graphs Yu-Ting Wen 05-25-2018 Outline Background Models of influence Linear Threshold Independent Cascade Influence maximization problem Proof of performance bound Compute

More information

WE consider the general ranking problem, where a computer

WE consider the general ranking problem, where a computer 5140 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Statistical Analysis of Bayes Optimal Subset Ranking David Cossock and Tong Zhang Abstract The ranking problem has become increasingly

More information

Perseus A Personalized Reputation System

Perseus A Personalized Reputation System Perseus A Personalized Reputation System Petteri Nurmi Helsinki Institute for Information Technology HIIT petteri.nurmi@cs.helsinki.fi Introduction Internet provides many possibilities for online transactions

More information

MADVERTISER: A SYSTEM FOR MOBILE ADVERTISING IN MOBILE PEER-TO-PEER ENVIRONMENTS

MADVERTISER: A SYSTEM FOR MOBILE ADVERTISING IN MOBILE PEER-TO-PEER ENVIRONMENTS Association for Information Systems AIS Electronic Library (AISeL) PACIS 2014 Proceedings Pacific Asia Conference on Information Systems (PACIS) 2014 MADVERTISER: A SYSTEM FOR MOBILE ADVERTISING IN MOBILE

More information

CorpWiki: A self-regulating wiki to promote corporate collective intelligence through expert peer matching

CorpWiki: A self-regulating wiki to promote corporate collective intelligence through expert peer matching CorpWiki: A self-regulating wiki to promote corporate collective intelligence through expert peer matching Ioanna Lykourentzou (1), Katerina Papadaki (1), Dimitrios J. Vergados (1), Despina Polemi (2)

More information

Preface to the third edition Preface to the first edition Acknowledgments

Preface to the third edition Preface to the first edition Acknowledgments Contents Foreword Preface to the third edition Preface to the first edition Acknowledgments Part I PRELIMINARIES XXI XXIII XXVII XXIX CHAPTER 1 Introduction 3 1.1 What Is Business Analytics?................

More information

Modeling Heterogeneous User. Churn and Local Resilience of Unstructured P2P Networks

Modeling Heterogeneous User. Churn and Local Resilience of Unstructured P2P Networks Modeling Heterogeneous User Churn and Local Resilience of Unstructured P2P Networks Zhongmei Yao Joint work with Derek Leonard, Xiaoming Wang, and Dmitri Loguinov Internet Research Lab Department of Computer

More information

UNSUPERVISED KEYWORD EXTRACTION FROM MICROBLOG POSTS VIA HASHTAGS a

UNSUPERVISED KEYWORD EXTRACTION FROM MICROBLOG POSTS VIA HASHTAGS a Journal of Web Engineering, Vol. 17, No. 1&2 (2018) 093 120 c River Publishers UNSUPERVISED KEYWORD EXTRACTION FROM MICROBLOG POSTS VIA HASHTAGS a LIN LI 1 School of Computer Science & Technology, Wuhan

More information

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Yixuan Li 1, Kun He 2, David Bindel 1 and John E. Hopcroft 1 1 Cornell University, USA 2 Huazhong University of Science

More information

Scalable Mining of Social Data using Stochastic Gradient Fisher Scoring. Jeon-Hyung Kang and Kristina Lerman USC Information Sciences Institute

Scalable Mining of Social Data using Stochastic Gradient Fisher Scoring. Jeon-Hyung Kang and Kristina Lerman USC Information Sciences Institute Scalable Mining of Social ata using Stochastic Gradient Fisher Scoring Jeon-Hyung Kang and Kristina Lerman USC Information Sciences Institute Information Overload in Social Media 2,500,000,000,000,000,000

More information

Computational Text Analysis for Functional Genomics and Bioinformatics

Computational Text Analysis for Functional Genomics and Bioinformatics Computational Text Analysis for Functional Genomics and Bioinformatics Notes Konstantin Tretyakov Abstract The book Computational Text Analysis for Functional Genomics and Bioinformatics by S. Raychaudhuri

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

An Effective Recommender System by Unifying User and Item Trust Information for B2B Applications

An Effective Recommender System by Unifying User and Item Trust Information for B2B Applications An Effective Recommender System by Unifying User and Item Trust Information for B2B Applications Qusai Shambour a,b, Jie Lu a, a Lab of Decision Systems and e-service Intelligence, Centre for Quantum Computation

More information

Keyword Extraction using Word Co-occurrence TIR 2010, Bilbao 31 August 2010

Keyword Extraction using Word Co-occurrence TIR 2010, Bilbao 31 August 2010 Keyword Extraction using Word Co-occurrence TIR 2010, Bilbao 31 August 2010 Christian Wartena (Novay Rogier Brussee (Univ. of Applied Sciences Utrecht, presenter Wout Slakhorst (Novay Problem description

More information

Measurement and Analysis of OSN Ad Auctions

Measurement and Analysis of OSN Ad Auctions Measurement and Analysis of OSN Ad Auctions Chloe Kliman-Silver Robert Bell Balachander Krishnamurthy Alan Mislove Northeastern University AT&T Labs Research Brown University Motivation Online advertising

More information

Homophily and Influence in Social Networks

Homophily and Influence in Social Networks Homophily and Influence in Social Networks Nicola Barbieri nicolabarbieri1@gmail.com References: Maximizing the Spread of Influence through a Social Network, Kempe et Al 2003 Influence and Correlation

More information

OntoNaviERP: Ontology-supported Navigation in ERP Software Documentation

OntoNaviERP: Ontology-supported Navigation in ERP Software Documentation OntoNaviERP: Ontology-supported Navigation in ERP Software Documentation 1,2 and Andreas Wechselberger 1 1 E-Business and Web Science Research Group, Bundeswehr University Munich, Germany 2 STI Innsbruck,

More information

Key Lessons Learned Building Recommender Systems For Large-scale Social Networks

Key Lessons Learned Building Recommender Systems For Large-scale Social Networks Key Lessons Learned Building Recommender Systems For Large-scale Social Networks 1 The world s largest professional network Over 50% of members are now international 2/sec 165M+ * New members * 34th 90

More information

Creation of a PAM matrix

Creation of a PAM matrix Rationale for substitution matrices Substitution matrices are a way of keeping track of the structural, physical and chemical properties of the amino acids in proteins, in such a fashion that less detrimental

More information

Text Mining. Theory and Applications Anurag Nagar

Text Mining. Theory and Applications Anurag Nagar Text Mining Theory and Applications Anurag Nagar Topics Introduction What is Text Mining Features of Text Document Representation Vector Space Model Document Similarities Document Classification and Clustering

More information

Evaluating Tagging Behavior in Social Bookmarking Systems: Metrics and design heuristics

Evaluating Tagging Behavior in Social Bookmarking Systems: Metrics and design heuristics Evaluating Tagging Behavior in Social Bookmarking Systems: Metrics and design heuristics Umer Farooq 1, Thomas G. Kannampallil 1, Yang Song 2, Craig H. Ganoe 1, John M. Carroll 1, and C. Lee Giles 2 1

More information

A Propagation-based Algorithm for Inferring Gene-Disease Associations

A Propagation-based Algorithm for Inferring Gene-Disease Associations A Propagation-based Algorithm for Inferring Gene-Disease Associations Oron Vanunu Roded Sharan Abstract: A fundamental challenge in human health is the identification of diseasecausing genes. Recently,

More information

Ph.D. Defense: Resource Allocation Optimization in the Smart Grid and High-performance Computing Tim Hansen

Ph.D. Defense: Resource Allocation Optimization in the Smart Grid and High-performance Computing Tim Hansen Ph.D. Defense: Resource Allocation Optimization in the Smart Grid and High-performance Computing Tim Hansen Department of Electrical and Computer Engineering Colorado State University Fort Collins, Colorado,

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Thesaurus based Keyword Extraction. Luit Gazendam (Novay) Christian Wartena (Novay) Rogier Brussee (Univ. of Applied Sciences Utrecht)

Thesaurus based Keyword Extraction. Luit Gazendam (Novay) Christian Wartena (Novay) Rogier Brussee (Univ. of Applied Sciences Utrecht) Thesaurus based Keyword Extraction Luit Gazendam (Novay) Christian Wartena (Novay) Rogier Brussee (Univ. of Applied Sciences Utrecht) Problem description Keywords used for organising and retrieval documents

More information

XPLODIV: An Exploitation-Exploration Aware Diversification Approach for Recommender Systems

XPLODIV: An Exploitation-Exploration Aware Diversification Approach for Recommender Systems Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference XPLODIV: An Exploitation-Exploration Aware Diversification Approach for Recommender Systems Andrea

More information

Inferring Social Ties across Heterogeneous Networks

Inferring Social Ties across Heterogeneous Networks Inferring Social Ties across Heterogeneous Networks CS 6001 Complex Network Structures HARISH ANANDAN Introduction Social Ties Information carrying connections between people It can be: Strong, weak or

More information

Advanced Job Daimler. Julian Leweling, Daimler AG

Advanced Job Daimler. Julian Leweling, Daimler AG Advanced Job Analytics @ Daimler Julian Leweling, Agenda From Job Ads to Knowledge: Advanced Job Analytics @ Daimler About Why KNIME? Our Inspiration Use Case KNIME Walkthrough Application Next steps Advanced

More information

Derek Davis, Gerardo Figueroa, and Yi-Shin Chen

Derek Davis, Gerardo Figueroa, and Yi-Shin Chen IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 47, NO. 6, JUNE 2017 979 SociRank: Identifying and Ranking Prevalent News Topics Using Social Media Factors Derek Davis, Gerardo Figueroa,

More information

LEAST COST SEARCH ALGORITHM FOR THE IDENTIFICATION OF A DNAPL SOURCE

LEAST COST SEARCH ALGORITHM FOR THE IDENTIFICATION OF A DNAPL SOURCE LEAST COST SEARCH ALGORITHM FOR THE IDENTIFICATION OF A DNAPL SOURCE Z. DOKOU and G. F. PINDER Research Center for Groundwater Remediation Design, University of Vermont, Department of Civil and Environmental

More information

Experimental Techniques 2

Experimental Techniques 2 Experimental Techniques 2 High-throughput interaction detection Yeast two-hybrid - pairwise organisms as machines to learn about organisms yeast, worm, fly, human,... low intersection between repeated

More information

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong

Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Progress Report: Predicting Which Recommended Content Users Click Stanley Jacob, Lingjie Kong Machine learning models can be used to predict which recommended content users will click on a given website.

More information

Silvia Calegari, Marco Comerio, Andrea Maurino,

Silvia Calegari, Marco Comerio, Andrea Maurino, A Semantic and Information Retrieval based Approach to Service Contract Selection Silvia Calegari, Marco Comerio, Andrea Maurino, Emanuele Panzeri, and Gabriella Pasi Department of Informatics, Systems

More information

Final Project Report CS224W Fall 2015 Afshin Babveyh Sadegh Ebrahimi

Final Project Report CS224W Fall 2015 Afshin Babveyh Sadegh Ebrahimi Final Project Report CS224W Fall 2015 Afshin Babveyh Sadegh Ebrahimi Introduction Bitcoin is a form of crypto currency introduced by Satoshi Nakamoto in 2009. Even though it only received interest from

More information

A Site Observation Directed Test Pattern Generation Method for Reducing Defective Part Level

A Site Observation Directed Test Pattern Generation Method for Reducing Defective Part Level A Site Observation Directed Test Pattern Generation Method for Reducing Defective Part Level Michael R. Grimaila, Sooryong Lee, Jennifer Dworak, M. Ray Mercer, and Jaehong Park Department of Electrical

More information

Inference and computing with decomposable graphs

Inference and computing with decomposable graphs Inference and computing with decomposable graphs Peter Green 1 Alun Thomas 2 1 School of Mathematics University of Bristol 2 Genetic Epidemiology University of Utah 6 September 2011 / Bayes 250 Green/Thomas

More information

Finding Similar Tweets and Similar Users by Applying Document Similarity to Twitter Streaming Data

Finding Similar Tweets and Similar Users by Applying Document Similarity to Twitter Streaming Data Proc. vol.6,no2,2013,pp.22-30 Schl. ITE Tokai Univ. Vol. xx,no.xx,20xx,pp.xxx -xxx Paper Paper Finding Similar Tweets and Similar Users by Applying Document Similarity to Twitter Streaming Data by Iwao

More information

Predicting ratings of peer-generated content with personalized metrics

Predicting ratings of peer-generated content with personalized metrics Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu

More information

Novartis E2E CM case study

Novartis E2E CM case study Technical R&D/CHAD CM Unit Novartis E2E CM case study Markus Krumme, CM Unit Head Cambridge, MA September 26, 2016 Continuous Manufacturing at Novartis Basel ~300 m 2 productive area, 2 upstream trains,

More information

MINING SUPPLIERS FROM ONLINE NEWS DOCUMENTS

MINING SUPPLIERS FROM ONLINE NEWS DOCUMENTS MINING SUPPLIERS FROM ONLINE NEWS DOCUMENTS Chih-Ping Wei, Department of Information Management, National Taiwan University, Taipei, Taiwan, R.O.C., cpwei@im.ntu.edu.tw Lien-Chin Chen, Department of Information

More information

Automatic Tagging and Categorisation: Improving knowledge management and retrieval

Automatic Tagging and Categorisation: Improving knowledge management and retrieval Automatic Tagging and Categorisation: Improving knowledge management and retrieval 1. Introduction Unlike past business practices, the modern enterprise is increasingly reliant on the efficient processing

More information

Machine Learning: Algorithms and Applications

Machine Learning: Algorithms and Applications Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2011-2012 Lecture 4: 19 th March 2012 Evolutionary computing These

More information

Eyal Carmi. Google, 76 Ninth Avenue, New York, NY U.S.A. Gal Oestreicher-Singer and Uriel Stettner

Eyal Carmi. Google, 76 Ninth Avenue, New York, NY U.S.A. Gal Oestreicher-Singer and Uriel Stettner RESEARCH NOTE IS OPRAH CONTAGIOUS? THE DEPTH OF DIFFUSION OF DEMAND SHOCKS IN A PRODUCT NETWORK Eyal Carmi Google, 76 Ninth Avenue, New York, NY 10011 U.S.A. {eyal.carmi@gmail.com} Gal Oestreicher-Singer

More information

CLASS/YEAR: II MCA SUB.CODE&NAME: MC7303, SOFTWARE ENGINEERING. 1. Define Software Engineering. Software Engineering: 2. What is a process Framework? Process Framework: UNIT-I 2MARKS QUESTIONS AND ANSWERS

More information

Spatial Information in Offline Approximate Dynamic Programming for Dynamic Vehicle Routing with Stochastic Requests

Spatial Information in Offline Approximate Dynamic Programming for Dynamic Vehicle Routing with Stochastic Requests 1 Spatial Information in Offline Approximate Dynamic Programming for Dynamic Vehicle Routing with Stochastic Requests Ansmann, Artur, TU Braunschweig, a.ansmann@tu-braunschweig.de Ulmer, Marlin W., TU

More information

A Systematic Approach to Performance Evaluation

A Systematic Approach to Performance Evaluation A Systematic Approach to Performance evaluation is the process of determining how well an existing or future computer system meets a set of alternative performance objectives. Arbitrarily selecting performance

More information

ABSTRACT. Timetable, Urban bus network, Stochastic demand, Variable demand, Simulation ISSN:

ABSTRACT. Timetable, Urban bus network, Stochastic demand, Variable demand, Simulation ISSN: International Journal of Industrial Engineering & Production Research (09) pp. 83-91 December 09, Volume, Number 3 International Journal of Industrial Engineering & Production Research ISSN: 08-4889 Journal

More information