Similarity and trust metrics used in Recommender Systems: A survey

Size: px
Start display at page:

Download "Similarity and trust metrics used in Recommender Systems: A survey"

Transcription

1 Similarity and trust metrics used in Recommender Systems: A survey Maryam Jallouli 1, Sonia Lajmi 1,2, Ikram Amous 1 1 MIRACL Laboratory, Technopole of Sfax, University of Sfax, P.O.Box 242, 3031 Sfax, Tunisia 2 Al Baha University, Saudi Arabia Jallouli.maryam@gmail.com, slajmi@bu.edu.sa, ikram.amous@isecs.rnu.tn Abstract. Recommender systems suggest the most appropriate items to users in order to help customers to find the most relevant items and facilitate sales. Collaborative filtering recommendation algorithm is the most successful technique for recommendation. In view of the fact that collaborative filtering systems depend on neighbors as the source of information, the recommendation quality of this approach depends on the neighbor s selection. However, selecting neighbors can either stem from similarity or trust metrics. In this paper, we analyze these two types of neighbor s selection metrics used in the field of recommendation in the literature. For each type, we first define it and then review different proposed metrics. Keywords: Recommender system, neighbor s selection, similarity metric, trust metric. 1 Introduction With the growth development of social networks such as Facebook and Twitter and video streaming web sites like Youtube and Dailymotion. Companies are more and more interested to publish their products in social networks to increase their profits. Their product, named item, can be of different sorts such as friends, movies, music, books, news, images, etc. In the other hand, users using social network continue to increase. For example, and according to a statistic 1, number of users in Facebook is 1,13 billion, where 1,03 billion of them log on every day. Consequently, recommender systems have then emerged as a way to offer for a specific user the most appropriate item. In general, the process of recommendation consists of an input, a recommender system algorithm, and an output. These steps are shown in Fig. 1. Input can be represented by two steps. First, we obtain users history interactions towards items. How to represent it depends on the type of RS used, and it can be a vector, matrix or 1 (Juillet 2016)

2 tensor. The typical input of most RS algorithms is represented as a matrix of ratings. It consists of an table, where each row represents a user, each column represents an item, and the intersection of a row and a column represents the user s rating value given to an item [1]. The absence of a rating value at this intersection indicates that user has not yet rated the item. The second step is to calculate the distance between the target user and other users to find their nearest neighbors. The output of this step is an matrix, where is the number of users and content of cell, presents the trust intensity between users and. In traditional approaches neighbors are chosen based on similarity, however latest approaches are based in trust. Concerning the recommender system algorithms, the objective is to calculate the missing ratings in the rating matrix. In the last decade, several approaches have been proposed as recommender systems. These approaches can be divided into three distinct classes: there are content based, collaborative filtering and hybrid based approaches. Collaborative filtering algorithm (CF) is the most widely used technique to build recommender systems (RS) [1]. It consists of, for a target user, to find his nearest neighbors from the user matrix, and according to his neighbor s interests, predict his interest for an item not yet appreciated. The rating is computed by a weighted average of the ratings by the neighbors. The CF success depends, however, critically upon the similarity metric used to find the most similar or trusted users (neighbors). Fig. 1. General structure of a recommender system. From the above process, we can find that the second step in the input, that is the most important step and our focus in this paper, is to obtain user matrix. According to Fig. 1, this matrix can be obtained by either similarity or trust metrics. In the next two sections, these two types of metrics are detailed. For simplicity, a number of notations are introduced. to represent the 2D rating-matrix, with

3 user-item dimensions. Each value in this matrix, denoted by, represent the rate of user, about item. and denotes respectively the number of users and items. is a 2-D trust matrix with user-user dimension. Each case in this matrix, denoted by, represents trust value between users and. Our survey is organized as follow. First, we expose some similarity metrics used in this field. Secondly, we detail specific metric related to trust. For this aim, we define the trust notion in recommender system. Summarize, next, the four aspects of trust. Then, we review three approaches in this area categorized as: trust metrics based on (i) trust propagation, (ii) user s interactions and (iii) ratings vector. Three main research contributions of this paper are summarized as follow: (1) we clarify the majority of terms used in this field and thus unveil ambiguities (2) propose a new categorization for trust metric, which can be based on trust propagation, user s interaction or user s rating. And Finally, (3) present a survey that presents different similarity and trust metrics unlike other surveys which present only one kind of approach. 2 Similarity Metrics 2.1 Definitions In the field of RS, similarity values between two users are measured by observing all the items who have been rated to both users. There are a number of different mathematical formulations that can be used to calculate the similarity between two users. As can be seen in the formula below, each formula includes terms summed over the set of common users U. The cosine similarity and Pearson correlation coefficient are the most widely used metrics. Considering the ineffectiveness of these two measures, Bayesian similarity [2] is an alternative metric that is proposed for better similarity measure. In the following formulas, denotes the similarity value between users and. 2.2 Metrics Based On Vectors of Ratings Cosine similarity (COS). The earliest work using cosine similarity metric [3] for user-user CF is [4]. This similarity metric [3] represents each user by a vector which contains ratings of appreciated items, and defines the similarity between two users as the cosine value of the angle between these vectors. That s why it is also known as vector-based similarity. The cosine-based approach defines the cosine-similarity between two users and as demonstrated in Eq 1: (1)

4 where denotes the dot-product of the two vectors. Pearson correlation coefficient (PCC). This formula [3]is based on how much the rating by common users for a pair of items deviate from average ratings for those items. The PCC of two users and is defined in Eq 2: (2) Bayesian similarity (BS). This approach is designed by [2] taking into consideration both the direction and length of rating vectors. Therefore, the rating similarity between a pair of users is measured by excluding the chance correlation and user bias from the overall similarity as in Eq 3: Where denotes the raw similarity between two users and, denotes the chance correlation and is a constant representing the general user bias. The similarity metrics presented in this section are general. The next section is more specific. It detail the trust metrics used in the literature to improve recommender systems. (3) 3 Trust Metrics 3.1 Definitions There have been some proposed definitions of trust in the context of Recommender Systems [5] [6]. But, these definitions fall into various evidences, and a solid definition for it, in many cases can be quite elusive. In fact, we propose to categorise trust into trust propagation, user s interactions and ratings vector. According to each approach, definition of trust differs. In RS, trust can be broadly classified into two types: explicit and implicit. Explicit trust refers to the situations where trust information is explicitly specified by users themselves. For example, users in FilmTrust [7] can declare others users as trusted neighbours. However, this information is usually not available due to the concern of privacy or because the users don t like to make much effort to fill information. Hence, the implicit methods are preferred. Efforts have been made to infer trust using computational methods rather than specified by users. In the following formulas, denotes the trust value from user to user. 3.2 Properties Four distinct properties that can be attributed to trust are contended in [8].

5 Asymmetry. Trust depends from user to another. It is personal and subjective. In fact, towards a same user, each user, according to his understanding to the target user, may hold various opinions. For example, considering two users and, user trusts user does not mean that user trusts user. Transitivity. In real life, people tend to trust friend of a friend rather than a stranger. Based on this assumption, transitivity means that if user trusts, and trusts, it can be concluded that trusts to some degree. This information can be inferred by propagating trust in social networks, to identity trusted friends. Dynamicity. With the change of evidences or experiences over time, trust changes too. That is, it can be increased with positive feedbacks and decreased with negative feedbacks. So, trust is not stable or fix. Also, trust between two users is difficult to establish but easy to crash. That s mean, to form a solid trust, we need more evidences, but in the other way, to decrease trust we need just few evidences. Context Dependence. Trust depends with context which we are most interested in. In fact, a situation where a user has to trust another user depends with respect to one specific situation, but not necessarily to another. Situation here means the context, such as time and location. For example, a user can trust user in the context of electronic, but not in the context of movie. 3.3 Metrics Based On Trust Propagation In trust propagation methods, the system infers trust intensity between two users and based on non-empty value in the trust matrix. First, trust intensity between users contained in this matrix is presented in a graph, named trust graph. Based on the trust matrix, a trust graph is constructed. This graph is two dimensional, where nodes present users and edges presents trust value between users. In general, trust matrix is incomplete, that s mean the graph is not complete too. Therefore, trust propagation is one of the most important processes that calculate trust intensity on missed edges. Some of trust metrics based trust propagation has been proposed in the literature. In this section, we demonstrate how Advogato, Appleseed, TidalTrust and MoleTrust compute the indirect trust between pairs of users and how the trust propagates in the trust network. Advogato. The advogato maximum flow trust metric has been proposed by Levin [9] in order to discover the most trustworthy users. They take as input a seeds of trustworthy users and a number of users defined as base users. This metric uses a breath-first search algorithm. It uses the distance from the seeds denoted by and the out-degree of nodes as in Eq 4.. (4) Appleseed. Appleseed has been proposed by Ziegler [10] in his PhD thesis. His approach is similar to Advogato. But on it, the basic intuition of Appleseed is motivated by spreading activation mechanism. In other words, Advogato models trust as energy and propagates it using spreading activation models.

6 TidalTrust. Golbeck proposed TidalTrust [11] to compute trust prediction. He performs a modified depth first search for finding the highest trust users. Hence, to compute the trust value between user and, this metric aggregates the trust value between s direct neighbors and weighted by the direct trust values of and its direct neighbors as in Eq 5:. (5) Where denotes the set of neighbors of. We should note here that will be computed recursively if the trust value is an indirect trust. MoleTrust. Authors of [12] introduce MoleTrust as a trust metric. The underlying idea is similar to TidalTrust in the way that it also uses explicit trust. But the difference lies in the method of propagation. MoleTrust used breadth-first search while TidalTrust used depth-first search. The trust value from to is the aggregation of trust values between and users directly trusting weighted by the direct trust values as in Eq 6. (6) Where is the set of in-links of. That s mean, is the set of users for whom is a direct neighbor. 3.4 Metrics Based On Users interactions Mayer et all [5] defined trust as the willingness of a party to be vulnerable to the actions of another party based on the expectation that the other will perform a particular action important to the trustor, irrespective of the ability to monitor or control that other party. In fact, users interactions can be used to infer implicit trust. For example, if user wrote a positive comments about many items rated by user, then, user will consider user as an expert and then consider it as a trustworthy user. In this context, three most used and effective trust metrics based on users interactions are elaborated. Each of them takes into consideration a group of factors and uses it as a criterion of selection to select trust users. These factors are defined according to each approach as follow: Expertise and Preference based-trust (EPT). Denotes the approach proposed by Kim and Phalak [13]. The key idea of this work is that a user would trust an expert according to an area of interest of this user. For example, we consider user is interested in movie according to his history of activities in a social network, and a user has a good reputation on movie and electronic products because he writes many helpful comments. In this case, we can expect that user will trust user to

7 some extent in the categorie of movie not in the electronic product category. Based on this assumption; Kim and Phalak had taken into consideration expertise (reputation) and preference (interest) as principle factors of trust value. Thus, the degree of trust between two users and is a function based on the expertise of a content provider and the preference of a content user, as follow: (7) Where (0 : the preference level of user on category, (0 : the expertise of content provider j( ) on category and : the total number of categories. Functions of computing a content provider s expertise on a category and a content user s preference for a category are defined in their paper [13]. Trust Antecedent Framework (TAF). In [14], authors assume that trust is calculated as the result of three main trustee factors [5], including ability, benevolence and integrity and one trustor factor know as trust propensity. Ability determines skills and competence to deliver desired outcome, benevolence is the willingness to do good with the trustor and integrity is to adhere to a set of good moral principles. In the other hand, trust propensity means how easy a trustor trusts someone. Given the above factors, trust metrics is calculated as in Eq 8.. (8) where is an interaction intensity from to and is average rating received from. and determines ability factor. determines benevolence feature. measure integrity value and is the normalized trust propensity factor of the trustor. Formula of each factor is well defined in their paper [5]. Extended Trust Antecedent Framework ETAF. In [15], author have used the four generic trust factors used by [14] but in a different way. However, the formalization of each trust factor in separated, according to [15] into two perspectives: local and global. Local trustworthiness means that the trust factors are implemented using the direct interactions between two users, as in [14], whereas global trustworthiness indicates that trust factors are modeled based on the interactions of all users, with the target trustor or trustee. Both kinds of trustworthiness are taken into consideration by [15] and the formalization of their approach is as follow: Where and are the local and global trustworthiness of user from the perspective of user. Formulas of these metrics are presented in detail in [15]. (9) 3.4 Metrics Based On Vectors of Ratings Since our focus here is to infer trust from user ratings, the most appropriate definition of trust in this case was proposed by Guo [6]: Trust is defined as one s belief

8 towards the ability of others in providing valuable ratings. In this section we review different trust metrics based on vectors of rating used in the literature. k-nearest recommenders (KNR). Lathia [16] compute a predicted trust value by considering ratings of other users. In particular, a user is considered more trustworthy when he provides even opposite ratings than the one who is not willing to share opinions. Here, trust is obtained by the average of provided values over all the rated items. (10) Where defines the set of items rated by both users and, and is the maximum rating scale. Pearson Correlation Coefficient (PCC). In [17], Papagelis adopt the PCC metric to first compute user similarity as follow: (11) In a second step, they define and as the threshold of the user similarity and the number of co-rated item, respectively. Thus, trust value between two users and is defined as: (12) Trust-based CF. In [18], Hwang and Chen predict a rating by using a simple version of Resnick s prediction formula, based only on a single user: (13) Where and refers to the mean ratings of users and, respectively. Then, the trust value is calculated by averaging the prediction error on co-rated item: (14) By analogy to their work, [19] compute prediction using Resnick s formula prediction, but compute trust based on mean squared distance (MSD):

9 (15) In their approach, they consider that if, where is a predefined threshold, then these two users are considered as trusted neighbors. 4 Conclusion In this paper, we discussed on process of making recommendation in CF approaches. This process is categorized into three steps: Input, an algorithm of recommendation and output. Furthermore, the second step in the input concerns user matrix calculation. Two distinct metrics can be considered to obtain this matrix: similarity and trust metrics. This paper proposed an empirical study of these two types of metrics. To this end, similarity and trust metrics are first defined and then well reviewed in detail. For future work, we intend to present in detail the second step in the process of making recommendation, named recommendation algorithm. References 1. Gediminas, A., Alexander, T.: Toward the Next Generation of Recommender Systems: A Survey of the State of the Art and Possible Extensions. IEEE Trans. on Knowl. and Data Eng, vol. 17, pp ( 2005) 2. Guibing, G., Jie, Z., Neil, Y.S.: A Novel Bayesian Similarity Measure for Recommender Systems. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), pp ( 2013) 3. Tan, P.N., Steinbach,M., Kumar,V.: Introduction to Data Mining. Pearson Addison Wesley,chapter2 (2006) 4. Breese, J. S., Heckerman,D., Kadie,C.: Empirical Analysis of Predictive Algorithms for Collaborative Filtering,. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp (1998) 5. Roger,C., Mayer,J., H. D. James et F. Shoorman.: An integrative model of organizational trust. Academy of Management Review, pp , Guo,G.:Integrating trust and similarity to ameliorate the data sparsity and cold start for recommender systems. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys) (2013) 7. Jennifer,G., James, H.: Filmtrust: Movie Recommendations using Trust in Web-based Social Networks. In Proceedings of the IEEE Consumer Communications and Networking Conference (CCNC), vol. 1, pp (2006) 8. Christiano, C., Rino, F.: Trust theory: A Socio-Cognitive and Computational Model. Wiley, (2010) 9. Raph, L., Alexander, A.: Advogato's Trust Metric. online at (2002) 10. Cai-Nicolas, Z.: Towards Decentralized Recommender Systems. PhD thesis, University of Freiburg (2005) 11. Jennifer, A.G.: Computing and Applying Trust in Web-base Social Networks. In Ph.D

10 thesis (2005) 12. Paolo, M., Paolo, A.: Trust-aware recommender systems. In Proceedings of the 2007 ACM conference on Recommender systems, pp (2007) 13. Roger, C., James, H.D., F. David, S.: An Integrative Model of Organizational Trust. Academy of Management Review, pp (1995) 14. Young, A.K., Rasik, P.: A Trust Prediction Framework in Rating-based Experience Sharing social Networks without a Web of Trust. Information sciences, vol. 191, pp ( 2012) 15. Viet-An,N., Ee-Peng, L., Jing, J., Aixin, S.: To Trust or Not to Trust? Predicting Online Trusts using Trust Antecedent Framework (2008) 16. Guinbing, G., Jie, Z., Daniel, T., Neil, Y.S.: ETAF: An Extended Trust Antecedents Framework for Trust Prediction," International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp (2014) 17. Guinbing, G.: Integrating Trust and Similarity to Ameliorate the Data Sparsity and Cold Start for Recommender Systems. 7th ACM Conference on Recommender Systems (RecSys), (2013) 18. Neal, L., Stephen, H., Licia, C.: Trust-based Collaborative Filtering. In Trust Management II (2008) 19. Mano, P., Dimitris,P.,Themistoklis, K.: Alleviating the Sparsity Problem of Collaborative Filtering using Trust Inferences. In Trust management (2005) 20. Chein-Shung, H., Yu-Pin,C.: Using Trust in Collaborative Filtering Recommendation," In new trends in applied artificial intelligence (2007) 21. Qusai, S., Jie, L.: A Trust-Semantic Fusion-based Recommendation Approach for e- business Applications. Decision Support Systems (2012)